Hi all, I am looking into generating accuracy metrics of a classification,
in the context of text classification.
In API, there is no accuracy directly, I tried two things:
1: accuracy = np.mean(pred.ravel() == y_test.ravel())
2: I went into the code of metrics where recall/ precision/ f1 are
calculated, and similarly add the calculation for accuracy as:
true_neg = np.zeros(n_labels, dtype=np.double)
... ...
for i, label_i in enumerate(labels):
... ...
# true_neg[i] here is added for calculation of accuracy
true_neg[i] = np.sum(y_pred[y_true != label_i] != label_i)
true_pos[i] = np.sum(y_pred[y_true == label_i] == label_i)
false_pos[i] = np.sum(y_pred[y_true != label_i] == label_i)
false_neg[i] = np.sum(y_pred[y_true == label_i] != label_i)
support[i] = np.sum(y_true == label_i)
... ...
accuracy = (true_pos + true_neg) / (true_pos + true_neg + false_pos +
false_neg)
... ...
accuracy[(true_pos + true_neg + false_pos + false_neg) == 0.0] = 0.0
... ...
if acc == 1:
# As defined in fbeta_score() in sklearn-metrics source code
if accuracy.shape[0] == 2:
return accuracy[1]
else:
return np.average(accuracy, weights=support)
However, both of these two approaches are not generating accuracy ( (TP +
TN) / (TN+TP+FN+FP) ) as I expected:
1 is same as recall..
2 is always as high as 93.00 - 97.00%. when recall/precision/f1 range in
75%-90%
I am now confused, what is the right way?
Thank you!
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general