Hi all, I am looking into generating accuracy metrics of a classification,
in the context of text classification.

In API, there is no accuracy directly, I tried two things:

1: accuracy = np.mean(pred.ravel() == y_test.ravel())

2: I went into the code of metrics where recall/ precision/ f1 are
calculated, and similarly add the calculation for accuracy as:

    true_neg = np.zeros(n_labels, dtype=np.double)
    ... ...
    for i, label_i in enumerate(labels):
        ... ...
        # true_neg[i] here is added for calculation of accuracy
        true_neg[i] = np.sum(y_pred[y_true != label_i] != label_i)
        true_pos[i] = np.sum(y_pred[y_true == label_i] == label_i)
        false_pos[i] = np.sum(y_pred[y_true != label_i] == label_i)
        false_neg[i] = np.sum(y_pred[y_true == label_i] != label_i)
        support[i] = np.sum(y_true == label_i)
    ... ...
    accuracy = (true_pos + true_neg) / (true_pos + true_neg + false_pos +
false_neg)
    ... ...
    accuracy[(true_pos + true_neg + false_pos + false_neg) == 0.0] = 0.0
    ... ...
    if acc == 1:
        # As defined in fbeta_score() in sklearn-metrics source code
        if accuracy.shape[0] == 2:
            return accuracy[1]
        else:
            return np.average(accuracy, weights=support)

However, both of these two approaches are not generating accuracy ( (TP +
TN) / (TN+TP+FN+FP) ) as I expected:
1 is same as recall..
2 is always as high as 93.00 - 97.00%. when recall/precision/f1 range in
75%-90%

I am now confused, what is the right way?

Thank you!
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to