> Firstly, balanced accuracy is a different thing, and yes, it should be
> supported.
> Secondly, I am correct in thinking you're talking about multiclass (not
> multilabel).
Sorry for the confusion, and yes, you are right. I think have mixed the terms
“average per-class accuracy” with “balanced accuracy” then.
Maybe to clarify, a corrected example to describe what I meant. Given the
confusion matrix
predicted
label
[ 3, 0, 0]
true [ 7, 50, 12]
label [ 0, 0, 18]
I’d compute the accuracy as TP / TN = (3 + 50 + 18) / 90 = 0.79
and the “average per-class accuracy” as
(83/90 + 71/90 + 78/90) / 3 = (83 + 71 + 78) / (3 * 90) = 0.86
(I hope I got it right this time!)
In any case, I am not finding any literature describing this, and I am also not
proposing to add it to sickit-learn, just wanted to get some info whether this
is implemented or not. Thanks! :)
> On Mar 8, 2016, at 8:29 PM, Joel Nothman <[email protected]> wrote:
>
> Firstly, balanced accuracy is a different thing, and yes, it should be
> supported.
>
> Secondly, I am correct in thinking you're talking about multiclass (not
> multilabel).
>
> However, what you're describing isn't accuracy. It's actually micro-averaged
> recall, except that your dataset is impossible because you're allowing there
> to be fewer predictions than instances. If we assume that we're allowed to
> predict some negative class, that's fine; we can nowadays exclude it from
> micro-averaged recall with the labels parameter to recall_score. (If all
> labels are included in a multiclass problem, micro-averaged recall =
> precision = fscore = accuracy.)
>
> I had assumed you meant binarised accuracy, which would add together both
> true positives and true negatives for each class.
>
> Either way, if there's no literature on this, I think we'd really best not
> support it.
>
> On 9 March 2016 at 11:15, Sebastian Raschka <[email protected]> wrote:
> I haven’t seen this in practice, yet, either. A colleague was looking for
> this in scikit-learn recently, and he asked me if I know whether this is
> implemented or not. I couldn’t find anything in the docs and was just curious
> about your opinion. However, I just found this entry here on wikipedia:
>
> https://en.wikipedia.org/wiki/Accuracy_and_precision
> > Another useful performance measure is the balanced accuracy[10] which
> > avoids inflated performance estimates on imbalanced datasets. It is defined
> > as the arithmetic mean of sensitivity and specificity, or the average
> > accuracy obtained on either class:
>
> > Am I right in thinking that in the binary case, this is identical to
> > accuracy?
>
>
> I think it would only be equal to the “accuracy” if the class labels are
> uniformly distributed.
>
> > I'm not sure what this metric is getting at.
>
> I have to think about this more, but I think it may be useful for imbalanced
> datasets where you want to emphasize the minority class. E.g., let’s say we
> have a dataset of 120 samples and three class labels 1, 2, 3. And the classes
> are distributed like this
> 10 x 1
> 50 x 2
> 60 x 3
>
> Now, let’s assume we have a model that makes the following predictions
>
> - it gets 0 out of 10 from class 1 right
> - 45 out of 50 from class 2
> - 55 out of 60 from class 3
>
> So, the accuracy would then be computed as
>
> (0 + 45 + 55) / 120 = 0.833
>
> But the “balanced accuracy” would be much lower, because the model did really
> badly on class 1, i.e.,
>
> (0/10 + 45/50 + 55/60) / 3 = 0.61
>
> Hm, if I see this correctly, this is actually very similar to the F1 score.
> But instead of computing the harmonic mean between “precision and the true
> positive rate), we compute the harmonic mean between "precision and true
> negative rate"
>
> > On Mar 8, 2016, at 6:40 PM, Joel Nothman <[email protected]> wrote:
> >
> > I've not seen this metric used (references?). Am I right in thinking that
> > in the binary case, this is identical to accuracy? If I predict all
> > elements to be the majority class, then adding more minority classes into
> > the problem increases my score. I'm not sure what this metric is getting at.
> >
> > On 8 March 2016 at 11:57, Sebastian Raschka <[email protected]> wrote:
> > Hi,
> >
> > I was just wondering why there’s no support for the average per-class
> > accuracy in the scorer functions (if I am not overlooking something).
> > E.g., we have 'f1_macro', 'f1_micro', 'f1_samples', ‘f1_weighted’ but I
> > didn’t see a ‘accuracy_macro’, i.e.,
> > (acc.class_1 + acc.class_2 + … + acc.class_n) / n
> >
> > Would you discourage its usage (in favor of other metrics in imbalanced
> > class problems) or was it simply not implemented, yet?
> >
> > Best,
> > Sebastian
> > ------------------------------------------------------------------------------
> > Transform Data into Opportunity.
> > Accelerate data analysis in your applications with
> > Intel Data Analytics Acceleration Library.
> > Click to learn more.
> > http://makebettercode.com/inteldaal-eval
> > _______________________________________________
> > Scikit-learn-general mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> > ------------------------------------------------------------------------------
> > Transform Data into Opportunity.
> > Accelerate data analysis in your applications with
> > Intel Data Analytics Acceleration Library.
> > Click to learn more.
> > http://makebettercode.com/inteldaal-eval_______________________________________________
> > Scikit-learn-general mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140_______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general