Re: [Scikit-learn-general] Average Per-Class Accuracy metric

Joel Nothman Tue, 08 Mar 2016 19:04:33 -0800

You mean TP / N, not TP / TN.

And I think the average per-class accuracy does some weird things. Like:


true = [1, 1, 1, 0, 0]
pred = [1, 1, 1, 1, 1]
a.p.c.a = (3 + 3) / 5 / 2

true = [1, 1, 1, 0, 2]
pred = [1, 1, 1, 1, 1]
a.p.c.a = (4 + 4 + 3) / 5 / 3

I don't think that's very useful.

On 9 March 2016 at 13:36, Sebastian Raschka <se.rasc...@gmail.com> wrote:

> > Firstly, balanced accuracy is a different thing, and yes, it should be
> supported.
>
> > Secondly, I am correct in thinking you're talking about multiclass (not
> multilabel).
>
>
> Sorry for the confusion, and yes, you are right. I think have mixed the
> terms “average per-class accuracy” with “balanced accuracy” then.
>
> Maybe to clarify, a corrected example to describe what I meant. Given the
> confusion matrix
>
>                predicted
>                label
>
>                [ 3,  0,  0]
>  true        [ 7, 50, 12]
>  label       [ 0,  0, 18]
>
>
> I’d compute the accuracy as TP / TN =  (3 + 50 + 18) / 90 = 0.79
>
> and the “average per-class accuracy” as
>
> (83/90 + 71/90 + 78/90) / 3 = (83 + 71 + 78) / (3 * 90) = 0.86
>
> (I hope I got it right this time!)
>
> In any case, I am not finding any literature describing this, and I am
> also not proposing to add it to sickit-learn, just wanted to get some info
> whether this is implemented or not. Thanks! :)
>
>
>
> > On Mar 8, 2016, at 8:29 PM, Joel Nothman <joel.noth...@gmail.com> wrote:
> >
> > Firstly, balanced accuracy is a different thing, and yes, it should be
> supported.
> >
> > Secondly, I am correct in thinking you're talking about multiclass (not
> multilabel).
> >
> > However, what you're describing isn't accuracy. It's actually
> micro-averaged recall, except that your dataset is impossible because
> you're allowing there to be fewer predictions than instances. If we assume
> that we're allowed to predict some negative class, that's fine; we can
> nowadays exclude it from micro-averaged recall with the labels parameter to
> recall_score. (If all labels are included in a multiclass problem,
> micro-averaged recall = precision = fscore = accuracy.)
> >
> > I had assumed you meant binarised accuracy, which would add together
> both true positives and true negatives for each class.
> >
> > Either way, if there's no literature on this, I think we'd really best
> not support it.
> >
> > On 9 March 2016 at 11:15, Sebastian Raschka <se.rasc...@gmail.com>
> wrote:
> > I haven’t seen this in practice, yet, either. A colleague was looking
> for this in scikit-learn recently, and he asked me if I know whether this
> is implemented or not. I couldn’t find anything in the docs and was just
> curious about your opinion. However, I just found this entry here on
> wikipedia:
> >
> > https://en.wikipedia.org/wiki/Accuracy_and_precision
> > > Another useful performance measure is the balanced accuracy[10] which
> avoids inflated performance estimates on imbalanced datasets. It is defined
> as the arithmetic mean of sensitivity and specificity, or the average
> accuracy obtained on either class:
> >
> > > Am I right in thinking that in the binary case, this is identical to
> accuracy?
> >
> >
> > I think it would only be equal to the “accuracy” if the class labels are
> uniformly distributed.
> >
> > >  I'm not sure what this metric is getting at.
> >
> > I have to think about this more, but I think it may be useful for
> imbalanced datasets where you want to emphasize the minority class. E.g.,
> let’s say we have a dataset of 120 samples and three class labels 1, 2, 3.
> And the classes are distributed like this
> > 10 x 1
> > 50 x 2
> > 60 x 3
> >
> > Now, let’s assume we have a model that makes the following predictions
> >
> > - it gets 0 out of 10 from class 1 right
> > - 45 out of 50 from class 2
> > - 55 out of 60 from class 3
> >
> > So, the accuracy would then be computed as
> >
> > (0 + 45 + 55) / 120 = 0.833
> >
> > But the “balanced accuracy” would be much lower, because the model did
> really badly on class 1, i.e.,
> >
> > (0/10 + 45/50 + 55/60) / 3 = 0.61
> >
> > Hm, if I see this correctly, this is actually very similar to the F1
> score. But instead of computing the harmonic mean between “precision and
> the true positive rate), we compute the harmonic mean between "precision
> and true negative rate"
> >
> > > On Mar 8, 2016, at 6:40 PM, Joel Nothman <joel.noth...@gmail.com>
> wrote:
> > >
> > > I've not seen this metric used (references?). Am I right in thinking
> that in the binary case, this is identical to accuracy? If I predict all
> elements to be the majority class, then adding more minority classes into
> the problem increases my score. I'm not sure what this metric is getting at.
> > >
> > > On 8 March 2016 at 11:57, Sebastian Raschka <se.rasc...@gmail.com>
> wrote:
> > > Hi,
> > >
> > > I was just wondering why there’s no support for the average per-class
> accuracy in the scorer functions (if I am not overlooking something).
> > > E.g., we have 'f1_macro', 'f1_micro', 'f1_samples', ‘f1_weighted’ but
> I didn’t see a ‘accuracy_macro’, i.e.,
> > > (acc.class_1 + acc.class_2 + … + acc.class_n) / n
> > >
> > > Would you discourage its usage (in favor of other metrics in
> imbalanced class problems) or was it simply not implemented, yet?
> > >
> > > Best,
> > > Sebastian
> > >
> ------------------------------------------------------------------------------
> > > Transform Data into Opportunity.
> > > Accelerate data analysis in your applications with
> > > Intel Data Analytics Acceleration Library.
> > > Click to learn more.
> > > http://makebettercode.com/inteldaal-eval
> > > _______________________________________________
> > > Scikit-learn-general mailing list
> > > Scikit-learn-general@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> > >
> > >
> ------------------------------------------------------------------------------
> > > Transform Data into Opportunity.
> > > Accelerate data analysis in your applications with
> > > Intel Data Analytics Acceleration Library.
> > > Click to learn more.
> > >
> http://makebettercode.com/inteldaal-eval_______________________________________________
> > > Scikit-learn-general mailing list
> > > Scikit-learn-general@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> >
> ------------------------------------------------------------------------------
> > Transform Data into Opportunity.
> > Accelerate data analysis in your applications with
> > Intel Data Analytics Acceleration Library.
> > Click to learn more.
> > http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> ------------------------------------------------------------------------------
> > Transform Data into Opportunity.
> > Accelerate data analysis in your applications with
> > Intel Data Analytics Acceleration Library.
> > Click to learn more.
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140_______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Average Per-Class Accuracy metric

Reply via email to