Re: [Scikit-learn-general] How to approach "Sum of True and False Positives = 0"

Josh Wasserstein Thu, 17 Oct 2013 06:44:40 -0700

Hi Joel and others,

Sorry, but I am still confused.. If I am using *stratified shuffle splitting
*, shouldn't I always have *some positives in the testing set (I have
positives in the full dataset)*? The message says: "The sum of true
positives and false positives (in other words total #  of positives in the
testing fold) are equal to zero for some labels"


Thanks,

Josh




On Sat, Aug 24, 2013 at 7:13 PM, Josh Wasserstein <ribonucle...@gmail.com>wrote:

> Thanks Joel. That makes sense.
>
> Josh
>
>
> On Sat, Aug 24, 2013 at 5:57 PM, Joel Nothman <
> jnoth...@student.usyd.edu.au> wrote:
>
>> On Sun, Aug 25, 2013 at 3:28 AM, Josh Wasserstein <ribonucle...@gmail.com
>> > wrote:
>>
>>> I am working with on a multi-class classification problem with
>>> admittedly very little data. My total datset has 29 examples with the
>>> following label distribution:
>>>
>>> Label A: 15 examples
>>> Label B: 8 examples
>>> Label C: 6 examples
>>>
>>> For cross validation I am using stratified repeated K-fold CV, with K =
>>> 3, and 20 repetitions
>>>   sfs = StratifiedShuffleSplit(y,n_iter=n_iter,test_size=1.0/K)
>>>
>>> The problem comes when I do a SVM grid search, e.g.:
>>>
>>>     clf = GridSearchCV(SVC(C=1, cache_size=5000, probability=True),
>>>                        tuned_parameters,
>>>                        scoring=score_func,
>>>                        verbose=1, n_jobs=1, cv=sfs)
>>>     clf.fit(X, y)
>>>
>>> where score_func is usually one of:
>>> f1_micro
>>> f1_macro
>>> f1_weighted
>>>
>>> I get warning messages like the following:
>>>
>>> > /path/to/python2.7/site-packages/sklearn/metrics/metrics.py:1249:
>>> > UserWarning: The sum of true positives and false positives are equal
>>> > to zero for some labels. Precision is ill defined for those labels
>>> > [0].  The precision and recall are equal to zero for some labels.
>>> > fbeta_score is ill defined for those labels [0 2].
>>> > average=average)
>>>
>>> My questions are:
>>>
>>> *1. *Why does this happen? I thought that F1 scoring would choose an
>>> operating point (i.e. a score threshold) where we get at least *some 
>>> *positives
>>> (regardless of whether they are FP or TP).
>>>
>>
>> The threshold is chosen by the classifier, not the metric. But this is
>> also often impossible: your classifier might return A and B ahead of C for
>> instance, or it might validly predict a label that ism't present in your
>> evaluation data.
>>
>> The reason for the warning is that you might argue that predicting 0
>> entries should result in a precision of 1. You can also argue that it
>> should be 0. Similarly for the recall of a predicted label that does not
>> appear in your test data. This decision will make a big difference to macro
>> F1.
>>
>> *2 *Can I reliably trust the scores that I get when I get this warning?
>>>
>>
>> Scikit-learn opts for 0 in these cases, so the result is a lower bound on
>> the metric. But a micro-average may be more suitable/stable.
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Introducing Performance Central, a new site from SourceForge and
>> AppDynamics. Performance Central is your source for news, insights,
>> analysis and resources for efficient Application Performance Management.
>> Visit us today!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] How to approach "Sum of True and False Positives = 0"

Reply via email to