On Sun, Aug 25, 2013 at 3:28 AM, Josh Wasserstein <ribonucle...@gmail.com>wrote:
> I am working with on a multi-class classification problem with admittedly
> very little data. My total datset has 29 examples with the following label
> distribution:
>
> Label A: 15 examples
> Label B: 8 examples
> Label C: 6 examples
>
> For cross validation I am using stratified repeated K-fold CV, with K = 3,
> and 20 repetitions
> sfs = StratifiedShuffleSplit(y,n_iter=n_iter,test_size=1.0/K)
>
> The problem comes when I do a SVM grid search, e.g.:
>
> clf = GridSearchCV(SVC(C=1, cache_size=5000, probability=True),
> tuned_parameters,
> scoring=score_func,
> verbose=1, n_jobs=1, cv=sfs)
> clf.fit(X, y)
>
> where score_func is usually one of:
> f1_micro
> f1_macro
> f1_weighted
>
> I get warning messages like the following:
>
> > /path/to/python2.7/site-packages/sklearn/metrics/metrics.py:1249:
> > UserWarning: The sum of true positives and false positives are equal
> > to zero for some labels. Precision is ill defined for those labels
> > [0]. The precision and recall are equal to zero for some labels.
> > fbeta_score is ill defined for those labels [0 2].
> > average=average)
>
> My questions are:
>
> *1. *Why does this happen? I thought that F1 scoring would choose an
> operating point (i.e. a score threshold) where we get at least *some
> *positives
> (regardless of whether they are FP or TP).
>
The threshold is chosen by the classifier, not the metric. But this is also
often impossible: your classifier might return A and B ahead of C for
instance, or it might validly predict a label that ism't present in your
evaluation data.
The reason for the warning is that you might argue that predicting 0
entries should result in a precision of 1. You can also argue that it
should be 0. Similarly for the recall of a predicted label that does not
appear in your test data. This decision will make a big difference to macro
F1.
*2 *Can I reliably trust the scores that I get when I get this warning?
>
Scikit-learn opts for 0 in these cases, so the result is a lower bound on
the metric. But a micro-average may be more suitable/stable.
------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general