I am working with on a multi-class classification problem with admittedly
very little data. My total datset has 29 examples with the following label
distribution:

Label A: 15 examples
Label B: 8 examples
Label C: 6 examples

For cross validation I am using stratified repeated K-fold CV, with K = 3,
and 20 repetitions
  sfs = StratifiedShuffleSplit(y,n_iter=n_iter,test_size=1.0/K)

The problem comes when I do a SVM grid search, e.g.:

    clf = GridSearchCV(SVC(C=1, cache_size=5000, probability=True),
                       tuned_parameters,
                       scoring=score_func,
                       verbose=1, n_jobs=1, cv=sfs)
    clf.fit(X, y)

where score_func is usually one of:
f1_micro
f1_macro
f1_weighted

I get warning messages like the following:

> /path/to/python2.7/site-packages/sklearn/metrics/metrics.py:1249:
> UserWarning: The sum of true positives and false positives are equal
> to zero for some labels. Precision is ill defined for those labels
> [0].  The precision and recall are equal to zero for some labels.
> fbeta_score is ill defined for those labels [0 2].
> average=average)

My questions are:

*1. *Why does this happen? I thought that F1 scoring would choose an
operating point (i.e. a score threshold) where we get at least *some *positives
(regardless of whether they are FP or TP).

*2 *Can I reliably trust the scores that I get when I get this warning?

Josh
------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to