Re: [Scikit-learn-general] Prediction Probabilities in LinearSVC with scikit-learn >0.12

Afik Cohen Tue, 30 Oct 2012 18:01:15 -0700

* Woops, my previous reply got munged up, so I'm resubmitting it. Please 
ignore my previous messed up email.
 
> 2012/10/30 Afik Cohen <afik@...>:
> >> Do you know what they are doing? I would expect they just do a soft-max.
> > I don't. :) But according to the LIBLINEAR FAQ: "If you really would like to
> > have probability outputs for SVM in LIBLINEAR, you can consider using the
> > simple
> > probability model of logistic regression. Simply modify the following
> > subroutine
> > in linear.cpp." So I assume that it's using the same method logistic
> > regression
> > is using to determine probability estimates.
> 
> Judging from your code, it is, which is why I suggested copying over
> the new predict_proba. It does exactly the same thing as Liblinear's
> code for logistic regression probability predictions.
That is good to know, we'll try that. However, now you've got us thinking that
maybe this method is unreliable and we're somewhat less confident using it in
production code...


> 
> > I think it's worth doing, because as I mentioned, we seemed to be getting
> > meaningful results. We also compared probability outputs from LinearSVC()
> > and OneVsRestClassifier(LinearSVC()); in the former, we would get N
> > probabilities that the input belonged to each class, and in the latter we
> > would get N "IS" and "IS NOT" pairs, showing for each class the probability
> > that the input was closer to that class or to the rest of the classes.
> > Again, these probability estimates did not seem like meaningless noise!
> 
> Those "probabilities" are guaranteed to be >.5 for positive predictions and
> <=.5 for negative ones in binary classification. They will always sum to one
> for each sample in multiclass classification.  That's still the case with the
> hack I suggested. However, you could also have used the output from
> decision_function; that produces a more appropriate confidence score for
> linear SVMs. It's not a number between zero and one, though, but either
> positive or negative and farther away from zero as the confidence increases.
Yes, we have noticed that the estimates are >.5 for positive predictions and
<=.5 for negative ones. For example, here is some example output from
LinearSVC.predict_proba():

With a single classifier (a single LinearSVC() instance training on all
classes): 
0.00091845721710660235, 0.00091952391997811766,
0.00092169857946579239, 0.00092723763293324924, 0.00093133854468835234,
0.001014397289942081, 0.0010818874768571949, 0.0018864265035381381,
0.00091323156582493283, 0.00091434117232201174, 0.00091437125286051744,
0.00091637654884632082, ...

Here, 0.0018864265035381381 is the highest probability, so it's the chosen
class. This happens to be the correct prediction.

With a OneVsRest strategy - fitting with OneVsRestClassifier(LinearSVC()):

[array([[ 0.74510559,  0.25489441]]), array([[ 0.43768196,  0.56231804]])
array([[ 0.73616065,  0.26383935]]), array([[ 0.73083986,  0.26916014]]),
array([[ 0.73569696,  0.26430304]]), array([[ 0.73282635,  0.26717365]]),
array([[ 0.72934341,  0.27065659]]) ...]

You can interpret this as each classifier's probabilities shown as array([[
"IS_NOT"% , "IS"% ]]). The array with the largest IS percentage is array([[
0.43768196, 0.56231804]]), so that's the class that was picked. 

Testing this on an email we know does not belong to any existing class produces
something like this:

[array([[ 0.73209442,  0.26790558]]), array([[ 0.69946787,  0.30053213]])
array([[ 0.73971788,  0.26028212]]), array([[ 0.73583213,  0.26416787]]),
array([[ 0.73277501,  0.26722499]])]

The highest "IS" percentage is array([[ 0.69946787, 0.30053213]]), so that is
the returned class, but note the low percentage. This could mean that this is a
reliable way of establishing confidence thresholds, i.e. determining a point
below which a match returned could be considered 'low confidence' and thus
probably not to be trusted.

> 
> I don't get the remark about OneVsRestClassifier. What do you mean by "is" and
> "is not" pairs? What does your target vector y look like?

I should have been more clear, I've seen this nomenclature before in other
machine learning papers. "IS" means the % probability the input belongs to this
class, whereas "IS NOT" means the % that it doesn't.
> 
> >> Do you have strong reasons not to use logistic regression?
> > Correct me if I've misunderstood, but regression is meant for fitting to a
> > continuous variable or something, not classifying inputs to discrete
> > classes, right? We're classifying emails into ~1200 distinct classes, so
> > Logistic Regression is meaningless for us (in fact, when we tried it, it
> > achieved a hilarious 48% cross-validated k=3 accuracy. LinearSVC achieves
> > 95% accuracy.)
> 
> Actually, logistic regression *is* a classification model, it just has a very
> unfortunate name ("In the terminology of statistics, this model is known as
> logistic regression, although it should be emphasized that this is a model for
> classification rather than regression" -- C.M.  Bishop, Pattern Recognition
> and Machine Learning, p. 205).
> 
> 48% accuracy is extreme compared to 95%, though. How were you applying
> logistic regression?
> 
Hah, thanks for the explanation :) But yes, the accuracy was terrible. In fact,
we just ran another cross-validated k=3 run with our current data, and got these
results: 

Training LogisticRegression(C=1.0, class_weight=None, dual=False,
fit_intercept=True, intercept_scaling=1, penalty=l2, tol=0.0001) 
Running Cross-Validated accuracy testing with 3 folds.
done [4276.551s]
Results: 
Accuracy: 0.639312 (+/- 0.003300)
Training time:  4276.55051398 
Input Data: (10480, 405562) 
Labels:  1144

As you can see, 63% accuracy with 10480 document vectors with 405562 features.
Pretty awful compared to LinearSVC which gives us upwards of 95%.

Afik



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Prediction Probabilities in LinearSVC with scikit-learn >0.12

Reply via email to