2015-05-22 9:01 GMT+02:00 zhenjiang zech xu <zhenjiang...@gmail.com>:
> I tested the following code and its outputs show predict_proba and predict
> give very different result, even for the samples with high probability (0.7)
> to be label 1 are predicted as label 1. I am very surprised. Is this problem
> specific to the algorithm SVC used to generate probability? I haven't tested
> on other types of models. would they have similar problem?

No, this is SVC-specific. It's in the docstring for predict_proba, in fact:

        The probability model is created using cross validation, so
        the results can be slightly different than those obtained by
        predict. Also, it will produce meaningless results on very small
        datasets.

(I'm not really sure how "slightly" results differ; what the docstring
is trying to say is that for large enough training sets, the results
should *usually* be consistent.)

The thing is that SVC's probabilities are derived from its predictions
+ a Platt scaling model [1]. In all other probability models (AFAIK),
the prediction is instead derived from the probability by a simple
argmax.


[1] https://en.wikipedia.org/wiki/Platt_scaling

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to