2015-05-22 9:01 GMT+02:00 zhenjiang zech xu <zhenjiang...@gmail.com>: > I tested the following code and its outputs show predict_proba and predict > give very different result, even for the samples with high probability (0.7) > to be label 1 are predicted as label 1. I am very surprised. Is this problem > specific to the algorithm SVC used to generate probability? I haven't tested > on other types of models. would they have similar problem?
No, this is SVC-specific. It's in the docstring for predict_proba, in fact: The probability model is created using cross validation, so the results can be slightly different than those obtained by predict. Also, it will produce meaningless results on very small datasets. (I'm not really sure how "slightly" results differ; what the docstring is trying to say is that for large enough training sets, the results should *usually* be consistent.) The thing is that SVC's probabilities are derived from its predictions + a Platt scaling model [1]. In all other probability models (AFAIK), the prediction is instead derived from the probability by a simple argmax. [1] https://en.wikipedia.org/wiki/Platt_scaling ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general