Re: [Scikit-learn-general] inconsistencies between predict_proba and predict

chyi-kwei yau Fri, 22 May 2015 08:05:34 -0700

As I know, SVM predicts class label directly without probability
information.
The probability of each class is just an estimation.
(Check section 8 of libsvm paper:
http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf)



On Fri, May 22, 2015 at 3:03 AM zhenjiang zech xu <zhenjiang...@gmail.com>
wrote:

> Hi all,
>
> I tested the following code and its outputs show predict_proba and predict
> give very different result, even for the samples with high probability
> (0.7) to be label 1 are predicted as label 1. I am very surprised. Is this
> problem specific to the algorithm SVC used to generate probability? I
> haven't tested on other types of models. would they have similar problem?
>
> import numpy as np
> import matplotlib.pyplot as plt
>
> from sklearn import svm, datasets
> from sklearn.cross_validation import train_test_split
> from sklearn.metrics import confusion_matrix
>
> # import some data to play with
> iris = datasets.load_iris()
> X = iris.data
> y = iris.target
>
> # Split the data into a training set and a test set
> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
>
> # Run classifier, using a model that is too regularized (C too low) to see
> # the impact on the results
> classifier = svm.SVC(kernel='linear', C=0.01, probability=True)
> classifier.fit(X_train, y_train)
> y_pred = classifier.predict(X_test)
> y_pred_p = classifier.predict_proba(X_test)
> y_pred_p_l = np.argmax(y_pred_p, axis=1)
> diff = np.argwhere(y_pred != np.argmax(y_pred_prob, axis=1)).ravel()
> y_pred[diff]
> y_pred_p_l[diff]
> y_pred_p[diff,]
>
> Here are the output:
>
> $ y_pred[diff]
> : array([2, 2, 2, 2, 2, 2, 2, 2, 2])
>
> $ y_pred_p_l[diff]
> : array([1, 1, 1, 1, 1, 1, 1, 1, 1])
>
> $ y_pred_p[diff,]
> :
> array([[ 0.01,  0.59,  0.4 ],
>        [ 0.01,  0.57,  0.41],
>        [ 0.02,  0.7 ,  0.28],
>        [ 0.01,  0.67,  0.31],
>        [ 0.01,  0.72,  0.27],
>        [ 0.01,  0.61,  0.38],
>        [ 0.01,  0.59,  0.4 ],
>        [ 0.01,  0.56,  0.43],
>        [ 0.01,  0.56,  0.43]])
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] inconsistencies between predict_proba and predict

Reply via email to