As I know, SVM predicts class label directly without probability
information.
The probability of each class is just an estimation.
(Check section 8 of libsvm paper:
http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf)
On Fri, May 22, 2015 at 3:03 AM zhenjiang zech xu <zhenjiang...@gmail.com>
wrote:
> Hi all,
>
> I tested the following code and its outputs show predict_proba and predict
> give very different result, even for the samples with high probability
> (0.7) to be label 1 are predicted as label 1. I am very surprised. Is this
> problem specific to the algorithm SVC used to generate probability? I
> haven't tested on other types of models. would they have similar problem?
>
> import numpy as np
> import matplotlib.pyplot as plt
>
> from sklearn import svm, datasets
> from sklearn.cross_validation import train_test_split
> from sklearn.metrics import confusion_matrix
>
> # import some data to play with
> iris = datasets.load_iris()
> X = iris.data
> y = iris.target
>
> # Split the data into a training set and a test set
> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
>
> # Run classifier, using a model that is too regularized (C too low) to see
> # the impact on the results
> classifier = svm.SVC(kernel='linear', C=0.01, probability=True)
> classifier.fit(X_train, y_train)
> y_pred = classifier.predict(X_test)
> y_pred_p = classifier.predict_proba(X_test)
> y_pred_p_l = np.argmax(y_pred_p, axis=1)
> diff = np.argwhere(y_pred != np.argmax(y_pred_prob, axis=1)).ravel()
> y_pred[diff]
> y_pred_p_l[diff]
> y_pred_p[diff,]
>
> Here are the output:
>
> $ y_pred[diff]
> : array([2, 2, 2, 2, 2, 2, 2, 2, 2])
>
> $ y_pred_p_l[diff]
> : array([1, 1, 1, 1, 1, 1, 1, 1, 1])
>
> $ y_pred_p[diff,]
> :
> array([[ 0.01, 0.59, 0.4 ],
> [ 0.01, 0.57, 0.41],
> [ 0.02, 0.7 , 0.28],
> [ 0.01, 0.67, 0.31],
> [ 0.01, 0.72, 0.27],
> [ 0.01, 0.61, 0.38],
> [ 0.01, 0.59, 0.4 ],
> [ 0.01, 0.56, 0.43],
> [ 0.01, 0.56, 0.43]])
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general