Hi all,

I tested the following code and its outputs show predict_proba and predict
give very different result, even for the samples with high probability
(0.7) to be label 1 are predicted as label 1. I am very surprised. Is this
problem specific to the algorithm SVC used to generate probability? I
haven't tested on other types of models. would they have similar problem?

import numpy as np
import matplotlib.pyplot as plt

from sklearn import svm, datasets
from sklearn.cross_validation import train_test_split
from sklearn.metrics import confusion_matrix

# import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the data into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# Run classifier, using a model that is too regularized (C too low) to see
# the impact on the results
classifier = svm.SVC(kernel='linear', C=0.01, probability=True)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
y_pred_p = classifier.predict_proba(X_test)
y_pred_p_l = np.argmax(y_pred_p, axis=1)
diff = np.argwhere(y_pred != np.argmax(y_pred_prob, axis=1)).ravel()
y_pred[diff]
y_pred_p_l[diff]
y_pred_p[diff,]

Here are the output:

$ y_pred[diff]
: array([2, 2, 2, 2, 2, 2, 2, 2, 2])

$ y_pred_p_l[diff]
: array([1, 1, 1, 1, 1, 1, 1, 1, 1])

$ y_pred_p[diff,]
:
array([[ 0.01,  0.59,  0.4 ],
       [ 0.01,  0.57,  0.41],
       [ 0.02,  0.7 ,  0.28],
       [ 0.01,  0.67,  0.31],
       [ 0.01,  0.72,  0.27],
       [ 0.01,  0.61,  0.38],
       [ 0.01,  0.59,  0.4 ],
       [ 0.01,  0.56,  0.43],
       [ 0.01,  0.56,  0.43]])
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to