I¹m having an issue using the prediction probabilities for sparse SVM, where many of the predictions come out the same for my test instances. These probabilities are produced during cross validation, and when I plot an ROC curve for the folds, the results look very strange, as there are a handful of clustered points on the graph. Here is my cross validation code, I based it off of the samples on the scikit website:
skf = StratifiedKFold(y, n_folds=numfolds) for train_index, test_index in skf: #split the training and testing sets X_train, X_test = X_scaled[train_index], X_scaled[test_index] y_train, y_test = y[train_index], y[test_index] #train on the subset for this fold print 'Training on fold ' + str(fold) classifier = svm.SVC(C=C_val, kernel='rbf', gamma=gamma_val, probability=True) probas_ = classifier.fit(X_train, y_train).predict_proba(X_test) #Compute ROC curve and area the curve fpr, tpr, thresholds = roc_curve(y_test, probas_[:, 1]) mean_tpr += interp(mean_fpr, fpr, tpr) mean_tpr[0] = 0.0 roc_auc = auc(fpr, tpr) I¹m just trying to figure out if there¹s something I¹m obviously missing here, since I used this same training set and SVM parameters with libsvm and got much better results. When I used libsvm and printed out the distances from the hyperplane for the CV test instances and then plotted the ROC, it came out much more like I expected, and a much better AUC. Any pointers would be greatly appreciated! Brett Meyer
smime.p7s
Description: S/MIME cryptographic signature
------------------------------------------------------------------------------ HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing & Easy Data Exploration http://p.sf.net/sfu/hpccsystems
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general