hi Brett, your code looks good.
can you share a full gist showing clearly the problem? my first check would be to compare using X.toarray() to make sure the dense / sparse code agree. Alex On Mon, Jun 9, 2014 at 9:36 PM, Brett Meyer <brett.me...@crowdstrike.com> wrote: > I’m having an issue using the prediction probabilities for sparse SVM, where > many of the predictions come out the same for my test instances. These > probabilities are produced during cross validation, and when I plot an ROC > curve for the folds, the results look very strange, as there are a handful > of clustered points on the graph. Here is my cross validation code, I based > it off of the samples on the scikit website: > > skf = StratifiedKFold(y, n_folds=numfolds) > > for train_index, test_index in skf: > #split the training and testing sets > X_train, X_test = X_scaled[train_index], X_scaled[test_index] > y_train, y_test = y[train_index], y[test_index] > > #train on the subset for this fold > print 'Training on fold ' + str(fold) > classifier = svm.SVC(C=C_val, kernel='rbf', gamma=gamma_val, > probability=True) > probas_ = classifier.fit(X_train, y_train).predict_proba(X_test) > > #Compute ROC curve and area the curve > fpr, tpr, thresholds = roc_curve(y_test, probas_[:, 1]) > mean_tpr += interp(mean_fpr, fpr, tpr) > mean_tpr[0] = 0.0 > roc_auc = auc(fpr, tpr) > > I’m just trying to figure out if there’s something I’m obviously missing > here, since I used this same training set and SVM parameters with libsvm and > got much better results. When I used libsvm and printed out the distances > from the hyperplane for the CV test instances and then plotted the ROC, it > came out much more like I expected, and a much better AUC. Any pointers > would be greatly appreciated! > > Brett Meyer > > > ------------------------------------------------------------------------------ > HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions > Find What Matters Most in Your Big Data with HPCC Systems > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. > Leverages Graph Analysis for Fast Processing & Easy Data Exploration > http://p.sf.net/sfu/hpccsystems > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing & Easy Data Exploration http://p.sf.net/sfu/hpccsystems _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general