[Scikit-learn-general] bias in svm.LinearSVC classification accuracy in very small data sample?

Fabrizio Fasano Fri, 24 Apr 2015 09:54:12 -0700

Dear community,

I'm performing a binary classification on a very small data set:


details:
-binary classification (Y=0,1)
-small dataset (16 samples)
-large features set (112 features)
-balanced labels (y=0 and y=1 occur 8 times each)
-linear SVM classifier.

accuracy was 100% when tested on the true y. But for every combination of
16 values I randomly assign to y (equally populated 0 and 1) the accuracy
is >60% (tested by cross validation 25% test 75% train with many CV, not
only the StratifiedShuffleSplit one in the code below).

I understand that "small sample large features set" is a bad thing, but how
can the procedure returns an always good result?

thanks a lot for your help

Fabrizio

CODE:

print "\nWhen a stratified shuffle split is apllied"
from sklearn.cross_validation import StratifiedShuffleSplit
sss = StratifiedShuffleSplit(y, 100, test_size=0.25, random_state=0)
#len(sss)
print "shuffled permutations:"
print(sss)
clf = svm.LinearSVC(penalty="l1", dual=False, C=1, random_state=1)
cv_scores=[]

for train_index, test_index in sss:
   print("TRAIN:", train_index, "TEST:", test_index)
   X_train, X_test = X_scaled[train_index], X_scaled[test_index]
   y_train, y_test = y[train_index], y[test_index]
   clf.fit(X_train, y_train)
   y_pred = clf.predict(X_test)
   print "true label:", y_test
   print "predicted label", y_pred
   cv_scores.append(np.sum(y_pred == y_test) / float(np.size(y_test)))

print "Accuracy ", np.ceil(100*np.mean(cv_scores)), "+/-",
np.ceil(200*np.std(cv_scores))

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] bias in svm.LinearSVC classification accuracy in very small data sample?

Reply via email to