Dear experts, I’m experiencing a dramatic improvement in cross-validation when data are standardised
I mean accuracy increased from 48% to 100% when I shift from X to X_scaled = preprocessing.scale(X) Does it make sense in your opinion? Thank You a lot for any suggestion, Fabrizio my CODE: import numpy as np from sklearn import preprocessing from sklearn.svm import LinearSVC from sklearn.cross_validation import StratifiedShuffleSplit # 14 features, 16 samples dataset data = loadtxt(“data.txt") y=data[:,0] X=data[:,1:15] X_scaled = preprocessing.scale(X) sss = StratifiedShuffleSplit(y, 10000, test_size=0.25, random_state=0) clf = svm.LinearSVC(penalty="l1", dual=False, C=1, random_state=1) cv_scores=[] for train_index, test_index in sss: X_train, X_test = X_scaled[train_index], X_scaled[test_index] y_train, y_test = y[train_index], y[test_index] clf.fit(X_train, y_train) y_pred = clf.predict(X_test) cv_scores.append(np.sum(y_pred == y_test) / float(np.size(y_test))) print "Accuracy ", np.ceil(100*np.mean(cv_scores)), "+/-", np.ceil(200*np.std(cv_scores)) ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general