Data preprocessing is important. One thing you might want to do is get your preprocessing scaling values over the training data - technically getting the value over the whole dataset is not valid as that includes the test data.
It is hard to say whether 100% is believable or not, but you should probably only take scaling over training data. On Wed, Apr 29, 2015 at 11:13 AM, Fabrizio Fasano <han...@gmail.com> wrote: > Dear experts, > > I’m experiencing a dramatic improvement in cross-validation when data are > standardised > > I mean accuracy increased from 48% to 100% when I shift from X to X_scaled = > preprocessing.scale(X) > > Does it make sense in your opinion? > > Thank You a lot for any suggestion, > > Fabrizio > > > > my CODE: > > import numpy as np > from sklearn import preprocessing > from sklearn.svm import LinearSVC > from sklearn.cross_validation import StratifiedShuffleSplit > > # 14 features, 16 samples dataset > data = loadtxt(“data.txt") > y=data[:,0] > X=data[:,1:15] > X_scaled = preprocessing.scale(X) > > sss = StratifiedShuffleSplit(y, 10000, test_size=0.25, random_state=0) > clf = svm.LinearSVC(penalty="l1", dual=False, C=1, random_state=1) > cv_scores=[] > > for train_index, test_index in sss: > X_train, X_test = X_scaled[train_index], X_scaled[test_index] > y_train, y_test = y[train_index], y[test_index] > clf.fit(X_train, y_train) > y_pred = clf.predict(X_test) > cv_scores.append(np.sum(y_pred == y_test) / float(np.size(y_test))) > > print "Accuracy ", np.ceil(100*np.mean(cv_scores)), "+/-", > np.ceil(200*np.std(cv_scores)) > > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general