Hi Kyle, Thank You for the suggestion,
If I standardise only the training set, does the classifier work well on the non standardised test set? Do I have to do something on the test set before applying my classification test? Fabrizio > On 29 Apr 2015, at 17:36, Kyle Kastner <kastnerk...@gmail.com> wrote: > > Data preprocessing is important. One thing you might want to do is get > your preprocessing scaling values over the training data - technically > getting the value over the whole dataset is not valid as that includes > the test data. > > It is hard to say whether 100% is believable or not, but you should > probably only take scaling over training data. > > On Wed, Apr 29, 2015 at 11:13 AM, Fabrizio Fasano <han...@gmail.com> wrote: >> Dear experts, >> >> I’m experiencing a dramatic improvement in cross-validation when data are >> standardised >> >> I mean accuracy increased from 48% to 100% when I shift from X to X_scaled = >> preprocessing.scale(X) >> >> Does it make sense in your opinion? >> >> Thank You a lot for any suggestion, >> >> Fabrizio >> >> >> >> my CODE: >> >> import numpy as np >> from sklearn import preprocessing >> from sklearn.svm import LinearSVC >> from sklearn.cross_validation import StratifiedShuffleSplit >> >> # 14 features, 16 samples dataset >> data = loadtxt(“data.txt") >> y=data[:,0] >> X=data[:,1:15] >> X_scaled = preprocessing.scale(X) >> >> sss = StratifiedShuffleSplit(y, 10000, test_size=0.25, random_state=0) >> clf = svm.LinearSVC(penalty="l1", dual=False, C=1, random_state=1) >> cv_scores=[] >> >> for train_index, test_index in sss: >> X_train, X_test = X_scaled[train_index], X_scaled[test_index] >> y_train, y_test = y[train_index], y[test_index] >> clf.fit(X_train, y_train) >> y_pred = clf.predict(X_test) >> cv_scores.append(np.sum(y_pred == y_test) / float(np.size(y_test))) >> >> print "Accuracy ", np.ceil(100*np.mean(cv_scores)), "+/-", >> np.ceil(200*np.std(cv_scores)) >> >> >> >> ------------------------------------------------------------------------------ >> One dashboard for servers and applications across Physical-Virtual-Cloud >> Widest out-of-the-box monitoring support with 50+ applications >> Performance metrics, stats and reports that give you Actionable Insights >> Deep dive visibility with transaction tracing using APM Insight. >> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general