Re: [Scikit-learn-general] Dramatic improvement by standardizing data?

josef.pktd Wed, 29 Apr 2015 08:29:07 -0700

On Wed, Apr 29, 2015 at 11:13 AM, Fabrizio Fasano <han...@gmail.com> wrote:


> Dear experts,
>
> I’m experiencing a dramatic improvement in cross-validation when data are
> standardised
>
> I mean accuracy increased from 48% to 100% when I shift from X to X_scaled
> = preprocessing.scale(X)
>
> Does it make sense in your opinion?
>
> Thank You a lot for any suggestion,
>
> Fabrizio
>


related question: Does scikit-learn do any autoscaling of the penalities?

I'm just looking into the scaling of penalties for statsmodels. Without
scaling the data or scaling the penalties, the penalized estimator might
not make much sense if the user doesn't use the appropriate scaling for the
data.

Josef



>
>
>
> my CODE:
>
> import numpy as np
> from sklearn import preprocessing
> from sklearn.svm import LinearSVC
> from sklearn.cross_validation import StratifiedShuffleSplit
>
> # 14 features, 16 samples dataset
> data = loadtxt(“data.txt")
> y=data[:,0]
> X=data[:,1:15]
> X_scaled = preprocessing.scale(X)
>
> sss = StratifiedShuffleSplit(y, 10000, test_size=0.25, random_state=0)
> clf = svm.LinearSVC(penalty="l1", dual=False, C=1, random_state=1)
> cv_scores=[]
>
> for train_index, test_index in sss:
>    X_train, X_test = X_scaled[train_index], X_scaled[test_index]
>    y_train, y_test = y[train_index], y[test_index]
>    clf.fit(X_train, y_train)
>    y_pred = clf.predict(X_test)
>    cv_scores.append(np.sum(y_pred == y_test) / float(np.size(y_test)))
>
> print "Accuracy ", np.ceil(100*np.mean(cv_scores)), "+/-",
> np.ceil(200*np.std(cv_scores))
>
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Dramatic improvement by standardizing data?

Reply via email to