Sounds like you got it to me, but perhaps "because theoretically it's
wrong" needs another moment's explanation: your estimator is fitted to the
feature values as they are adjusted in training, so it's inappropriate to
adjust them differently at test time. It's also inappropriate because in
the real world you often don't have test sets, but lone test samples, so
finding the mean of all test samples before processing any is not very
helpful.


On Tue, May 28, 2013 at 1:07 AM, Gianni Iannelli <giannicrist...@msn.com>wrote:

> Thank you! It's clear! Please, tell me if I understood correctly (or I'm
> completely stupid):
>
>
>    1. it took the training set and calculate the mean and the standard
>    deviation for each feature. To calculate it just substract the mean and
>    divide by the std (saw in the posted link on stackoverflow);
>    2. transform the training set using these values;
>    3. Train my SVM;
>    4. take the test set and apply the transformation *without calculating
>    again* the mean and the std but just using the already calculated one;
>    5. classify each point.
>
>
> Doing as I was doing (preprocessing.scale() for the trainingset and the
> testset) the difference is in the fourth point. It calculates again the
> mean and the std for the all test set and it applies the trasformation. In
> this case, looking at the mean, the two regions (training and test) could
> be shifted and consequently, the classification could be wrong. I wrote
> could because I have tried with three different dataset and with the second
> method (the one proposed by you) get better result two times and worste one
> time respect to the one that I was using.
>
> In conclusion, the method that I was using must be avoid because
> theoretically it's wrong.
>
> Is this correct?
>
> Thanks to all and thanks for your time!
> Solimyr
>
> > From: l.j.buiti...@uva.nl
> > Date: Mon, 27 May 2013 16:38:12 +0200
>
> > To: scikit-learn-general@lists.sourceforge.net
> > Subject: Re: [Scikit-learn-general] SVM - Scaling data or not?
> >
> > 2013/5/27 Gianni Iannelli <giannicrist...@msn.com>:
> > > Found it! But now it has a different name: StandardScaler.
> >
> > Ah, yes, excuse me.
> >
> > > Could you please exlpain to me what its the point to store Mean and
> Standard
> > > Deviation? It's not so clear to me. And how the transform is made?
> Sorry for
> > > my lower knowledge level about this stuffs...I wanna be sure to
> understand
> > > everything.
> >
> > The transform method centers and scales using the mean and stddev that
> > it has learned from the training set. This makes sure, as I explained
> > previously, that the test set is mapped to the same region of feature
> > space where the training set lives, and where the classifier has
> > learned its decision boundary.
> >
> > Suppose you'd want to apply a classifier, trained on a scaled training
> > set, to a single sample. If you don't center and scale, it may live in
> > the wrong region of space wrt. the decision boundary, which will be
> > somewhere near the origin. If you'd center and scale the single sample
> > using its own mean and stddev, it would always end up at the origin
> > because the mean of one point is the point itself, and no meaningful
> > classification can be performed.
> >
> > --
> > Lars Buitinck
> > Scientific programmer, ILPS
> > University of Amsterdam
> >
> >
> ------------------------------------------------------------------------------
> > Try New Relic Now & We'll Send You this Cool Shirt
> > New Relic is the only SaaS-based application performance monitoring
> service
> > that delivers powerful full stack analytics. Optimize and monitor your
> > browser, app, & servers with just a few lines of code. Try New Relic
> > and get this awesome Nerd Life shirt!
> http://p.sf.net/sfu/newrelic_d2d_may
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> Try New Relic Now & We'll Send You this Cool Shirt
> New Relic is the only SaaS-based application performance monitoring service
> that delivers powerful full stack analytics. Optimize and monitor your
> browser, app, & servers with just a few lines of code. Try New Relic
> and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to