Re: [Scikit-learn-general] To standardize is the question ...

Andreas Mueller Sat, 01 Jun 2013 11:19:32 -0700

On 06/01/2013 07:51 PM, o m wrote:
> > The main question is, what is your definition of an "important" variable?
> >
> > Gilles
> That's a good question;-) Seriously.
>
> I would define it - with many closely related variables - as a member of a 
> set that gives you the best predictability.
> LARS and LASSO with cross validation provide a good story along these lines. 
> But performing  standardization can influence that.
>
> What do people typically do in these situations?
The way I think about it is: do you believe that a priory all variables 
have the same importance? Then standardize.
Do you believe that all variables share the same scale? Then don't 
standardize.
This is basically true for all machine learning algorithms.
For example, if your units are meters (or feet) does a change in the 
first variable by 1m have the same meaning
as a change by 1m in the second? If so, you shouldn't standardize. If 
one variable only has small changes, these
will be blown up compared to the others.


Hth,
Andy

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] To standardize is the question ...

Reply via email to