On 06/01/2013 07:51 PM, o m wrote: > > The main question is, what is your definition of an "important" variable? > > > > Gilles > That's a good question;-) Seriously. > > I would define it - with many closely related variables - as a member of a > set that gives you the best predictability. > LARS and LASSO with cross validation provide a good story along these lines. > But performing standardization can influence that. > > What do people typically do in these situations? The way I think about it is: do you believe that a priory all variables have the same importance? Then standardize. Do you believe that all variables share the same scale? Then don't standardize. This is basically true for all machine learning algorithms. For example, if your units are meters (or feet) does a change in the first variable by 1m have the same meaning as a change by 1m in the second? If so, you shouldn't standardize. If one variable only has small changes, these will be blown up compared to the others.
Hth, Andy ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general