Andy, on reading your tip, and reflecting on what I do, I'm tempted to claim
that standardization is very important, regardless ...

Assume x0 is very important but  has a tiny range (-1/100, 1/100) - all
other
variables  being significantly larger in range.
Lars/Lasso will drop x0 until the end, as the associated parameter
estimate is high. I'd therefore conclude that x0 must not be very
important.
Moreover, that conclusion would be re-enforced if the combined effect of
ten other useless
variables masked the  effect/contribution of x0. If I standardize
everything, Lars/Lasso
would put x0 in its place right from the start.

Is there a flaw?

Gael mentioned randomized-sparsity which I'm unfamiliar with, but would like
to investigate further.

Thanks.

Best Regards.


On 06/01/2013 07:51 PM, o m wrote:
> > > The main question is, what is your definition of an "important"
variable?
> > >
> > > Gilles
> > That's a good question;-) Seriously.
> >
> > I would define it - with many closely related variables - as a member
of a
> > set that gives you the best predictability.
> > LARS and LASSO with cross validation provide a good story along these
lines.
> > But performing  standardization can influence that.
> >
> > What do people typically do in these situations?
>
> The way I think about it is: do you believe that a priory all variables
> have the same importance? Then standardize.
> Do you believe that all variables share the same scale? Then don't
> standardize.
> This is basically true for all machine learning algorithms.
> For example, if your units are meters (or feet) does a change in the
> first variable by 1m have the same meaning
> as a change by 1m in the second? If so, you shouldn't standardize. If
> one variable only has small changes, these
> will be blown up compared to the others.

Hth,
Andy
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to