Re: [Scikit-learn-general] choice of regularization parameter grid for elastic net

John Collins Mon, 14 Oct 2013 10:06:35 -0700

Hi James,

In R speak:


The reason you see the advice to choose a higher alpha if nobs < nvars and
a lower alpha if the comparison is that alpha is the mixing weight between
L1 and L2 penalties (whereas lambda is the regularization level) and
because the L1 penalty tends to set more coefficients to zero than L2.
Therefore if nvars >> nobs this seems like good advice since you'll end up
with a more parsimonious and interpretable model. I would suggest that the
advice above is a good rule of thumb but also a bit hand-wavy. In practice,
alpha is not nearly as sensitive as lambda (level of regularization). It
may be reasonable to play with some discrete set of alphas in (0,1 ] and a
path of lambdas for each and choose the best model from these.

Confusingly, sklearn uses l1_ratio to mean alpha and alpha to mean lambda.
Reading some of the previous thread, maybe this is responsible for some
confusion between the two sets of documentation?

-
John


On Mon, Oct 14, 2013 at 9:33 AM, <
scikit-learn-general-requ...@lists.sourceforge.net> wrote:

> Send Scikit-learn-general mailing list submissions to
>         scikit-learn-general@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> or, via email, send a message with subject or body 'help' to
>         scikit-learn-general-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
>         scikit-learn-general-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Scikit-learn-general digest..."
>
>
> Today's Topics:
>
>    1. Re: recommendation systems (Olivier Grisel)
>    2. Re: centering of sparse data for elastic net (James Jensen)
>    3. Re: choice of regularization parameter grid       for elastic net
>       (James Jensen)
>    4. Re: centering of sparse data for elastic net (Lars Buitinck)
>    5. Re: choice of regularization parameter grid for elastic net
>       (Nicholas Dronen)
>    6. Contributing to scikit-learn (Ankit Agrawal)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 14 Oct 2013 12:05:24 +0200
> From: Olivier Grisel <olivier.gri...@ensta.org>
> Subject: Re: [Scikit-learn-general] recommendation systems
> To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net>
> Message-ID:
>         <
> cafve7k5e31gdnnqcp8zpg4w2tjayt91o+8hfyyjzvzsh6go...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Actually the mrec implementation is not the original SLIM algorithm
> but a variant demonstrated by the lib author here:
>
> http://slideshare.net/MarkLevy/efficient-slides
>
> Thanks @larsmans for the tweet :)
>
> --
> Olivier
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 14 Oct 2013 08:13:30 -0700
> From: James Jensen <jdjen...@eng.ucsd.edu>
> Subject: Re: [Scikit-learn-general] centering of sparse data for
>         elastic net
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID: <525c0a1a.6010...@eng.ucsd.edu>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Thank you, Olivier.
>
> Just to clarify: you say
>
>     You can control the centering with `normalize=True` flag of the
>     ElasticNet class (or any other linear regression model).
>
> I've noticed people use the term "normalize" in different ways. In the
> case of the `normalize=True` flag of the linear models, does it mean
> both scaling samples to have unit norm and centering them to have mean
> zero? If so, this is inconsistent with the usage in, say, the
> preprocessing module, where "normalization" refers only to scaling to
> unit norm, and the word "standardization" is used to refer to doing both
> (although the function to standardize is scale(), and "scale" seems more
> naturally associated with normalization, in my mind). Because of this, I
> had supposed that the `normalize=True` flag did not determine centering.
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 3
> Date: Mon, 14 Oct 2013 08:34:17 -0700
> From: James Jensen <jdjen...@eng.ucsd.edu>
> Subject: Re: [Scikit-learn-general] choice of regularization parameter
>         grid    for elastic net
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID: <525c0ef9.1020...@eng.ucsd.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Thanks, Alex. That is helpful. Looks like the glmnet documentation says
> that this is how they do it as well. What they don't explain is how to
> find alpha_max in the first place. The only thing I've thought of is
> doing something like a binary search until you find the smallest alpha
> yielding the coef_ of zeros, with some limit on how many steps you do it
> in. But is there a better way?
>
> Also, how do you choose the smallest alpha value (or in other words, how
> do you choose eps)? I came across an unofficial third-party description
> of glmnet that said that if nobs < nvars, a higher value is chosen
> (0.01, I think), whereas if nobs > nvars, a smaller value is chosen
> (say, 0.0001). The basic idea makes sense, but it seems a bit ad hoc to
> me, and it seems like it would be sensible to have more than two
> possible values, based on the ratio of nobs to nvars. Any thoughts?
>
> > hi James,
> >
> > for a given value of l1_ratio, the grid of alphas is chosen in log scale
> > starting from alpha_max to alpha_max / 10**eps. Any value of alpha
> > larger than alpha_max will lead to a coef_ full of zeros.
> >
> > HTH
> > Alex
>
>
>
>
> ------------------------------
>
> Message: 4
> Date: Mon, 14 Oct 2013 17:40:33 +0200
> From: Lars Buitinck <larsm...@gmail.com>
> Subject: Re: [Scikit-learn-general] centering of sparse data for
>         elastic net
> To: jdjen...@ucsd.edu,  scikit-learn-general
>         <scikit-learn-general@lists.sourceforge.net>
> Message-ID:
>         <
> cakz-xudhnv8ycdfpv-9m6ts_prva1lnrbycjmbbqdaejlng...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> 2013/10/14 James Jensen <jdjen...@eng.ucsd.edu>:
> > I've noticed people use the term "normalize" in different ways. In the
> case
> > of the `normalize=True` flag of the linear models, does it mean both
> scaling
> > samples to have unit norm and centering them to have mean zero? If so,
> this
> > is inconsistent with the usage in, say, the preprocessing module, where
> > "normalization" refers only to scaling to unit norm, and the word
> > "standardization" is used to refer to doing both (although the function
> to
> > standardize is scale(), and "scale" seems more naturally associated with
> > normalization, in my mind). Because of this, I had supposed that the
> > `normalize=True` flag did not determine centering.
>
> Yes, this is inconsistent with the preprocessing module. "normalize"
> in linear_models is what preprocessing calls "standard scaling".
>
>
>
> ------------------------------
>
> Message: 5
> Date: Mon, 14 Oct 2013 10:00:17 -0600
> From: Nicholas Dronen <ndro...@gmail.com>
> Subject: Re: [Scikit-learn-general] choice of regularization parameter
>         grid for elastic net
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID:
>         <CADJSnkytfDL-Ziy5q9FbKnuhG2GX=0P9wo3zm4=MZ9-Egrx3=
> q...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi, James:
>
> If by 'alpha' you mean what the lasso literature refers to as 'lambda', my
> recollection is that the maximum lambda is determined simply by the L1 norm
> of the coefficients of the ordinary least squares solution, because any
> value greater than that provides no constraint for the lasso solution.
>  This was mentioned in a talk at ICML this year:
>
>
> http://techtalks.tv/talks/the-lasso-persistence-and-cross-validation/58279/
>
> Regards,
>
> Nick
>
>
> On Mon, Oct 14, 2013 at 9:34 AM, James Jensen <jdjen...@eng.ucsd.edu>
> wrote:
>
> > Thanks, Alex. That is helpful. Looks like the glmnet documentation says
> > that this is how they do it as well. What they don't explain is how to
> > find alpha_max in the first place. The only thing I've thought of is
> > doing something like a binary search until you find the smallest alpha
> > yielding the coef_ of zeros, with some limit on how many steps you do it
> > in. But is there a better way?
> >
> > Also, how do you choose the smallest alpha value (or in other words, how
> > do you choose eps)? I came across an unofficial third-party description
> > of glmnet that said that if nobs < nvars, a higher value is chosen
> > (0.01, I think), whereas if nobs > nvars, a smaller value is chosen
> > (say, 0.0001). The basic idea makes sense, but it seems a bit ad hoc to
> > me, and it seems like it would be sensible to have more than two
> > possible values, based on the ratio of nobs to nvars. Any thoughts?
> >
> > > hi James,
> > >
> > > for a given value of l1_ratio, the grid of alphas is chosen in log
> scale
> > > starting from alpha_max to alpha_max / 10**eps. Any value of alpha
> > > larger than alpha_max will lead to a coef_ full of zeros.
> > >
> > > HTH
> > > Alex
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > October Webinars: Code for Performance
> > Free Intel webinars can help you accelerate application performance.
> > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> > from
> > the latest Intel processors and coprocessors. See abstracts and register
> >
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 6
> Date: Mon, 14 Oct 2013 22:02:59 +0530
> From: Ankit Agrawal <aaaagra...@gmail.com>
> Subject: [Scikit-learn-general] Contributing to scikit-learn
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID:
>         <
> caon2cijpbqoxsfncwash7m5qzeaa5e2bdz4d-nz7johhemu...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi,
>
>
>     I am Ankit Agrawal, a 4th year undergrad majoring in EE with
> specialization in Communications and Signal Processing at IIT Bombay. I
> completed my GSoC with scikit-image this year and have a good grasp with
> Python(and a little bit with Cython). I have completed a course in ML, and
> have taken some courses where it is applied, namely Computer Vision, NLP
> and Speech Processing.
>
>     I would like to contribute to scikit-learn to improve my understanding
> of different ML algorithms. I have started going through some parts of the
> documentation and also through the Contributing page. If there are any
> other pointers to go through to get started, please let me know. Thanks.
>
>
> Regards,
> Ankit Agrawal,
> Communication and Signal Processing,
> IIT Bombay.
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
>
> ------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> End of Scikit-learn-general Digest, Vol 45, Issue 16
> ****************************************************
>

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] choice of regularization parameter grid for elastic net

Reply via email to