Re: [Scikit-learn-general] choice of regularization parameter grid for elastic net

James Jensen Mon, 14 Oct 2013 17:03:24 -0700

John, you're right about the difference in nomenclature. I've been usingscikit-learn's names for the parameters, so the alpha I've referred tois the regularization strength and corresponds to lambda in glmnet. Themixing parameter, referred to in glmnet as alpha, is the L1-ratio inscikit-learn.

Nick, thank you very much for the tip on how the L1 norm of an OLSsolution is used to determine the maximum regularization strength forlasso. Thinking about how that would extend to elastic net: with anL1-ratio of 1, alpha_max is the L1 norm of an OLS solution, becauseelastic net reduces to lasso in this case. But with L1-ratios betweenzero and one, couldn't alpha_max be greater than the L1 norm of an OLSsolution since alpha_max for the elastic net is not the L1regularization strength, but rather the overall regularization strength,distributed between L1 and L2? As the ElasticNet documentation says,alpha = L1 strength + L2 strength, and L1-ratio= L1 strength / (L1strength + L2 strength). It seems like the alpha_max for elastic netwith a given L1-ratio could be some function of both the L1 and L2 normsof an OLS solution, and it might be a simple combination. But I haven'tfound it browsing the literature, and I am unsure of how to derive it.

I did find the part in coordinate_descent.py where alpha_max is chosen,but I don't fully understand the reasoning behind it:


   alpha_max = np.abs(Xy).max() / (n_samples * l1_ratio)

Another concern: if the data does not have mean zero and/or unitvariance (I've been told this might be ok if, for example, I want topreserve sparsity in the input), might this affect the magnitude of thesolution coefficients and hence the calculation of alpha_max?

And I'm still not sure how to pick the smallest value of alpha (orrather "eps," the ratio between the largest and smallest values).

Now for the L1-ratio. The ElasticNetCV class does not automaticallychoose a set of L1-ratios to test, as it does with the alphas; it's upto the user to supply them. However, it does mention in thedocumentation for ElasticNetCV:


   /Note that a good choice of list of values for l1_ratio is often to
   put more values close to 1 (i.e. Lasso) and less close to 0 (i.e.
   Ridge), as in //[.1,////.5,////.7,////.9,////.95,////.99,////1]/

I understand John's reasoning that good L1-ratios are likely to behigher the greater the proportion of variables to samples. If anyoneknows of other considerations that could go into choosing an appropriateset of L1-ratios, let me know.

Lastly: I was excited about the idea of trying first with a sparse gridand then repeating the search in more detail in the area of parametervalues yielding high cross-validation scores. However, I notice in thepaper associated with Nick's link that it says "In practice, an upperbound must be selected for any grid-search optimization [over values ofthe L1 regularization parameter]. Note that more advanced optimizationtechniques are generally not practical as the CV objective function[...] is often noisy." Any thoughts on this?

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] choice of regularization parameter grid for elastic net

Reply via email to