Thanks to everyone for their help with this.

 From your input, I now know how to compute the maximum regularization 
strength for both lasso and elastic net. I thought my problem was 
solved, but I'm realizing that it probably isn't, and I'll explain why. 
If anyone has ideas of how to approach the remaining issue, I'd welcome 
them; if not, I understand that this is beginning to stray from the 
methods directly implemented in scikit-learn.

As I mentioned earlier, I'm using elastic net to calculate regularized 
canonical correlation. Given data matrices X and Y, I find coefficient 
vectors a and b that maximize the correlation between Xa and Yb. This 
can be done by iteratively regressing X on Yb (to estimate a) and then Y 
on Xa (to estimate b), and repeating these two regressions until 
convergence.

Here's the thing: the method for choosing the maximum regularization 
strength for elastic net depends on the training and target data, but in 
this iterative approach for CCA, these change at each iteration.

It occurred to me that one could initially, for each pair of candidate 
L1 ratios (pair because there's one for each of the two regressions), 
calculate and use the pair of highest alphas (again, one for each 
regression) at each iteration, return them after reaching convergence to 
a CCA solution, and use them to define the two grids of other alpha 
values to try. But it seems like that would probably not work. There's 
no guarantee that it would yield the highest possible alphas, is there? 
Because of the route it would take to convergence (if it even does 
converge) with the regularization changing each time?

I've been told that elastic net can also be parameterized by the number 
of desired nonzero coefficients, rather than the Lagrange parameter, 
although this isn't how it's implemented in scikit-learn. I'm interested 
in the idea. But wouldn't this have the same problem, in a sense? That 
is, the amount of regularization to guarantee n nonzero coeffs would 
change at each iteration, etc.

I'm afraid that deriving an expression for the maximum alphas in this 
CCA case is probably beyond me mathematically. I suppose I can go back 
to the idea of something like a binary search in the parameter space; 
it's just that the possibility of there being a way to compute them 
directly makes me not want to settle for this less efficient and less 
precise option.

Would it be valid to do the iterative regression to solve CCA with OLS, 
and take the L1 norms of each of the resulting coefficient sets to be 
the maximum regularization strength for CCA with lasso? This would at 
least give lower bounds to the alphas for CCA with elastic net, though 
I'm not sure how helpful that would be.

Another concern: glmnet solves candidate models in descending order of 
regularization strength, and makes use of warm starts to speed it up 
considerably. The scikit-learn implementation of elastic net supports 
warm starts. However, the CCA problem with iterative regression is 
biconvex, not convex. Would warm starting present a problem in the 
biconvex case because there can be local optima?

P.S. Thanks, Alex, for pointing out where the alpha_max comes from in 
_alpha_grid(). I'd take your word for it; still, as Olivier said, would 
you mind pointing me to a reference or explaining the derivation? Maybe 
it would give me some ideas or leads.

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to