Thanks to everyone for their help with this. From your input, I now know how to compute the maximum regularization strength for both lasso and elastic net. I thought my problem was solved, but I'm realizing that it probably isn't, and I'll explain why. If anyone has ideas of how to approach the remaining issue, I'd welcome them; if not, I understand that this is beginning to stray from the methods directly implemented in scikit-learn.
As I mentioned earlier, I'm using elastic net to calculate regularized canonical correlation. Given data matrices X and Y, I find coefficient vectors a and b that maximize the correlation between Xa and Yb. This can be done by iteratively regressing X on Yb (to estimate a) and then Y on Xa (to estimate b), and repeating these two regressions until convergence. Here's the thing: the method for choosing the maximum regularization strength for elastic net depends on the training and target data, but in this iterative approach for CCA, these change at each iteration. It occurred to me that one could initially, for each pair of candidate L1 ratios (pair because there's one for each of the two regressions), calculate and use the pair of highest alphas (again, one for each regression) at each iteration, return them after reaching convergence to a CCA solution, and use them to define the two grids of other alpha values to try. But it seems like that would probably not work. There's no guarantee that it would yield the highest possible alphas, is there? Because of the route it would take to convergence (if it even does converge) with the regularization changing each time? I've been told that elastic net can also be parameterized by the number of desired nonzero coefficients, rather than the Lagrange parameter, although this isn't how it's implemented in scikit-learn. I'm interested in the idea. But wouldn't this have the same problem, in a sense? That is, the amount of regularization to guarantee n nonzero coeffs would change at each iteration, etc. I'm afraid that deriving an expression for the maximum alphas in this CCA case is probably beyond me mathematically. I suppose I can go back to the idea of something like a binary search in the parameter space; it's just that the possibility of there being a way to compute them directly makes me not want to settle for this less efficient and less precise option. Would it be valid to do the iterative regression to solve CCA with OLS, and take the L1 norms of each of the resulting coefficient sets to be the maximum regularization strength for CCA with lasso? This would at least give lower bounds to the alphas for CCA with elastic net, though I'm not sure how helpful that would be. Another concern: glmnet solves candidate models in descending order of regularization strength, and makes use of warm starts to speed it up considerably. The scikit-learn implementation of elastic net supports warm starts. However, the CCA problem with iterative regression is biconvex, not convex. Would warm starting present a problem in the biconvex case because there can be local optima? P.S. Thanks, Alex, for pointing out where the alpha_max comes from in _alpha_grid(). I'd take your word for it; still, as Olivier said, would you mind pointing me to a reference or explaining the derivation? Maybe it would give me some ideas or leads. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general