Hello!
You may already be familiar with canonical correlation analysis (CCA).
Given two sets of variables, CCA yields the linear combinations with
maximum correlation between them. It is similar to PCA, which finds
projections with maximum variance for a single set of variables; in fact,
PCA can be treated as a special case of CCA. Sigg et al. 2007 (
http://ml2.inf.ethz.ch/papers/2007/0_nonnegative_cca.pdf) discuss how
canonical correlation can be solved for by iterative regression. This
provides a straightforward and flexible way to apply regularization
penalties and useful constraints such as non-negativity in CCA. I think CCA
would be useful to have in Scikit-learn and that it would be relatively
easy to write a CCA solver using Scikit-learn's existing linear methods. I
have wanted to contribute something myself along these lines but have met
with some difficulties.
CCA involves finding multiple directions that are orthogonal to each other,
much like PCA. For all directions beyond the first, the objective function
to be minimized includes not only the regression and regularization penalty
terms, but also a penalty intended to enforce the orthogonality of the
solution, a term that is zero within the feasible space.
Here is the objective function:
For the [image: k]th direction [image: a_k],
[image: a_k=f=\underset{a}{min\,} { \frac{1}{2n_{samples}} ||X a - Yb||_{2}
^ {2} + \alpha \rho ||a||_1 + \frac{\alpha(1-\rho)}{2} ||a||_{2} ^ {2}} +
\lambda a^{\top}Oa]
which is the elastic net plus the orthogonality penalty term [image: O =
\sum_{l < k} {C_{XX} a_{l} a_{l}^{\top} C_{XX}}]
and [image: C_{XX}] is the covariance matrix of [image: X].
Clearly I can't solve for this using Scikit-learn's elastic net as-is. I
thought perhaps I could find a way to use the same underlying coordinate
descent code to optimize this new function. But I have not been able to
make sense of it. Could anyone give me some pointers? Of course, I would be
delighted if any of the experienced developers were interested in
implementing this themselves, since I am a novice, but I realize that they
have many things to work on already.
Note: since appropriate regularization penalties might not be known
beforehand, and different regularizations can be applied for each of the
two sets of variables, it would be nice to also enable the use of
cross-validation for model selection as in ElasticNetCV. But I should
probably figure out the simpler case first.
I am also not confident about the details of how to arrive at a good value
for [image: \lambda]. I understand that the basic idea is to begin with a
very small value, so that we're close to an unconstrained solution to the
problem (i.e. without the penalty term affecting the solution), and
increase [image: \lambda] (using the previous solution as our initial guess
in each iteration) until the solution no longer changes, i.e. we've
converged to a solution that's within the feasible space. But how do I know
how much to increase [image: \lambda] by at each iteration?
I'm curious about the motivation for using coordinate descent, and about
any other methods that can be used for this sort of thing. From what I
understand, the advantage of coordinate descent here is that the
coefficients for the entire path of the regularization parameter can be
found at very little extra cost. Aside from that, isn't it slower than
gradient-based methods (i.e. takes more iterations to converge)? Although I
guess in this case the derivative of the L1 norm is not defined where any
element of the vector is zero, so perhaps gradient-based methods are out of
the question? Are there other methods that can be used for elastic net that
would accommodate the orthogonality penalty term? Could LARS be adapted to
accommodate it? I see that Sigg et al. use monotone incremental forward
stagewise regression (MIFSR), but they do it with an L1 penalty only, and I
suspect that monotonicity would not hold for the elastic net penalty. If
that's incorrect, let me know.
And I've been told there are probably more sophisticated methods for
dealing with this kind of orthogonality constraint than this penalty
method, but I don't know what these methods are. If anyone is familiar with
this and could direct me to a good resource, I would appreciate it.
Sorry for the long email with many questions. Thanks for your help and for
an extremely useful library. And congrats on version 0.14.
James
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general