Re: [Scikit-learn-general] Speeding up lasso_path by a factor of 2000

Gael Varoquaux Fri, 15 Feb 2013 07:21:01 -0800

On Fri, Feb 15, 2013 at 03:45:03PM +0100, Olivier Grisel wrote:
> Alright for minimizing the RMSE on the training set but on the test
> set the error might be minimized in between kinks as on the on real
> distribution, the kinks location might be slightly off the positions
> found on the training set or am I making a mistake?


Darn, I made the mistake myself. I remembered that I had to think about
this carefully when I wrote LassoLarsCV, but I spoke too quickly this
morning (that's the problem with doing too many things in parallel).
Charles-Pierre, please accept my apologies.

The problem is not that the kink positions and shifted, the problem is
that quadratic function. The residuals are given by:

  res = y - np.dot(X, coef)

where X and y can be train, test, or whatever you want.

The RMSE is np.sqrt(np.sum(res ** 2)), I like to think about the square
of the RMSE, because it is easier, and has the same minima: np.sum(res **
2).

Anyhow, this is not a piecewise affine function, it is a piecewise
quadratic function, and can have minima between knots. My claim was
bullshit. Sorry Charles-Pierre. However, there was a motivation behind
writing the _lars_path_residuals function, and to make up for my false
claims, let me walk you through how it can be used to efficiently solve
your problem:

The good news is that the differential of the squared residual norms is
piecewise affine:

 D = np.dot(X.T, res)

So to find where the minimum in path actually lies, we can find the 3
knots in the path bordering the minimum by a simple argsort. Or, more
cleanly, check where all derivative changes sign:

   not np.all((D_k >= 0) * (D_(k - 1) <= 0))

Where the 'k' index denotes the knot in the path.

Finding the exact location of the minimum is then a question of solving D
== 0, using the fact that coef (or more simple res) is affine in this
domain. This is a simple linear algebra problem.

This, I believe, is the right way to do it. While it may seem more
complex/costly to do than a simple grid, I have found that it is much
more robust, because the knots of the path concentrate in the interesting
regions, and to cover these regions as well with a grid requires many
points). I have not implement this specific approach, sorry, just a
variant, so I cannot provide the code. However, the _lars_path_residuals
is the cornerstone of it. You can also have a look at the LassoCV, which
contains useful code (and that can be improved).

Actually, if you want to contribute a function that given an X matrices,
and residuals and alphas corresonding to knots in the path, implements
the above strategy, that would be a very useful contribution.

Thanks for the discussion!

Gaël

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Speeding up lasso_path by a factor of 2000

Reply via email to