Re: [Scikit-learn-general] Possible bug in SGD with L1 regularization and question

Mathieu Blondel Wed, 09 Nov 2011 05:33:05 -0800

On Wed, Nov 9, 2011 at 4:37 AM, Peter Prettenhofer
<[email protected]> wrote:


> I'm aware of the issue - it seems to me that Bob is right but I can
> hardly tell based on empirical evidence. Truncated gradient is quite a
> crude procedure anyways - Olivier once suggested to use a projected
> gradient approach instead (see J. Duchi's work). Anyways, the current
> approach does a reasonable job to obtain a sparse solution with high
> predictive performance (I compared mostly with liblinear tough).

For me, Duchi's L1 ball projection method is only useful if your data
is dense because their basic algorithm has a linear dependency on the
number of features (zero or not). They also developed an algorithm
using red-black trees with a logarithmic dependency on the number of
features but it looks really complicated. In comparison, the
unconstrained formulations like Carpenter's and Langford's only depend
on the number of *non-zero* features thanks to the lazy-update trick.

D. Sculley proposed a simpler algorithm based on a binary search in
his web scale k-means paper. The linear-time algorithm of Duchi is
more complicated. Again, it makes more sense if your data is dense
(which is the case for cluster centers).

> BTW: presentation, code and errata of the paper can be found on the
> authors website
>  http://www.logos.ic.i.u-tokyo.ac.jp/~tsuruoka/

Good to know, I'll have a look later.

Mathieu

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Possible bug in SGD with L1 regularization and question

Reply via email to