Le 09/11/2011 14:32, Mathieu Blondel a écrit : > On Wed, Nov 9, 2011 at 4:37 AM, Peter Prettenhofer > <[email protected]> wrote: > >> I'm aware of the issue - it seems to me that Bob is right but I can >> hardly tell based on empirical evidence. Truncated gradient is quite a >> crude procedure anyways - Olivier once suggested to use a projected >> gradient approach instead (see J. Duchi's work). Anyways, the current >> approach does a reasonable job to obtain a sparse solution with high >> predictive performance (I compared mostly with liblinear tough). > For me, Duchi's L1 ball projection method is only useful if your data > is dense because their basic algorithm has a linear dependency on the > number of features (zero or not). They also developed an algorithm > using red-black trees with a logarithmic dependency on the number of > features but it looks really complicated. In comparison, the > unconstrained formulations like Carpenter's and Langford's only depend > on the number of *non-zero* features thanks to the lazy-update trick. This might be a bit off-topic, but I am not sure the linear complexity of Duchi's algorithm (simple version) is such a big issue in practice (and practicality beats purity, right ;-)).
For my own needs (projected gradient descent), I quickly implemented it here: https://gist.github.com/1272551 (I tested it against Duchi's own Matlab code). Despite its linear complexity, it has the nice perk of being super simple and scales easily to dense vectors with millions of dimensions. The bottleneck is probably in computing the cumulative sums of the sorted vector, which is already quite fast. It could be even faster with an ad-hoc implementation for sparse vectors if necessary, but my guess is that you would need really huge and very sparse vectors for it to be worth your while. My 1.99 cents (not really worth 2). Adrien > D. Sculley proposed a simpler algorithm based on a binary search in > his web scale k-means paper. The linear-time algorithm of Duchi is > more complicated. Again, it makes more sense if your data is dense > (which is the case for cluster centers). > >> BTW: presentation, code and errata of the paper can be found on the >> authors website >> http://www.logos.ic.i.u-tokyo.ac.jp/~tsuruoka/ > Good to know, I'll have a look later. > > Mathieu > > ------------------------------------------------------------------------------ > RSA(R) Conference 2012 > Save $700 by Nov 18 > Register now > http://p.sf.net/sfu/rsa-sfdev2dev1 > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
