On Wed, Nov 9, 2011 at 4:37 AM, Peter Prettenhofer <[email protected]> wrote:
> I'm aware of the issue - it seems to me that Bob is right but I can > hardly tell based on empirical evidence. Truncated gradient is quite a > crude procedure anyways - Olivier once suggested to use a projected > gradient approach instead (see J. Duchi's work). Anyways, the current > approach does a reasonable job to obtain a sparse solution with high > predictive performance (I compared mostly with liblinear tough). For me, Duchi's L1 ball projection method is only useful if your data is dense because their basic algorithm has a linear dependency on the number of features (zero or not). They also developed an algorithm using red-black trees with a logarithmic dependency on the number of features but it looks really complicated. In comparison, the unconstrained formulations like Carpenter's and Langford's only depend on the number of *non-zero* features thanks to the lazy-update trick. D. Sculley proposed a simpler algorithm based on a binary search in his web scale k-means paper. The linear-time algorithm of Duchi is more complicated. Again, it makes more sense if your data is dense (which is the case for cluster centers). > BTW: presentation, code and errata of the paper can be found on the > authors website > http://www.logos.ic.i.u-tokyo.ac.jp/~tsuruoka/ Good to know, I'll have a look later. Mathieu ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
