Yes, I was about to answer the same thing: SGD is great when n_samples > n_features, but the situation n_samples << n_features also exists.
In such situation, I believe that a cyclic coordinate descent with a clever way of choosing the coordinates is the fastest approach. In some sens it is the transpose of the SGD (hand-wavingly). I would indeed like to see a fast coordinate descent solver for logistic regression. I am more interested in the l1 penalty, but the l2 penalty is also useful. Multinomial loss could fall in such work. For such contribution to be actually useful, I'd like the code to be really fast with large n_features: we don't need a solver that doesn't scale to real problem. I am not an expert, but I think that a reference that I recently mentionned could be useful: http://www.jmlr.org/papers/volume11/yuan10c/yuan10c.pdf Obviously doing this right is quite a lot of work. I think that my group could invest some efforts in this direction. We were starting to discuss this a bit. Gaƫl ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
