My experience with Pegasos is also a bit mixed... on the one hand, it requires less hyper-parameter tuning than plain SGD. On the other hand, according to my experience properly tuned hyper-parameters for SGD outperform Pegasos.
Scikit-learns SGD uses Leon Bottou's algorithm: he adopted the learning rate schedule of Pegasos and combines it with a heuristic to determine the initial learning rate. best, Peter Disclaimer: My experience is heavily biased towards high-dimensional, sparse problems. 2011/10/21 Alexandre Passos <[email protected]>: > On Fri, Oct 21, 2011 at 09:24, Andreas Mueller <[email protected]> > wrote: >> Hi everybody. >> I have a question about the implementation of SGD. As far as I can tell, >> it follows Leon Bottou's work while using the learning rate from Pegasos. >> As far as I can tell, a difference between Bottou's SGD and Shwartz's >> Pegasos is the projection step in Pegasos that enforces the >> regularization constrains (if I understood correctly). >> The authors claim that this is an important part of their algorithm. > > If I recall correctly in their own code the projection step is almost > always commented out. The really important part of the algorithm is > the learning rate scaled by the strong convexity constant. > > When I implemented pegasos I found out that the projection step made > no difference at all, and hence also commented it out. > >> What was the reason to favour the version of the algorithm without >> the projection step? Has anyone done any experiments on comparing >> the different SGD approaches? >> I am trying to get into this a bit more and would love to understand >> the differences. >> >> On a related topic: Has any one any experience in using SGD >> for kernelized SVMs? There is the LASVM by Bottou and >> Pegasos can also do kernelized classification. >> Would it be worth including this in sklearn? > > I've implemented this in the past, and kernelized pegasos was always > far too slow to be usable, as predicting on a new data point involves > computing the kernel between this data point and every single other > point on which an update has ever happenned. LaSVM is much faster > because it is very clever about keeping its support set small, and it > might be worth implementing. I should have inneficient pure-python > code for it lying around somewhere. > > > -- > - Alexandre > > ------------------------------------------------------------------------------ > The demand for IT networking professionals continues to grow, and the > demand for specialized networking skills is growing even more rapidly. > Take a complimentary Learning@Cisco Self-Assessment and learn > about Cisco certifications, training, and career opportunities. > http://p.sf.net/sfu/cisco-dev2dev > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > -- Peter Prettenhofer ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
