Thanks for the quick anser Alexandre.
> On Fri, Oct 21, 2011 at 09:24, Andreas Mueller<[email protected]> > wrote: >> Hi everybody. >> I have a question about the implementation of SGD. As far as I can tell, >> it follows Leon Bottou's work while using the learning rate from Pegasos. >> As far as I can tell, a difference between Bottou's SGD and Shwartz's >> Pegasos is the projection step in Pegasos that enforces the >> regularization constrains (if I understood correctly). >> The authors claim that this is an important part of their algorithm. > If I recall correctly in their own code the projection step is almost > always commented out. The really important part of the algorithm is > the learning rate scaled by the strong convexity constant. > > When I implemented pegasos I found out that the projection step made > no difference at all, and hence also commented it out. > That seems pretty weired, given that they presented the projection as an integral part of their algorithm in the first paper. In the journal paper, though, they present it as an variation of the basic algorithm, which supports your experience. Did you do your experiments on text data? Or did you also try some dense datasets? >> What was the reason to favour the version of the algorithm without >> the projection step? Has anyone done any experiments on comparing >> the different SGD approaches? >> I am trying to get into this a bit more and would love to understand >> the differences. >> >> On a related topic: Has any one any experience in using SGD >> for kernelized SVMs? There is the LASVM by Bottou and >> Pegasos can also do kernelized classification. >> Would it be worth including this in sklearn? > I've implemented this in the past, and kernelized pegasos was always > far too slow to be usable, as predicting on a new data point involves > computing the kernel between this data point and every single other > point on which an update has ever happenned. LaSVM is much faster > because it is very clever about keeping its support set small, and it > might be worth implementing. I should have inneficient pure-python > code for it lying around somewhere. > > That seems to be consistent with what they report. I haven't looked into LaSVM much but it sounds interesting. Again, have you tried sparse or dense data? And do you think it makes a difference? Is LaSVM tricky to implement? I might give it to a Student to do on cuda ;) Thanks again, Andy ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
