My experience with Pegasos is also a bit mixed... on the one hand, it
requires less hyper-parameter tuning than plain SGD. On the other
hand, according to my experience properly tuned hyper-parameters for
SGD outperform Pegasos.

Scikit-learns SGD uses Leon Bottou's algorithm: he adopted the
learning rate schedule of Pegasos and combines it with a heuristic to
determine the initial learning rate.

best,
 Peter

Disclaimer: My experience is heavily biased towards high-dimensional,
sparse problems.

2011/10/21 Alexandre Passos <[email protected]>:
> On Fri, Oct 21, 2011 at 09:24, Andreas Mueller <[email protected]> 
> wrote:
>> Hi everybody.
>> I have a question about the implementation of SGD. As far as I can tell,
>> it follows Leon Bottou's work while using the learning rate from Pegasos.
>> As far as I can tell, a difference between Bottou's SGD and Shwartz's
>> Pegasos is the projection step in Pegasos that enforces the
>> regularization constrains (if I understood correctly).
>> The authors claim that this is an important part of their algorithm.
>
> If I recall correctly in their own code the projection step is almost
> always commented out. The really important part of the algorithm is
> the learning rate scaled by the strong convexity constant.
>
> When I implemented pegasos I found out that the projection step made
> no difference at all, and hence also commented it out.
>
>> What was the reason to favour the version of the algorithm without
>> the projection step? Has anyone done any experiments on comparing
>> the different SGD approaches?
>> I am trying to get into this a bit more and would love to understand
>> the differences.
>>
>> On a related topic: Has any one any experience in using SGD
>> for kernelized SVMs? There is the LASVM by Bottou and
>> Pegasos can also do kernelized classification.
>> Would it be worth including this in sklearn?
>
> I've implemented this in the past, and kernelized pegasos was always
> far too slow to be usable, as predicting on a new data point involves
> computing the kernel between this data point and every single other
> point on which an update has ever happenned. LaSVM is much faster
> because it is very clever about keeping its support set small, and it
> might be worth implementing. I should have inneficient pure-python
> code for it lying around somewhere.
>
>
> --
>  - Alexandre
>
> ------------------------------------------------------------------------------
> The demand for IT networking professionals continues to grow, and the
> demand for specialized networking skills is growing even more rapidly.
> Take a complimentary Learning@Cisco Self-Assessment and learn
> about Cisco certifications, training, and career opportunities.
> http://p.sf.net/sfu/cisco-dev2dev
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>



-- 
Peter Prettenhofer

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to