Hi everybody.
Could someone please explain to me the learning rate heuristic in SGD?
Why is \eta_0 initialized the way it is?
Peter Prettenhofer mentioned it is taken from Leon Bottou's
sgd code. I found it there but no further explanation.
Is it explained in any of the papers? I could not find it.
I thought the initial learning rate in sgd is choosen using
a subset of the training set. This seems to be in
contradiction to using the heuristic.
Which one is actually used?

Also, I think there is a typo in the doc where they
explain the learning rate schedule.
In 3.3.6.1 SGD, below the formular fo the schedule,
it says "t_0 is the time step [...], t_0 is choosen
automatically". I think the first "t_0" should
actually be "t". Is that right?

Any help would be appreciated!

Thanks,
Andy

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to