Hi everybody. Could someone please explain to me the learning rate heuristic in SGD? Why is \eta_0 initialized the way it is? Peter Prettenhofer mentioned it is taken from Leon Bottou's sgd code. I found it there but no further explanation. Is it explained in any of the papers? I could not find it. I thought the initial learning rate in sgd is choosen using a subset of the training set. This seems to be in contradiction to using the heuristic. Which one is actually used?
Also, I think there is a typo in the doc where they explain the learning rate schedule. In 3.3.6.1 SGD, below the formular fo the schedule, it says "t_0 is the time step [...], t_0 is choosen automatically". I think the first "t_0" should actually be "t". Is that right? Any help would be appreciated! Thanks, Andy ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
