In most SGD papers I know, people do: 1) Sample instance x_i 2) Predict label for x_i 3) Regularize weight 4) Update weight if non-zero loss suffered
However, J. Langford and B. Carpenter do: 1) Sample instance x_i 2) Regularize weight 3) Predict label for x_i 4) Update weight if non-zero loss suffered Regularization doesn't depend on the prediction and regularization may change the prediction so I guess it makes senses to do it like J. Langford and B. Carpenter but, do people have feedback about which one is usually empirically better? Mathieu ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
