In most SGD papers I know, people do:

1) Sample instance x_i
2) Predict label for x_i
3) Regularize weight
4) Update weight if non-zero loss suffered

However, J. Langford and B. Carpenter do:

1) Sample instance x_i
2) Regularize weight
3) Predict label for x_i
4) Update weight if non-zero loss suffered

Regularization doesn't depend on the prediction and regularization may
change the prediction so I guess it makes senses to do it like J.
Langford and B. Carpenter but, do people have feedback about which one
is usually empirically better?

Mathieu

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to