I found that for sparse data, the scikit implementation of sgd uses an 
intercept_decay variable set to .01 (SPARSE_INTERCEPT_DECAY) to avoid intercept 
oscillation. Shouldn't this be determined by the learning_rate instead? I'm 
asking because it adds a layer of tuning that the user doesn't have control 
over.
Danny                                     
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to