Re: [Scikit-learn-general] Possible bug in SGD with L1 regularization and question

Mathieu Blondel Fri, 18 Nov 2011 23:36:05 -0800

On Wed, Nov 9, 2011 at 4:37 AM, Peter Prettenhofer
<[email protected]> wrote:


> Unfortunately, I'm not that familiar with "SGD-L1 (Clipped +
> Lazy-Update)" either - I just quickly skimmed over a technical report
> of Bob [1]. I agree with your description: it seems to me that the
> major difference is the fact that the cumulative penalty approach
> `remembers` the amount that has been clipped and applies it the next
> time the feature becomes active. The "SGD-L1 (Clipped + Lazy-Update)"
> approach discards the penalty that has been clipped.

For those interested, I can now confirm that the only difference
between Langford/Carpenter's method and Tsuruoka's is that the latter
only remembers the amount of penalty *actually* received by each
weight, i.e., the amount of penalty that crossed zero is buffered for
the next updates. Therefore, if a given weight happens to become
non-zero again due to an update, it will receive the buffered penalty.
With Langford/Carpenter's method, the part that crossed zero is lost.
Tsuruoka's method is therefore more aggressive.

Mathieu

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Possible bug in SGD with L1 regularization and question

Reply via email to