2011/11/19 Mathieu Blondel <[email protected]>:
> On Wed, Nov 9, 2011 at 4:37 AM, Peter Prettenhofer
> <[email protected]> wrote:
>
>> Unfortunately, I'm not that familiar with "SGD-L1 (Clipped +
>> Lazy-Update)" either - I just quickly skimmed over a technical report
>> of Bob [1]. I agree with your description: it seems to me that the
>> major difference is the fact that the cumulative penalty approach
>> `remembers` the amount that has been clipped and applies it the next
>> time the feature becomes active. The "SGD-L1 (Clipped + Lazy-Update)"
>> approach discards the penalty that has been clipped.
>
> For those interested, I can now confirm that the only difference
> between Langford/Carpenter's method and Tsuruoka's is that the latter
> only remembers the amount of penalty *actually* received by each
> weight, i.e., the amount of penalty that crossed zero is buffered for
> the next updates. Therefore, if a given weight happens to become
> non-zero again due to an update, it will receive the buffered penalty.
> With Langford/Carpenter's method, the part that crossed zero is lost.
> Tsuruoka's method is therefore more aggressive.

Thanks for the update Mathieu. I think this kind of implementation
detail should be added to either the docstring or the narrative
documentation or both.

Also do you have any hint whether this has an impact on the test error
in practice on your data?

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to