On Wed, Nov 9, 2011 at 4:37 AM, Peter Prettenhofer <[email protected]> wrote:
> Unfortunately, I'm not that familiar with "SGD-L1 (Clipped + > Lazy-Update)" either - I just quickly skimmed over a technical report > of Bob [1]. I agree with your description: it seems to me that the > major difference is the fact that the cumulative penalty approach > `remembers` the amount that has been clipped and applies it the next > time the feature becomes active. The "SGD-L1 (Clipped + Lazy-Update)" > approach discards the penalty that has been clipped. For those interested, I can now confirm that the only difference between Langford/Carpenter's method and Tsuruoka's is that the latter only remembers the amount of penalty *actually* received by each weight, i.e., the amount of penalty that crossed zero is buffered for the next updates. Therefore, if a given weight happens to become non-zero again due to an update, it will receive the buffered penalty. With Langford/Carpenter's method, the part that crossed zero is lost. Tsuruoka's method is therefore more aggressive. Mathieu ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
