Hello,

I was re-reading Tsuruoka's paper, based on which the SGDClassifier
implements L1 regularization and found this interesting post (as
usual?) by Bob Carpenter:

http://lingpipe-blog.com/2009/09/18/tsuruoka-tsujii-ananiadou-2009-stochastic-gradient-descent-training-for-l1-regularized-log-linear-models-with-cumulative-penalty/

He mentions a possible bug in the paper which if confirmed also
affects SGDClassifier in scikit-learn: the lazy updates which are
buffered towards the end of the training may well never be applied, so
it's necessary to apply the L1 regularization routine one more time
after the for loop to be sure that all buffered updates have been
applied. This means that the solutions found by the current
SGDClassifier may not be as sparse as they could be.

As an aside, I'm not sure I fully grasp the difference between what
the authors call "SGD-L1 (Clipped + Lazy-Update)" and the proposed
cumulative penalty method. In the former case, since the weights won't
take part in the inner product until the corresponding features are
actually activated in an instance, we can just buffer the updates
until we actually need to make them. In the cumulative penalty case,
they basically do the same, i.e., they accumulate the penalties. The
only potential difference I see is that they accumulate the penalties
even if the weight is clipped to 0 (line 21 in Figure 2). Their method
obtains much sparser solution than just "SGD-L1 (Clipped +
Lazy-Update)" so I guess I'm missing something. Any explanation would
be greatly appreciated.

Link to the paper:
http://www.aclweb.org/anthology-new/P/P09/P09-1054.pdf

Mathieu

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to