Hello, I was re-reading Tsuruoka's paper, based on which the SGDClassifier implements L1 regularization and found this interesting post (as usual?) by Bob Carpenter:
http://lingpipe-blog.com/2009/09/18/tsuruoka-tsujii-ananiadou-2009-stochastic-gradient-descent-training-for-l1-regularized-log-linear-models-with-cumulative-penalty/ He mentions a possible bug in the paper which if confirmed also affects SGDClassifier in scikit-learn: the lazy updates which are buffered towards the end of the training may well never be applied, so it's necessary to apply the L1 regularization routine one more time after the for loop to be sure that all buffered updates have been applied. This means that the solutions found by the current SGDClassifier may not be as sparse as they could be. As an aside, I'm not sure I fully grasp the difference between what the authors call "SGD-L1 (Clipped + Lazy-Update)" and the proposed cumulative penalty method. In the former case, since the weights won't take part in the inner product until the corresponding features are actually activated in an instance, we can just buffer the updates until we actually need to make them. In the cumulative penalty case, they basically do the same, i.e., they accumulate the penalties. The only potential difference I see is that they accumulate the penalties even if the weight is clipped to 0 (line 21 in Figure 2). Their method obtains much sparser solution than just "SGD-L1 (Clipped + Lazy-Update)" so I guess I'm missing something. Any explanation would be greatly appreciated. Link to the paper: http://www.aclweb.org/anthology-new/P/P09/P09-1054.pdf Mathieu ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
