Yes, the regularization term just adds a bunch of (theta_i)^2 terms. The partial derivative with respect to theta_i is simply 2*theta_i since all the other new regularization terms are 0 w.r.t. theta_i. The regularization term just adds the weight vector itself to the gradient -- simples.
... give or take a factor of 2. To be fair there is minor variation in convention here; some put a factor of 1/2 in front of the L2 regularization term to absorb the 2 in the partial derivatives, for tidiness. It doesn't matter in the sense that it's the same as using a lambda half as large, but then again, that does matter if you're trying to make apples-to-apples comparisons with another implementation. See about slide 20 here for some clear equations: http://people.cs.umass.edu/~sheldon/teaching/2012fa/ml/files/lec7-annotated.pdf And now I have basically the same question. I'm not sure I get how the code in Updater implements L2 regression. I see the weights-minus-gradient part, but the division by the scalar doesn't look right immediately. It looks like the shrinking term but then there should be a minus in there, and it ought to be a multiplier on the old weights only? Heh, if it's a slightly different definition, it would really make Walrus's point! On Thu, Jan 9, 2014 at 7:10 PM, Evan R. Sparks <[email protected]> wrote: > Hi, > > The L2 update rule is derived from the derivative of the loss function with > respect to the model weights - an L2 regularized loss function contains an > additional additive term involving the weights. This paper provides some > useful mathematical background: > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.7377 > > The code that computes the new L2 weight is here: > https://github.com/apache/incubator-spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/optimization/Updater.scala#L90 > > The compute function calculates the new weights based on the current weights > gradient as computed at each step. Contrast it with the code in the > SimpleUpdater class to get a sense for how the regularization parameter is > incorporated - it's fairly simple. > > In general, though, I agree it makes sense to include a discussion of the > algorithm and a reference to the specific version we implement in the > scaladoc. > > - Evan > > > On Thu, Jan 9, 2014 at 10:49 AM, Walrus theCat <[email protected]> > wrote: >> >> No -- I'm not, and I appreciate the comment. What I'm looking for is a >> specific mathematical formula that I can map to the source code. >> >> Personally, specifically, I'd like to see how the loss function gets >> embedded into the w (gradient), in the case of the regularized and >> unregularized operation. >> >> Looking through the source, the "loss history" makes sense to me, but I >> can't see how that translates into the effect on the gradient. >> >> >> On Thu, Jan 9, 2014 at 10:39 AM, Sean Owen <[email protected]> wrote: >>> >>> L2 regularization just means "regularizing by penalizing parameters >>> whose L2 norm is large", and L2 norm just means squared length. It's >>> not something you would write an ML paper on any more than what the >>> vector dot product is. Are you asking something else? >>> >>> On Thu, Jan 9, 2014 at 6:19 PM, Walrus theCat <[email protected]> >>> wrote: >>> > Thanks Christopher, >>> > >>> > I wanted to know if there was a specific paper this particular codebase >>> > was >>> > based on. For instance, Weka cites papers in their documentation. >>> > >>> > >>> > On Wed, Jan 8, 2014 at 7:10 PM, Christopher Nguyen <[email protected]> >>> > wrote: >>> >> >>> >> Walrus, given the question, this may be a good place for you to start. >>> >> There's some good discussion there as well as links to papers. >>> >> >>> >> >>> >> >>> >> http://www.quora.com/Machine-Learning/What-is-the-difference-between-L1-and-L2-regularization >>> >> >>> >> Sent while mobile. Pls excuse typos etc. >>> >> >>> >> On Jan 8, 2014 2:24 PM, "Walrus theCat" <[email protected]> >>> >> wrote: >>> >>> >>> >>> Hi, >>> >>> >>> >>> Can someone point me to the paper that algorithm is based on? >>> >>> >>> >>> Thanks >>> > >>> > >> >> >
