No -- I'm not, and I appreciate the comment. What I'm looking for is a specific mathematical formula that I can map to the source code.
Personally, specifically, I'd like to see how the loss function gets embedded into the w (gradient), in the case of the regularized and unregularized operation. Looking through the source, the "loss history" makes sense to me, but I can't see how that translates into the effect on the gradient. On Thu, Jan 9, 2014 at 10:39 AM, Sean Owen <[email protected]> wrote: > L2 regularization just means "regularizing by penalizing parameters > whose L2 norm is large", and L2 norm just means squared length. It's > not something you would write an ML paper on any more than what the > vector dot product is. Are you asking something else? > > On Thu, Jan 9, 2014 at 6:19 PM, Walrus theCat <[email protected]> > wrote: > > Thanks Christopher, > > > > I wanted to know if there was a specific paper this particular codebase > was > > based on. For instance, Weka cites papers in their documentation. > > > > > > On Wed, Jan 8, 2014 at 7:10 PM, Christopher Nguyen <[email protected]> > wrote: > >> > >> Walrus, given the question, this may be a good place for you to start. > >> There's some good discussion there as well as links to papers. > >> > >> > >> > http://www.quora.com/Machine-Learning/What-is-the-difference-between-L1-and-L2-regularization > >> > >> Sent while mobile. Pls excuse typos etc. > >> > >> On Jan 8, 2014 2:24 PM, "Walrus theCat" <[email protected]> wrote: > >>> > >>> Hi, > >>> > >>> Can someone point me to the paper that algorithm is based on? > >>> > >>> Thanks > > > > >
