No -- I'm not, and I appreciate the comment.  What I'm looking for is a
specific mathematical formula that I can map to the source code.

Personally, specifically, I'd like to see how the loss function gets
embedded into the w (gradient), in the case of the regularized and
unregularized operation.

Looking through the source, the "loss history" makes sense to me, but I
can't see how that translates into the effect on the gradient.


On Thu, Jan 9, 2014 at 10:39 AM, Sean Owen <[email protected]> wrote:

> L2 regularization just means "regularizing by penalizing parameters
> whose L2 norm is large", and L2 norm just means squared length. It's
> not something you would write an ML paper on any more than what the
> vector dot product is. Are you asking something else?
>
> On Thu, Jan 9, 2014 at 6:19 PM, Walrus theCat <[email protected]>
> wrote:
> > Thanks Christopher,
> >
> > I wanted to know if there was a specific paper this particular codebase
> was
> > based on.  For instance, Weka cites papers in their documentation.
> >
> >
> > On Wed, Jan 8, 2014 at 7:10 PM, Christopher Nguyen <[email protected]>
> wrote:
> >>
> >> Walrus, given the question, this may be a good place for you to start.
> >> There's some good discussion there as well as links to papers.
> >>
> >>
> >>
> http://www.quora.com/Machine-Learning/What-is-the-difference-between-L1-and-L2-regularization
> >>
> >> Sent while mobile. Pls excuse typos etc.
> >>
> >> On Jan 8, 2014 2:24 PM, "Walrus theCat" <[email protected]> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Can someone point me to the paper that algorithm is based on?
> >>>
> >>> Thanks
> >
> >
>

Reply via email to