What Sean says is just right, except that I was (telegraphically) getting
at a slightly different point with L_1:

On Wed, Feb 27, 2013 at 7:23 AM, Chris Harrington <[email protected]>wrote:

> Is L_1 regularization the same as manhattan distance?
>

L_1 metric is manhattan distance, yes.

L_1 regularization of k-means refers to something a little bit different.

The idea with regularization is that you add some sort of penalty to the
function you are optimizing.  This penalty pushes the optimization toward a
solution that you would prefer on some other grounds than just the
optimization alone.  Regularization often helps in solving underdetermined
systems where there are an infinite number of solutions and we have to pick
a preferred solution.

There isn't anything that says that you have to be optimizing the same kind
of function as the regularization.  Thus k-means, which is inherently
optimizing squared error can quite reasonably be regularized with L_1 (sum
of the absolute value of the centroids' coefficients).

I haven't tried this at all seriously yet.  L_1 regularization tends to
help drive toward sparsity, but it is normally used in convex problems
where we can guarantee a findable global optimum.  The k-means problem,
however, is not convex so adding the regularization may screw things up in
practice.  For text-like data, I have a strong intuition that the idealized
effect of L_1 should be very good, but the pragmatic effect may not be so
useful.

Reply via email to