What Sean says is just right, except that I was (telegraphically) getting at a slightly different point with L_1:
On Wed, Feb 27, 2013 at 7:23 AM, Chris Harrington <[email protected]>wrote: > Is L_1 regularization the same as manhattan distance? > L_1 metric is manhattan distance, yes. L_1 regularization of k-means refers to something a little bit different. The idea with regularization is that you add some sort of penalty to the function you are optimizing. This penalty pushes the optimization toward a solution that you would prefer on some other grounds than just the optimization alone. Regularization often helps in solving underdetermined systems where there are an infinite number of solutions and we have to pick a preferred solution. There isn't anything that says that you have to be optimizing the same kind of function as the regularization. Thus k-means, which is inherently optimizing squared error can quite reasonably be regularized with L_1 (sum of the absolute value of the centroids' coefficients). I haven't tried this at all seriously yet. L_1 regularization tends to help drive toward sparsity, but it is normally used in convex problems where we can guarantee a findable global optimum. The k-means problem, however, is not convex so adding the regularization may screw things up in practice. For text-like data, I have a strong intuition that the idealized effect of L_1 should be very good, but the pragmatic effect may not be so useful.
