So if I'm understanding what you are saying is, simply put, that I should investigate the use L_1 as my distance measure during my measuring of vector distance within a cluster?
On 1 Mar 2013, at 16:24, Ted Dunning wrote: > What Sean says is just right, except that I was (telegraphically) getting > at a slightly different point with L_1: > > On Wed, Feb 27, 2013 at 7:23 AM, Chris Harrington <[email protected]>wrote: > >> Is L_1 regularization the same as manhattan distance? >> > > L_1 metric is manhattan distance, yes. > > L_1 regularization of k-means refers to something a little bit different. > > The idea with regularization is that you add some sort of penalty to the > function you are optimizing. This penalty pushes the optimization toward a > solution that you would prefer on some other grounds than just the > optimization alone. Regularization often helps in solving underdetermined > systems where there are an infinite number of solutions and we have to pick > a preferred solution. > > There isn't anything that says that you have to be optimizing the same kind > of function as the regularization. Thus k-means, which is inherently > optimizing squared error can quite reasonably be regularized with L_1 (sum > of the absolute value of the centroids' coefficients). > > I haven't tried this at all seriously yet. L_1 regularization tends to > help drive toward sparsity, but it is normally used in convex problems > where we can guarantee a findable global optimum. The k-means problem, > however, is not convex so adding the regularization may screw things up in > practice. For text-like data, I have a strong intuition that the idealized > effect of L_1 should be very good, but the pragmatic effect may not be so > useful.
