So this is constraining the centroids themselves to be 0 in many dimensions... interesting. I assume this comes into play when you move centroids -- you're not minimizing squared distance anymore but that plus an additional penalty. I'd be interested to hear your results Chris if you try it -- versus simply using the L1 distance alone. I agree this does seem to make it all the more important to run it many times and pick the best clustering due to local minima.
On Mon, Mar 4, 2013 at 7:18 PM, Ted Dunning <[email protected]> wrote: > Always good to test, but I think that euclidean distance with L_1 > regularization is probably more interesting. > > On Mon, Mar 4, 2013 at 12:00 PM, Chris Harrington <[email protected] > >wrote: > > > So if I'm understanding what you are saying is, simply put, that I should > > investigate the use L_1 as my distance measure during my measuring of > > vector distance within a cluster? > > > > > > On 1 Mar 2013, at 16:24, Ted Dunning wrote: > > > > > What Sean says is just right, except that I was (telegraphically) > getting > > > at a slightly different point with L_1: > > > > > > On Wed, Feb 27, 2013 at 7:23 AM, Chris Harrington <[email protected] > > >wrote: > > > > > >> Is L_1 regularization the same as manhattan distance? > > >> > > > > > > L_1 metric is manhattan distance, yes. > > > > > > L_1 regularization of k-means refers to something a little bit > different. > > > > > > The idea with regularization is that you add some sort of penalty to > the > > > function you are optimizing. This penalty pushes the optimization > > toward a > > > solution that you would prefer on some other grounds than just the > > > optimization alone. Regularization often helps in solving > > underdetermined > > > systems where there are an infinite number of solutions and we have to > > pick > > > a preferred solution. > > > > > > There isn't anything that says that you have to be optimizing the same > > kind > > > of function as the regularization. Thus k-means, which is inherently > > > optimizing squared error can quite reasonably be regularized with L_1 > > (sum > > > of the absolute value of the centroids' coefficients). > > > > > > I haven't tried this at all seriously yet. L_1 regularization tends to > > > help drive toward sparsity, but it is normally used in convex problems > > > where we can guarantee a findable global optimum. The k-means problem, > > > however, is not convex so adding the regularization may screw things up > > in > > > practice. For text-like data, I have a strong intuition that the > > idealized > > > effect of L_1 should be very good, but the pragmatic effect may not be > so > > > useful. > > > > >
