So if I'm understanding what you are saying is, simply put, that I should 
investigate the use L_1 as my distance measure during my measuring of vector 
distance within a cluster?


On 1 Mar 2013, at 16:24, Ted Dunning wrote:

> What Sean says is just right, except that I was (telegraphically) getting
> at a slightly different point with L_1:
> 
> On Wed, Feb 27, 2013 at 7:23 AM, Chris Harrington <[email protected]>wrote:
> 
>> Is L_1 regularization the same as manhattan distance?
>> 
> 
> L_1 metric is manhattan distance, yes.
> 
> L_1 regularization of k-means refers to something a little bit different.
> 
> The idea with regularization is that you add some sort of penalty to the
> function you are optimizing.  This penalty pushes the optimization toward a
> solution that you would prefer on some other grounds than just the
> optimization alone.  Regularization often helps in solving underdetermined
> systems where there are an infinite number of solutions and we have to pick
> a preferred solution.
> 
> There isn't anything that says that you have to be optimizing the same kind
> of function as the regularization.  Thus k-means, which is inherently
> optimizing squared error can quite reasonably be regularized with L_1 (sum
> of the absolute value of the centroids' coefficients).
> 
> I haven't tried this at all seriously yet.  L_1 regularization tends to
> help drive toward sparsity, but it is normally used in convex problems
> where we can guarantee a findable global optimum.  The k-means problem,
> however, is not convex so adding the regularization may screw things up in
> practice.  For text-like data, I have a strong intuition that the idealized
> effect of L_1 should be very good, but the pragmatic effect may not be so
> useful.

Reply via email to