If you get near a good solution (which is what the streaming k-means tries
to make very likely) then the local convexity should not be spoiled by the
L_1 regularization.  This is behind the guaranteed convergence of SGD and
the reason that step-wise regression so often fails ... stepwise is
equivalent to L_0 regularization which breaks convexity.  L_1 does nearly
the same amount of good while leaving convexity intact.

On Mon, Mar 4, 2013 at 2:44 PM, Sean Owen <[email protected]> wrote:

> So this is constraining the centroids themselves to be 0 in many
> dimensions... interesting. I assume this comes into play when you move
> centroids -- you're not minimizing squared distance anymore but that plus
> an additional penalty. I'd be interested to hear your results Chris if you
> try it -- versus simply using the L1 distance alone. I agree this does seem
> to make it all the more important to run it many times and pick the best
> clustering due to local minima.
>
>
> On Mon, Mar 4, 2013 at 7:18 PM, Ted Dunning <[email protected]> wrote:
>
> > Always good to test, but I think that euclidean distance with L_1
> > regularization is probably more interesting.
> >
> > On Mon, Mar 4, 2013 at 12:00 PM, Chris Harrington <[email protected]
> > >wrote:
> >
> > > So if I'm understanding what you are saying is, simply put, that I
> should
> > > investigate the use L_1 as my distance measure during my measuring of
> > > vector distance within a cluster?
> > >
> > >
> > > On 1 Mar 2013, at 16:24, Ted Dunning wrote:
> > >
> > > > What Sean says is just right, except that I was (telegraphically)
> > getting
> > > > at a slightly different point with L_1:
> > > >
> > > > On Wed, Feb 27, 2013 at 7:23 AM, Chris Harrington <
> [email protected]
> > > >wrote:
> > > >
> > > >> Is L_1 regularization the same as manhattan distance?
> > > >>
> > > >
> > > > L_1 metric is manhattan distance, yes.
> > > >
> > > > L_1 regularization of k-means refers to something a little bit
> > different.
> > > >
> > > > The idea with regularization is that you add some sort of penalty to
> > the
> > > > function you are optimizing.  This penalty pushes the optimization
> > > toward a
> > > > solution that you would prefer on some other grounds than just the
> > > > optimization alone.  Regularization often helps in solving
> > > underdetermined
> > > > systems where there are an infinite number of solutions and we have
> to
> > > pick
> > > > a preferred solution.
> > > >
> > > > There isn't anything that says that you have to be optimizing the
> same
> > > kind
> > > > of function as the regularization.  Thus k-means, which is inherently
> > > > optimizing squared error can quite reasonably be regularized with L_1
> > > (sum
> > > > of the absolute value of the centroids' coefficients).
> > > >
> > > > I haven't tried this at all seriously yet.  L_1 regularization tends
> to
> > > > help drive toward sparsity, but it is normally used in convex
> problems
> > > > where we can guarantee a findable global optimum.  The k-means
> problem,
> > > > however, is not convex so adding the regularization may screw things
> up
> > > in
> > > > practice.  For text-like data, I have a strong intuition that the
> > > idealized
> > > > effect of L_1 should be very good, but the pragmatic effect may not
> be
> > so
> > > > useful.
> > >
> > >
> >
>

Reply via email to