What about factorizing the matrix with SVD to get dense vectors?

2011/10/19 Bae, Jae Hyeon <[email protected]>

> I am sorry, I am confused about distance and similarity. Distance between
> pairs is mostly 1 with CosineDistanceMeasure.
>
> 2011/10/19 Ted Dunning <[email protected]>
>
> > Distance between pairs is mostly zero?  This indicates a real problem. It
> > the pairs that you mean are pairs of examples it isn't so bad but pairs
> of
> > canopies should have non zero distance.
> >
> > Or did you mean pairs of coordinates?
> >
> > Sent from my iPhone
> >
> > On Oct 19, 2011, at 8:36, "Bae, Jae Hyeon" <[email protected]> wrote:
> >
> > > Hi
> > >
> > > I am trying to do clustering very sparse data. With canopy clustering,
> it
> > > generates so many canopies causing GC overhead limit. I can change
> > > parameters of canopy clustering but distances between most pairs are 0,
> > > changing parameters does not affect so much. Even if I increase -Xmx
> > size, a
> > > lot of canopies will drive single reducer of canopy clustering to the
> GC
> > > overhead limit.
> > >
> > > Could you suggest any better idea for this situation? I can try K-means
> > > clustering with K as a big number and Locality Sensitive Hashing can be
> a
> > > good candidate but I am not sure Likelike implementation is robust and
> > > flexible to use.
> > >
> > > Thank you
> > >
> > > Best, Jae
> >
>

Reply via email to