I am sorry, I am confused about distance and similarity. Distance between
pairs is mostly 1 with CosineDistanceMeasure.

2011/10/19 Ted Dunning <[email protected]>

> Distance between pairs is mostly zero?  This indicates a real problem. It
> the pairs that you mean are pairs of examples it isn't so bad but pairs of
> canopies should have non zero distance.
>
> Or did you mean pairs of coordinates?
>
> Sent from my iPhone
>
> On Oct 19, 2011, at 8:36, "Bae, Jae Hyeon" <[email protected]> wrote:
>
> > Hi
> >
> > I am trying to do clustering very sparse data. With canopy clustering, it
> > generates so many canopies causing GC overhead limit. I can change
> > parameters of canopy clustering but distances between most pairs are 0,
> > changing parameters does not affect so much. Even if I increase -Xmx
> size, a
> > lot of canopies will drive single reducer of canopy clustering to the GC
> > overhead limit.
> >
> > Could you suggest any better idea for this situation? I can try K-means
> > clustering with K as a big number and Locality Sensitive Hashing can be a
> > good candidate but I am not sure Likelike implementation is robust and
> > flexible to use.
> >
> > Thank you
> >
> > Best, Jae
>

Reply via email to