Distance between pairs is mostly zero?  This indicates a real problem. It the 
pairs that you mean are pairs of examples it isn't so bad but pairs of canopies 
should have non zero distance. 

Or did you mean pairs of coordinates?

Sent from my iPhone

On Oct 19, 2011, at 8:36, "Bae, Jae Hyeon" <[email protected]> wrote:

> Hi
> 
> I am trying to do clustering very sparse data. With canopy clustering, it
> generates so many canopies causing GC overhead limit. I can change
> parameters of canopy clustering but distances between most pairs are 0,
> changing parameters does not affect so much. Even if I increase -Xmx size, a
> lot of canopies will drive single reducer of canopy clustering to the GC
> overhead limit.
> 
> Could you suggest any better idea for this situation? I can try K-means
> clustering with K as a big number and Locality Sensitive Hashing can be a
> good candidate but I am not sure Likelike implementation is robust and
> flexible to use.
> 
> Thank you
> 
> Best, Jae

Reply via email to