Hi,

I am clustering 5 million vectors ( 200 dimensions each ) on a 8 node cluster with 2 GB memory each using CanopyDriver. The replication factor is 3.

The reduce phase of buildCluster is taking too long to finish.

How can I Improve the performance?

Is it related to memory? If yes, what configuration do you suggest? I can not reduce the dimension of vectors.

Thanks and Regards,
Paritosh Ranjan

Reply via email to