Hi,I am clustering 5 million vectors ( 200 dimensions each ) on a 8 node cluster with 2 GB memory each using CanopyDriver. The replication factor is 3.
The reduce phase of buildCluster is taking too long to finish. How can I Improve the performance?Is it related to memory? If yes, what configuration do you suggest? I can not reduce the dimension of vectors.
Thanks and Regards, Paritosh Ranjan
