why not change the clusterID from int to long

myn Mon, 29 Aug 2011 03:50:29 -0700

why not change the clusterID from int to long
I have a data about 30 billion rows,when i used createCanopyFromVectors in 
meanshift.
the clusterid,is not big enough.
second  ,in MeanShiftCanopyCreatorMapper class,     
 nextCanopyId = ((1 << 31) / 50000) * (Integer.parseInt(parts[4])%50000); in 
setup function
means on map only have 40000 ids, That is not big enough, hadoop default block 
size is 64M ,somt times it will more then 50000rows

why not change the clusterID from int to long

Reply via email to