Hi, Sorry, i sent to the wrong ML. Please ignore this.
Thank you. > Hi, > > I'm trying to do some text analysis using mahout kmeans (clustering), > processing the data on hadoop. > --numClusters = 160 > --maxIter (-x) maxIter = 200 > > Well my data is small, around 500MB . > I have 4 servers, each with 4CPU and TaskTrackers are set to 4 as > maximum. > When i run the mahout task, i can see that the number of map tasks are > the most 3, so i guess i do not need to do any tuning on this at this > moment. > > One iteration took around 1.5mins ~ 2mins to finish. > I am not sure whether this is normal or is it consider slow, can anyone > gives me an advice on this? > > And with x = 200, it tooks me around 200x2mins = 6 hours > to finish the whole analysis.. > Is it something which is unavoided? > The bigger the "x" is, the longer time it takes to finish the kmeans job? > > Any ways to improve on the mahout kmeans to speed it up? > > Thank you. > >
