Hi, I am running kmeans clustering on a local hadoop node with 16 cores (mapred-site.xml https://gist.github.com/2962458)
running seq2sparse on the input sequencefiles ( originally 64k text document with approx 100 words each) uses all the 16 cores when running over hadoop/hdfs and takes about 20min canopy is quick and gets me about 120 clusters. Running kmeans takes ages as only one map task is launched ( https://gist.github.com/2962436). I am wondering what I might be doing wrong since all cores are used in se2parse and not in kmeans. I tried settings in the bin/mahout script MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.map.tasks=16" MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.reduce.tasks=16" but that did not help not using hadoop by setting MAHOUT_LOCAL results to the same Thanks for helping
