Hi,

I am running kmeans clustering on a local hadoop node with 16 cores
(mapred-site.xml https://gist.github.com/2962458)

running seq2sparse on the input sequencefiles ( originally 64k text
document with approx 100 words each) uses all the 16 cores when running
over hadoop/hdfs and takes about 20min

canopy is quick and gets me about 120 clusters.

Running kmeans takes ages as only one map task is launched (
https://gist.github.com/2962436).

I am wondering what I might be doing wrong since all cores are used in
se2parse and not in kmeans.

I tried settings in the bin/mahout script
MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.map.tasks=16"
MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.reduce.tasks=16"

but that did not help

not using hadoop by setting MAHOUT_LOCAL results to the same

Thanks for helping

Reply via email to