Has anyone run Clustering (Kmeans) on EMR lately, per 
https://cwiki.apache.org/confluence/display/MAHOUT/Mahout+on+Elastic+MapReduce? 
 

Here's what I ran, using the CLI, 
./elastic-mapreduce -j j-31BXNQA7ATCCV  --jar 
s3://news-vecs/mahout-core-0.4-SNAPSHOT.job  --main-class 
org.apache.mahout.clustering.kmeans.KMeansDriver --arg "--input" --arg 
"s3://news-vecs/part-out.vec" --arg "--clusters" --arg 
s3://news-vecs/kmeans/clusters/ --arg "--k" --arg 10 --arg "--output" --arg 
s3://news-vecs/out/ --arg "--distanceMeasure" --arg  
"org.apache.mahout.common.distance.CosineDistanceMeasure" --arg 
"--convergenceDelta" --arg 0.001 --arg "--overwrite" --arg "--maxIter" --arg 50 
--arg "--clustering"

It seems to run, but I don't see anything useful done and the out directory is 
definitely not created.

Anyone have insight?

Thanks,
Grant

Reply via email to