Add a -cl argument to kmeans. Clustering of the input points is not the default behavior - it mentions this on the wiki - but so many people stumble over it I'm considering changing that.
-----Original Message----- From: Abhik Banerjee [mailto:[email protected]] Sent: Friday, July 29, 2011 11:33 AM To: [email protected] Subject: Doubt regarding the kmeans clustering results on mahout Hi, I am new to mahout and I tried to run the kmeans clustering using mahout , on a cloudera vm machine (having hadoop installed in it) , I tried to run it using the command :---- root@cloudera-vm:/map_reduce_samples# mahout kmeans -i hdfs://localhost/mahout_input/ip -o hdfs://localhost/mahout_output/output_kmeans_07_29/ -dm org.apache.mahout.common.distance.EuclideanDistanceMeasure -cd 1.0 -c hdfs://localhost/mahout_input/centroids_07_29 -k 5 -x 5 It gives me the result folder / output folder in the following directory. I can see the individual cluster directories using the clusterdump, but I am unable to see the folder named ClusteredPoints - which gives a mapping between the points - original and the cluster Ids , am I missing something. This is how the output folder looks:- root@cloudera-vm:/map_reduce_samples# hadoop fs -ls /mahout_output/output_kmeans_07_29 Found 5 items drwxr-xr-x - root supergroup 0 2011-07-29 11:23 /mahout_output/output_kmeans_07_29/clusters-1 drwxr-xr-x - root supergroup 0 2011-07-29 11:23 /mahout_output/output_kmeans_07_29/clusters-2 drwxr-xr-x - root supergroup 0 2011-07-29 11:23 /mahout_output/output_kmeans_07_29/clusters-3 drwxr-xr-x - root supergroup 0 2011-07-29 11:23 /mahout_output/output_kmeans_07_29/clusters-4 drwxr-xr-x - root supergroup 0 2011-07-29 11:23 /mahout_output/output_kmeans_07_29/clusters-5 ps:- when I ran the java version of the cluster apples example from the book , it created 3 folders for the clusters and the ClusterePoints folder containing the mappings. Any help shall be greatly appreciated. Thanks and Regards, Abhik Banerjee
