Hi,
I managed to run the kmeans algorithm on a cloudera vm , using the
help provided at the wiki and help at the forum . I got my output and
am trying to use the clusterdump to analyze my result.
(I seemed to give 5 iterations , but it seems to have formed only 4
clusters , I am also curious about that , I ran this below command )
mahout kmeans -i hdfs://localhost/mahout_input/ip -o
hdfs://localhost/mahout_output/output_kmeans_07_29_1/ -dm
org.apache.mahout.common.distance.EuclideanDistanceMeasure -cd 1.0 -c
hdfs://localhost/mahout_input/centroids_07_29_1 -k 5 -x 5 -cl
after k means completion on hadoop cloudera vm I ran this command :-
mahout clusterdump --seqFileDir
hdfs://localhost/mahout_output/output_kmeans_07_29_1/clusters-5/part-r-00000
--pointsDir hdfs://localhost/mahout_output/output_kmeans_07_29_1/clusteredPoints
--output kmeans_07_29_1_cl5.tx
and when I look into the text file I see a structure like this
CL-99871{n=10157 c=[186:12.229, 189:9.343, 212:2.716] r=[186:7.803, 189:8.054, 2
12:4.686]}
Weight: Point:
1.0: 1.161.199.19 = [186:22.000, 189:32.000]
1.0: 1.161.204.226 = [186:9.000, 189:11.000]
1.0: 1.170.149.79 = [186:18.000, 189:10.000]
1.0: 1.175.137.84 = [186:23.000, 189:8.000]
1.0: 1.176.27.109 = [186:7.000, 189:9.000, 212:3.000]
1.0: 1.177.175.26 = [186:12.000, 189:12.000]
1.0: 1.197.208.25 = [186:26.000]
1.0: 1.212.176.27 = [186:11.000, 189:1.000]
1.0: 1.212.176.28 = [186:11.000, 189:6.000]
1.0: 1.22.160.35 = [186:17.000, 189:6.000]
1.0: 1.230.123.81 = [186:18.000, 189:4.000]
I can figure the first part of it , as explained in the wiki , that
the name is CL-99871 , number of points is 10157 , cluster center is [
] in the vector form , radius is [ ] ,
I dont understand how the later part of it is structured , the Ip
addresses are my name - data points which I wanted to get clustered,
what do those vector values mean , if they mean the vectors of those
points , I am not sure why they are only 2 dimensional as my original
data points were consisting of 288 dimensions , for each ip address.
Thanks for all the help,
Abhik