Hello,
this is my first email to the mahout-user-list.
I am trying to do some clustering with mahout and i have a question concerning
the cluster-center and cluster-radius.
For testing, i clustered 10 points using the KMeansClusterer:
points:
[13.000, 4455.000]
[13.000, 5101.000]
[13.000, 333.000]
[13.000, 3412.000]
[13.000, 823.000]
[13.000, 238.000]
[13.000 951.000]
[ 9.000, 311.000]
[ 9.000, 970.000]
[10.000, 2885.000]
This is the method i am using:
clusters = KMeansClusterer.clusterPoints(points, initial_clusters, measure, 10,
0.001);
initial_clusters are 2 random points of the points above, measure is
EuclideanDistanceMeasure.
And this is the result of the converged clusters VL-0 and VL-1:
VL-0{n=6 c=[11.667, 604.333] r=[1.886, 315.059]}
VL-1{n=4 c=[12.250, 3963.250] r=[1.299, 866.428]}
If i understand this output right then n is the number of points that are
assigned to the cluster. c is the cluster-center and r is the radius of the
cluster.
So, every point belongs to either cluster 0 or cluster 1. Actually you can even
guess what points belong to what cluster but i am confused by the calculated
cluster-center and cluster-radius:
For example [ 9.000, 970.000] should belong to cluster 0, but 9.000 <
9.781 [11.667 -1.886] and 970.000 > 919.392 [604.333 + 315.059]. The point is
not in range of the cluster, it obviously does not belong to cluster 1 but all
10 points are assigned to clusters. Can someone please tell me where the
mistake is?
greetings, Immo