Hi Christoph,
thanks for your reply!
The cluster-assignment is pretty much what i want to do:
I have some points that i want to be clustered. Thats what i use
KMeansClusterer.clusterpoints(...) for. Unfortunately this method does not
provide me with an item-cluster-map. The only thing i get as a result is a
List<List<Cluster>> -object (this should be all iterations of all clusters).
And the Cluster-object only includes information about the n, c(enter),
r(adius) as stated below.
That is why i want to calculate what points belong to what cluster by my own.
Now i have the problem, that not all points are within the calculated range of
the clusters.
This are the calculated converged clusters from the List<List<Cluster>>-object:
>> VL-0{n=6 c=[11.667, 604.333] r=[1.886, 315.059]}
>> VL-1{n=4 c=[12.250, 3963.250] r=[1.299, 866.428]} .
The point
>> [ 9.000, 970.000]
should fit into one of the clusters, but it does not. I wonder how this can
happen or if i understood something completely wrong.
Could you please tell me what you mean by "use the clustering flag"?
This is the method in detail from the KMeansClusterer-class. I dont see how to
set some flags.
public static List<List<Cluster>>
clusterPoints(Iterable<org.apache.mahout.math.Vector> points, List<Cluster>
clusters, DistanceMeasure measure, int maxIter, double distanceThreshold) {
//compiled code
throw new RuntimeException("Compiled Code");
}
thanks for your help,
Immo
On Jul 26, 2011, at 3:57 PM, Christoph Brücke wrote:
> Hi Immo,
>
> did you have an extra cluster assignment at the end? Because the KMeans uses
> two phases: the first where all points are assigned to a cluster and the
> second where the cluster centroids are calculated based on the first
> assignment. So my idea is that you could use the clustering flag in order to
> have a final cluster assignment.
> I didn't do the math though, this is just an educated guess.
>
> Hope this helps,
> Christoph
>
>
> Am 26.07.2011 um 15:05 schrieb Immo Micus:
>
>> Hello,
>>
>> this is my first email to the mahout-user-list.
>> I am trying to do some clustering with mahout and i have a question
>> concerning the cluster-center and cluster-radius.
>>
>> For testing, i clustered 10 points using the KMeansClusterer:
>>
>> points:
>> [13.000, 4455.000]
>> [13.000, 5101.000]
>> [13.000, 333.000]
>> [13.000, 3412.000]
>> [13.000, 823.000]
>> [13.000, 238.000]
>> [13.000 951.000]
>> [ 9.000, 311.000]
>> [ 9.000, 970.000]
>> [10.000, 2885.000]
>>
>> This is the method i am using:
>>
>> clusters = KMeansClusterer.clusterPoints(points, initial_clusters, measure,
>> 10, 0.001);
>>
>> initial_clusters are 2 random points of the points above, measure is
>> EuclideanDistanceMeasure.
>>
>>
>> And this is the result of the converged clusters VL-0 and VL-1:
>>
>> VL-0{n=6 c=[11.667, 604.333] r=[1.886, 315.059]}
>> VL-1{n=4 c=[12.250, 3963.250] r=[1.299, 866.428]}
>>
>> If i understand this output right then n is the number of points that are
>> assigned to the cluster. c is the cluster-center and r is the radius of the
>> cluster.
>> So, every point belongs to either cluster 0 or cluster 1. Actually you can
>> even guess what points belong to what cluster but i am confused by the
>> calculated cluster-center and cluster-radius:
>> For example [ 9.000, 970.000] should belong to cluster 0, but 9.000 <
>> 9.781 [11.667 -1.886] and 970.000 > 919.392 [604.333 + 315.059]. The point
>> is not in range of the cluster, it obviously does not belong to cluster 1
>> but all 10 points are assigned to clusters. Can someone please tell me where
>> the mistake is?
>>
>>
>> greetings, Immo
>>
>>
>>
>>
>>
>>
>>
>>
>
> Christoph Brücke
> [email protected]
>
>
>