Hi Christoph,

thanks for your reply!

The cluster-assignment is pretty much what i want to do: 

I have some points that i want to be clustered. Thats what i use 
KMeansClusterer.clusterpoints(...) for.  Unfortunately this method does not 
provide me with an item-cluster-map. The only thing i get as a result is a 
List<List<Cluster>> -object (this should be all iterations of all clusters). 
And the Cluster-object only includes information about the n, c(enter), 
r(adius) as stated below.

That is why i want to calculate what points belong to what cluster by my own. 
Now i have the problem, that not all points are within the calculated range of 
the clusters. 

This are the calculated converged clusters from the List<List<Cluster>>-object:
>> VL-0{n=6 c=[11.667, 604.333] r=[1.886, 315.059]}
>> VL-1{n=4 c=[12.250, 3963.250] r=[1.299, 866.428]} .

The point
>> [  9.000,   970.000] 
should fit into one of the clusters, but it does not. I wonder how this can 
happen or if i understood something completely wrong. 

Could you please tell me what you mean by "use the clustering flag"?
  
This is the method in detail from the KMeansClusterer-class. I dont see how to 
set some flags.

public static List<List<Cluster>> 
clusterPoints(Iterable<org.apache.mahout.math.Vector> points, List<Cluster> 
clusters, DistanceMeasure measure, int maxIter, double distanceThreshold) {
        //compiled code
        throw new RuntimeException("Compiled Code");
    }

thanks for your help, 
Immo

On Jul 26, 2011, at 3:57 PM, Christoph Brücke wrote:

> Hi Immo,
> 
> did you have an extra cluster assignment at the end? Because the KMeans uses 
> two phases: the first where all points are assigned to a cluster and the 
> second where the cluster centroids are calculated based on the first 
> assignment. So my idea is that you could use the clustering flag in order to 
> have a final cluster assignment.
> I didn't do the math though, this is just an educated guess.
> 
> Hope this helps,
> Christoph
> 
> 
> Am 26.07.2011 um 15:05 schrieb Immo Micus:
> 
>> Hello,
>> 
>> this is my first email to the mahout-user-list.
>> I am trying to do some clustering with mahout and i have a question 
>> concerning the cluster-center and cluster-radius.
>> 
>> For testing, i clustered 10 points using the KMeansClusterer:
>> 
>> points:
>> [13.000, 4455.000] 
>> [13.000, 5101.000] 
>> [13.000,   333.000] 
>> [13.000, 3412.000] 
>> [13.000,   823.000] 
>> [13.000,   238.000]
>> [13.000    951.000] 
>> [  9.000,   311.000] 
>> [  9.000,   970.000] 
>> [10.000, 2885.000]
>> 
>> This is the method i am using:
>> 
>> clusters = KMeansClusterer.clusterPoints(points, initial_clusters, measure, 
>> 10, 0.001);
>> 
>> initial_clusters are 2 random points of the points above, measure is 
>> EuclideanDistanceMeasure.
>> 
>> 
>> And this is the result of the converged clusters VL-0 and VL-1:
>> 
>> VL-0{n=6 c=[11.667, 604.333] r=[1.886, 315.059]}
>> VL-1{n=4 c=[12.250, 3963.250] r=[1.299, 866.428]}
>> 
>> If i understand this output right then n is the number of points that are 
>> assigned to the cluster. c is the cluster-center and r is the radius of the 
>> cluster.
>> So, every point belongs to either cluster 0 or cluster 1. Actually you can 
>> even guess what points belong to what cluster but i am confused by the 
>> calculated cluster-center and cluster-radius:
>> For example  [  9.000,   970.000] should belong to cluster 0, but   9.000 <  
>> 9.781 [11.667 -1.886] and 970.000 > 919.392  [604.333 + 315.059].  The point 
>> is not in range of the cluster, it obviously does not belong to cluster 1 
>> but all 10 points are assigned to clusters. Can someone please tell me where 
>> the mistake is?
>> 
>> 
>> greetings, Immo
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> Christoph Brücke
> [email protected]
> 
> 
> 

Reply via email to