Hi 
I have large number of artcles clustered by kmeans. 
For the new articles that comes in, it says I need to "use canopy clustering to 
assign it to the cluster whose centroid is closest based on a very small 
distance threshold" according to Mahout in Action book. 
I'm not sure how to add new article canopies to the existing cluster. 
 
So I'm saving batch articles in a list of Cluster like this. 
List<Cluster> clusters = new ArrayList<Cluster>(); 
 
For the new article canopies, I'm trying following to measure the distance, but 
I get error like this. "Required cardinality 11981 but got 77372" 
Text key = new Text(); 
Canopy value = new Canopy(); 
DistanceMeasure measure = new EuclideanDistanceMeasure(); 
while (reader.next(key, value)){ 
     for (int i=0; i<clusters.size(); i++){ 
        double d = measure.distance(clusters.get(i).getCenter(), 
value.getCenter()); 
     } 
} 
 
Is this how to compare cluster centroids with new canopies?  or Did I 
misundertand something? 
Can you please help me so I can get the online news clustering working? 
Thank you very much!                                      

Reply via email to