Hi
I have large number of artcles clustered by kmeans.
For the new articles that comes in, it says I need to "use canopy clustering to
assign it to the cluster whose centroid is closest based on a very small
distance threshold" according to Mahout in Action book.
I'm not sure how to add new article canopies to the existing cluster.
So I'm saving batch articles in a list of Cluster like this.
List<Cluster> clusters = new ArrayList<Cluster>();
For the new article canopies, I'm trying following to measure the distance, but
I get error like this. "Required cardinality 11981 but got 77372"
Text key = new Text();
Canopy value = new Canopy();
DistanceMeasure measure = new EuclideanDistanceMeasure();
while (reader.next(key, value)){
for (int i=0; i<clusters.size(); i++){
double d = measure.distance(clusters.get(i).getCenter(),
value.getCenter());
}
}
Is this how to compare cluster centroids with new canopies? or Did I
misundertand something?
Can you please help me so I can get the online news clustering working?
Thank you very much!