KMeans does not use the key in its mapper, only the VectorWritable value. But 
you can create NamedVectors in your upstream processing and put the IDs in the 
name and the Vectors in the delegate. The NVs will flow through the clustering 
step into the clusteredPoints directory. You will have to write your own 
clustering step if you want a different output than the WVWs.

-----Original Message-----
From: Eshwaran Vijaya Kumar [mailto:[email protected]] 
Sent: Friday, August 12, 2011 11:44 AM
To: [email protected]
Subject: Mahout KMeans Output 

I am using KMeans as part of a long pipeline. Suppose I give Kmeans a 
SequenceFile containing Key as IntWritable and value as VectorWritable where 
the Keys are IDs for the Vectors, is there a utility or an option to get KMeans 
to spit out the IDs that belong to a cluster rather than the 
WeightedVectorWritable bean? 

Thanks
Esh

Reply via email to