KMeans does not use the key in its mapper, only the VectorWritable value. But you can create NamedVectors in your upstream processing and put the IDs in the name and the Vectors in the delegate. The NVs will flow through the clustering step into the clusteredPoints directory. You will have to write your own clustering step if you want a different output than the WVWs.
-----Original Message----- From: Eshwaran Vijaya Kumar [mailto:[email protected]] Sent: Friday, August 12, 2011 11:44 AM To: [email protected] Subject: Mahout KMeans Output I am using KMeans as part of a long pipeline. Suppose I give Kmeans a SequenceFile containing Key as IntWritable and value as VectorWritable where the Keys are IDs for the Vectors, is there a utility or an option to get KMeans to spit out the IDs that belong to a cluster rather than the WeightedVectorWritable bean? Thanks Esh
