Excellent..NamedVectors would do the job. Thanks. On Aug 12, 2011, at 12:09 PM, Jeff Eastman wrote:
> KMeans does not use the key in its mapper, only the VectorWritable value. But > you can create NamedVectors in your upstream processing and put the IDs in > the name and the Vectors in the delegate. The NVs will flow through the > clustering step into the clusteredPoints directory. You will have to write > your own clustering step if you want a different output than the WVWs. > > -----Original Message----- > From: Eshwaran Vijaya Kumar [mailto:[email protected]] > Sent: Friday, August 12, 2011 11:44 AM > To: [email protected] > Subject: Mahout KMeans Output > > I am using KMeans as part of a long pipeline. Suppose I give Kmeans a > SequenceFile containing Key as IntWritable and value as VectorWritable where > the Keys are IDs for the Vectors, is there a utility or an option to get > KMeans to spit out the IDs that belong to a cluster rather than the > WeightedVectorWritable bean? > > Thanks > Esh
