+1 Me too. If there aren't already unit tests which guarantee this then we need to add them. This is a pretty important capability not to guarantee in the API.
-----Original Message----- From: Blake Lemoine [mailto:[email protected]] Sent: Saturday, August 13, 2011 4:46 PM To: [email protected] Subject: Re: Mahout KMeans Output If it stopped working I would feel confident calling that a bug. The KMeans algorithm should forward the vectors in an "as is" manner. On Aug 13, 2011 6:30 PM, "Lance Norskog" <[email protected]> wrote: > "The NVs will flow through the clustering step into the > clusteredPoints directory." Be careful about this part. It is hard to > guarantee that this will always work, and will keep working as classes > evolve. > > On Fri, Aug 12, 2011 at 12:35 PM, Eshwaran Vijaya Kumar > <[email protected]> wrote: >> Excellent..NamedVectors would do the job. Thanks. >> On Aug 12, 2011, at 12:09 PM, Jeff Eastman wrote: >> >>> KMeans does not use the key in its mapper, only the VectorWritable value. But you can create NamedVectors in your upstream processing and put the IDs in the name and the Vectors in the delegate. The NVs will flow through the clustering step into the clusteredPoints directory. You will have to write your own clustering step if you want a different output than the WVWs. >>> >>> -----Original Message----- >>> From: Eshwaran Vijaya Kumar [mailto:[email protected]] >>> Sent: Friday, August 12, 2011 11:44 AM >>> To: [email protected] >>> Subject: Mahout KMeans Output >>> >>> I am using KMeans as part of a long pipeline. Suppose I give Kmeans a SequenceFile containing Key as IntWritable and value as VectorWritable where the Keys are IDs for the Vectors, is there a utility or an option to get KMeans to spit out the IDs that belong to a cluster rather than the WeightedVectorWritable bean? >>> >>> Thanks >>> Esh >> >> > > > > -- > Lance Norskog > [email protected]
