"The NVs will flow through the clustering step into the
clusteredPoints directory." Be careful about this part. It is hard to
guarantee that this will always work, and will keep working as classes
evolve.

On Fri, Aug 12, 2011 at 12:35 PM, Eshwaran Vijaya Kumar
<[email protected]> wrote:
> Excellent..NamedVectors would do the job. Thanks.
> On Aug 12, 2011, at 12:09 PM, Jeff Eastman wrote:
>
>> KMeans does not use the key in its mapper, only the VectorWritable value. 
>> But you can create NamedVectors in your upstream processing and put the IDs 
>> in the name and the Vectors in the delegate. The NVs will flow through the 
>> clustering step into the clusteredPoints directory. You will have to write 
>> your own clustering step if you want a different output than the WVWs.
>>
>> -----Original Message-----
>> From: Eshwaran Vijaya Kumar [mailto:[email protected]]
>> Sent: Friday, August 12, 2011 11:44 AM
>> To: [email protected]
>> Subject: Mahout KMeans Output
>>
>> I am using KMeans as part of a long pipeline. Suppose I give Kmeans a 
>> SequenceFile containing Key as IntWritable and value as VectorWritable where 
>> the Keys are IDs for the Vectors, is there a utility or an option to get 
>> KMeans to spit out the IDs that belong to a cluster rather than the 
>> WeightedVectorWritable bean?
>>
>> Thanks
>> Esh
>
>



-- 
Lance Norskog
[email protected]

Reply via email to