Andrew,

This feature was available prior to Mahout 0.7 (clustering had support for 
Named Vectors) and was broken later. While this may not be fixed in the soon to 
be Mahout 0.8, there is a JIRA that's open for this - 
https://issues.apache.org/jira/browse/MAHOUT-1030 that's been targeted for 0.9. 
Please feel free to submit a patch if you would like to take a shot at it.

Suneel




________________________________
 From: Andrew Musselman <[email protected]>
To: [email protected] 
Sent: Friday, July 5, 2013 3:05 PM
Subject: Preserve contents of keys after running k-means
 

Hi list

We are trying to do some k-means clustering and are wondering if there's an
easy way to preserve the contents of the keys for the input records.

E.g.

12345: (0,3,79,80)
98765: (1,4,98,90)

where the vectors being clustered are the tuples and the keys are some id.

When we run clusterdump with pointsDir specified we have the vectors but
not the keys.  We're looking at NamedVector as a path to this solution, as
well as looking at a mapping file between ordered integers and the ids in
order.

Thanks for any advice.

Best
Andrew

Reply via email to