Andrew, This feature was available prior to Mahout 0.7 (clustering had support for Named Vectors) and was broken later. While this may not be fixed in the soon to be Mahout 0.8, there is a JIRA that's open for this - https://issues.apache.org/jira/browse/MAHOUT-1030 that's been targeted for 0.9. Please feel free to submit a patch if you would like to take a shot at it.
Suneel ________________________________ From: Andrew Musselman <[email protected]> To: [email protected] Sent: Friday, July 5, 2013 3:05 PM Subject: Preserve contents of keys after running k-means Hi list We are trying to do some k-means clustering and are wondering if there's an easy way to preserve the contents of the keys for the input records. E.g. 12345: (0,3,79,80) 98765: (1,4,98,90) where the vectors being clustered are the tuples and the keys are some id. When we run clusterdump with pointsDir specified we have the vectors but not the keys. We're looking at NamedVector as a path to this solution, as well as looking at a mapping file between ordered integers and the ids in order. Thanks for any advice. Best Andrew
