Andrew, That is a pretty clever solution.
I think that you can get by with a simpler solution by noting how the internal id's are assigned (sequentially, I think). On Fri, Jul 5, 2013 at 1:53 PM, Andrew Musselman <[email protected] > wrote: > So how are people working around this without patching 0.7? Downgrading to > 0.6? > > We're on a cluster where we don't have admin rights to patch Mahout. > > Our dumb idea now is to hash the concatenated values of each vector and > pair that up with our original ids, then run another process on the points > results to hash the results, then join up on hash value to pull id together > with cluster #. > > Anyone have a nicer solution to this at hand? > > > > On Fri, Jul 5, 2013 at 1:02 PM, Suneel Marthi <[email protected] > >wrote: > > > Andrew, > > > > This feature was available prior to Mahout 0.7 (clustering had support > for > > Named Vectors) and was broken later. While this may not be fixed in the > > soon to be Mahout 0.8, there is a JIRA that's open for this - > > https://issues.apache.org/jira/browse/MAHOUT-1030 that's been targeted > > for 0.9. Please feel free to submit a patch if you would like to take a > > shot at it. > > > > Suneel > > > > > > > > > > ________________________________ > > From: Andrew Musselman <[email protected]> > > To: [email protected] > > Sent: Friday, July 5, 2013 3:05 PM > > Subject: Preserve contents of keys after running k-means > > > > > > Hi list > > > > We are trying to do some k-means clustering and are wondering if there's > an > > easy way to preserve the contents of the keys for the input records. > > > > E.g. > > > > 12345: (0,3,79,80) > > 98765: (1,4,98,90) > > > > where the vectors being clustered are the tuples and the keys are some > id. > > > > When we run clusterdump with pointsDir specified we have the vectors but > > not the keys. We're looking at NamedVector as a path to this solution, > as > > well as looking at a mapping file between ordered integers and the ids in > > order. > > > > Thanks for any advice. > > > > Best > > Andrew > > >
