Re: Preserve contents of keys after running k-means

Andrew Musselman Fri, 05 Jul 2013 13:55:05 -0700

So how are people working around this without patching 0.7?  Downgrading to
0.6?


We're on a cluster where we don't have admin rights to patch Mahout.

Our dumb idea now is to hash the concatenated values of each vector and
pair that up with our original ids, then run another process on the points
results to hash the results, then join up on hash value to pull id together
with cluster #.

Anyone have a nicer solution to this at hand?



On Fri, Jul 5, 2013 at 1:02 PM, Suneel Marthi <[email protected]>wrote:

> Andrew,
>
> This feature was available prior to Mahout 0.7 (clustering had support for
> Named Vectors) and was broken later. While this may not be fixed in the
> soon to be Mahout 0.8, there is a JIRA that's open for this -
> https://issues.apache.org/jira/browse/MAHOUT-1030 that's been targeted
> for 0.9. Please feel free to submit a patch if you would like to take a
> shot at it.
>
> Suneel
>
>
>
>
> ________________________________
>  From: Andrew Musselman <[email protected]>
> To: [email protected]
> Sent: Friday, July 5, 2013 3:05 PM
> Subject: Preserve contents of keys after running k-means
>
>
> Hi list
>
> We are trying to do some k-means clustering and are wondering if there's an
> easy way to preserve the contents of the keys for the input records.
>
> E.g.
>
> 12345: (0,3,79,80)
> 98765: (1,4,98,90)
>
> where the vectors being clustered are the tuples and the keys are some id.
>
> When we run clusterdump with pointsDir specified we have the vectors but
> not the keys.  We're looking at NamedVector as a path to this solution, as
> well as looking at a mapping file between ordered integers and the ids in
> order.
>
> Thanks for any advice.
>
> Best
> Andrew
>

Re: Preserve contents of keys after running k-means

Reply via email to