Andrew,

That is a pretty clever solution.

I think that you can get by with a simpler solution by noting how the
internal id's are assigned (sequentially, I think).



On Fri, Jul 5, 2013 at 1:53 PM, Andrew Musselman <[email protected]
> wrote:

> So how are people working around this without patching 0.7?  Downgrading to
> 0.6?
>
> We're on a cluster where we don't have admin rights to patch Mahout.
>
> Our dumb idea now is to hash the concatenated values of each vector and
> pair that up with our original ids, then run another process on the points
> results to hash the results, then join up on hash value to pull id together
> with cluster #.
>
> Anyone have a nicer solution to this at hand?
>
>
>
> On Fri, Jul 5, 2013 at 1:02 PM, Suneel Marthi <[email protected]
> >wrote:
>
> > Andrew,
> >
> > This feature was available prior to Mahout 0.7 (clustering had support
> for
> > Named Vectors) and was broken later. While this may not be fixed in the
> > soon to be Mahout 0.8, there is a JIRA that's open for this -
> > https://issues.apache.org/jira/browse/MAHOUT-1030 that's been targeted
> > for 0.9. Please feel free to submit a patch if you would like to take a
> > shot at it.
> >
> > Suneel
> >
> >
> >
> >
> > ________________________________
> >  From: Andrew Musselman <[email protected]>
> > To: [email protected]
> > Sent: Friday, July 5, 2013 3:05 PM
> > Subject: Preserve contents of keys after running k-means
> >
> >
> > Hi list
> >
> > We are trying to do some k-means clustering and are wondering if there's
> an
> > easy way to preserve the contents of the keys for the input records.
> >
> > E.g.
> >
> > 12345: (0,3,79,80)
> > 98765: (1,4,98,90)
> >
> > where the vectors being clustered are the tuples and the keys are some
> id.
> >
> > When we run clusterdump with pointsDir specified we have the vectors but
> > not the keys.  We're looking at NamedVector as a path to this solution,
> as
> > well as looking at a mapping file between ordered integers and the ids in
> > order.
> >
> > Thanks for any advice.
> >
> > Best
> > Andrew
> >
>

Reply via email to