I am glad I asked.  You just saved me a whole lot of trouble!

On Fri, Jun 22, 2012 at 9:00 AM, Sean Owen <[email protected]> wrote:

> Just hashing is almost surely fine. I'd XOR 64 bit chunks of the UUID
> to make a 64-bit value. The probability of collision at this size is
> vanishingly small, and collisions do little damage anyway.
>
> note that in the Hadoop jobs the longs are hashed down to ints anyway!
>
> On Fri, Jun 22, 2012 at 3:43 PM, Jonathan Hodges <[email protected]>
> wrote:
> > I have some input data I don’t control where the user IDs are UUID
> format.
> > The UUIDs are larger than the long type I need for Mahout.  Is there a
> best
> > practice converting this type of data?
> >
> >
> > Since our set is less than 10 million unique users I was thinking about
> > chaining together a few MR jobs to convert the user UUIDs to unique
> > sequential longs.  Before going through the trouble I thought I would ask
> > the community for ideas as I am still very new to Mahout.
> >
> >
> > Thanks in advance.
> >
> >
> > -Jonathan
>

Reply via email to