I am glad I asked. You just saved me a whole lot of trouble!
On Fri, Jun 22, 2012 at 9:00 AM, Sean Owen <[email protected]> wrote: > Just hashing is almost surely fine. I'd XOR 64 bit chunks of the UUID > to make a 64-bit value. The probability of collision at this size is > vanishingly small, and collisions do little damage anyway. > > note that in the Hadoop jobs the longs are hashed down to ints anyway! > > On Fri, Jun 22, 2012 at 3:43 PM, Jonathan Hodges <[email protected]> > wrote: > > I have some input data I don’t control where the user IDs are UUID > format. > > The UUIDs are larger than the long type I need for Mahout. Is there a > best > > practice converting this type of data? > > > > > > Since our set is less than 10 million unique users I was thinking about > > chaining together a few MR jobs to convert the user UUIDs to unique > > sequential longs. Before going through the trouble I thought I would ask > > the community for ideas as I am still very new to Mahout. > > > > > > Thanks in advance. > > > > > > -Jonathan >
