Yep, just hash to a long, from UUID or String or whatever. The occasional collision does not cause a real problem. If you mix the tastes of two users or items once in a billion times, the overall results will hardly be different.
You have to maintain the reverse mapping of course. Look at the IDMigrator class for a little help there. You can rewrite to use UUID or String, but believe me, it will be an immense amount of change and make things much slower. It used to work this way for recommenders in about 2006 and the Object overhead and GC pressure was by far the bottleneck. That's why it's all long now. On Wed, Aug 1, 2012 at 9:29 PM, Matt Mitchell <[email protected]> wrote: > Question about dealing with UUIDs as Mahout user IDs. I'm considering > ways to deal with these values: > > 1. use getLeastSignificantBits > 2. re-map to a database auto-increment number (this would take very > long time to do?) > 3. customize mahout so that it accepts UUIDs as user IDs > > Any feedback here? If I went with #3 (seems the safest) how would I do > this and, what are the consequences? > > The user count is in the millions. > > Thanks! >
