Thanks Sean! That all makes sense. Would you mind recommended a hashing function for this? Is there something in Mahout I could use?
- Matt On Wed, Aug 1, 2012 at 4:34 PM, Sean Owen <[email protected]> wrote: > Yep, just hash to a long, from UUID or String or whatever. The occasional > collision does not cause a real problem. If you mix the tastes of two users > or items once in a billion times, the overall results will hardly be > different. > > You have to maintain the reverse mapping of course. Look at the IDMigrator > class for a little help there. > > You can rewrite to use UUID or String, but believe me, it will be an > immense amount of change and make things much slower. It used to work this > way for recommenders in about 2006 and the Object overhead and GC pressure > was by far the bottleneck. That's why it's all long now. > > On Wed, Aug 1, 2012 at 9:29 PM, Matt Mitchell <[email protected]> wrote: > >> Question about dealing with UUIDs as Mahout user IDs. I'm considering >> ways to deal with these values: >> >> 1. use getLeastSignificantBits >> 2. re-map to a database auto-increment number (this would take very >> long time to do?) >> 3. customize mahout so that it accepts UUIDs as user IDs >> >> Any feedback here? If I went with #3 (seems the safest) how would I do >> this and, what are the consequences? >> >> The user count is in the millions. >> >> Thanks! >>
