Hello Matt, On 01.08.2012, at 22:40, Matt Mitchell wrote:
> Thanks Sean! That all makes sense. Would you mind recommended a > hashing function for this? Is there something in Mahout I could use? The following class uses an string to long mapping based on a MemoryIDMigrator: https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/FacebookRecommender.java Internally mahout uses parts of the md5 hashes. Which can be fir example directly expressed in SQL: cast(conv(substring(md5([column name]), 1, 16),16,10) as signed) Javadoc can be found here: https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/model/IDMigrator.html /Manuel > > - Matt > > On Wed, Aug 1, 2012 at 4:34 PM, Sean Owen <[email protected]> wrote: >> Yep, just hash to a long, from UUID or String or whatever. The occasional >> collision does not cause a real problem. If you mix the tastes of two users >> or items once in a billion times, the overall results will hardly be >> different. >> >> You have to maintain the reverse mapping of course. Look at the IDMigrator >> class for a little help there. >> >> You can rewrite to use UUID or String, but believe me, it will be an >> immense amount of change and make things much slower. It used to work this >> way for recommenders in about 2006 and the Object overhead and GC pressure >> was by far the bottleneck. That's why it's all long now. >> >> On Wed, Aug 1, 2012 at 9:29 PM, Matt Mitchell <[email protected]> wrote: >> >>> Question about dealing with UUIDs as Mahout user IDs. I'm considering >>> ways to deal with these values: >>> >>> 1. use getLeastSignificantBits >>> 2. re-map to a database auto-increment number (this would take very >>> long time to do?) >>> 3. customize mahout so that it accepts UUIDs as user IDs >>> >>> Any feedback here? If I went with #3 (seems the safest) how would I do >>> this and, what are the consequences? >>> >>> The user count is in the millions. >>> >>> Thanks! >>> -- Manuel Blechschmidt M.Sc. IT Systems Engineering Dortustr. 57 14467 Potsdam Mobil: 0173/6322621 Twitter: http://twitter.com/Manuel_B
