Hello Matt,

On 01.08.2012, at 22:40, Matt Mitchell wrote:

> Thanks Sean! That all makes sense. Would you mind recommended a
> hashing function for this? Is there something in Mahout I could use?

The following class uses an string to long mapping based on a MemoryIDMigrator:

https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/FacebookRecommender.java

Internally mahout uses parts of the md5 hashes. Which can be fir example 
directly expressed in SQL:

cast(conv(substring(md5([column name]), 1, 16),16,10) as signed)

Javadoc can be found here:
https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/model/IDMigrator.html

/Manuel

> 
> - Matt
> 
> On Wed, Aug 1, 2012 at 4:34 PM, Sean Owen <[email protected]> wrote:
>> Yep, just hash to a long, from UUID or String or whatever. The occasional
>> collision does not cause a real problem. If you mix the tastes of two users
>> or items once in a billion times, the overall results will hardly be
>> different.
>> 
>> You have to maintain the reverse mapping of course. Look at the IDMigrator
>> class for a little help there.
>> 
>> You can rewrite to use UUID or String, but believe me, it will be an
>> immense amount of change and make things much slower. It used to work this
>> way for recommenders in about 2006 and the Object overhead and GC pressure
>> was by far the bottleneck. That's why it's all long now.
>> 
>> On Wed, Aug 1, 2012 at 9:29 PM, Matt Mitchell <[email protected]> wrote:
>> 
>>> Question about dealing with UUIDs as Mahout user IDs. I'm considering
>>> ways to deal with these values:
>>> 
>>> 1. use getLeastSignificantBits
>>> 2. re-map to a database auto-increment number (this would take very
>>> long time to do?)
>>> 3. customize mahout so that it accepts UUIDs as user IDs
>>> 
>>> Any feedback here? If I went with #3 (seems the safest) how would I do
>>> this and, what are the consequences?
>>> 
>>> The user count is in the millions.
>>> 
>>> Thanks!
>>> 

-- 
Manuel Blechschmidt
M.Sc. IT Systems Engineering
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B

Reply via email to