The issue is that user and item IDs may be longs, but they are used as indexes into a vector, which are ints. This does a hashing and stores that mapping, so it can be reversed at the end. For the reverse mapping, in the case of collision, the lowest long key wins. And, the hash also has the nice property of being the identity mapping for values <= Integer.MAX_VALUE.
Sean On Wed, Aug 25, 2010 at 11:01 PM, Stanley Ipkiss <[email protected]> wrote: > > In RecommenderJob.java (org.apache.mahout.cf.taste.hadoop.item), what is the > primary purpose of the first map reduce job? This is the one that I am > talking about - > > if (shouldRunNextPhase(parsedArgs, currentPhase)) { > Job itemIDIndex = prepareJob( > inputPath, itemIDIndexPath, TextInputFormat.class, > ItemIDIndexMapper.class, VarIntWritable.class, > VarLongWritable.class, > ItemIDIndexReducer.class, VarIntWritable.class, > VarLongWritable.class, > SequenceFileOutputFormat.class); > itemIDIndex.setCombinerClass(ItemIDIndexReducer.class); > itemIDIndex.waitForCompletion(true); > } > > It seems to me that the mapper just outputs int based keys for the item/user > long ids, and the reducer just finds the least user/item id within each > index. Do we just want to find the lowest id in our complete dataset, for > which we end up spinning a complete map reduce job? > -- > View this message in context: > http://lucene.472066.n3.nabble.com/1st-MapReduce-job-in-RecommenderJob-java-org-apache-mahout-cf-taste-hadoop-item-tp1342081p1342081.html > Sent from the Mahout User List mailing list archive at Nabble.com. >
