Re: 1st MapReduce job in RecommenderJob.java (org.apache.mahout.cf.taste.hadoop.item)

Sean Owen Wed, 25 Aug 2010 15:48:32 -0700

The issue is that user and item IDs may be longs, but they are used as
indexes into a vector, which are ints. This does a hashing and stores
that mapping, so it can be reversed at the end. For the reverse
mapping, in the case of collision, the lowest long key wins. And, the
hash also has the nice property of being the identity mapping for
values <= Integer.MAX_VALUE.


Sean

On Wed, Aug 25, 2010 at 11:01 PM, Stanley Ipkiss
<[email protected]> wrote:
>
> In RecommenderJob.java (org.apache.mahout.cf.taste.hadoop.item), what is the
> primary purpose of the first map reduce job? This is the one that I am
> talking about -
>
>    if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>      Job itemIDIndex = prepareJob(
>        inputPath, itemIDIndexPath, TextInputFormat.class,
>        ItemIDIndexMapper.class, VarIntWritable.class,
> VarLongWritable.class,
>        ItemIDIndexReducer.class, VarIntWritable.class,
> VarLongWritable.class,
>        SequenceFileOutputFormat.class);
>      itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);
>      itemIDIndex.waitForCompletion(true);
>    }
>
> It seems to me that the mapper just outputs int based keys for the item/user
> long ids, and the reducer just finds the least user/item id within each
> index. Do we just want to find the lowest id in our complete dataset, for
> which we end up spinning a complete map reduce job?
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/1st-MapReduce-job-in-RecommenderJob-java-org-apache-mahout-cf-taste-hadoop-item-tp1342081p1342081.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Re: 1st MapReduce job in RecommenderJob.java (org.apache.mahout.cf.taste.hadoop.item)

Reply via email to