Hi Stanley,

The IDs for items are expected to be longs in the input data. When we
convert the input to vectors these longs have to be mapped to ints. The
first mapreduce job you're talking about is storing that mapping so that
these ints can be mapped back to the original long IDs later in the final
recommendation step.

--sebastian


2010/8/26 Stanley Ipkiss <[email protected]>

>
> In RecommenderJob.java (org.apache.mahout.cf.taste.hadoop.item), what is
> the
> primary purpose of the first map reduce job? This is the one that I am
> talking about -
>
>    if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>      Job itemIDIndex = prepareJob(
>        inputPath, itemIDIndexPath, TextInputFormat.class,
>        ItemIDIndexMapper.class, VarIntWritable.class,
> VarLongWritable.class,
>        ItemIDIndexReducer.class, VarIntWritable.class,
> VarLongWritable.class,
>        SequenceFileOutputFormat.class);
>      itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);
>      itemIDIndex.waitForCompletion(true);
>    }
>
> It seems to me that the mapper just outputs int based keys for the
> item/user
> long ids, and the reducer just finds the least user/item id within each
> index. Do we just want to find the lowest id in our complete dataset, for
> which we end up spinning a complete map reduce job?
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/1st-MapReduce-job-in-RecommenderJob-java-org-apache-mahout-cf-taste-hadoop-item-tp1342081p1342081.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Reply via email to