Hi all,
I'm trying to get https://issues.apache.org/jira/browse/MAHOUT-106 running in
Mahout 0.4 and Hadoop 0.20.2. I'm however stuck at a point where a job with
multiple input paths and mappers is created, as show in the code below.
MultipleInputs.addInputPath(psz, new
Path(sumSUQStarPath).makeQualified(fsPsz), SequenceFileInputFormat.class,
Psz.PszSumSUQStarMapper.class);
MultipleInputs.addInputPath(psz, new
Path(sumUQStarPath).makeQualified(fsPsz), SequenceFileInputFormat.class,
Psz.PszSumUQStarMapper.class);
prepareJobConfWithMultipleInputs(psz,
pszNextPath,
VarIntWritable.class,
LongFloatWritable.class,
Psz.PszReducer.class,
VarLongWritable.class,
IntFloatWritable.class,
SequenceFileOutputFormat.class);
JobClient.runJob(psz);
I'm not quite sure how this should be written for the current API's.
AbstractJob's current prepareJob method can handle multiple input paths via
org.apache.hadoop.fs.Path, not sure how to do with the extra mapper though.
Any help would be appreciated.
Thanks,
Alan