Re: Job with mulitple input paths and mappers

Sebastian Schelter Tue, 30 Nov 2010 08:49:28 -0800

Hi Alan,

AFAIK, there's no elegant solution, you have to create a mapper that cansomehow differentiate the inputs (you may need to add some kind ofidentifier to your data) and apply different mapping logics according tothat.

You can have a look at RecommenderJob to see howto safely build thecombined input pathes:

/* necessary to make this job (having a combined input path) workon Amazon S3 */Configuration partialMultiplyConf =partialMultiply.getConfiguration();FileSystem fs = FileSystem.get(tempDirPath.toUri(),partialMultiplyConf);

      prePartialMultiplyPath1 = prePartialMultiplyPath1.makeQualified(fs);
      prePartialMultiplyPath2 = prePartialMultiplyPath2.makeQualified(fs);

FileInputFormat.setInputPaths(partialMultiply,prePartialMultiplyPath1, prePartialMultiplyPath2);

      partialMultiply.waitForCompletion(true);

--sebastian


On 29.11.2010 18:54, Alan Said wrote:

Hi all,
I'm trying to get https://issues.apache.org/jira/browse/MAHOUT-106 running in 
Mahout 0.4 and Hadoop 0.20.2. I'm however stuck at a point where a job with 
multiple input paths and mappers is created, as show in the code below.

     MultipleInputs.addInputPath(psz, new 
Path(sumSUQStarPath).makeQualified(fsPsz), SequenceFileInputFormat.class, 
Psz.PszSumSUQStarMapper.class);
     MultipleInputs.addInputPath(psz, new 
Path(sumUQStarPath).makeQualified(fsPsz), SequenceFileInputFormat.class, 
Psz.PszSumUQStarMapper.class);

     prepareJobConfWithMultipleInputs(psz,
                                          pszNextPath,
                                          VarIntWritable.class,
                                          LongFloatWritable.class,
                                          Psz.PszReducer.class,
                                          VarLongWritable.class,
                                          IntFloatWritable.class,
                                          SequenceFileOutputFormat.class);
     JobClient.runJob(psz);

I'm not quite sure how this should be written for the current API's.
AbstractJob's current prepareJob method can handle multiple input paths via 
org.apache.hadoop.fs.Path, not sure how to do with the extra mapper though.

Any help would be appreciated.

Thanks,
Alan

Re: Job with mulitple input paths and mappers

Reply via email to