How to share files amongst multiple jobs using Distributed Cache in Hadoop 2.7.2

Siddharth Dawar Tue, 07 Jun 2016 02:18:13 -0700

Hi,

I wrote a program which creates Map-Reduce jobs in an iterative fashion as
follows:



while (true) {

JobConf conf2  = new JobConf(getConf(),graphMining.class);
conf2.setJobName("sid");conf2.setMapperClass(mapperMiner.class);conf2.setReducerClass(reducerMiner.class);conf2.setInputFormat(SequenceFileInputFormat.class);conf2.setOutputFormat(SequenceFileOutputFormat.class);conf2.setOutputValueClass(BytesWritable.class);conf2.setMapOutputKeyClass(Text.class);conf2.setMapOutputValueClass(MapWritable.class);conf2.setOutputKeyClass(Text.class);

conf2.setNumMapTasks(Integer.parseInt(args[3]));conf2.setNumReduceTasks(Integer.parseInt(args[4]));FileInputFormat.addInputPath(conf2,
new Path(input));FileOutputFormat.setOutputPath(conf2, new
Path(output)); }

RunningJob job = JobClient.runJob(conf2);
}


Now, I want the first Job which gets created to write something in the
distributed cache and the jobs which get created after the first job to
read from the distributed cache.

I came to know that the DistributedCache.addcacheFiles() method is
deprecated, so the documentation suggests to use Job.addcacheFiles() method
specific for each job.

But, I am unable to get an handle of the currently running job, as
JobClient.runJob(conf2) submits a job internally.


How can I share the content written by the first job in this while loop
available via distributed cache to other jobs which get created in later
iterations of while loop ?

How to share files amongst multiple jobs using Distributed Cache in Hadoop 2.7.2

Reply via email to