Hi, I wrote a program which creates Map-Reduce jobs in an iterative fashion as follows:
while (true) { JobConf conf2 = new JobConf(getConf(),graphMining.class); conf2.setJobName("sid");conf2.setMapperClass(mapperMiner.class);conf2.setReducerClass(reducerMiner.class);conf2.setInputFormat(SequenceFileInputFormat.class);conf2.setOutputFormat(SequenceFileOutputFormat.class);conf2.setOutputValueClass(BytesWritable.class);conf2.setMapOutputKeyClass(Text.class);conf2.setMapOutputValueClass(MapWritable.class);conf2.setOutputKeyClass(Text.class); conf2.setNumMapTasks(Integer.parseInt(args[3]));conf2.setNumReduceTasks(Integer.parseInt(args[4]));FileInputFormat.addInputPath(conf2, new Path(input));FileOutputFormat.setOutputPath(conf2, new Path(output)); } RunningJob job = JobClient.runJob(conf2); } Now, I want the first Job which gets created to write something in the distributed cache and the jobs which get created after the first job to read from the distributed cache. I came to know that the DistributedCache.addcacheFiles() method is deprecated, so the documentation suggests to use Job.addcacheFiles() method specific for each job. But, I am unable to get an handle of the currently running job, as JobClient.runJob(conf2) submits a job internally. How can I share the content written by the first job in this while loop available via distributed cache to other jobs which get created in later iterations of while loop ?