Hi, Today morning, I noticed one more weird thing. When I run the map reduce job using this utility, it does not show up in JobTracker web UI. Any one has any clue? Please help. Thanks.
Regards, Anand.C From: Chandra Mohan, Ananda Vel Murugan [mailto:[email protected]] Sent: Monday, November 04, 2013 7:32 PM To: [email protected] Subject: Running map reduce programmatically is unusually slow Hi, I have written a small utility to run map reduce job programmatically. My aim is to run my map reduce job without using hadoop shell script. I am planning to call this utility from another application. Following is the code which runs the map reduce job. I have bundled this java class into a jar (remotemr.jar ). I have the actual map reduce job bundled inside another jar (mapreduce.jar) import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.SequenceFileInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; import org.apache.hadoop.mapred.jobcontrol.Job; import org.apache.hadoop.mapred.jobcontrol.JobControl; public class RemoteMapreduce { public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException { String inputPath = args[0]; String outputPath = args[1]; String specFilePath=args[2]; Configuration config = new Configuration(); config.addResource(new Path("/opt/hadoop-1.0.2/bin/core-site.xml")); config.addResource(new Path("/opt/hadoop-1.0.2/bin/hdfs-site.xml")); JobConf jobConf = new JobConf(config); jobConf.set("hadoop.tmp.dir ", "/tmp/hadoop-ananda/"); jobConf.setJar("/home/ananda/mapreduce.jar"); jobConf.setMapperClass(Myjob.MapClass.class); SequenceFileInputFormat.setInputPaths(jobConf, new Path(inputPath)); TextOutputFormat.setOutputPath(jobConf, new Path(outputPath)); jobConf.setMapOutputKeyClass(Text.class); jobConf.setMapOutputValueClass(Text.class); jobConf.setInputFormat(SequenceFileInputFormat.class); jobConf.setOutputFormat(TextOutputFormat.class); jobConf.setOutputKeyClass(Text.class); jobConf.setOutputValueClass(Text.class); jobConf.set("specPath", specFilePath); jobConf.setUser("ananda"); Job job1 = new Job(jobConf); JobClient jc = new JobClient(jobConf); jc.submitJob(jobConf); /* JobControl ctrl = new JobControl("dar"); ctrl.addJob(job1); ctrl.run();*/ System.out.println("Job launched!"); } } I am running it as follows java -cp <all hadoop jars needed for the job>:/home/ananda/mapreduce.jar:/home/Ananda/remotemr.jar RemoteMapreduce <inputpath> <outputpath> <specpath> It runs without any error. But it takes longer time than what it takes when I run it using hadoop shell script. One more thing is all the three input paths needs to be fully qualified HDFS paths i.e. hdfs://<hostname>:<port>/<path>. If I give partial paths as in hadoop shell script, I am getting input path not found errors. Am I doing anything wrong? Please help. Thanks Regards, Anand.C
