Running MapReduce from a web application

Andre Reiter Wed, 22 Jun 2011 02:05:29 -0700

Hi everybody,

it was not an easy way to run a map reduce job at all, ie if a third party jars 
are involved...
a good help is the article by cloudera: 
http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/


i still can not use the -libjars argument for running a MR job with 3rd party 
jars, like described in the first option
for some reason it does not work for me... the tasks fail with the 
java.lang.ClassNotFoundException, classes of the 3rd party lib are not found

the second option: Include the referenced JAR in the lib subdirectory of the 
submittable JAR
actually this works fine for me, starting a job from the shell like this: 
./bin/hadoop jar /tmp/my.jar package.HBaseReader
not the most elegant way, but finally it works

now i would like to start MR jobs from my web application running on a tomcat, 
is there an elegant way to do it using 3rd party jars?

the third option described at the article is to include the jars on every 
tasktracker, which is IMHO not the very best, like the second...


the second question: at the moment i use the TextOutputFormatis the output 
format, which creates a file in the specified dfs directory: part-r-00000
so i can read id using ./bin/hadoop fs -cat /tmp/requests/part-r-00000 on the 
shell

how can i get the path to this output file after my job is finished, to process 
it however... is there another way to collect results of a MR job, a text file 
is good for humans, but IMHO parsing a text file for results is not the 
preferable way...

thanks in advance
andre

PS:
versions:
 - Linux version 2.6.26-2-amd64 (Debian 2.6.26-25lenny1)
 - hadoop-0.20.2-CDH3B4
 - hbase-0.90.1-CDH3B4
 - zookeeper-3.3.2-CDH3B4

Running MapReduce from a web application

Reply via email to