Re: Running MapReduce from a web application

Jonathan Holloway Fri, 24 Jun 2011 15:12:23 -0700

Take a look at Yahoo's Oozie, it's fairly trivial to build a workflow for a map 
reduce job and submit it via the web service for processing, it's a lot easier 
than using ProcessBuilder also.


Jon.

On 24 Jun 2011, at 22:47, Andre Reiter <a.rei...@web.de> wrote:

> Hi Doug,
> 
> thanks a lot for your reply
> the point is clear hoe to create a job instance and to configure it using the 
> TableMapReduceUtil.initTableMapperJob
> actually our job is working just perfectly, even the third party libs are 
> simple to import using TableMapReduceUtil.addDependencyJars
> 
> the problem is about the starting the MR job...
> 
> at the moment we do it this way:
> - set HADOOP_CLASSPATH with hbase, zookeeper, and all third party jars
> - execute "./bin/hadoop jar /tmp/map_reduce_v1.jar package1.MRDriver1"
> 
> that works like a charm, the question is now, how to start the job from our 
> web application running on tomcat ???
> 
> one option is may be to fork a new process, like this:
> ProcessBuilder pb = new ProcessBuilder("/opt/hadoop/bin/hadoop", "jar", 
> "/tmp/map_reduce_v1.jar", "package1.MRDriver1");
> ...
> // configure ProcessBuilder
> Process p = processBuilder.start();
> 
> but this does not seem to be very elegant to us... does it?
> 
> so how to start a job from a running app, in the same process without forking
> 
> andre
> 
> 
> 
> Doug Meil wrote:
>> 
>> Hi there-
>> 
>> Take a look at this for starters...
>> 
>> http://hbase.apache.org/book.html#mapreduce
>> 
>> 
>> if you do job.waitForCompletion(true); it will execute synchronously.  If 
>> you do job.waitForCompletion(false) it will fire and forget.  A simple 
>> pattern is to spin off a thread where it executes job.waitFor..(true) and 
>> then you can pick up the results.
>> 
>> 
>> -----Original Message-----
>> From: Andre Reiter [mailto:a.rei...@web.de]
>> Sent: Friday, June 24, 2011 12:41 AM
>> To: user@hbase.apache.org
>> Subject: Re: Running MapReduce from a web application
>> 
>> Hi everybody,
>> 
>> no suggestiona about that questions?
>> how to submit a MR out of my application, and not manually from a shell 
>> useing ./bin/hadoop jar ... ?
>> 
>> best regards
>> andre
>> 
>> 
>> 
>> Andre Reiter wrote:
>>> now i would like to start MR jobs from my web application running on a 
>>> tomcat, is there an elegant way to do it?
>>> 
>>> the second question: at the moment i use the TextOutputFormatis the
>>> output format, which creates a file in the specified dfs directory:
>>> part-r-00000 so i can read id using ./bin/hadoop fs -cat
>>> /tmp/requests/part-r-00000 on the shell
>>> 
>>> how can i get the path to this output file after my job is finished, to 
>>> process it however... is there another way to collect results of a MR job, 
>>> a text file is good for humans, but IMHO parsing a text file for results is 
>>> not the preferable way...
>>> 
>>> thanks in advance
>>> andre
>>> 
>>> PS:
>>> versions:
>>> - Linux version 2.6.26-2-amd64 (Debian 2.6.26-25lenny1)
>>> - hadoop-0.20.2-CDH3B4
>>> - hbase-0.90.1-CDH3B4
>>> - zookeeper-3.3.2-CDH3B4
>> 
>> 
> 
>

Re: Running MapReduce from a web application

Reply via email to