You can chain MR jobs with Oozie, but would suggest using Cascading, Pig or Hive. You can do this is a couple lines of code, I suspect. Two map reduce jobs should not pose any kind of challenge with the right tools.
On Monday, March 4, 2013, Sandy Ryza wrote: > Hi Aji, > > Oozie is a mature project for managing MapReduce workflows. > http://oozie.apache.org/ > > -Sandy > > > On Mon, Mar 4, 2013 at 8:17 AM, Justin Woody > <[email protected]<javascript:_e({}, 'cvml', '[email protected]');> > > wrote: > >> Aji, >> >> Why don't you just chain the jobs together? >> http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining >> >> Justin >> >> On Mon, Mar 4, 2013 at 11:11 AM, Aji Janis >> <[email protected]<javascript:_e({}, 'cvml', '[email protected]');>> >> wrote: >> > Russell thanks for the link. >> > >> > I am interested in finding a solution (if out there) where Mapper1 >> outputs a >> > custom object and Mapper 2 can use that as input. One way to do this >> > obviously by writing to Accumulo, in my case. But, is there another >> solution >> > for this: >> > >> > List<MyObject> ----> Input to Job >> > >> > MyObject ---> Input to Mapper1 (process MyObject) ----> Output >> <MyObjectId, >> > MyObject> >> > >> > <MyObjectId, MyObject> are Input to Mapper2 ... and so on >> > >> > >> > >> > Ideas? >> > >> > >> > On Mon, Mar 4, 2013 at 10:00 AM, Russell Jurney < >> [email protected] <javascript:_e({}, 'cvml', >> '[email protected]');>> >> > wrote: >> >> >> >> >> >> >> http://svn.apache.org/repos/asf/accumulo/contrib/pig/trunk/src/main/java/org/apache/accumulo/pig/AccumuloStorage.java >> >> >> >> AccumuloStorage for Pig comes with Accumulo. Easiest way would be to >> try >> >> it. >> >> >> >> Russell Jurney http://datasyndrome.com >> >> >> >> On Mar 4, 2013, at 5:30 AM, Aji Janis >> >> <[email protected]<javascript:_e({}, 'cvml', '[email protected]');>> >> wrote: >> >> >> >> Hello, >> >> >> >> I have a MR job design with a flow like this: Mapper1 -> Mapper2 -> >> >> Mapper3 -> Reducer1. Mapper1's input is an accumulo table. M1's output >> goes >> >> to M2.. and so on. Finally the Reducer writes output to Accumulo. >> >> >> >> Questions: >> >> >> >> 1) Has any one tried something like this before? Are there any workflow >> >> control apis (in or outside of Hadoop) that can help me set up the job >> like >> >> this. Or am I limited to use Quartz for this? >> >> 2) If both M2 and M3 needed to write some data to two same tables in >> >> Accumulo, is it possible to do so? Are there any good accumulo >> mapreduce >> >> jobs you can point me to? blogs/pages that I can use for reference >> (starting >> >> point/best practices). >> >> >> >> Thank you in advance for any suggestions! >> >> >> >> Aji >> >> >> > >> > > -- Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
