Aji, Why don't you just chain the jobs together? http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining
Justin On Mon, Mar 4, 2013 at 11:11 AM, Aji Janis <[email protected]> wrote: > Russell thanks for the link. > > I am interested in finding a solution (if out there) where Mapper1 outputs a > custom object and Mapper 2 can use that as input. One way to do this > obviously by writing to Accumulo, in my case. But, is there another solution > for this: > > List<MyObject> ----> Input to Job > > MyObject ---> Input to Mapper1 (process MyObject) ----> Output <MyObjectId, > MyObject> > > <MyObjectId, MyObject> are Input to Mapper2 ... and so on > > > > Ideas? > > > On Mon, Mar 4, 2013 at 10:00 AM, Russell Jurney <[email protected]> > wrote: >> >> >> http://svn.apache.org/repos/asf/accumulo/contrib/pig/trunk/src/main/java/org/apache/accumulo/pig/AccumuloStorage.java >> >> AccumuloStorage for Pig comes with Accumulo. Easiest way would be to try >> it. >> >> Russell Jurney http://datasyndrome.com >> >> On Mar 4, 2013, at 5:30 AM, Aji Janis <[email protected]> wrote: >> >> Hello, >> >> I have a MR job design with a flow like this: Mapper1 -> Mapper2 -> >> Mapper3 -> Reducer1. Mapper1's input is an accumulo table. M1's output goes >> to M2.. and so on. Finally the Reducer writes output to Accumulo. >> >> Questions: >> >> 1) Has any one tried something like this before? Are there any workflow >> control apis (in or outside of Hadoop) that can help me set up the job like >> this. Or am I limited to use Quartz for this? >> 2) If both M2 and M3 needed to write some data to two same tables in >> Accumulo, is it possible to do so? Are there any good accumulo mapreduce >> jobs you can point me to? blogs/pages that I can use for reference (starting >> point/best practices). >> >> Thank you in advance for any suggestions! >> >> Aji >> >
