Re: Accumulo and Mapreduce

Russell Jurney Mon, 04 Mar 2013 10:52:38 -0800

You can chain MR jobs with Oozie, but would suggest using Cascading, Pig or
Hive. You can do this is a couple lines of code, I suspect. Two map reduce
jobs should not pose any kind of challenge with the right tools.


On Monday, March 4, 2013, Sandy Ryza wrote:

> Hi Aji,
>
> Oozie is a mature project for managing MapReduce workflows.
> http://oozie.apache.org/
>
> -Sandy
>
>
> On Mon, Mar 4, 2013 at 8:17 AM, Justin Woody 
> <[email protected]<javascript:_e({}, 'cvml', '[email protected]');>
> > wrote:
>
>> Aji,
>>
>> Why don't you just chain the jobs together?
>> http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining
>>
>> Justin
>>
>> On Mon, Mar 4, 2013 at 11:11 AM, Aji Janis 
>> <[email protected]<javascript:_e({}, 'cvml', '[email protected]');>>
>> wrote:
>> > Russell thanks for the link.
>> >
>> > I am interested in finding a solution (if out there) where Mapper1
>> outputs a
>> > custom object and Mapper 2 can use that as input. One way to do this
>> > obviously by writing to Accumulo, in my case. But, is there another
>> solution
>> > for this:
>> >
>> > List<MyObject> ----> Input to Job
>> >
>> > MyObject ---> Input to Mapper1 (process MyObject) ----> Output
>> <MyObjectId,
>> > MyObject>
>> >
>> > <MyObjectId, MyObject> are Input to Mapper2 ... and so on
>> >
>> >
>> >
>> > Ideas?
>> >
>> >
>> > On Mon, Mar 4, 2013 at 10:00 AM, Russell Jurney <
>> [email protected] <javascript:_e({}, 'cvml',
>> '[email protected]');>>
>> > wrote:
>> >>
>> >>
>> >>
>> http://svn.apache.org/repos/asf/accumulo/contrib/pig/trunk/src/main/java/org/apache/accumulo/pig/AccumuloStorage.java
>> >>
>> >> AccumuloStorage for Pig comes with Accumulo. Easiest way would be to
>> try
>> >> it.
>> >>
>> >> Russell Jurney http://datasyndrome.com
>> >>
>> >> On Mar 4, 2013, at 5:30 AM, Aji Janis 
>> >> <[email protected]<javascript:_e({}, 'cvml', '[email protected]');>>
>> wrote:
>> >>
>> >> Hello,
>> >>
>> >>  I have a MR job design with a flow like this: Mapper1 -> Mapper2 ->
>> >> Mapper3 -> Reducer1. Mapper1's input is an accumulo table. M1's output
>> goes
>> >> to M2.. and so on. Finally the Reducer writes output to Accumulo.
>> >>
>> >> Questions:
>> >>
>> >> 1) Has any one tried something like this before? Are there any workflow
>> >> control apis (in or outside of Hadoop) that can help me set up the job
>> like
>> >> this. Or am I limited to use Quartz for this?
>> >> 2) If both M2 and M3 needed to write some data to two same tables in
>> >> Accumulo, is it possible to do so? Are there any good accumulo
>> mapreduce
>> >> jobs you can point me to? blogs/pages that I can use for reference
>> (starting
>> >> point/best practices).
>> >>
>> >> Thank you in advance for any suggestions!
>> >>
>> >> Aji
>> >>
>> >
>>
>
>

-- 
Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com

Re: Accumulo and Mapreduce

Reply via email to