Re: Accumulo and Mapreduce

Justin Woody Mon, 04 Mar 2013 08:18:07 -0800

Aji,

Why don't you just chain the jobs together?
http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining


Justin

On Mon, Mar 4, 2013 at 11:11 AM, Aji Janis <[email protected]> wrote:
> Russell thanks for the link.
>
> I am interested in finding a solution (if out there) where Mapper1 outputs a
> custom object and Mapper 2 can use that as input. One way to do this
> obviously by writing to Accumulo, in my case. But, is there another solution
> for this:
>
> List<MyObject> ----> Input to Job
>
> MyObject ---> Input to Mapper1 (process MyObject) ----> Output <MyObjectId,
> MyObject>
>
> <MyObjectId, MyObject> are Input to Mapper2 ... and so on
>
>
>
> Ideas?
>
>
> On Mon, Mar 4, 2013 at 10:00 AM, Russell Jurney <[email protected]>
> wrote:
>>
>>
>> http://svn.apache.org/repos/asf/accumulo/contrib/pig/trunk/src/main/java/org/apache/accumulo/pig/AccumuloStorage.java
>>
>> AccumuloStorage for Pig comes with Accumulo. Easiest way would be to try
>> it.
>>
>> Russell Jurney http://datasyndrome.com
>>
>> On Mar 4, 2013, at 5:30 AM, Aji Janis <[email protected]> wrote:
>>
>> Hello,
>>
>>  I have a MR job design with a flow like this: Mapper1 -> Mapper2 ->
>> Mapper3 -> Reducer1. Mapper1's input is an accumulo table. M1's output goes
>> to M2.. and so on. Finally the Reducer writes output to Accumulo.
>>
>> Questions:
>>
>> 1) Has any one tried something like this before? Are there any workflow
>> control apis (in or outside of Hadoop) that can help me set up the job like
>> this. Or am I limited to use Quartz for this?
>> 2) If both M2 and M3 needed to write some data to two same tables in
>> Accumulo, is it possible to do so? Are there any good accumulo mapreduce
>> jobs you can point me to? blogs/pages that I can use for reference (starting
>> point/best practices).
>>
>> Thank you in advance for any suggestions!
>>
>> Aji
>>
>

Re: Accumulo and Mapreduce

Reply via email to