Re: Question about MapReduce

Jean-Marc Spaggiari Fri, 02 Nov 2012 11:57:26 -0700

Hi Shrijeet,

Helped a lot! Thanks!


Now, the only think I need is to know where's the best place to put my
JAR on the server. Should I put it on the hadoop lib directory? Or
somewhere on the HBase structure?

Thanks,

JM

2012/10/29, Shrijeet Paliwal <[email protected]>:
> In line.
>
> On Mon, Oct 29, 2012 at 8:11 AM, Jean-Marc Spaggiari <
> [email protected]> wrote:
>
>> I'm replying to myself ;)
>>
>> I found "cleanup" and "setup" methods from the TableMapper table. So I
>> think those are the methods I was looking for. I will init the
>> HTablePool there. Please let me know if I'm wrong.
>>
>> Now, I still have few other questions.
>>
>> 1) context.getCurrentValue() can throw a InterrruptedException, but
>> when can this occur? Is there a timeout on the Mapper side? Of it's if
>> the region is going down while the job is running?
>>
>
> You do not need to call  context.getCurrentValue(). The 'value' argument to
> map method[1] has the information you are looking for.
>
>
>> 2) How can I pass parameters to the Map method? Can I use
>> job.getConfiguration().put to add some properties there, can get them
>> back in context.getConfiguration.get?
>>
>
> Yes, thats how it is done.
>
>
>> 3) What's the best way to log results/exceptions/traces from the map
>> method?
>>
>
> In most cases, you'll have mapper and reducer classes as nested static
> classes within some enclosing class. You can get handle to the Logger from
> the enclosing class and do your usual LOG.info, LOG.warn yada yada.
>
> Hope it helps.
>
> [1] map(KEYIN key, *VALUEIN value*, Context context)
>
>>
>> I will search on my side, but some help will be welcome because it
>> seems there is not much documentation when we start to dig a bit :(
>>
>> JM
>>
>> 2012/10/27, Jean-Marc Spaggiari <[email protected]>:
>> > Hi,
>> >
>> > I'm thinking about my firs MapReduce class and I have some questions.
>> >
>> > The goal of it will be to move some rows from one table to another one
>> > based on the timestamp only.
>> >
>> > Since this is pretty new for me, I'm starting from the RowCounter
>> > class to have a baseline.
>> >
>> > There are few things I will have to update. First, the
>> > createSumittableJob method to get timestamp range instead of key
>> > range, and "play2 with the parameters. This part is fine.
>> >
>> > Next, I need to update the map method, and this is where I have some
>> > questions.
>> >
>> > I'm able to find the timestamp of all the cf:c from the
>> > context.getCurrentValue() method, that's fine. Now, my concern is on
>> > the way to get access to the table to store this field, and the table
>> > to delete it. Should I instantiate an HTable for the source table, and
>> > execute and delete on it, then do an insert on another HTable
>> > instance?  Should I use an HTablePool? Also, since I’m already on the
>> > row, can’t I just mark it as deleted instead of calling a new HTable?
>> >
>> > Also, instead of calling the delete and put one by one, I would like
>> > to put them on a list and execute it only when it’s over 10 members.
>> > How can I make sure that at the end of the job, this is flushed? Else,
>> > I will lose some operations. Is there a kind of “dispose” method
>> > called on the region when the job is done?
>> >
>> > Thanks,
>> >
>> > JM
>> >
>>
>

Re: Question about MapReduce

Reply via email to