Re: Question about MapReduce

Shrijeet Paliwal Fri, 02 Nov 2012 12:07:06 -0700

JM,

I personally would chose to put it neither hadoop libs nor hbase libs. Have
them go to your application's own install directory.


Then you could sent the variable HADOOP_CLASSPATH to have your jar (also
include hbase jars, hbase dependencies and dependencies your program needs)
And to execute fire 'hadoop jar' command.

An example[1]:

Set classpath:
export HADOOP_CLASSPATH=`hbase classpath`:mycool.jar:mycooldependency.jar

Fire following to launch your job:
hadoop jar mycool.jar hbase.experiments.MyCoolProgram
-Dmapred.running.map.limit=50
-Dmapred.map.tasks.speculative.execution=false aCommandLineArg


Did I get your question right?

[1] In the example I gave `hbase classpath` gets you set with all hbase
jars.



On Fri, Nov 2, 2012 at 11:56 AM, Jean-Marc Spaggiari <
[email protected]> wrote:

> Hi Shrijeet,
>
> Helped a lot! Thanks!
>
> Now, the only think I need is to know where's the best place to put my
> JAR on the server. Should I put it on the hadoop lib directory? Or
> somewhere on the HBase structure?
>
> Thanks,
>
> JM
>
> 2012/10/29, Shrijeet Paliwal <[email protected]>:
> > In line.
> >
> > On Mon, Oct 29, 2012 at 8:11 AM, Jean-Marc Spaggiari <
> > [email protected]> wrote:
> >
> >> I'm replying to myself ;)
> >>
> >> I found "cleanup" and "setup" methods from the TableMapper table. So I
> >> think those are the methods I was looking for. I will init the
> >> HTablePool there. Please let me know if I'm wrong.
> >>
> >> Now, I still have few other questions.
> >>
> >> 1) context.getCurrentValue() can throw a InterrruptedException, but
> >> when can this occur? Is there a timeout on the Mapper side? Of it's if
> >> the region is going down while the job is running?
> >>
> >
> > You do not need to call  context.getCurrentValue(). The 'value' argument
> to
> > map method[1] has the information you are looking for.
> >
> >
> >> 2) How can I pass parameters to the Map method? Can I use
> >> job.getConfiguration().put to add some properties there, can get them
> >> back in context.getConfiguration.get?
> >>
> >
> > Yes, thats how it is done.
> >
> >
> >> 3) What's the best way to log results/exceptions/traces from the map
> >> method?
> >>
> >
> > In most cases, you'll have mapper and reducer classes as nested static
> > classes within some enclosing class. You can get handle to the Logger
> from
> > the enclosing class and do your usual LOG.info, LOG.warn yada yada.
> >
> > Hope it helps.
> >
> > [1] map(KEYIN key, *VALUEIN value*, Context context)
> >
> >>
> >> I will search on my side, but some help will be welcome because it
> >> seems there is not much documentation when we start to dig a bit :(
> >>
> >> JM
> >>
> >> 2012/10/27, Jean-Marc Spaggiari <[email protected]>:
> >> > Hi,
> >> >
> >> > I'm thinking about my firs MapReduce class and I have some questions.
> >> >
> >> > The goal of it will be to move some rows from one table to another one
> >> > based on the timestamp only.
> >> >
> >> > Since this is pretty new for me, I'm starting from the RowCounter
> >> > class to have a baseline.
> >> >
> >> > There are few things I will have to update. First, the
> >> > createSumittableJob method to get timestamp range instead of key
> >> > range, and "play2 with the parameters. This part is fine.
> >> >
> >> > Next, I need to update the map method, and this is where I have some
> >> > questions.
> >> >
> >> > I'm able to find the timestamp of all the cf:c from the
> >> > context.getCurrentValue() method, that's fine. Now, my concern is on
> >> > the way to get access to the table to store this field, and the table
> >> > to delete it. Should I instantiate an HTable for the source table, and
> >> > execute and delete on it, then do an insert on another HTable
> >> > instance?  Should I use an HTablePool? Also, since I’m already on the
> >> > row, can’t I just mark it as deleted instead of calling a new HTable?
> >> >
> >> > Also, instead of calling the delete and put one by one, I would like
> >> > to put them on a list and execute it only when it’s over 10 members.
> >> > How can I make sure that at the end of the job, this is flushed? Else,
> >> > I will lose some operations. Is there a kind of “dispose” method
> >> > called on the region when the job is done?
> >> >
> >> > Thanks,
> >> >
> >> > JM
> >> >
> >>
> >
>

Re: Question about MapReduce

Reply via email to