Yep, you perfectly got my question. I just tried and it's working perfectly!
Thanks a lot! I now have a lot to play with. JM 2012/11/2, Shrijeet Paliwal <[email protected]>: > JM, > > I personally would chose to put it neither hadoop libs nor hbase libs. Have > them go to your application's own install directory. > > Then you could sent the variable HADOOP_CLASSPATH to have your jar (also > include hbase jars, hbase dependencies and dependencies your program needs) > And to execute fire 'hadoop jar' command. > > An example[1]: > > Set classpath: > export HADOOP_CLASSPATH=`hbase classpath`:mycool.jar:mycooldependency.jar > > Fire following to launch your job: > hadoop jar mycool.jar hbase.experiments.MyCoolProgram > -Dmapred.running.map.limit=50 > -Dmapred.map.tasks.speculative.execution=false aCommandLineArg > > > Did I get your question right? > > [1] In the example I gave `hbase classpath` gets you set with all hbase > jars. > > > > On Fri, Nov 2, 2012 at 11:56 AM, Jean-Marc Spaggiari < > [email protected]> wrote: > >> Hi Shrijeet, >> >> Helped a lot! Thanks! >> >> Now, the only think I need is to know where's the best place to put my >> JAR on the server. Should I put it on the hadoop lib directory? Or >> somewhere on the HBase structure? >> >> Thanks, >> >> JM >> >> 2012/10/29, Shrijeet Paliwal <[email protected]>: >> > In line. >> > >> > On Mon, Oct 29, 2012 at 8:11 AM, Jean-Marc Spaggiari < >> > [email protected]> wrote: >> > >> >> I'm replying to myself ;) >> >> >> >> I found "cleanup" and "setup" methods from the TableMapper table. So I >> >> think those are the methods I was looking for. I will init the >> >> HTablePool there. Please let me know if I'm wrong. >> >> >> >> Now, I still have few other questions. >> >> >> >> 1) context.getCurrentValue() can throw a InterrruptedException, but >> >> when can this occur? Is there a timeout on the Mapper side? Of it's if >> >> the region is going down while the job is running? >> >> >> > >> > You do not need to call context.getCurrentValue(). The 'value' >> > argument >> to >> > map method[1] has the information you are looking for. >> > >> > >> >> 2) How can I pass parameters to the Map method? Can I use >> >> job.getConfiguration().put to add some properties there, can get them >> >> back in context.getConfiguration.get? >> >> >> > >> > Yes, thats how it is done. >> > >> > >> >> 3) What's the best way to log results/exceptions/traces from the map >> >> method? >> >> >> > >> > In most cases, you'll have mapper and reducer classes as nested static >> > classes within some enclosing class. You can get handle to the Logger >> from >> > the enclosing class and do your usual LOG.info, LOG.warn yada yada. >> > >> > Hope it helps. >> > >> > [1] map(KEYIN key, *VALUEIN value*, Context context) >> > >> >> >> >> I will search on my side, but some help will be welcome because it >> >> seems there is not much documentation when we start to dig a bit :( >> >> >> >> JM >> >> >> >> 2012/10/27, Jean-Marc Spaggiari <[email protected]>: >> >> > Hi, >> >> > >> >> > I'm thinking about my firs MapReduce class and I have some >> >> > questions. >> >> > >> >> > The goal of it will be to move some rows from one table to another >> >> > one >> >> > based on the timestamp only. >> >> > >> >> > Since this is pretty new for me, I'm starting from the RowCounter >> >> > class to have a baseline. >> >> > >> >> > There are few things I will have to update. First, the >> >> > createSumittableJob method to get timestamp range instead of key >> >> > range, and "play2 with the parameters. This part is fine. >> >> > >> >> > Next, I need to update the map method, and this is where I have some >> >> > questions. >> >> > >> >> > I'm able to find the timestamp of all the cf:c from the >> >> > context.getCurrentValue() method, that's fine. Now, my concern is on >> >> > the way to get access to the table to store this field, and the >> >> > table >> >> > to delete it. Should I instantiate an HTable for the source table, >> >> > and >> >> > execute and delete on it, then do an insert on another HTable >> >> > instance? Should I use an HTablePool? Also, since I’m already on >> >> > the >> >> > row, can’t I just mark it as deleted instead of calling a new >> >> > HTable? >> >> > >> >> > Also, instead of calling the delete and put one by one, I would like >> >> > to put them on a list and execute it only when it’s over 10 members. >> >> > How can I make sure that at the end of the job, this is flushed? >> >> > Else, >> >> > I will lose some operations. Is there a kind of “dispose” method >> >> > called on the region when the job is done? >> >> > >> >> > Thanks, >> >> > >> >> > JM >> >> > >> >> >> > >> >
