That was my initial plan too, but I was wondering if there was any other best practice about the delete. So I will go that way.
Thanks, JM 2012/11/2, Shrijeet Paliwal <[email protected]>: > Not sure what exactly is happening in your job. But in one of the delete > jobs I wrote I was creating an instance of HTable in setup method of my > mapper > > delTab = new HTable(conf, conf.get(TABLE_NAME)); > > And performing delete in map() call using delTab. So no, you do not have > access to table directly *usually*. > > > -Shrijeet > > > On Fri, Nov 2, 2012 at 12:47 PM, Jean-Marc Spaggiari < > [email protected]> wrote: > >> Sorry, one last question. >> >> On the map method, I have access to the row using the values >> parameter. Now, based on the value content, I might want to delete it. >> Do I have access to the table directly from one of the parameters? Or >> should I call the delete using an HTableInterface from my pool? >> >> Thanks, >> >> JM >> >> 2012/11/2, Jean-Marc Spaggiari <[email protected]>: >> > Yep, you perfectly got my question. >> > >> > I just tried and it's working perfectly! >> > >> > Thanks a lot! I now have a lot to play with. >> > >> > JM >> > >> > 2012/11/2, Shrijeet Paliwal <[email protected]>: >> >> JM, >> >> >> >> I personally would chose to put it neither hadoop libs nor hbase libs. >> >> Have >> >> them go to your application's own install directory. >> >> >> >> Then you could sent the variable HADOOP_CLASSPATH to have your jar >> >> (also >> >> include hbase jars, hbase dependencies and dependencies your program >> >> needs) >> >> And to execute fire 'hadoop jar' command. >> >> >> >> An example[1]: >> >> >> >> Set classpath: >> >> export HADOOP_CLASSPATH=`hbase >> classpath`:mycool.jar:mycooldependency.jar >> >> >> >> Fire following to launch your job: >> >> hadoop jar mycool.jar hbase.experiments.MyCoolProgram >> >> -Dmapred.running.map.limit=50 >> >> -Dmapred.map.tasks.speculative.execution=false aCommandLineArg >> >> >> >> >> >> Did I get your question right? >> >> >> >> [1] In the example I gave `hbase classpath` gets you set with all >> >> hbase >> >> jars. >> >> >> >> >> >> >> >> On Fri, Nov 2, 2012 at 11:56 AM, Jean-Marc Spaggiari < >> >> [email protected]> wrote: >> >> >> >>> Hi Shrijeet, >> >>> >> >>> Helped a lot! Thanks! >> >>> >> >>> Now, the only think I need is to know where's the best place to put >> >>> my >> >>> JAR on the server. Should I put it on the hadoop lib directory? Or >> >>> somewhere on the HBase structure? >> >>> >> >>> Thanks, >> >>> >> >>> JM >> >>> >> >>> 2012/10/29, Shrijeet Paliwal <[email protected]>: >> >>> > In line. >> >>> > >> >>> > On Mon, Oct 29, 2012 at 8:11 AM, Jean-Marc Spaggiari < >> >>> > [email protected]> wrote: >> >>> > >> >>> >> I'm replying to myself ;) >> >>> >> >> >>> >> I found "cleanup" and "setup" methods from the TableMapper table. >> >>> >> So >> >>> >> I >> >>> >> think those are the methods I was looking for. I will init the >> >>> >> HTablePool there. Please let me know if I'm wrong. >> >>> >> >> >>> >> Now, I still have few other questions. >> >>> >> >> >>> >> 1) context.getCurrentValue() can throw a InterrruptedException, >> >>> >> but >> >>> >> when can this occur? Is there a timeout on the Mapper side? Of >> >>> >> it's >> >>> >> if >> >>> >> the region is going down while the job is running? >> >>> >> >> >>> > >> >>> > You do not need to call context.getCurrentValue(). The 'value' >> >>> > argument >> >>> to >> >>> > map method[1] has the information you are looking for. >> >>> > >> >>> > >> >>> >> 2) How can I pass parameters to the Map method? Can I use >> >>> >> job.getConfiguration().put to add some properties there, can get >> them >> >>> >> back in context.getConfiguration.get? >> >>> >> >> >>> > >> >>> > Yes, thats how it is done. >> >>> > >> >>> > >> >>> >> 3) What's the best way to log results/exceptions/traces from the >> >>> >> map >> >>> >> method? >> >>> >> >> >>> > >> >>> > In most cases, you'll have mapper and reducer classes as nested >> static >> >>> > classes within some enclosing class. You can get handle to the >> >>> > Logger >> >>> from >> >>> > the enclosing class and do your usual LOG.info, LOG.warn yada yada. >> >>> > >> >>> > Hope it helps. >> >>> > >> >>> > [1] map(KEYIN key, *VALUEIN value*, Context context) >> >>> > >> >>> >> >> >>> >> I will search on my side, but some help will be welcome because it >> >>> >> seems there is not much documentation when we start to dig a bit >> >>> >> :( >> >>> >> >> >>> >> JM >> >>> >> >> >>> >> 2012/10/27, Jean-Marc Spaggiari <[email protected]>: >> >>> >> > Hi, >> >>> >> > >> >>> >> > I'm thinking about my firs MapReduce class and I have some >> >>> >> > questions. >> >>> >> > >> >>> >> > The goal of it will be to move some rows from one table to >> >>> >> > another >> >>> >> > one >> >>> >> > based on the timestamp only. >> >>> >> > >> >>> >> > Since this is pretty new for me, I'm starting from the >> >>> >> > RowCounter >> >>> >> > class to have a baseline. >> >>> >> > >> >>> >> > There are few things I will have to update. First, the >> >>> >> > createSumittableJob method to get timestamp range instead of key >> >>> >> > range, and "play2 with the parameters. This part is fine. >> >>> >> > >> >>> >> > Next, I need to update the map method, and this is where I have >> >>> >> > some >> >>> >> > questions. >> >>> >> > >> >>> >> > I'm able to find the timestamp of all the cf:c from the >> >>> >> > context.getCurrentValue() method, that's fine. Now, my concern >> >>> >> > is >> >>> >> > on >> >>> >> > the way to get access to the table to store this field, and the >> >>> >> > table >> >>> >> > to delete it. Should I instantiate an HTable for the source >> >>> >> > table, >> >>> >> > and >> >>> >> > execute and delete on it, then do an insert on another HTable >> >>> >> > instance? Should I use an HTablePool? Also, since I’m already >> >>> >> > on >> >>> >> > the >> >>> >> > row, can’t I just mark it as deleted instead of calling a new >> >>> >> > HTable? >> >>> >> > >> >>> >> > Also, instead of calling the delete and put one by one, I would >> >>> >> > like >> >>> >> > to put them on a list and execute it only when it’s over 10 >> >>> >> > members. >> >>> >> > How can I make sure that at the end of the job, this is flushed? >> >>> >> > Else, >> >>> >> > I will lose some operations. Is there a kind of “dispose” method >> >>> >> > called on the region when the job is done? >> >>> >> > >> >>> >> > Thanks, >> >>> >> > >> >>> >> > JM >> >>> >> > >> >>> >> >> >>> > >> >>> >> >> >> > >> >
