I'm replying to myself ;) I found "cleanup" and "setup" methods from the TableMapper table. So I think those are the methods I was looking for. I will init the HTablePool there. Please let me know if I'm wrong.
Now, I still have few other questions. 1) context.getCurrentValue() can throw a InterrruptedException, but when can this occur? Is there a timeout on the Mapper side? Of it's if the region is going down while the job is running? 2) How can I pass parameters to the Map method? Can I use job.getConfiguration().put to add some properties there, can get them back in context.getConfiguration.get? 3) What's the best way to log results/exceptions/traces from the map method? I will search on my side, but some help will be welcome because it seems there is not much documentation when we start to dig a bit :( JM 2012/10/27, Jean-Marc Spaggiari <[email protected]>: > Hi, > > I'm thinking about my firs MapReduce class and I have some questions. > > The goal of it will be to move some rows from one table to another one > based on the timestamp only. > > Since this is pretty new for me, I'm starting from the RowCounter > class to have a baseline. > > There are few things I will have to update. First, the > createSumittableJob method to get timestamp range instead of key > range, and "play2 with the parameters. This part is fine. > > Next, I need to update the map method, and this is where I have some > questions. > > I'm able to find the timestamp of all the cf:c from the > context.getCurrentValue() method, that's fine. Now, my concern is on > the way to get access to the table to store this field, and the table > to delete it. Should I instantiate an HTable for the source table, and > execute and delete on it, then do an insert on another HTable > instance? Should I use an HTablePool? Also, since I’m already on the > row, can’t I just mark it as deleted instead of calling a new HTable? > > Also, instead of calling the delete and put one by one, I would like > to put them on a list and execute it only when it’s over 10 members. > How can I make sure that at the end of the job, this is flushed? Else, > I will lose some operations. Is there a kind of “dispose” method > called on the region when the job is done? > > Thanks, > > JM >
