One more question about MapReduce. One of my servers is slower than the others. I don't have any time constraint for the job to finish.
But I'm getting this message: "Task attempt_201211122318_0014_m_000021_0 failed to report status for 601 seconds. Killing!" Where can I chance this timeout to something like 1800 seconds? Is it on the mapred-site.xml file? If so, which property should I insert? Thanks, JM 2012/11/2, Jean-Marc Spaggiari <[email protected]>: > That was my initial plan too, but I was wondering if there was any > other best practice about the delete. So I will go that way. > > Thanks, > > JM > > 2012/11/2, Shrijeet Paliwal <[email protected]>: >> Not sure what exactly is happening in your job. But in one of the delete >> jobs I wrote I was creating an instance of HTable in setup method of my >> mapper >> >> delTab = new HTable(conf, conf.get(TABLE_NAME)); >> >> And performing delete in map() call using delTab. So no, you do not have >> access to table directly *usually*. >> >> >> -Shrijeet >> >> >> On Fri, Nov 2, 2012 at 12:47 PM, Jean-Marc Spaggiari < >> [email protected]> wrote: >> >>> Sorry, one last question. >>> >>> On the map method, I have access to the row using the values >>> parameter. Now, based on the value content, I might want to delete it. >>> Do I have access to the table directly from one of the parameters? Or >>> should I call the delete using an HTableInterface from my pool? >>> >>> Thanks, >>> >>> JM >>> >>> 2012/11/2, Jean-Marc Spaggiari <[email protected]>: >>> > Yep, you perfectly got my question. >>> > >>> > I just tried and it's working perfectly! >>> > >>> > Thanks a lot! I now have a lot to play with. >>> > >>> > JM >>> > >>> > 2012/11/2, Shrijeet Paliwal <[email protected]>: >>> >> JM, >>> >> >>> >> I personally would chose to put it neither hadoop libs nor hbase >>> >> libs. >>> >> Have >>> >> them go to your application's own install directory. >>> >> >>> >> Then you could sent the variable HADOOP_CLASSPATH to have your jar >>> >> (also >>> >> include hbase jars, hbase dependencies and dependencies your program >>> >> needs) >>> >> And to execute fire 'hadoop jar' command. >>> >> >>> >> An example[1]: >>> >> >>> >> Set classpath: >>> >> export HADOOP_CLASSPATH=`hbase >>> classpath`:mycool.jar:mycooldependency.jar >>> >> >>> >> Fire following to launch your job: >>> >> hadoop jar mycool.jar hbase.experiments.MyCoolProgram >>> >> -Dmapred.running.map.limit=50 >>> >> -Dmapred.map.tasks.speculative.execution=false aCommandLineArg >>> >> >>> >> >>> >> Did I get your question right? >>> >> >>> >> [1] In the example I gave `hbase classpath` gets you set with all >>> >> hbase >>> >> jars. >>> >> >>> >> >>> >> >>> >> On Fri, Nov 2, 2012 at 11:56 AM, Jean-Marc Spaggiari < >>> >> [email protected]> wrote: >>> >> >>> >>> Hi Shrijeet, >>> >>> >>> >>> Helped a lot! Thanks! >>> >>> >>> >>> Now, the only think I need is to know where's the best place to put >>> >>> my >>> >>> JAR on the server. Should I put it on the hadoop lib directory? Or >>> >>> somewhere on the HBase structure? >>> >>> >>> >>> Thanks, >>> >>> >>> >>> JM >>> >>> >>> >>> 2012/10/29, Shrijeet Paliwal <[email protected]>: >>> >>> > In line. >>> >>> > >>> >>> > On Mon, Oct 29, 2012 at 8:11 AM, Jean-Marc Spaggiari < >>> >>> > [email protected]> wrote: >>> >>> > >>> >>> >> I'm replying to myself ;) >>> >>> >> >>> >>> >> I found "cleanup" and "setup" methods from the TableMapper table. >>> >>> >> So >>> >>> >> I >>> >>> >> think those are the methods I was looking for. I will init the >>> >>> >> HTablePool there. Please let me know if I'm wrong. >>> >>> >> >>> >>> >> Now, I still have few other questions. >>> >>> >> >>> >>> >> 1) context.getCurrentValue() can throw a InterrruptedException, >>> >>> >> but >>> >>> >> when can this occur? Is there a timeout on the Mapper side? Of >>> >>> >> it's >>> >>> >> if >>> >>> >> the region is going down while the job is running? >>> >>> >> >>> >>> > >>> >>> > You do not need to call context.getCurrentValue(). The 'value' >>> >>> > argument >>> >>> to >>> >>> > map method[1] has the information you are looking for. >>> >>> > >>> >>> > >>> >>> >> 2) How can I pass parameters to the Map method? Can I use >>> >>> >> job.getConfiguration().put to add some properties there, can get >>> them >>> >>> >> back in context.getConfiguration.get? >>> >>> >> >>> >>> > >>> >>> > Yes, thats how it is done. >>> >>> > >>> >>> > >>> >>> >> 3) What's the best way to log results/exceptions/traces from the >>> >>> >> map >>> >>> >> method? >>> >>> >> >>> >>> > >>> >>> > In most cases, you'll have mapper and reducer classes as nested >>> static >>> >>> > classes within some enclosing class. You can get handle to the >>> >>> > Logger >>> >>> from >>> >>> > the enclosing class and do your usual LOG.info, LOG.warn yada >>> >>> > yada. >>> >>> > >>> >>> > Hope it helps. >>> >>> > >>> >>> > [1] map(KEYIN key, *VALUEIN value*, Context context) >>> >>> > >>> >>> >> >>> >>> >> I will search on my side, but some help will be welcome because >>> >>> >> it >>> >>> >> seems there is not much documentation when we start to dig a bit >>> >>> >> :( >>> >>> >> >>> >>> >> JM >>> >>> >> >>> >>> >> 2012/10/27, Jean-Marc Spaggiari <[email protected]>: >>> >>> >> > Hi, >>> >>> >> > >>> >>> >> > I'm thinking about my firs MapReduce class and I have some >>> >>> >> > questions. >>> >>> >> > >>> >>> >> > The goal of it will be to move some rows from one table to >>> >>> >> > another >>> >>> >> > one >>> >>> >> > based on the timestamp only. >>> >>> >> > >>> >>> >> > Since this is pretty new for me, I'm starting from the >>> >>> >> > RowCounter >>> >>> >> > class to have a baseline. >>> >>> >> > >>> >>> >> > There are few things I will have to update. First, the >>> >>> >> > createSumittableJob method to get timestamp range instead of >>> >>> >> > key >>> >>> >> > range, and "play2 with the parameters. This part is fine. >>> >>> >> > >>> >>> >> > Next, I need to update the map method, and this is where I have >>> >>> >> > some >>> >>> >> > questions. >>> >>> >> > >>> >>> >> > I'm able to find the timestamp of all the cf:c from the >>> >>> >> > context.getCurrentValue() method, that's fine. Now, my concern >>> >>> >> > is >>> >>> >> > on >>> >>> >> > the way to get access to the table to store this field, and the >>> >>> >> > table >>> >>> >> > to delete it. Should I instantiate an HTable for the source >>> >>> >> > table, >>> >>> >> > and >>> >>> >> > execute and delete on it, then do an insert on another HTable >>> >>> >> > instance? Should I use an HTablePool? Also, since I’m already >>> >>> >> > on >>> >>> >> > the >>> >>> >> > row, can’t I just mark it as deleted instead of calling a new >>> >>> >> > HTable? >>> >>> >> > >>> >>> >> > Also, instead of calling the delete and put one by one, I would >>> >>> >> > like >>> >>> >> > to put them on a list and execute it only when it’s over 10 >>> >>> >> > members. >>> >>> >> > How can I make sure that at the end of the job, this is >>> >>> >> > flushed? >>> >>> >> > Else, >>> >>> >> > I will lose some operations. Is there a kind of “dispose” >>> >>> >> > method >>> >>> >> > called on the region when the job is done? >>> >>> >> > >>> >>> >> > Thanks, >>> >>> >> > >>> >>> >> > JM >>> >>> >> > >>> >>> >> >>> >>> > >>> >>> >>> >> >>> > >>> >> >
