Re: Question about MapReduce

Jean-Marc Spaggiari Tue, 13 Nov 2012 18:42:14 -0800

One more question about MapReduce.

One of my servers is slower than the others. I don't have any time
constraint for the job to finish.


But I'm getting this message:

"Task attempt_201211122318_0014_m_000021_0 failed to report status for
601 seconds. Killing!"

Where can I chance this timeout to something like 1800 seconds? Is it
on the mapred-site.xml file? If so, which property should I insert?

Thanks,

JM

2012/11/2, Jean-Marc Spaggiari <[email protected]>:
> That was my initial plan too, but I was wondering if there was any
> other best practice about the delete.  So I will go that way.
>
> Thanks,
>
> JM
>
> 2012/11/2, Shrijeet Paliwal <[email protected]>:
>> Not sure what exactly is happening in your job. But in one of the delete
>> jobs I wrote I was creating an instance of HTable in setup method of my
>> mapper
>>
>> delTab = new HTable(conf, conf.get(TABLE_NAME));
>>
>> And performing delete in map() call using delTab. So no, you do not have
>> access to table directly *usually*.
>>
>>
>> -Shrijeet
>>
>>
>> On Fri, Nov 2, 2012 at 12:47 PM, Jean-Marc Spaggiari <
>> [email protected]> wrote:
>>
>>> Sorry, one last question.
>>>
>>> On the map method, I have access to the row using the values
>>> parameter. Now, based on the value content, I might want to delete it.
>>> Do I have access to the table directly from one of the parameters? Or
>>> should I call the delete using an HTableInterface from my pool?
>>>
>>> Thanks,
>>>
>>> JM
>>>
>>> 2012/11/2, Jean-Marc Spaggiari <[email protected]>:
>>> > Yep, you perfectly got my question.
>>> >
>>> > I just tried and it's working perfectly!
>>> >
>>> > Thanks a lot! I now have a lot to play with.
>>> >
>>> > JM
>>> >
>>> > 2012/11/2, Shrijeet Paliwal <[email protected]>:
>>> >> JM,
>>> >>
>>> >> I personally would chose to put it neither hadoop libs nor hbase
>>> >> libs.
>>> >> Have
>>> >> them go to your application's own install directory.
>>> >>
>>> >> Then you could sent the variable HADOOP_CLASSPATH to have your jar
>>> >> (also
>>> >> include hbase jars, hbase dependencies and dependencies your program
>>> >> needs)
>>> >> And to execute fire 'hadoop jar' command.
>>> >>
>>> >> An example[1]:
>>> >>
>>> >> Set classpath:
>>> >> export HADOOP_CLASSPATH=`hbase
>>> classpath`:mycool.jar:mycooldependency.jar
>>> >>
>>> >> Fire following to launch your job:
>>> >> hadoop jar mycool.jar hbase.experiments.MyCoolProgram
>>> >> -Dmapred.running.map.limit=50
>>> >> -Dmapred.map.tasks.speculative.execution=false aCommandLineArg
>>> >>
>>> >>
>>> >> Did I get your question right?
>>> >>
>>> >> [1] In the example I gave `hbase classpath` gets you set with all
>>> >> hbase
>>> >> jars.
>>> >>
>>> >>
>>> >>
>>> >> On Fri, Nov 2, 2012 at 11:56 AM, Jean-Marc Spaggiari <
>>> >> [email protected]> wrote:
>>> >>
>>> >>> Hi Shrijeet,
>>> >>>
>>> >>> Helped a lot! Thanks!
>>> >>>
>>> >>> Now, the only think I need is to know where's the best place to put
>>> >>> my
>>> >>> JAR on the server. Should I put it on the hadoop lib directory? Or
>>> >>> somewhere on the HBase structure?
>>> >>>
>>> >>> Thanks,
>>> >>>
>>> >>> JM
>>> >>>
>>> >>> 2012/10/29, Shrijeet Paliwal <[email protected]>:
>>> >>> > In line.
>>> >>> >
>>> >>> > On Mon, Oct 29, 2012 at 8:11 AM, Jean-Marc Spaggiari <
>>> >>> > [email protected]> wrote:
>>> >>> >
>>> >>> >> I'm replying to myself ;)
>>> >>> >>
>>> >>> >> I found "cleanup" and "setup" methods from the TableMapper table.
>>> >>> >> So
>>> >>> >> I
>>> >>> >> think those are the methods I was looking for. I will init the
>>> >>> >> HTablePool there. Please let me know if I'm wrong.
>>> >>> >>
>>> >>> >> Now, I still have few other questions.
>>> >>> >>
>>> >>> >> 1) context.getCurrentValue() can throw a InterrruptedException,
>>> >>> >> but
>>> >>> >> when can this occur? Is there a timeout on the Mapper side? Of
>>> >>> >> it's
>>> >>> >> if
>>> >>> >> the region is going down while the job is running?
>>> >>> >>
>>> >>> >
>>> >>> > You do not need to call  context.getCurrentValue(). The 'value'
>>> >>> > argument
>>> >>> to
>>> >>> > map method[1] has the information you are looking for.
>>> >>> >
>>> >>> >
>>> >>> >> 2) How can I pass parameters to the Map method? Can I use
>>> >>> >> job.getConfiguration().put to add some properties there, can get
>>> them
>>> >>> >> back in context.getConfiguration.get?
>>> >>> >>
>>> >>> >
>>> >>> > Yes, thats how it is done.
>>> >>> >
>>> >>> >
>>> >>> >> 3) What's the best way to log results/exceptions/traces from the
>>> >>> >> map
>>> >>> >> method?
>>> >>> >>
>>> >>> >
>>> >>> > In most cases, you'll have mapper and reducer classes as nested
>>> static
>>> >>> > classes within some enclosing class. You can get handle to the
>>> >>> > Logger
>>> >>> from
>>> >>> > the enclosing class and do your usual LOG.info, LOG.warn yada
>>> >>> > yada.
>>> >>> >
>>> >>> > Hope it helps.
>>> >>> >
>>> >>> > [1] map(KEYIN key, *VALUEIN value*, Context context)
>>> >>> >
>>> >>> >>
>>> >>> >> I will search on my side, but some help will be welcome because
>>> >>> >> it
>>> >>> >> seems there is not much documentation when we start to dig a bit
>>> >>> >> :(
>>> >>> >>
>>> >>> >> JM
>>> >>> >>
>>> >>> >> 2012/10/27, Jean-Marc Spaggiari <[email protected]>:
>>> >>> >> > Hi,
>>> >>> >> >
>>> >>> >> > I'm thinking about my firs MapReduce class and I have some
>>> >>> >> > questions.
>>> >>> >> >
>>> >>> >> > The goal of it will be to move some rows from one table to
>>> >>> >> > another
>>> >>> >> > one
>>> >>> >> > based on the timestamp only.
>>> >>> >> >
>>> >>> >> > Since this is pretty new for me, I'm starting from the
>>> >>> >> > RowCounter
>>> >>> >> > class to have a baseline.
>>> >>> >> >
>>> >>> >> > There are few things I will have to update. First, the
>>> >>> >> > createSumittableJob method to get timestamp range instead of
>>> >>> >> > key
>>> >>> >> > range, and "play2 with the parameters. This part is fine.
>>> >>> >> >
>>> >>> >> > Next, I need to update the map method, and this is where I have
>>> >>> >> > some
>>> >>> >> > questions.
>>> >>> >> >
>>> >>> >> > I'm able to find the timestamp of all the cf:c from the
>>> >>> >> > context.getCurrentValue() method, that's fine. Now, my concern
>>> >>> >> > is
>>> >>> >> > on
>>> >>> >> > the way to get access to the table to store this field, and the
>>> >>> >> > table
>>> >>> >> > to delete it. Should I instantiate an HTable for the source
>>> >>> >> > table,
>>> >>> >> > and
>>> >>> >> > execute and delete on it, then do an insert on another HTable
>>> >>> >> > instance?  Should I use an HTablePool? Also, since I’m already
>>> >>> >> > on
>>> >>> >> > the
>>> >>> >> > row, can’t I just mark it as deleted instead of calling a new
>>> >>> >> > HTable?
>>> >>> >> >
>>> >>> >> > Also, instead of calling the delete and put one by one, I would
>>> >>> >> > like
>>> >>> >> > to put them on a list and execute it only when it’s over 10
>>> >>> >> > members.
>>> >>> >> > How can I make sure that at the end of the job, this is
>>> >>> >> > flushed?
>>> >>> >> > Else,
>>> >>> >> > I will lose some operations. Is there a kind of “dispose”
>>> >>> >> > method
>>> >>> >> > called on the region when the job is done?
>>> >>> >> >
>>> >>> >> > Thanks,
>>> >>> >> >
>>> >>> >> > JM
>>> >>> >> >
>>> >>> >>
>>> >>> >
>>> >>>
>>> >>
>>> >
>>>
>>
>

Re: Question about MapReduce

Reply via email to