Re: WALPlayer kills many RS when play large number of WALs

Tianying Chang Tue, 22 Jul 2014 09:59:26 -0700

Andrew

Thanks for your answer! I think you are right. The node has only 15G
memory. We configured it to run RS with 12G. And then we configured 4
mapper and 4 reducer on each node, each to use 2G memory.  So that probably
caused RS being killed by OOM.
*mapred.child.java.opts*-Xmx2048m
I have another question.  If I change the mapper/reducer per node to 1 and
lower the mapred.child.java.opts to 512M, I think that will prevent RS
being killed due to OOM. But will that cause Java Heap Space problem for
the mapreduce job when the WALPlayer reducer running? Since I saw some post
that to fix the Java Heap space error for mapreduce job, they are
recommended to increase the mapred.child.java.opts to higher.


http://stackoverflow.com/questions/8464048/out-of-memory-error-in-hadoop

Thanks
Tian-Ying





On Tue, Jul 22, 2014 at 9:14 AM, Andrew Purtell <[email protected]> wrote:

> Accidentally hit send too soon.
> 
> 
>  A good rule of thumb is the aggregate of all Java heaps (daemons like
> DataNOde, RegionServer, NodeManager, etc. + the max allowed number of
> mapreduce jobs * task heap setting) ... should fit into available RAM.
>
> If you don't have enough available RAM, then you need to take steps to
> reduce resource consumption. Limit the allowed number of concurrent
> mapreduce tasks. Reduce the heap size specified in
> 'mapred.child.java.opts'. Or both.  
>
>
> On Tue, Jul 22, 2014 at 9:12 AM, Andrew Purtell <[email protected]>
> wrote:
>
> > You need to better manage the colocation of the mapreduce runtime. In
> > other words, you are allowing mapreduce to grab too many node resources,
> > resulting in activation of the kernel's OOM killer.
> > 
> > A good rule of thumb is the aggregate of all Java heaps (daemons like
> > DataNOde, RegionServer, NodeManager, etc. + the max allowed number of
> > mapreduce jobs * task heap setting). Reduce the allowed mapreduce task
> > concurrency.
> >
> >
> > On Tue, Jul 22, 2014 at 8:15 AM, Tianying Chang <[email protected]>
> wrote:
> >
> >> Hi
> >>
> >> I was running WALPlayer that output HFile for future bulkload. There are
> >> 6200 hlogs, and the total size is about 400G.
> >>
> >> The mapreduce job finished. But I saw two bad things:
> >> 1. More than half of RS died. I checked the syslog, it seems they are
> >> killed by OOM. They also have very high CPU spike for the whole time
> >> during
> >> WALPlayer
> >>
> >> cpu user usage of 84.4% matches resource limit [cpu user usage>70.0%]
> >>
> >> 2. Mapreduce job also has failure of Java heap Space error. My job set
> the
> >> heap usage as 2G,
> >> *mapred.child.java.opts*-Xmx2048m
> >> Does this mean WALPlayer cannot support this load on this kind of
> setting?
> >>
> >> Thanks
> >> Tian-Ying
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: WALPlayer kills many RS when play large number of WALs

Reply via email to