Andrew Thanks for your answer! I think you are right. The node has only 15G memory. We configured it to run RS with 12G. And then we configured 4 mapper and 4 reducer on each node, each to use 2G memory. So that probably caused RS being killed by OOM. *mapred.child.java.opts*-Xmx2048m I have another question. If I change the mapper/reducer per node to 1 and lower the mapred.child.java.opts to 512M, I think that will prevent RS being killed due to OOM. But will that cause Java Heap Space problem for the mapreduce job when the WALPlayer reducer running? Since I saw some post that to fix the Java Heap space error for mapreduce job, they are recommended to increase the mapred.child.java.opts to higher.
http://stackoverflow.com/questions/8464048/out-of-memory-error-in-hadoop Thanks Tian-Ying On Tue, Jul 22, 2014 at 9:14 AM, Andrew Purtell <[email protected]> wrote: > Accidentally hit send too soon. > > > A good rule of thumb is the aggregate of all Java heaps (daemons like > DataNOde, RegionServer, NodeManager, etc. + the max allowed number of > mapreduce jobs * task heap setting) ... should fit into available RAM. > > If you don't have enough available RAM, then you need to take steps to > reduce resource consumption. Limit the allowed number of concurrent > mapreduce tasks. Reduce the heap size specified in > 'mapred.child.java.opts'. Or both. > > > On Tue, Jul 22, 2014 at 9:12 AM, Andrew Purtell <[email protected]> > wrote: > > > You need to better manage the colocation of the mapreduce runtime. In > > other words, you are allowing mapreduce to grab too many node resources, > > resulting in activation of the kernel's OOM killer. > > > > A good rule of thumb is the aggregate of all Java heaps (daemons like > > DataNOde, RegionServer, NodeManager, etc. + the max allowed number of > > mapreduce jobs * task heap setting). Reduce the allowed mapreduce task > > concurrency. > > > > > > On Tue, Jul 22, 2014 at 8:15 AM, Tianying Chang <[email protected]> > wrote: > > > >> Hi > >> > >> I was running WALPlayer that output HFile for future bulkload. There are > >> 6200 hlogs, and the total size is about 400G. > >> > >> The mapreduce job finished. But I saw two bad things: > >> 1. More than half of RS died. I checked the syslog, it seems they are > >> killed by OOM. They also have very high CPU spike for the whole time > >> during > >> WALPlayer > >> > >> cpu user usage of 84.4% matches resource limit [cpu user usage>70.0%] > >> > >> 2. Mapreduce job also has failure of Java heap Space error. My job set > the > >> heap usage as 2G, > >> *mapred.child.java.opts*-Xmx2048m > >> Does this mean WALPlayer cannot support this load on this kind of > setting? > >> > >> Thanks > >> Tian-Ying > >> > > > > > > > > -- > > Best regards, > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >
