Accidentally hit send too soon. A good rule of thumb is the aggregate of all Java heaps (daemons like DataNOde, RegionServer, NodeManager, etc. + the max allowed number of mapreduce jobs * task heap setting) ... should fit into available RAM.
If you don't have enough available RAM, then you need to take steps to reduce resource consumption. Limit the allowed number of concurrent mapreduce tasks. Reduce the heap size specified in 'mapred.child.java.opts'. Or both. On Tue, Jul 22, 2014 at 9:12 AM, Andrew Purtell <[email protected]> wrote: > You need to better manage the colocation of the mapreduce runtime. In > other words, you are allowing mapreduce to grab too many node resources, > resulting in activation of the kernel's OOM killer. > > A good rule of thumb is the aggregate of all Java heaps (daemons like > DataNOde, RegionServer, NodeManager, etc. + the max allowed number of > mapreduce jobs * task heap setting). Reduce the allowed mapreduce task > concurrency. > > > On Tue, Jul 22, 2014 at 8:15 AM, Tianying Chang <[email protected]> wrote: > >> Hi >> >> I was running WALPlayer that output HFile for future bulkload. There are >> 6200 hlogs, and the total size is about 400G. >> >> The mapreduce job finished. But I saw two bad things: >> 1. More than half of RS died. I checked the syslog, it seems they are >> killed by OOM. They also have very high CPU spike for the whole time >> during >> WALPlayer >> >> cpu user usage of 84.4% matches resource limit [cpu user usage>70.0%] >> >> 2. Mapreduce job also has failure of Java heap Space error. My job set the >> heap usage as 2G, >> *mapred.child.java.opts*-Xmx2048m >> Does this mean WALPlayer cannot support this load on this kind of setting? >> >> Thanks >> Tian-Ying >> > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
