It maybe a kernel bug. url is https://bugs.centos.org/print_bug_page.php?bug_id=7770
From: spe...@outlook.com To: user@hadoop.apache.org Subject: System auto reboot When MR runs Date: Sun, 24 Apr 2016 13:07:31 +0000 Hi I'm using Hadoop2.7 with cgroup enabled on Redhat7.1. When I run large MR jobs, some nodemanager machine auto reboot. If I use DefaultLCEResourcesHandler instead of CgroupsLCEResourcesHandler, The MR jobs run fine. /var/crash/127.0.0.1-2016.04.23-21:52:08/vmcore-dmesg.txt like this: CPU: 29 PID: 63957 Comm: java Not tainted 3.10.0-229.el7.x86_64 #1 ... ... [15770.097168] Call Trace: [15770.097536] [<ffffffff810afe39>] ? pick_next_task_fair+0x129/0x1d0 [15770.097905] [<ffffffff81608b97>] __schedule+0x127/0x7c0 [15770.098271] [<ffffffff81609259>] schedule+0x29/0x70 [15770.098633] [<ffffffff810d2293>] futex_wait_queue_me+0xd3/0x130 [15770.098992] [<ffffffff810d2e09>] futex_wait+0x179/0x280 [15770.099353] [<ffffffff8101b983>] ? native_sched_clock+0x13/0x80 [15770.099698] [<ffffffff8101b9f9>] ? sched_clock+0x9/0x10 [15770.100057] [<ffffffff810addfe>] ? sched_slice.isra.51+0x5e/0xc0 [15770.100419] [<ffffffff810ad7b8>] ? __enqueue_entity+0x78/0x80 [15770.100783] [<ffffffff810d4e9e>] do_futex+0xfe/0x5b0 [15770.101143] [<ffffffff810a8f44>] ? wake_up_new_task+0x104/0x160 [15770.101496] [<ffffffff810d53d0>] SyS_futex+0x80/0x180 [15770.101852] [<ffffffff81613da9>] system_call_fastpath+0x16/0x1b Any suggestion will be appreciated. Thanks