[
https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198560#comment-14198560
]
Nathan Roberts commented on YARN-2809:
--------------------------------------
Stack trace:
{noformat}
[<ffffffff8150d4a8>] ? panic+0xa7/0x16f
[<ffffffff815116d4>] ? oops_end+0xe4/0x100
[<ffffffff81046bfb>] ? no_context+0xfb/0x260
[<ffffffff81449058>] ? dev_hard_start_xmit+0x308/0x530
[<ffffffff81046e85>] ? __bad_area_nosemaphore+0x125/0x1e0
[<ffffffff812773a9>] ? cpumask_next_and+0x29/0x50
[<ffffffff81046f53>] ? bad_area_nosemaphore+0x13/0x20
[<ffffffff810476b1>] ? __do_page_fault+0x321/0x480
[<ffffffff81056881>] ? update_curr+0xe1/0x1f0
[<ffffffff81065905>] ? enqueue_entity+0x125/0x410
[<ffffffff810524e3>] ? set_next_buddy+0x43/0x50
[<ffffffff810570e0>] ? check_preempt_wakeup+0x1c0/0x260
[<ffffffff81065ceb>] ? enqueue_task_fair+0xfb/0x100
[<ffffffff8105230c>] ? check_preempt_curr+0x7c/0x90
[<ffffffff815135fe>] ? do_page_fault+0x3e/0xa0
[<ffffffff815109b5>] ? page_fault+0x25/0x30
[<ffffffff81056b19>] ? update_cfs_shares+0x29/0x170
[<ffffffff81065363>] ? dequeue_entity+0x113/0x2e0
[<ffffffff810664da>] ? dequeue_task_fair+0x6a/0x130
[<ffffffff81055ebe>] ? dequeue_task+0x8e/0xb0
[<ffffffff81055f03>] ? deactivate_task+0x23/0x30
[<ffffffff8150dc99>] ? thread_return+0x127/0x76e
[<ffffffff810e6e1e>] ? call_rcu+0xe/0x10
[<ffffffff8107196f>] ? release_task+0x33f/0x4b0
[<ffffffff81073837>] ? do_exit+0x5b7/0x870
[<ffffffff81073b48>] ? do_group_exit+0x58/0xd0
[<ffffffff81088e36>] ? get_signal_to_deliver+0x1f6/0x460
[<ffffffff8100a265>] ? do_signal+0x75/0x800
[<ffffffff810dc675>] ? __audit_syscall_exit+0x265/0x290
[<ffffffff8100aa80>] ? do_notify_resume+0x90/0xc0
[<ffffffff8100b341>] ? int_signal+0x12/0x17
{noformat}
What's happening is that CgroupsLCEResourcesHandler is attempting to delete the
cgroup before all the tasks within the cgroup have exited (explained later). It
tries every 20ms to remove the cgroup until successful, or a timeout (default 1
second) expires. Sometimes these attempts hit a race within the kernel where
the last task has not completely finished tearing down, yet it is far enough
down that the cgroup is able to be removed. This leaves a NULL pointer around
which results in the panic.
The kernel has been fixed and most recent distributions will have the fix.
However, there are older kernel versions out there that would benefit from a
simple workaround. The proposed workaround is to wait until the "tasks" file
within the cgroup is empty, and then delay a small amount of time before
attempting to delete the cgroup.
One question is why are there still tasks in the cgroup? Don't have a complete
answer here and some of the details may be slightly off, but do know the
following: The processtree within a mapreduce cgroup looks like "bash -c" ->
"java ..."
When map or reduce processing is complete, the AM is informed, who then informs
the NM so that the container can be torn down. A SIGTERM is sent to the session
(bash is session leader). bash is much quicker at exiting than everything else
so it exits and its parent (container-executor) gets a SIGCHILD and starts
cleaning up, this includes removing the cgroup which gets us into the race
described above.
> Implement workaround for linux kernel panic when removing cgroup
> ----------------------------------------------------------------
>
> Key: YARN-2809
> URL: https://issues.apache.org/jira/browse/YARN-2809
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.6.0
> Environment: RHEL 6.4
> Reporter: Nathan Roberts
> Assignee: Nathan Roberts
>
> Some older versions of linux have a bug that can cause a kernel panic when
> the LCE attempts to remove a cgroup. It is a race condition so it's a bit
> rare but on a few thousand node cluster it can result in a couple of panics
> per day.
> This is the commit that likely (haven't verified) fixes the problem in linux:
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.y&id=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267
> Details will be added in comments.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)