[jira] [Commented] (YARN-8929) DefaultOOMHandler should only pick running containers to kill upon oom events

Haibo Chen (JIRA) Mon, 22 Oct 2018 12:31:10 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16659526#comment-16659526
 ]


Haibo Chen commented on YARN-8929:
----------------------------------

The patch does three things

1) To kill a container, read its cgroup.procs file instead of tasks file to 
seed the processes that belong to the given container. The tasks file contains 
all thread ids too. For pids, we should read cgroup.procs file.

2) Avoid  killing a container that is not running yet, because killing a 
non-running container won't release any memory to help the oom issue. Note from 
NM's perspective, a container is running as long as it is sent to container 
executor to launch, so a container can be thought of running by NM and its 
process may have not started. 

3) Always try to kill one container successfully. Killing a running container 
may not be successful (one reason is mentioned in 2)), so DefaultOOMHandler now 
tries to kill containers until one is killed successful to ensure some memory 
is released.

> DefaultOOMHandler should only pick running containers to kill upon oom events
> -----------------------------------------------------------------------------
>
>                 Key: YARN-8929
>                 URL: https://issues.apache.org/jira/browse/YARN-8929
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.2.0
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>            Priority: Major
>         Attachments: YARN-8929.00.patch
>
>
> DefaultOOMHandler currently currently sort all known containers based on the 
> execution type primarily and their start time secondarily.
> However, it does not check if a container is running or not.  Kill a 
> non-running container will not release any memory, hence won't get us of the 
> under-oom status.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-8929) DefaultOOMHandler should only pick running containers to kill upon oom events

Reply via email to