[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299606#comment-15299606
 ] 

sandflee commented on YARN-4599:
--------------------------------

Thanks [~kasha],
bq. Any metrics on how often the NM has to intervene?
it depends on the business of NM and how user set container memory.  In our 
cluster with about 5000 NM, average of the NM resource allocation rate is 90%, 
the actually mem usage rate is 60%. most user request memory more than 
container really use.  without this improvement, several thousands container 
oom kill event happens,  with the fix, not more than ten oom event happens.

bq. how long does it take for the NM to unblock containers once everything is 
paused ?
we payed not much attention on this,  usually nm  handle oom event very 
quickly, the most time maybe used by kernel to free pagecache/container mem, 
but we didn't stat this.

bq. Oh, and any interest in contributing to do this natively in YARN?
yes, I'd like to update our patch in one or two weeks.


> Set OOM control for memory cgroups
> ----------------------------------
>
>                 Key: YARN-4599
>                 URL: https://issues.apache.org/jira/browse/YARN-4599
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.9.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>         Attachments: yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to