Sandor Molnar created AMBARI-23831:
--------------------------------------

             Summary: Ambari YARN Changes needed to enable CGroups + CPU 
Scheduling + LinuxContainerExecutor in both secure & Unsecure clusters
                 Key: AMBARI-23831
                 URL: https://issues.apache.org/jira/browse/AMBARI-23831
             Project: Ambari
          Issue Type: Task
          Components: ambari-server
            Reporter: Sandor Molnar
            Assignee: Sandor Molnar
             Fix For: 2.7.0


The following changes should be implemented:

1) For both secure and non-secure cluster.
 - Use LinuxContainerExecutor:
{code:java}
"yarn.nodemanager.container-executor.class" => 
"org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor"

"yarn.nodemanager.linux-container-executor.resources-handler.class" => 
"org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler"

"yarn.nodemanager.linux-container-executor.cgroups.mount" => true (assume admin 
won't mount cgroup ahead)
{code}

 - Properly setup permission of container-executor / container-executor.cfg 
(use today's permissions in security mode).
 - Further changes:
{code:java}
"yarn.nodemanager.resource.memory.enabled"
// the default value is false, we need to set to true here to enable the 
cgroups based memory monitoring.


"yarn.nodemanager.resource.memory.cgroups.soft-limit-percentage"
// the default value is 90.0f, which means in memory congestion case, the 
container can still keep/reserve 90% resource for its claimed value. It cannot 
be set to above 100 or set as negative value.

"yarn.nodemanager.resource.memory.cgroups.swappiness"
// The percentage that memory can be swapped or not. default value is 0, which 
means container memory cannot be swapped out. If not set, linux cgroup setting 
by default set to 60 which means 60% of memory can potentially be swapped out 
when system memory is not enough.

"yarn.nodemanager.linux-container-executor.group" set to Unix group of the 
NodeManager which should match the setting in “container-executor.cfg” (hadoop 
for ambari?).
{code}

 - For cgroups limitations:
{code:java}
"yarn.nodemanager.resource.percentage-physical-cpu-limit" - 
this setting lets you limit the cpu usage of all YARN containers. It sets a 
hard upper limit on the cumulative CPU usage of the containers. For example, if 
set to 60, the combined CPU usage of all YARN containers will not exceed 60%. 
The yarn by default value is 100.

"yarn.nodemanager.resource.cpu-vcores" - number of vcores can be assign to yarn 
containers, default value is 8 for yarn, but ambari should set a proper value 
in considering of NM size, etc.

"yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage" - 
CGroups allows cpu usage limits to be hard or soft. When this setting is true, 
containers cannot use more CPU usage than allocated even if spare CPU is 
available. This ensures that containers can only use CPU that they were 
allocated. When set to false, containers can use spare CPU if available. It 
should be noted that irrespective of whether set to true or false, at no time 
can the combined CPU usage of all containers exceed the value specified in 
“yarn.nodemanager.resource.percentage-physical-cpu-limit”.
Talked with [~skumpf], we run into kernel panic when set hard limit before, so 
we should know there is risk to set this to true. May need a documentation? 
{code}

2) For non-secure cluster (this needs to be done when we move from secure to 
non-secure):
 - In container-executor.cfg: Remove "yarn" from banned user 
({{banned.users}}). And set {{min.user.id}} to 50.
 - In yarn-site.xml: change:
{code:java}
 
yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users=true
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user=yarn
{code}

3) When moving from non-secure to secure:
 - In container-executor.cfg:
Add "yarn" user to banned user ({{banned.users}})
Set {{min.user.id}} to existing default in Ambari (IIRC it's 1000).
 - Revert yarn-site.xml following configs to:
{code:java}
 
yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users=false
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user=nobody 
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to