[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570037#comment-14570037
 ] 

Sidharta Seethana commented on YARN-2194:
-----------------------------------------

There are two different issues here : 

* container-executor binary invocation uses ‘,’ as a separator when supplying a 
list of paths - which breaks when the path contains ‘,’
* cpu,cpuacct are mounted together by default on RHEL7 

Now, for the latter issue : In {{CgroupsLCEResourcesHandler}}, the following 
steps occur : 

* If the {{yarn.nodemanager.linux-container-executor.cgroups.mount}} switch is 
enabled , the ‘cpu’ controller is explicitly mounted at the specified path. 
* (irrespective of the state of the switch) The {{/proc/mounts}} file (possibly 
updated by the previous step) is subsequently parsed to determine the mount 
locations for the various cgroup controllers - this parsing code seems to be 
correct even if cpu and cpuacct are mounted in one location.

So, the thing we need to fix is the separator issue and we should be good.  The 
important thing to remember is that there are *two* cgroups implementation 
classes ( {{CgroupsLCEResourcesHandler}} and {{CGroupsHandlerImpl}} ). 
Hopefully, this will be addressed soon ( YARN-3542 ) - or we risk divergence. 


> Cgroups cease to work in RHEL7
> ------------------------------
>
>                 Key: YARN-2194
>                 URL: https://issues.apache.org/jira/browse/YARN-2194
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.0
>            Reporter: Wei Yan
>            Assignee: Wei Yan
>            Priority: Critical
>         Attachments: YARN-2194-1.patch, YARN-2194-2.patch, YARN-2194-3.patch
>
>
> In RHEL7, the CPU controller is named "cpu,cpuacct". The comma in the 
> controller name leads to container launch failure. 
> RHEL7 deprecates libcgroup and recommends the user of systemd. However, 
> systemd has certain shortcomings as identified in this JIRA (see comments). 
> This JIRA only fixes the failure, and doesn't try to use systemd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to