[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

Andrew Ferguson (JIRA) Wed, 24 Oct 2012 10:24:13 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483412#comment-13483412
 ]


Andrew Ferguson commented on YARN-3:
------------------------------------

thanks for the review [~vinodkv]. I'll post an updated patch on YARN-147. 
there's a lot of food for thought here (design questions), so here are some 
comments:

bq. yarn.nodemanager.linux-container-executor.cgroups.mount has different 
defaults in code and in yarn-default.xml

yeah -- personally, I think the default should be false since it's not clear 
what a sensible default mount path is. I had changed the line in the code in 
response to Tucu's comment [1], but I'm changing it back to false since true 
doesn't seem sensible to me. if anyone in the community has a sensible default 
mount path, then we can surely change the default to true in both the code and 
yarn-default.xml :-/

bq. Can you explain this? Is this sleep necessary. Depending on its importance, 
we'll need to fix the following Id check, AMs don't always have ID equaling one.

the sleep is necessary as sometimes the LCE reports that the container has 
exited, even though the AM process has not terminated. hence, because the 
process is still running, we can't remove the cgroup yet; therefore, the code 
sleeps briefly.

since the AM doesn't always have the ID of 1, what do you suggest I do to 
determine whether the container has the AM or not? if there isn't a good rule, 
the code can just always sleep before removing the cgroup.

bq. container-executor.c: If a mount-point is already mounted, mount gives a 
EBUSY error, mount_cgroup() will need to be fixed to support remounts (for e.g. 
on NM restarts). We could unmount cgroup fs on shutdown but that isn't always 
guaranteed.

great catch! thanks! I've made this non-fatal. now, the NM will attempt to 
re-mount the cgroup, will print a message that it can't do that because it's 
mounted, and everything will work, because it will simply work as in the case 
where the cluster admin has already mounted the cgroups.

bq. Not sure of the benefit of configurable 
yarn.nodemanager.linux-container-executor.cgroups.mount-path. Couldn't NM just 
always mount to a path that it creates and owns? Similar comment for the 
hierarchy-prefix.

for the hierarchy-prefix, this needs to be configurable since, in the scenario 
where the admin creates the cgroups in advance, the NM doesn't have privileges 
to create its own hierarchy.

for the mount-path, this is a good question. Linux distributions mount the 
cgroup controllers in various locations, so I thought it was better to keep it 
configurable, since I figured it would be confusing if the OS had already 
mounted some of the cgroup congrollers on /cgroup/ or /sys/fs/cgroup/, and then 
the NM started mounting additional controllers in /path/nm/owns/cgroup/.

bq. CgroupsLCEResourcesHandler is swallowing exceptions and errors in multiple 
places - updateCgroup() and createCgroup(). In the later, if cgroups are 
enabled, and we can't create the file, it is a critical error?

I'm fine either way. what would people prefer to see? is it better to launch a 
container even if we can't enforce the limits? or is it better to prevent the 
container from launching? happy to make the necessary quick change.

bq. Make ResourcesHandler top level. I'd like to merge the ContainersMonitor 
functionality with this so as to monitor/enforce memory limits also. 
ContainersMinotor is top-level, we should make ResourcesHandler also top-level 
so that other platforms don't need to create this type-hierarchy all over again 
when they wish to implement some or all of this functionality.

if I'm reading this correctly, yes, that is what I first wanted to do when I 
started this patch (see discussions at the top of this YARN-3 thread, the early 
patches for MAPREDUCE-4334, and the current YARN-4). however, it seems we have 
decided to go another way.



thank you,
Andrew


[1] 
https://issues.apache.org/jira/browse/YARN-147?focusedCommentId=13470926&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13470926
                
> Add support for CPU isolation/monitoring of containers
> ------------------------------------------------------
>
>                 Key: YARN-3
>                 URL: https://issues.apache.org/jira/browse/YARN-3
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Arun C Murthy
>            Assignee: Andrew Ferguson
>         Attachments: mapreduce-4334-design-doc.txt, 
> mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
> MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

Reply via email to