[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

Vinod Kumar Vavilapalli (JIRA) Wed, 24 Oct 2012 19:06:15 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483825#comment-13483825
 ]


Vinod Kumar Vavilapalli commented on YARN-3:
--------------------------------------------

bq. I think the default should be false since it's not clear what a sensible 
default mount path is.
+1

bq. the sleep is necessary as sometimes the LCE reports that the container has 
exited, even though the AM process has not terminated. hence, because the 
process is still running, we can't remove the cgroup yet; therefore, the code 
sleeps briefly.
That doesn't sound right. LCE launches a shell which in turn launches the JVM, 
so I'd think none of them should return earlier than the JVM. We need more 
information, but we can post-pone to a follow up ticket.

bq. since the AM doesn't always have the ID of 1, what do you suggest I do to 
determine whether the container has the AM or not? if there isn't a good rule, 
the code can just always sleep before removing the cgroup.
We will need to augment the containerID with AM-OR-NOT information, which is a 
bigger change. We can defer this to another ticket.

bq. great catch! thanks! I've made this non-fatal. now, the NM will attempt to 
re-mount the cgroup, will print a message that it can't do that because it's 
mounted, and everything will work, because it will simply work as in the case 
where the cluster admin has already mounted the cgroups.
Sure.

bq. for the hierarchy-prefix, this needs to be configurable since, in the 
scenario where the admin creates the cgroups in advance, the NM doesn't have 
privileges to create its own hierarchy.
Oh, yeah. You are right, we should document this in the description saying that 
if they are mounted in advance, the hierarchy-prefix should reflect what is 
already mounted or else NM may fail.

bq. for the mount-path, this is a good question. Linux distributions mount the 
cgroup controllers in various locations, so I thought it was better to keep it 
configurable, since I figured it would be confusing if the OS had already 
mounted some of the cgroup congrollers on /cgroup/ or /sys/fs/cgroup/, and then 
the NM started mounting additional controllers in /path/nm/owns/cgroup/.
Makes sense now. Let's add some version of this too to the config description.

bq. is it better to launch a container even if we can't enforce the limits? or 
is it better to prevent the container from launching? happy to make the 
necessary quick change.
I think it is a fatal error if the admin wanted to use cgroups and for some 
reason, NM cannot enforce it.

bq. if I'm reading this correctly, yes, that is what I first wanted to do when 
I started this patch (see discussions at the top of this YARN-3 thread, the 
early patches for MAPREDUCE-4334, and the current YARN-4). however, it seems we 
have decided to go another way.
Just read the whole discussion on this ticket. I think that we went the shorter 
cut, which I  believe is not the right long term solution. Partly can be 
attributed to the original patch doing so many things. Let's discuss this in a 
separate JIRA.

                
> Add support for CPU isolation/monitoring of containers
> ------------------------------------------------------
>
>                 Key: YARN-3
>                 URL: https://issues.apache.org/jira/browse/YARN-3
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Arun C Murthy
>            Assignee: Andrew Ferguson
>         Attachments: mapreduce-4334-design-doc.txt, 
> mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
> MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

Reply via email to