[
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483825#comment-13483825
]
Vinod Kumar Vavilapalli commented on YARN-3:
--------------------------------------------
bq. I think the default should be false since it's not clear what a sensible
default mount path is.
+1
bq. the sleep is necessary as sometimes the LCE reports that the container has
exited, even though the AM process has not terminated. hence, because the
process is still running, we can't remove the cgroup yet; therefore, the code
sleeps briefly.
That doesn't sound right. LCE launches a shell which in turn launches the JVM,
so I'd think none of them should return earlier than the JVM. We need more
information, but we can post-pone to a follow up ticket.
bq. since the AM doesn't always have the ID of 1, what do you suggest I do to
determine whether the container has the AM or not? if there isn't a good rule,
the code can just always sleep before removing the cgroup.
We will need to augment the containerID with AM-OR-NOT information, which is a
bigger change. We can defer this to another ticket.
bq. great catch! thanks! I've made this non-fatal. now, the NM will attempt to
re-mount the cgroup, will print a message that it can't do that because it's
mounted, and everything will work, because it will simply work as in the case
where the cluster admin has already mounted the cgroups.
Sure.
bq. for the hierarchy-prefix, this needs to be configurable since, in the
scenario where the admin creates the cgroups in advance, the NM doesn't have
privileges to create its own hierarchy.
Oh, yeah. You are right, we should document this in the description saying that
if they are mounted in advance, the hierarchy-prefix should reflect what is
already mounted or else NM may fail.
bq. for the mount-path, this is a good question. Linux distributions mount the
cgroup controllers in various locations, so I thought it was better to keep it
configurable, since I figured it would be confusing if the OS had already
mounted some of the cgroup congrollers on /cgroup/ or /sys/fs/cgroup/, and then
the NM started mounting additional controllers in /path/nm/owns/cgroup/.
Makes sense now. Let's add some version of this too to the config description.
bq. is it better to launch a container even if we can't enforce the limits? or
is it better to prevent the container from launching? happy to make the
necessary quick change.
I think it is a fatal error if the admin wanted to use cgroups and for some
reason, NM cannot enforce it.
bq. if I'm reading this correctly, yes, that is what I first wanted to do when
I started this patch (see discussions at the top of this YARN-3 thread, the
early patches for MAPREDUCE-4334, and the current YARN-4). however, it seems we
have decided to go another way.
Just read the whole discussion on this ticket. I think that we went the shorter
cut, which I believe is not the right long term solution. Partly can be
attributed to the original patch doing so many things. Let's discuss this in a
separate JIRA.
> Add support for CPU isolation/monitoring of containers
> ------------------------------------------------------
>
> Key: YARN-3
> URL: https://issues.apache.org/jira/browse/YARN-3
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Arun C Murthy
> Assignee: Andrew Ferguson
> Attachments: mapreduce-4334-design-doc.txt,
> mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch,
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch,
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch,
> MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch,
> MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch,
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira