[
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483412#comment-13483412
]
Andrew Ferguson commented on YARN-3:
------------------------------------
thanks for the review [~vinodkv]. I'll post an updated patch on YARN-147.
there's a lot of food for thought here (design questions), so here are some
comments:
bq. yarn.nodemanager.linux-container-executor.cgroups.mount has different
defaults in code and in yarn-default.xml
yeah -- personally, I think the default should be false since it's not clear
what a sensible default mount path is. I had changed the line in the code in
response to Tucu's comment [1], but I'm changing it back to false since true
doesn't seem sensible to me. if anyone in the community has a sensible default
mount path, then we can surely change the default to true in both the code and
yarn-default.xml :-/
bq. Can you explain this? Is this sleep necessary. Depending on its importance,
we'll need to fix the following Id check, AMs don't always have ID equaling one.
the sleep is necessary as sometimes the LCE reports that the container has
exited, even though the AM process has not terminated. hence, because the
process is still running, we can't remove the cgroup yet; therefore, the code
sleeps briefly.
since the AM doesn't always have the ID of 1, what do you suggest I do to
determine whether the container has the AM or not? if there isn't a good rule,
the code can just always sleep before removing the cgroup.
bq. container-executor.c: If a mount-point is already mounted, mount gives a
EBUSY error, mount_cgroup() will need to be fixed to support remounts (for e.g.
on NM restarts). We could unmount cgroup fs on shutdown but that isn't always
guaranteed.
great catch! thanks! I've made this non-fatal. now, the NM will attempt to
re-mount the cgroup, will print a message that it can't do that because it's
mounted, and everything will work, because it will simply work as in the case
where the cluster admin has already mounted the cgroups.
bq. Not sure of the benefit of configurable
yarn.nodemanager.linux-container-executor.cgroups.mount-path. Couldn't NM just
always mount to a path that it creates and owns? Similar comment for the
hierarchy-prefix.
for the hierarchy-prefix, this needs to be configurable since, in the scenario
where the admin creates the cgroups in advance, the NM doesn't have privileges
to create its own hierarchy.
for the mount-path, this is a good question. Linux distributions mount the
cgroup controllers in various locations, so I thought it was better to keep it
configurable, since I figured it would be confusing if the OS had already
mounted some of the cgroup congrollers on /cgroup/ or /sys/fs/cgroup/, and then
the NM started mounting additional controllers in /path/nm/owns/cgroup/.
bq. CgroupsLCEResourcesHandler is swallowing exceptions and errors in multiple
places - updateCgroup() and createCgroup(). In the later, if cgroups are
enabled, and we can't create the file, it is a critical error?
I'm fine either way. what would people prefer to see? is it better to launch a
container even if we can't enforce the limits? or is it better to prevent the
container from launching? happy to make the necessary quick change.
bq. Make ResourcesHandler top level. I'd like to merge the ContainersMonitor
functionality with this so as to monitor/enforce memory limits also.
ContainersMinotor is top-level, we should make ResourcesHandler also top-level
so that other platforms don't need to create this type-hierarchy all over again
when they wish to implement some or all of this functionality.
if I'm reading this correctly, yes, that is what I first wanted to do when I
started this patch (see discussions at the top of this YARN-3 thread, the early
patches for MAPREDUCE-4334, and the current YARN-4). however, it seems we have
decided to go another way.
thank you,
Andrew
[1]
https://issues.apache.org/jira/browse/YARN-147?focusedCommentId=13470926&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13470926
> Add support for CPU isolation/monitoring of containers
> ------------------------------------------------------
>
> Key: YARN-3
> URL: https://issues.apache.org/jira/browse/YARN-3
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Arun C Murthy
> Assignee: Andrew Ferguson
> Attachments: mapreduce-4334-design-doc.txt,
> mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch,
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch,
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch,
> MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch,
> MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch,
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira