[
https://issues.apache.org/jira/browse/YARN-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310276#comment-16310276
]
Miklos Szegedi commented on YARN-7693:
--------------------------------------
Thank you for the reply [~yangjiandan].
+0 on the approach adding a separarate monitor class for this. I think it is
useful to be able to change the monitor.
In terms of the feature you described I have some suggestions, you may want to
consider.
First of all please consider using a JIRA feature for your project making this
as a sub-task. How about doing this as part of YARN-1747 or even better
YARN-1011?
You may want to leverage the option to simply turn off the current cgroups
memory enforcement using the configuration added in YARN-7064. It also handles
monitoring resource utilization using cgroups.
bq. 1) Separate containers into two different group Opportunistic_Group and
Guaranteed_Group under hadoop-yarn
The reason why it is useful to have a single cgroup hadoop-yarn for all
containers that you can set a single logic and control the OOM killer for all.
I would be happy to look at the actual code, but adjusting two different
cgroups may add too much complexity. It is especially problematic in case of
promotion. When an opportunistic container is promoted to guaranted, you need
to move to the other cgroup but this requires heavy lifting from the kernel
that takes significant time. See
https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt for details.
bq. 2) Monitor system resource utilization and dynamically adjust resource of
Opportunistic_Group
The concern here is that dynamically adjusting does not work in the current
implementation either. This is because it is too slow to respond in extreme
cases. Please check out YARN-6677, YARN-4599 and YARN-1014. The idea there is
to disable the OOM killer on hadoop-yarn as you also suggested, so that we get
notified by the kernel when the system resource utilization is low. YARN can
then decide which container to preempt or adjust the soft limit, while the
containers are paused. The preemption unblocks the containers. Please let us
know, if you have time and you would like to contribute.
bq. 3) Kill container only when adjust resource fail for given times
I absolutely agree with this. A sudden spike in cpu usage should not trigger
immediate preemption. In case of memory I am not sure how much you can adjust
though. My understanding is that the basic design of opportunistic containers
is that they never affect the performance of guaranteed ones but using IO for
swapping would exactly do that. How would you reduce memory usage if not
preempting?
> ContainersMonitor support configurable
> --------------------------------------
>
> Key: YARN-7693
> URL: https://issues.apache.org/jira/browse/YARN-7693
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: nodemanager
> Reporter: Jiandan Yang
> Assignee: Jiandan Yang
> Priority: Minor
> Attachments: YARN-7693.001.patch, YARN-7693.002.patch
>
>
> Currently ContainersMonitor has only one default implementation
> ContainersMonitorImpl,
> After introducing Opportunistic Container, ContainersMonitor needs to monitor
> system metrics and even dynamically adjust Opportunistic and Guaranteed
> resources in the cgroup, so another ContainersMonitor may need to be
> implemented.
> The current ContainerManagerImpl ContainersMonitorImpl direct new
> ContainerManagerImpl, so ContainersMonitor need to be configurable.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]