[
https://issues.apache.org/jira/browse/YARN-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17100800#comment-17100800
]
Zbigniew Baranowski commented on YARN-10252:
--------------------------------------------
Not sure what is the reason of shadedclient test failures, this patch does not
touch any of client related stuff
> Allow adjusting vCore weight in CPU cgroup strict mode
> ------------------------------------------------------
>
> Key: YARN-10252
> URL: https://issues.apache.org/jira/browse/YARN-10252
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Affects Versions: 3.2.1
> Reporter: Zbigniew Baranowski
> Priority: Major
> Attachments: YARN-10252.patch, YARN-10252.patch, YARN.patch
>
>
> Currently, with CPU cgroup strict mode enabled on NodeManager, when cpu
> resources are overcommitted ( 8 vCores on 4 core machine), the total amount
> of CPU time that container will get for each requested vCore will be
> automatically downscaled with the formula: vCoreCPUTime =
> totalPhysicalCoresOnNM / coresConfiguredForNM. So container speed will be
> throttled on CPU even if there are spare cores available on NM (e.g with 8
> vCores available o 4 core machine, a container that asked for 2 cores
> effectively will be allowed to use only on physical core). The same is
> happening if CPU resource cap is enabled (via
> yarn.nodemanager.resource.percentage-physical-cpu-limit), in this case,
> totalCoresOnNode (=coresOnNode * percentage-physical-cpu-limit) is scaled
> down by the cap. So for example, if the cap is 80%, a container that asked
> for 2 cores will be allowed to use the max of the equivalent of 1.6 physical
> core, regardless of the current NM load.
> Both aforementioned situations may lead to underuse of available resources.
> In some cases, administrator may want to overcommit the resources if
> applications are statically over-allocating resources, but not fully using
> them. This will cause all containers to slow down, which is not the initial
> intention.
> Therefore it would be very useful if administrators have control on how
> vCores are mapped to CPU time on NodeManagers in strict mode when CPU
> resources are overcommitted or/and physical-cpu-limit is enabled.
> This could be potentially done with a parameter like
> yarn.nodemanager.resource.strict-vcore-weight that controls the vCore to
> pCore time mapping. E.g value 1 means one to one mapping, 1.2 means that a
> single vcore can have up to 120% of a physical core (this can be handy for
> pysparkers), -1 (default) disables the feature - use auto-scaling.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]