[ 
https://issues.apache.org/jira/browse/YARN-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17100800#comment-17100800
 ] 

Zbigniew Baranowski commented on YARN-10252:
--------------------------------------------

Not sure what is the reason of shadedclient test failures, this patch does not 
touch any of client related stuff

> Allow adjusting vCore weight in CPU cgroup strict mode
> ------------------------------------------------------
>
>                 Key: YARN-10252
>                 URL: https://issues.apache.org/jira/browse/YARN-10252
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 3.2.1
>            Reporter: Zbigniew Baranowski
>            Priority: Major
>         Attachments: YARN-10252.patch, YARN-10252.patch, YARN.patch
>
>
> Currently, with CPU cgroup strict mode enabled on NodeManager, when cpu 
> resources are overcommitted ( 8 vCores on 4 core machine), the total amount 
> of CPU time that container will get for each requested vCore will be 
> automatically downscaled with the formula: vCoreCPUTime = 
> totalPhysicalCoresOnNM / coresConfiguredForNM. So container speed will be 
> throttled on CPU even if there are spare cores available on NM (e.g with 8 
> vCores available o 4 core machine, a container that asked for 2 cores 
> effectively will be allowed to use only on physical core). The same is 
> happening if CPU resource cap is enabled (via 
> yarn.nodemanager.resource.percentage-physical-cpu-limit), in this case, 
> totalCoresOnNode (=coresOnNode * percentage-physical-cpu-limit) is scaled 
> down by the cap. So for example, if the cap is 80%, a container that asked 
> for 2 cores will be allowed to use the max of the equivalent of 1.6 physical 
> core, regardless of the current NM load.
> Both aforementioned situations may lead to underuse of available resources. 
> In some cases, administrator may want to overcommit the resources if 
> applications are statically over-allocating resources, but not fully using 
> them. This will cause all containers to slow down, which is not the initial 
> intention. 
> Therefore it would be very useful if administrators have control on how 
> vCores are mapped to CPU time on NodeManagers in strict mode when CPU 
> resources are overcommitted or/and physical-cpu-limit is enabled.
> This could be potentially done with a parameter like 
> yarn.nodemanager.resource.strict-vcore-weight that controls the vCore to 
> pCore time mapping. E.g value 1 means one to one mapping, 1.2 means that a 
> single vcore can have up to 120% of a physical core (this can be handy for 
> pysparkers), -1 (default) disables the feature - use auto-scaling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to