[ 
https://issues.apache.org/jira/browse/YARN-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Szucs resolved YARN-11733.
--------------------------------
    Fix Version/s: 3.5.0
       Resolution: Fixed

> Fix the order of updating CPU controls with cgroup v1
> -----------------------------------------------------
>
>                 Key: YARN-11733
>                 URL: https://issues.apache.org/jira/browse/YARN-11733
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Peter Szucs
>            Assignee: Peter Szucs
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.5.0
>
>
> After YARN-11674 (Update CpuResourceHandler implementation for cgroup v2 
> support) the order of updating cpu.cfs_period_us and cpu.cfs_quota_us 
> controls have changed which can cause the below errors when launching 
> containers with CPU limits on cgroupv1:
> {code:java}
> PrintWriter unable to write to 
> /var/cgroupv1/cpu/hadoop-yarn/container_e02_1727079571170_0040_02_000001/cpu.cfs_quota_us
>  with value: 112500{code}
>  
> *Reproduction:*
> I set CPU limits on yarn-site.xml for cgroup:
> {code:java}
> yarn.nodemanager.resource.percentage-physical-cpu-limit: 90
> yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage: 
> true{code}
> After that the limits were applied on the hadoop-yarn root hierarchy:
> {code:java}
> root@pszucs-test-2 hadoop-yarn]# cat cpu.cfs_period_us 1000000
> root@pszucs-test-2 hadoop-yarn]# cat cpu.cfs_quota_us 900000
> {code}
> When I tried to launch a container it gave me the following error:
> {code:java}
> PrintWriter unable to write to 
> /var/cgroupv1/cpu/hadoop-yarn/container_e02_1727079571170_0040_02_000001/cpu.cfs_quota_us
>  with value: 112500{code}
> It is because the container tries to exceed the limit defined at higher level 
> with the 112 500 value for cfs_quota_us. If I try to create a test cgroup 
> manually and try to update this control it lets me to do that up to the value 
> of 90 000 as well:
> {code:java}
> [root@pszucs-test-2 hadoop-yarn]# cat test/cpu.cfs_period_us
> 100000
> [root@pszucs-test-2 hadoop-yarn]# echo "90001" > test/cpu.cfs_quota_us
> -bash: echo: write error: Invalid argument
> [root@pszucs-test-2 hadoop-yarn]# echo "90000" > test/cpu.cfs_quota_us{code}
>  
> *Solution:*
> The cause for this issue is that the cfs_period_us control get the default 
> value of 100 000 when a new cgroup is created, but when YARN calculates the 
> limit, it uses 1 000 000 for that. Because of this we need to update 
> cpu.cfs_period_us before cpu.cfs_quota_us, to keep the ratio between the two 
> values and not to overcome the limit defined at parent level.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to