[ https://issues.apache.org/jira/browse/YARN-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Szucs resolved YARN-11733. -------------------------------- Fix Version/s: 3.5.0 Resolution: Fixed > Fix the order of updating CPU controls with cgroup v1 > ----------------------------------------------------- > > Key: YARN-11733 > URL: https://issues.apache.org/jira/browse/YARN-11733 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Reporter: Peter Szucs > Assignee: Peter Szucs > Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > After YARN-11674 (Update CpuResourceHandler implementation for cgroup v2 > support) the order of updating cpu.cfs_period_us and cpu.cfs_quota_us > controls have changed which can cause the below errors when launching > containers with CPU limits on cgroupv1: > {code:java} > PrintWriter unable to write to > /var/cgroupv1/cpu/hadoop-yarn/container_e02_1727079571170_0040_02_000001/cpu.cfs_quota_us > with value: 112500{code} > > *Reproduction:* > I set CPU limits on yarn-site.xml for cgroup: > {code:java} > yarn.nodemanager.resource.percentage-physical-cpu-limit: 90 > yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage: > true{code} > After that the limits were applied on the hadoop-yarn root hierarchy: > {code:java} > root@pszucs-test-2 hadoop-yarn]# cat cpu.cfs_period_us 1000000 > root@pszucs-test-2 hadoop-yarn]# cat cpu.cfs_quota_us 900000 > {code} > When I tried to launch a container it gave me the following error: > {code:java} > PrintWriter unable to write to > /var/cgroupv1/cpu/hadoop-yarn/container_e02_1727079571170_0040_02_000001/cpu.cfs_quota_us > with value: 112500{code} > It is because the container tries to exceed the limit defined at higher level > with the 112 500 value for cfs_quota_us. If I try to create a test cgroup > manually and try to update this control it lets me to do that up to the value > of 90 000 as well: > {code:java} > [root@pszucs-test-2 hadoop-yarn]# cat test/cpu.cfs_period_us > 100000 > [root@pszucs-test-2 hadoop-yarn]# echo "90001" > test/cpu.cfs_quota_us > -bash: echo: write error: Invalid argument > [root@pszucs-test-2 hadoop-yarn]# echo "90000" > test/cpu.cfs_quota_us{code} > > *Solution:* > The cause for this issue is that the cfs_period_us control get the default > value of 100 000 when a new cgroup is created, but when YARN calculates the > limit, it uses 1 000 000 for that. Because of this we need to update > cpu.cfs_period_us before cpu.cfs_quota_us, to keep the ratio between the two > values and not to overcome the limit defined at parent level. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org