Ferenc Erdelyi created YARN-11774: ------------------------------------- Summary: DominantResourceCalculator - Used Resources Percentage Metrics is Incorrect Key: YARN-11774 URL: https://issues.apache.org/jira/browse/YARN-11774 Project: Hadoop YARN Issue Type: Improvement Components: yarn-service Reporter: Ferenc Erdelyi Attachments: DominantResourceCalculator_repro1_metrics.png
The issue occurs using Dominant Resource Calculator Reproduction steps: - create two queues: root.a and root.b. Submit a vcores-heavy application to queue.a and memory-heavy application to queue.b. Make sure the applications started running, then navigate to YARN UI1 and check the Used Resources percentage. We expect to get the percentage based on the given resource-heavy values. E.g. if it was vcores, then we get the ratio of the used vcores value and the effective vcores value and multiply it by 100 to get the %. In some cases the calculation is incorrect. See the screenshot. !DominantResourceCalculator_repro1_metrics.png! root.a queue {code:java} hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar -shell_command 'while true; echo Timestamp: \"\"\$(date +%Y-%m-%d\ %H:%M:%S)\"\"; do sleep 3600; done' -jar /opt/cloudera/parcels/CDH/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar --num_containers 2 --master_memory 1024 –master_vcores 5 --container_memory 1024 --container_vcores 5 -queue root.a {code} root.b queue {code:java} hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar -shell_command 'while true; echo Timestamp: \"\"\$(date +%Y-%m-%d\ %H:%M:%S)\"\"; do sleep 3600; done' -jar /opt/cloudera/parcels/CDH/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar --num_containers 2 --master_memory 5120 –master_vcores 1 --container_memory 5120 --container_vcores 1 -queue root.b Observation: VCores "Used Capacity" percentage calculation is not intuitive. Out of 5 vcores queue capacity, we used 11 vcores (over the queue capacity). Intuitively - based on my understanding - we expect to calculate the percentage as 11/5*100=220, but we get a different value - 206.3 {code} For the memory, the "Used Capacity" calculation, I was not able to confirm the issue, however it seems to occur from time-to-time -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org