[ https://issues.apache.org/jira/browse/YARN-11834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Syed Shameerur Rahman updated YARN-11834: ----------------------------------------- Description: It was noted that in a Hadoop 3.4.1 YARN deployment, Spark application was stuck in ACCEPTED state even though the cluster had enough resources. *Steps to replicate* 1. Launch YARN cluster total capacity ≥ 1.59 TB memory, 660 vCores or more {{{}2.Apply the following properties{}}}{*}{*} *{{capacity-scheduler}}* *{{{}"yarn.scheduler.capacity.node-locality-delay": "-1", "yarn.scheduler.capacity.resource-calculator": "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"{}}}{{{},{}}}* {*}{{"}}{*}{*}{{}}}}{*}{{{}*{\{{}yarn.scheduler.capacity.schedule-asynchronously.enable*{}}}{*}{{" : "true"}}{*} *{{yarn-site}}* *{{"yarn.log-aggregation-enable": "true",}}* *{{{}"yarn.log-aggregation.retain-check-interval-seconds": "300", "yarn.log-aggregation.retain-seconds": "-1", "yarn.scheduler.capacity.max-parallel-apps": "1"{}}}{{{{}}{}}}* 3. Submit multiple Spark jobs that launch a large number of containers. For example: {{spark-example --conf spark.dynamicAllocation.enabled=false --num-executors 2000 --driver-memory 1g --executor-memory 1g --executor-cores 1 SparkPi 1000}} *Observations* On analysis the logs, The following were the observations : When Application 1 completes, there's a period where its resource requests are still being processed or "honored" by the scheduler. During this transition period, the following sequence could occur: 1. Application 1 completes and releases its resources 2. The scheduler is still processing some older allocation requests for Application 1 3. During this processing, the *cul.canAssign flag* for the user is set to false. Refer [Link#1 |https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1670]and [Link #2|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1268] 4. Application 2 (which is new) tries to get resources 5. The scheduler checks the user's cul.canAssign flag, finds it's false (due to [cache implementation|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1241]), and denies resources to Application 2 6. Application 2 remains in ACCEPTED state despite available resources This race condition occurs because the user's resource usage state (tracked in the CapacityUsageLimit object) isn't properly reset or synchronized between the completion of one application and the scheduling of another. was: It was noted that in a Hadoop 3.4.1 YARN deployment, Spark application was stuck in ACCEPTED state even though the cluster had enough resources. *Steps to replicate* 1. Launch YARN cluster total capacity ≥ 1.59 TB memory, 660 vCores or more {{{}2.Apply the following properties{}}}{*}{*} *{{capacity-scheduler}}* *{{{}"yarn.scheduler.capacity.node-locality-delay": "-1", "yarn.scheduler.capacity.resource-calculator": "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"{}}}{{{},{}}}* {*}{{"}}{*}{*}{{}}{*}{*}{{{}yarn.scheduler.capacity.schedule-asynchronously.enable{}}}{{{}" : "true"{}}}{*} *{{yarn-site}}* *{{"yarn.log-aggregation-enable": "true",}}* *{{{}"yarn.log-aggregation.retain-check-interval-seconds": "300", "yarn.log-aggregation.retain-seconds": "-1", "yarn.scheduler.capacity.max-parallel-apps": "1"{}}}{{{}{}}}* 3. Submit multiple Spark jobs that launch a large number of containers. For example: {{spark-example --conf spark.dynamicAllocation.enabled=false --num-executors 2000 --driver-memory 1g --executor-memory 1g --executor-cores 1 SparkPi 1000}} *Observations* On analysis the logs, The following were the observations : When Application 1 completes, there's a period where its resource requests are still being processed or "honored" by the scheduler. During this transition period, the following sequence could occur: 1. Application 1 completes and releases its resources 2. The scheduler is still processing some older allocation requests for Application 1 3. During this processing, the *cul.canAssign flag* for the user is set to false. Refer [Link#1 |https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1670]and [Link #2|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1268] 4. Application 2 (which is new) tries to get resources 5. The scheduler checks the user's cul.canAssign flag, finds it's false (due to [cache implementation|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1241]), and denies resources to Application 2 6. Application 2 remains in ACCEPTED state despite available resources This race condition occurs because the user's resource usage state (tracked in the CapacityUsageLimit object) isn't properly reset or synchronized between the completion of one application and the scheduling of another. > [Capacity Scheduler] Application Stuck In ACCEPTED State due to Race Condition > ------------------------------------------------------------------------------ > > Key: YARN-11834 > URL: https://issues.apache.org/jira/browse/YARN-11834 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler > Affects Versions: 3.4.0, 3.4.1 > Reporter: Syed Shameerur Rahman > Assignee: Syed Shameerur Rahman > Priority: Major > > It was noted that in a Hadoop 3.4.1 YARN deployment, Spark application was > stuck in ACCEPTED state even though the cluster had enough resources. > > *Steps to replicate* > 1. Launch YARN cluster total capacity ≥ 1.59 TB memory, 660 vCores or more > {{{}2.Apply the following properties{}}}{*}{*} > *{{capacity-scheduler}}* > *{{{}"yarn.scheduler.capacity.node-locality-delay": "-1", > "yarn.scheduler.capacity.resource-calculator": > "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"{}}}{{{},{}}}* > {*}{{"}}{*}{*}{{}}}}{*}{{{}*{\{{}yarn.scheduler.capacity.schedule-asynchronously.enable*{}}}{*}{{" > : "true"}}{*} > > *{{yarn-site}}* > *{{"yarn.log-aggregation-enable": "true",}}* > *{{{}"yarn.log-aggregation.retain-check-interval-seconds": "300", > "yarn.log-aggregation.retain-seconds": "-1", > "yarn.scheduler.capacity.max-parallel-apps": "1"{}}}{{{{}}{}}}* > 3. Submit multiple Spark jobs that launch a large number of containers. For > example: > {{spark-example --conf spark.dynamicAllocation.enabled=false --num-executors > 2000 --driver-memory 1g --executor-memory 1g --executor-cores 1 SparkPi 1000}} > > *Observations* > On analysis the logs, The following were the observations : > When Application 1 completes, there's a period where its resource requests > are still being processed or "honored" by the scheduler. During this > transition period, the following sequence could occur: > 1. Application 1 completes and releases its resources > 2. The scheduler is still processing some older allocation requests for > Application 1 > 3. During this processing, the *cul.canAssign flag* for the user is set to > false. Refer [Link#1 > |https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1670]and > [Link > #2|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1268] > 4. Application 2 (which is new) tries to get resources > 5. The scheduler checks the user's cul.canAssign flag, finds it's false (due > to [cache > implementation|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1241]), > and denies resources to Application 2 > 6. Application 2 remains in ACCEPTED state despite available resources > This race condition occurs because the user's resource usage state (tracked > in the CapacityUsageLimit object) isn't properly reset or synchronized > between the completion of one application and the scheduling of another. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org