[ 
https://issues.apache.org/jira/browse/YARN-11834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated YARN-11834:
-----------------------------------------
    Description: 
It was noted that in a Hadoop 3.4.1 YARN deployment, Spark application was 
stuck in ACCEPTED state even though the cluster had enough resources.

 

*Steps to replicate*

1. Launch YARN cluster total capacity ≥ 1.59 TB memory, 660 vCores or more 

{{{}2.Apply the following properties{}}}{*}{*}

*{{capacity-scheduler}}*

*{{{}"yarn.scheduler.capacity.node-locality-delay": "-1", 
"yarn.scheduler.capacity.resource-calculator": 
"org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"{}}}{{{},{}}}*
{*}{{"}}{*}{*}{{}}}}{*}{{{}*{\{{}yarn.scheduler.capacity.schedule-asynchronously.enable*{}}}{*}{{"
 : "true"}}{*}

 

*{{yarn-site}}*

*{{"yarn.log-aggregation-enable": "true",}}*
*{{{}"yarn.log-aggregation.retain-check-interval-seconds": "300", 
"yarn.log-aggregation.retain-seconds": "-1", 
"yarn.scheduler.capacity.max-parallel-apps": "1"{}}}{{{{}}{}}}*

3. Submit multiple Spark jobs that launch a large number of containers. For 
example:

{{spark-example --conf spark.dynamicAllocation.enabled=false --num-executors 
2000 --driver-memory 1g --executor-memory 1g --executor-cores 1 SparkPi 1000}}

 

*Observations*

On analysis the logs, The following were the observations :

When Application 1 completes, there's a period where its resource requests are 
still being processed or "honored" by the scheduler. During this transition 
period, the following sequence could occur:

1. Application 1 completes and releases its resources
2. The scheduler is still processing some older allocation requests for 
Application 1
3. During this processing, the *cul.canAssign flag* for the user is set to 
false. Refer [Link#1 
|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1670]and
 [Link 
#2|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1268]
4. Application 2 (which is new) tries to get resources
5. The scheduler checks the user's cul.canAssign flag, finds it's false (due to 
[cache 
implementation|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1241]),
 and denies resources to Application 2
6. Application 2 remains in ACCEPTED state despite available resources

This race condition occurs because the user's resource usage state (tracked in 
the CapacityUsageLimit object) isn't properly reset or synchronized 
between the completion of one application and the scheduling of another.

  was:
It was noted that in a Hadoop 3.4.1 YARN deployment, Spark application was 
stuck in ACCEPTED state even though the cluster had enough resources.

 

*Steps to replicate*

1. Launch YARN cluster total capacity ≥ 1.59 TB memory, 660 vCores or more 

{{{}2.Apply the following properties{}}}{*}{*}

*{{capacity-scheduler}}*

*{{{}"yarn.scheduler.capacity.node-locality-delay": "-1",      
"yarn.scheduler.capacity.resource-calculator": 
"org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"{}}}{{{},{}}}*
{*}{{"}}{*}{*}{{}}{*}{*}{{{}yarn.scheduler.capacity.schedule-asynchronously.enable{}}}{{{}"
 : "true"{}}}{*}

 

*{{yarn-site}}*

*{{"yarn.log-aggregation-enable": "true",}}*
      *{{{}"yarn.log-aggregation.retain-check-interval-seconds": "300",      
"yarn.log-aggregation.retain-seconds": "-1",      
"yarn.scheduler.capacity.max-parallel-apps": "1"{}}}{{{}{}}}*

3. Submit multiple Spark jobs that launch a large number of containers. For 
example:

{{spark-example --conf spark.dynamicAllocation.enabled=false --num-executors 
2000 --driver-memory 1g --executor-memory 1g --executor-cores 1 SparkPi 1000}}

 

*Observations*

On analysis the logs, The following were the observations :


When Application 1 completes, there's a period where its resource requests are 
still being processed or "honored" by the scheduler. During this transition 
period, the following sequence could occur:

1. Application 1 completes and releases its resources
2. The scheduler is still processing some older allocation requests for 
Application 1
3. During this processing, the *cul.canAssign flag* for the user is set to 
false. Refer [Link#1 
|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1670]and
 [Link 
#2|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1268]
4. Application 2 (which is new) tries to get resources
5. The scheduler checks the user's cul.canAssign flag, finds it's false (due to 
[cache 
implementation|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1241]),
 and denies resources to Application 2
6. Application 2 remains in ACCEPTED state despite available resources

This race condition occurs because the user's resource usage state (tracked in 
the CapacityUsageLimit object) isn't properly reset or synchronized 
between the completion of one application and the scheduling of another.


> [Capacity Scheduler] Application Stuck In ACCEPTED State due to Race Condition
> ------------------------------------------------------------------------------
>
>                 Key: YARN-11834
>                 URL: https://issues.apache.org/jira/browse/YARN-11834
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 3.4.0, 3.4.1
>            Reporter: Syed Shameerur Rahman
>            Assignee: Syed Shameerur Rahman
>            Priority: Major
>
> It was noted that in a Hadoop 3.4.1 YARN deployment, Spark application was 
> stuck in ACCEPTED state even though the cluster had enough resources.
>  
> *Steps to replicate*
> 1. Launch YARN cluster total capacity ≥ 1.59 TB memory, 660 vCores or more 
> {{{}2.Apply the following properties{}}}{*}{*}
> *{{capacity-scheduler}}*
> *{{{}"yarn.scheduler.capacity.node-locality-delay": "-1", 
> "yarn.scheduler.capacity.resource-calculator": 
> "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"{}}}{{{},{}}}*
> {*}{{"}}{*}{*}{{}}}}{*}{{{}*{\{{}yarn.scheduler.capacity.schedule-asynchronously.enable*{}}}{*}{{"
>  : "true"}}{*}
>  
> *{{yarn-site}}*
> *{{"yarn.log-aggregation-enable": "true",}}*
> *{{{}"yarn.log-aggregation.retain-check-interval-seconds": "300", 
> "yarn.log-aggregation.retain-seconds": "-1", 
> "yarn.scheduler.capacity.max-parallel-apps": "1"{}}}{{{{}}{}}}*
> 3. Submit multiple Spark jobs that launch a large number of containers. For 
> example:
> {{spark-example --conf spark.dynamicAllocation.enabled=false --num-executors 
> 2000 --driver-memory 1g --executor-memory 1g --executor-cores 1 SparkPi 1000}}
>  
> *Observations*
> On analysis the logs, The following were the observations :
> When Application 1 completes, there's a period where its resource requests 
> are still being processed or "honored" by the scheduler. During this 
> transition period, the following sequence could occur:
> 1. Application 1 completes and releases its resources
> 2. The scheduler is still processing some older allocation requests for 
> Application 1
> 3. During this processing, the *cul.canAssign flag* for the user is set to 
> false. Refer [Link#1 
> |https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1670]and
>  [Link 
> #2|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1268]
> 4. Application 2 (which is new) tries to get resources
> 5. The scheduler checks the user's cul.canAssign flag, finds it's false (due 
> to [cache 
> implementation|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1241]),
>  and denies resources to Application 2
> 6. Application 2 remains in ACCEPTED state despite available resources
> This race condition occurs because the user's resource usage state (tracked 
> in the CapacityUsageLimit object) isn't properly reset or synchronized 
> between the completion of one application and the scheduling of another.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to