[ 
https://issues.apache.org/jira/browse/YARN-11843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012047#comment-18012047
 ] 

ASF GitHub Bot commented on YARN-11843:
---------------------------------------

hadoop-yetus commented on PR #7855:
URL: https://github.com/apache/hadoop/pull/7855#issuecomment-3153783066

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |  21m 10s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  45m 30s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m  6s |  |  trunk passed with JDK 
Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 56s |  |  trunk passed with JDK 
Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   0m 55s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 59s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  trunk passed with JDK 
Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 48s |  |  trunk passed with JDK 
Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   1m 59s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  43m 37s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 49s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 59s |  |  the patch passed with JDK 
Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 59s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 49s |  |  the patch passed with JDK 
Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   0m 49s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 44s | 
[/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7855/1/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt)
 |  
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 24 unchanged - 0 fixed = 25 total (was 24)  |
   | +1 :green_heart: |  mvnsite  |   0m 49s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  |  the patch passed with JDK 
Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 41s |  |  the patch passed with JDK 
Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   1m 58s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  42m  4s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | -1 :x: |  unit  | 116m 21s | 
[/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7855/1/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt)
 |  hadoop-yarn-server-resourcemanager in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 37s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 284m 30s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.51 ServerAPI=1.51 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7855/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/7855 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 7447aefaac5f 5.15.0-144-generic #157-Ubuntu SMP Mon Jun 16 
07:33:10 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / cc6bc5fbade22da6b88033c31f3fc8996b737262 |
   | Default Java | Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7855/1/testReport/ |
   | Max. process+thread count | 927 (vs. ulimit of 5500) |
   | modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7855/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Fix potential deadlock when auto-correction of container allocation is enabled
> ------------------------------------------------------------------------------
>
>                 Key: YARN-11843
>                 URL: https://issues.apache.org/jira/browse/YARN-11843
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 3.5.0
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Major
>              Labels: pull-request-available
>
> The feature introduced in YARN-11702 has a potential deadlock issue. When 
> enabled, it can cause deadlock when holding application-level write locks 
> while trying to acquire queue-level write locks.
>  
> Root Cause:
>  - autoCorrectContainerAllocation is called while holding application-level 
> write locks
>  - It directly calls completedContainer() which requires queue-level write 
> locks
> {code:java}
> CapacityScheduler#allocate
>    --> ...
>        application.getWriteLock().lock();   //1. requires app writeLock!!!
>        try{
>            ...
>            AbstractYarnScheduler#autoCorrectContainerAllocation
>               --> AbstractYarnScheduler#completedContainer 
>                   --> AbstractYarnScheduler#completedContainerInternal
>                       --> AbstractLeafQueue#completedContainer
>                            writeLock.lock()  //2. requires queue writeLock!!!
>                            try{
>                                ...
>                                FiCaSchedulerApp#containerCompleted
>                                    //3. requires app writeLock!!!
>                            }finally{
>                                writeLock.unlock();
>                            }
>        }finally{
>            application.getWriteLock().unlock();
>        }{code}
>  - This violates lock hierarchy and creates deadlock scenarios, since 
> AbstractYarnScheduler#completedContainer could be called from another thread 
> during normal container completion operations.
>  
> Solution:
> Replace direct completedContainer() calls with asyncContainerRelease() in 
> autoCorrectContainerAllocation method.
> Before:
> {code:java}
> completedContainer(rmContainer, ...); // Direct call causes deadlock {code}
> After:
> {code:java}
> asyncContainerRelease(rmContainer); // Async call avoids deadlock {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to