[ 
https://issues.apache.org/jira/browse/YARN-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768105#comment-17768105
 ] 

ASF GitHub Bot commented on YARN-11573:
---------------------------------------

hadoop-yetus commented on PR #6098:
URL: https://github.com/apache/hadoop/pull/6098#issuecomment-1731798334

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 50s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  48m 38s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m  1s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 52s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 53s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 57s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 45s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 58s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 29s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 54s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 46s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 44s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 48s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 56s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  39m 27s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 103m  2s |  |  
hadoop-yarn-server-resourcemanager in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 34s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 248m 46s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6098/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6098 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c6c5ace082c1 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ffc0cc9cb155be99f075f8125376ee475debee7b |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6098/3/testReport/ |
   | Max. process+thread count | 898 (vs. ulimit of 5500) |
   | modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6098/3/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Add config option to make container allocation prefer nodes without reserved 
> containers
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-11573
>                 URL: https://issues.apache.org/jira/browse/YARN-11573
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Minor
>              Labels: pull-request-available
>
> Applications could be stuck when the container allocation logic does not 
> consider more nodes, but only nodes that are having reserved containers.
> This behavior can even block new AMs to be allocated on nodes so they don't 
> reach the running status.
> A jira that mentions the same thing is YARN-9598:
> {quote}Nodes which have been reserved should be skipped when iterating 
> candidates in RegularContainerAllocator#allocate, otherwise scheduler may 
> generate allocation or reservation proposal on these node which will always 
> be rejected in FiCaScheduler#commonCheckContainerAllocation.
> {quote}
> Since this jira implements 2 other points, I decided to create this one and 
> implement the 3rd point separately.
> h2. Notes:
> 1. FiCaSchedulerApp#commonCheckContainerAllocation will log this:
> {code:java}
> Trying to allocate from reserved container in async scheduling mode
> {code}
> in case RegularContainerAllocator creates a reservation proposal for nodes 
> having reserved container.
> 2. A better way is to prevent generating an AM container (or even normal 
> container) allocation proposal on a node if it already has a reservation on 
> it and we still have more nodes to check in the preferred node set. 
> Completely disabling task containers from being allocated to worker nodes 
> could limit the downscaling ability that we have currently.
> h2. 3. CALL HIERARCHY
> 1. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#nodeUpdate
> 2. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateContainersToNode(org.apache.hadoop.yarn.api.records.NodeId,
>  boolean)
> 3. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.CandidateNodeSet<org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode>,
>  boolean)
> 3.1. This is the place where it is decided whether to call 
> allocateContainerOnSingleNode or allocateContainersOnMultiNodes
> 4. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateContainersOnMultiNodes
> 5. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateOrReserveNewContainers
> 6. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue#assignContainers
> 7. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractParentQueue#assignContainersToChildQueues
> 8. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractLeafQueue#assignContainers
> 9. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp#assignContainers
> 10. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainers
> 11. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#allocate
> 12. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#tryAllocateOnNode
> 13. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainersOnNode
> 14. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignNodeLocalContainers
> 15. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainer
> Logs these lines as an example:
> {code:java}
> 2023-08-23 17:44:08,129 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator:
>  assignContainers: node=<host> application=application_1692304118418_3151 
> priority=0 pendingAsk=<per-allocation-resource=<memory:5632, 
> vCores:1>,repeat=1> type=OFF_SWITCH
> {code}
> h2. 4. DETAILS OF RegularContainerAllocator#allocate
> [Method 
> definition|https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java#L826-L896]
> 4.1. Defining ordered list of nodes to allocate containers on: 
> [LINK|https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java#L851-L852]
> {code:java}
>     Iterator<FiCaSchedulerNode> iter = schedulingPS.getPreferredNodeIterator(
>         candidates);
> {code}
> 4.2. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.AppPlacementAllocator#getPreferredNodeIterator
> 4.3. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.MultiNodeSortingManager#getMultiNodeSortIterator
>  
> ([LINK|https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/MultiNodeSortingManager.java#L114-L180])
> In this method, the MultiNodeLookupPolicy is resolved 
> [here|https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/MultiNodeSortingManager.java#L142-L143]
> 4.4. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.MultiNodeSorter#getMultiNodeLookupPolicy
> 4.5. This is where the MultiNodeLookupPolicy implementation of 
> getPreferredNodeIterator is invoked
> h2. 5. GOING UP THE CALL HIERARCHY UNTIL 
> CapacityScheduler#allocateOrReserveNewContainers
> 1. CSAssigment is created 
> [here|https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L1797-L1801]
>  in method: CapacityScheduler#allocateOrReserveNewContainers
> 2. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#submitResourceCommitRequest
> 3. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#tryCommit
> 4. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp#accept
> 5. 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp#commonCheckContainerAllocation
> --> This returns false and logs this line:
> {code:java}
> 2023-08-23 17:44:08,130 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
>  Trying to allocate from reserved container in async scheduling mode
> {code}
> h2. PROPOSED FIX
> In method: RegularContainerAllocator#allocate
> There's a loop that iterates over candidate nodes: 
> [https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java#L853-L895]
> We need to skip the nodes that are having a reservation, example code:
> {code:java}
> if (reservedContainer == null) {
> 840           // Do not schedule if there are any reservations to fulfill on 
> the node
> 841           if (node.getReservedContainer() != null) {
> 842             LOG.debug("Skipping scheduling on node {} since it has 
> already been"
> 843                     + " reserved by {}", node.getNodeID(),
> 844                 node.getReservedContainer().getContainerId());
> 845             
> ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
> 846                 activitiesManager, node, application, schedulerKey,
> 847                 ActivityDiagnosticConstant.NODE_HAS_BEEN_RESERVED);
> 848             continue;
> 849           }
> {code}
> NOTE: This code block is copied from [^YARN-9598.001.patch#file-5]
> h2. More notes for the implementation
> 1. This new behavior need to be hidden behind a feature flag (CS config).
> In my understanding, the [^YARN-9598.001.patch#file-5] skips all the nodes 
> with reservations, regardless of the container's type whether it's an AM 
> container or a task container.
> 2. Only skip the actual node with existing reservation if there are more 
> nodes to process with the iterator.
> 3. Add testcase to cover this scenario
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to