[
https://issues.apache.org/jira/browse/YARN-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768105#comment-17768105
]
ASF GitHub Bot commented on YARN-11573:
---------------------------------------
hadoop-yetus commented on PR #6098:
URL: https://github.com/apache/hadoop/pull/6098#issuecomment-1731798334
:confetti_ball: **+1 overall**
| Vote | Subsystem | Runtime | Logfile | Comment |
|:----:|----------:|--------:|:--------:|:-------:|
| +0 :ok: | reexec | 0m 50s | | Docker mode activated. |
|||| _ Prechecks _ |
| +1 :green_heart: | dupname | 0m 0s | | No case conflicting files
found. |
| +0 :ok: | codespell | 0m 0s | | codespell was not available. |
| +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available.
|
| +1 :green_heart: | @author | 0m 0s | | The patch does not contain
any @author tags. |
| +1 :green_heart: | test4tests | 0m 0s | | The patch appears to
include 1 new or modified test files. |
|||| _ trunk Compile Tests _ |
| +1 :green_heart: | mvninstall | 48m 38s | | trunk passed |
| +1 :green_heart: | compile | 1m 1s | | trunk passed with JDK
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 |
| +1 :green_heart: | compile | 0m 52s | | trunk passed with JDK
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| +1 :green_heart: | checkstyle | 0m 53s | | trunk passed |
| +1 :green_heart: | mvnsite | 0m 57s | | trunk passed |
| +1 :green_heart: | javadoc | 0m 55s | | trunk passed with JDK
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 |
| +1 :green_heart: | javadoc | 0m 45s | | trunk passed with JDK
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| +1 :green_heart: | spotbugs | 1m 58s | | trunk passed |
| +1 :green_heart: | shadedclient | 39m 29s | | branch has no errors
when building and testing our client artifacts. |
|||| _ Patch Compile Tests _ |
| +1 :green_heart: | mvninstall | 0m 48s | | the patch passed |
| +1 :green_heart: | compile | 0m 54s | | the patch passed with JDK
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 |
| +1 :green_heart: | javac | 0m 54s | | the patch passed |
| +1 :green_heart: | compile | 0m 46s | | the patch passed with JDK
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| +1 :green_heart: | javac | 0m 46s | | the patch passed |
| +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks
issues. |
| +1 :green_heart: | checkstyle | 0m 44s | | the patch passed |
| +1 :green_heart: | mvnsite | 0m 48s | | the patch passed |
| +1 :green_heart: | javadoc | 0m 43s | | the patch passed with JDK
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 |
| +1 :green_heart: | javadoc | 0m 39s | | the patch passed with JDK
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| +1 :green_heart: | spotbugs | 1m 56s | | the patch passed |
| +1 :green_heart: | shadedclient | 39m 27s | | patch has no errors
when building and testing our client artifacts. |
|||| _ Other Tests _ |
| +1 :green_heart: | unit | 103m 2s | |
hadoop-yarn-server-resourcemanager in the patch passed. |
| +1 :green_heart: | asflicense | 0m 34s | | The patch does not
generate ASF License warnings. |
| | | 248m 46s | | |
| Subsystem | Report/Notes |
|----------:|:-------------|
| Docker | ClientAPI=1.43 ServerAPI=1.43 base:
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6098/3/artifact/out/Dockerfile
|
| GITHUB PR | https://github.com/apache/hadoop/pull/6098 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
| uname | Linux c6c5ace082c1 4.15.0-212-generic #223-Ubuntu SMP Tue May 23
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / ffc0cc9cb155be99f075f8125376ee475debee7b |
| Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| Multi-JDK versions |
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
/usr/lib/jvm/java-8-openjdk-amd64:Private
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| Test Results |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6098/3/testReport/ |
| Max. process+thread count | 898 (vs. ulimit of 5500) |
| modules | C:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
U:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
|
| Console output |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6098/3/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.
> Add config option to make container allocation prefer nodes without reserved
> containers
> ---------------------------------------------------------------------------------------
>
> Key: YARN-11573
> URL: https://issues.apache.org/jira/browse/YARN-11573
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Reporter: Szilard Nemeth
> Assignee: Szilard Nemeth
> Priority: Minor
> Labels: pull-request-available
>
> Applications could be stuck when the container allocation logic does not
> consider more nodes, but only nodes that are having reserved containers.
> This behavior can even block new AMs to be allocated on nodes so they don't
> reach the running status.
> A jira that mentions the same thing is YARN-9598:
> {quote}Nodes which have been reserved should be skipped when iterating
> candidates in RegularContainerAllocator#allocate, otherwise scheduler may
> generate allocation or reservation proposal on these node which will always
> be rejected in FiCaScheduler#commonCheckContainerAllocation.
> {quote}
> Since this jira implements 2 other points, I decided to create this one and
> implement the 3rd point separately.
> h2. Notes:
> 1. FiCaSchedulerApp#commonCheckContainerAllocation will log this:
> {code:java}
> Trying to allocate from reserved container in async scheduling mode
> {code}
> in case RegularContainerAllocator creates a reservation proposal for nodes
> having reserved container.
> 2. A better way is to prevent generating an AM container (or even normal
> container) allocation proposal on a node if it already has a reservation on
> it and we still have more nodes to check in the preferred node set.
> Completely disabling task containers from being allocated to worker nodes
> could limit the downscaling ability that we have currently.
> h2. 3. CALL HIERARCHY
> 1.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#nodeUpdate
> 2.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateContainersToNode(org.apache.hadoop.yarn.api.records.NodeId,
> boolean)
> 3.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.CandidateNodeSet<org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode>,
> boolean)
> 3.1. This is the place where it is decided whether to call
> allocateContainerOnSingleNode or allocateContainersOnMultiNodes
> 4.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateContainersOnMultiNodes
> 5.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateOrReserveNewContainers
> 6.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue#assignContainers
> 7.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractParentQueue#assignContainersToChildQueues
> 8.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractLeafQueue#assignContainers
> 9.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp#assignContainers
> 10.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainers
> 11.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#allocate
> 12.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#tryAllocateOnNode
> 13.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainersOnNode
> 14.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignNodeLocalContainers
> 15.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainer
> Logs these lines as an example:
> {code:java}
> 2023-08-23 17:44:08,129 DEBUG
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator:
> assignContainers: node=<host> application=application_1692304118418_3151
> priority=0 pendingAsk=<per-allocation-resource=<memory:5632,
> vCores:1>,repeat=1> type=OFF_SWITCH
> {code}
> h2. 4. DETAILS OF RegularContainerAllocator#allocate
> [Method
> definition|https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java#L826-L896]
> 4.1. Defining ordered list of nodes to allocate containers on:
> [LINK|https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java#L851-L852]
> {code:java}
> Iterator<FiCaSchedulerNode> iter = schedulingPS.getPreferredNodeIterator(
> candidates);
> {code}
> 4.2.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.AppPlacementAllocator#getPreferredNodeIterator
> 4.3.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.MultiNodeSortingManager#getMultiNodeSortIterator
>
> ([LINK|https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/MultiNodeSortingManager.java#L114-L180])
> In this method, the MultiNodeLookupPolicy is resolved
> [here|https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/MultiNodeSortingManager.java#L142-L143]
> 4.4.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.MultiNodeSorter#getMultiNodeLookupPolicy
> 4.5. This is where the MultiNodeLookupPolicy implementation of
> getPreferredNodeIterator is invoked
> h2. 5. GOING UP THE CALL HIERARCHY UNTIL
> CapacityScheduler#allocateOrReserveNewContainers
> 1. CSAssigment is created
> [here|https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L1797-L1801]
> in method: CapacityScheduler#allocateOrReserveNewContainers
> 2.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#submitResourceCommitRequest
> 3.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#tryCommit
> 4.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp#accept
> 5.
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp#commonCheckContainerAllocation
> --> This returns false and logs this line:
> {code:java}
> 2023-08-23 17:44:08,130 DEBUG
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
> Trying to allocate from reserved container in async scheduling mode
> {code}
> h2. PROPOSED FIX
> In method: RegularContainerAllocator#allocate
> There's a loop that iterates over candidate nodes:
> [https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java#L853-L895]
> We need to skip the nodes that are having a reservation, example code:
> {code:java}
> if (reservedContainer == null) {
> 840 // Do not schedule if there are any reservations to fulfill on
> the node
> 841 if (node.getReservedContainer() != null) {
> 842 LOG.debug("Skipping scheduling on node {} since it has
> already been"
> 843 + " reserved by {}", node.getNodeID(),
> 844 node.getReservedContainer().getContainerId());
> 845
> ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
> 846 activitiesManager, node, application, schedulerKey,
> 847 ActivityDiagnosticConstant.NODE_HAS_BEEN_RESERVED);
> 848 continue;
> 849 }
> {code}
> NOTE: This code block is copied from [^YARN-9598.001.patch#file-5]
> h2. More notes for the implementation
> 1. This new behavior need to be hidden behind a feature flag (CS config).
> In my understanding, the [^YARN-9598.001.patch#file-5] skips all the nodes
> with reservations, regardless of the container's type whether it's an AM
> container or a task container.
> 2. Only skip the actual node with existing reservation if there are more
> nodes to process with the iterator.
> 3. Add testcase to cover this scenario
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]