[
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862471#comment-15862471
]
ASF GitHub Bot commented on YARN-6163:
--------------------------------------
Github user kambatla commented on a diff in the pull request:
https://github.com/apache/hadoop/pull/192#discussion_r100673210
--- Diff:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
---
@@ -1106,6 +1111,97 @@ boolean isStarvedForFairShare() {
return !Resources.isNone(fairshareStarvation);
}
+ /**
+ * Helper method for {@link #getStarvedResourceRequests()}:
+ * Given a map of visited {@link ResourceRequest}s, it checks if
+ * {@link ResourceRequest} 'rr' has already been visited. The map is
updated
+ * to reflect visiting 'rr'.
+ */
+ private static boolean checkAndMarkRRVisited(
+ Map<Priority, List<Resource>> visitedRRs, ResourceRequest rr) {
+ Priority priority = rr.getPriority();
+ Resource capability = rr.getCapability();
+ if (visitedRRs.containsKey(priority)) {
+ List<Resource> rrList = visitedRRs.get(priority);
+ if (rrList.contains(capability)) {
--- End diff --
Yeah, looks like there is indeed a bug here.
Consider an app asks for one container each on two nodes on the same rack:
- If this code encounters either of these node-local requests, it ignores
the other node and the rack requests. Ignoring the other node-local request is
undesired.
- if this code encounters the rack-local request, it ignores the node-local
requests. This is desired.
Maybe, on encountering a node-local request, we should mark the rack and
ANY as "visited". What do we do when we encounter rack or ANY first? Let me
think more about this.
> FS Preemption is a trickle for severely starved applications
> ------------------------------------------------------------
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: fairscheduler
> Affects Versions: 2.9.0
> Reporter: Karthik Kambatla
> Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch
>
>
> With current logic, only one RR is considered per each instance of marking an
> application starved. This marking happens only on the update call that runs
> every 500ms. Due to this, an application that is severely starved takes
> forever to reach fairshare based on preemptions.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]