[
https://issues.apache.org/jira/browse/YARN-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901075#comment-14901075
]
Advertising
Wangda Tan commented on YARN-4189:
----------------------------------
[~xinxianyin],
Thanks for looking at the doc, however, I think the approach in the doc
shouldn't decline the utilization:
Assume we limit the maximum waiting time for each container is X sec, and
average container execution time is Y sec. It will be fine If X << Y.
In my mind, X is a value close to node heartbeat interval and Y is from minutes
to hours.
I don't have any data to prove if my thoughts is true, we need to do some
benchmark tests before using it in practice.
> Capacity Scheduler : Improve location preference waiting mechanism
> ------------------------------------------------------------------
>
> Key: YARN-4189
> URL: https://issues.apache.org/jira/browse/YARN-4189
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: capacity scheduler
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Attachments: YARN-4189 design v1.pdf
>
>
> There're some issues with current Capacity Scheduler implementation of delay
> scheduling:
> *1) Waiting time to allocate each container highly depends on cluster
> availability*
> Currently, app can only increase missed-opportunity when a node has available
> resource AND it gets traversed by a scheduler. There’re lots of possibilities
> that an app doesn’t get traversed by a scheduler, for example:
> A cluster has 2 racks (rack1/2), each rack has 40 nodes.
> Node-locality-delay=40. An application prefers rack1.
> Node-heartbeat-interval=1s.
> Assume there are 2 nodes available on rack1, delay to allocate one container
> = 40 sec.
> If there are 20 nodes available on rack1, delay of allocating one container =
> 2 sec.
> *2) It could violate scheduling policies (Fifo/Priority/Fair)*
> Assume a cluster is highly utilized, an app (app1) has higher priority, it
> wants locality. And there’s another app (app2) has lower priority, but it
> doesn’t care about locality. When node heartbeats with available resource,
> app1 decides to wait, so app2 gets the available slot. This should be
> considered as a bug that we need to fix.
> The same problem could happen when we use FIFO/Fair queue policies.
> Another problem similar to this is related to preemption: when preemption
> policy preempts some resources from queue-A for queue-B (queue-A is
> over-satisfied and queue-B is under-satisfied). But queue-B is waiting for
> the node-locality-delay so queue-A will get resources back. In next round,
> preemption policy could preempt this resources again from queue-A.
> This JIRA is target to solve these problems.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)