[ 
https://issues.apache.org/jira/browse/YARN-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901075#comment-14901075
 ] 

Wangda Tan commented on YARN-4189:
----------------------------------

[~xinxianyin],

Thanks for looking at the doc, however, I think the approach in the doc 
shouldn't decline the utilization:

Assume we limit the maximum waiting time for each container is X sec, and 
average container execution time is Y sec. It will be fine If X << Y.

In my mind, X is a value close to node heartbeat interval and Y is from minutes 
to hours.

I don't have any data to prove if my thoughts is true, we need to do some 
benchmark tests before using it in practice.

> Capacity Scheduler : Improve location preference waiting mechanism
> ------------------------------------------------------------------
>
>                 Key: YARN-4189
>                 URL: https://issues.apache.org/jira/browse/YARN-4189
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-4189 design v1.pdf
>
>
> There're some issues with current Capacity Scheduler implementation of delay 
> scheduling:
> *1) Waiting time to allocate each container highly depends on cluster 
> availability*
> Currently, app can only increase missed-opportunity when a node has available 
> resource AND it gets traversed by a scheduler. There’re lots of possibilities 
> that an app doesn’t get traversed by a scheduler, for example:
> A cluster has 2 racks (rack1/2), each rack has 40 nodes. 
> Node-locality-delay=40. An application prefers rack1. 
> Node-heartbeat-interval=1s.
> Assume there are 2 nodes available on rack1, delay to allocate one container 
> = 40 sec.
> If there are 20 nodes available on rack1, delay of allocating one container = 
> 2 sec.
> *2) It could violate scheduling policies (Fifo/Priority/Fair)*
> Assume a cluster is highly utilized, an app (app1) has higher priority, it 
> wants locality. And there’s another app (app2) has lower priority, but it 
> doesn’t care about locality. When node heartbeats with available resource, 
> app1 decides to wait, so app2 gets the available slot. This should be 
> considered as a bug that we need to fix.
> The same problem could happen when we use FIFO/Fair queue policies.
> Another problem similar to this is related to preemption: when preemption 
> policy preempts some resources from queue-A for queue-B (queue-A is 
> over-satisfied and queue-B is under-satisfied). But queue-B is waiting for 
> the node-locality-delay so queue-A will get resources back. In next round, 
> preemption policy could preempt this resources again from queue-A.
> This JIRA is target to solve these problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to