[ 
https://issues.apache.org/jira/browse/YARN-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729749#comment-14729749
 ] 

Wangda Tan commented on YARN-4059:
----------------------------------

[~jlowe], thanks for your comments:
bq. Wondering if we should scale down the delay for allocation based on how 
full the cluster appears to be
I think this cannot handle the case if an app wants only a small proportion of 
cluster. High cluster utilization doesn't always mean the asked proportion is 
also highly utilized.

Do you think if following is a acceptable plan to you?

Instead of resetting missed-opportunity (or missed-time) every time we get new 
container with expect, we will deduct a time from total missed-time. An example 
is:
Assume we set node-local-delay to 5 sec.
And AM waits 20 sec to get a node-local container, we will set missed-time to 
20 - 5 = 15 sec. Before the missed-time downgrade to less than 5 sec, app 
accepts what RM allocated instead of being picky. This approach considers 
accumulated waiting time of a given app (maybe given priority as well). If an 
app already waits for a long time, it can get containers allocated quickly if 
any resources becomes available.

> Preemption should delay assignments back to the preempted queue
> ---------------------------------------------------------------
>
>                 Key: YARN-4059
>                 URL: https://issues.apache.org/jira/browse/YARN-4059
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Chang Li
>            Assignee: Chang Li
>         Attachments: YARN-4059.2.patch, YARN-4059.3.patch, YARN-4059.patch
>
>
> When preempting containers from a queue it can take a while for the other 
> queues to fully consume the resources that were freed up, due to delays 
> waiting for better locality, etc. Those delays can cause the resources to be 
> assigned back to the preempted queue, and then the preemption cycle continues.
> We should consider adding a delay, either based on node heartbeat counts or 
> time, to avoid granting containers to a queue that was recently preempted. 
> The delay should be sufficient to cover the cycles of the preemption monitor, 
> so we won't try to assign containers in-between preemption events for a queue.
> Worst-case scenario for assigning freed resources to other queues is when all 
> the other queues want no locality. No locality means only one container is 
> assigned per heartbeat, so we need to wait for the entire cluster 
> heartbeating in times the number of containers that could run on a single 
> node.
> So the "penalty time" for a queue should be the max of either the preemption 
> monitor cycle time or the amount of time it takes to allocate the cluster 
> with one container per heartbeat. Guessing this will be somewhere around 2 
> minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to