Wangda Tan commented on YARN-4059:

Thanks [~lichangleo].

I agree with [~jlowe], maybe we need to modify scheduler/preemption-policy 
together to make this can be better handled. I posted few points to YARN-4108.

For the locality wait issue, I think it's more caused by how we calculate 
missed-opportunity instead of preemption policy. Currently, missed-opportunity 
gets updated only when an application accessed by scheduler. If a cluster is 
highly utilized, such as 99% of nodes are occupied, missed-opportunity 
increasing can be very very slow. IIRC, [~jlowe] mentioned this in other JIRAs. 
Maybe we need to change heartbeat-based counting to time-based counting. 
[~jlowe], do you think if it acceptable to you if adding an option to CS choose 
to use heartbeat-based counting or time-based counting?

> Preemption should delay assignments back to the preempted queue
> ---------------------------------------------------------------
>                 Key: YARN-4059
>                 URL: https://issues.apache.org/jira/browse/YARN-4059
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Chang Li
>            Assignee: Chang Li
>         Attachments: YARN-4059.2.patch, YARN-4059.3.patch, YARN-4059.patch
> When preempting containers from a queue it can take a while for the other 
> queues to fully consume the resources that were freed up, due to delays 
> waiting for better locality, etc. Those delays can cause the resources to be 
> assigned back to the preempted queue, and then the preemption cycle continues.
> We should consider adding a delay, either based on node heartbeat counts or 
> time, to avoid granting containers to a queue that was recently preempted. 
> The delay should be sufficient to cover the cycles of the preemption monitor, 
> so we won't try to assign containers in-between preemption events for a queue.
> Worst-case scenario for assigning freed resources to other queues is when all 
> the other queues want no locality. No locality means only one container is 
> assigned per heartbeat, so we need to wait for the entire cluster 
> heartbeating in times the number of containers that could run on a single 
> node.
> So the "penalty time" for a queue should be the max of either the preemption 
> monitor cycle time or the amount of time it takes to allocate the cluster 
> with one container per heartbeat. Guessing this will be somewhere around 2 
> minutes.

This message was sent by Atlassian JIRA

Reply via email to