[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034376#comment-15034376
 ] 

Wangda Tan commented on YARN-4390:
----------------------------------

Thanks for sharing your thoughts, [~curino]!

I agree with most of what you said: fixing large imbalance is more important 
than doing micro corrections. It will be enough when a cluster is large and 
resource requests are almost homogeneous, current PCPP can handle such cases 
quite well.

But in other cases, for example:
# Resource requests from different queues are very heterogeneous, some requests 
need 1G mem only, and some requests need 32G.
# Hard locality is required (for example SLIDER-82).
Existing PCPP cannot work well. I have seen many excessive preemption happens 
from a customer's cluster with several hundreds of nodes and requests are 
heterogeneous.

So I'm proposing an approach in YARN-4108 which combines the two: Large 
imbalance will be calculated by preemption monitor and micro corrections will 
be handled by scheduler's allocation logic. I've uploaded doc / POC patch to 
YARN-4108, please kindly review.

> Consider container request size during CS preemption
> ----------------------------------------------------
>
>                 Key: YARN-4390
>                 URL: https://issues.apache.org/jira/browse/YARN-4390
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 3.0.0, 2.8.0, 2.7.3
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to