Karthik Kambatla commented on YARN-3633:

Thanks for reporting this, Rohit. 

Extrapolating the example, consider the applications' containers also need 3 
GB. With the proposed change, the AMs would come up but will not be able to run 
any containers. Note that this is only an issue on a cluster where a single AM 
fills up the am-share; is this likely to happen on larger clusters and 
production deployments?

The scheduler could realize there aren't enough resources to run applications 
from multiple queues and run them in some order, but that would violate the 
fairness policies. 

> With Fair Scheduler, cluster can logjam when there are too many queues
> ----------------------------------------------------------------------
>                 Key: YARN-3633
>                 URL: https://issues.apache.org/jira/browse/YARN-3633
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.6.0
>            Reporter: Rohit Agarwal
>            Assignee: Rohit Agarwal
>            Priority: Critical
> It's possible to logjam a cluster by submitting many applications at once in 
> different queues.
> For example, let's say there is a cluster with 20GB of total memory. Let's 
> say 4 users submit applications at the same time. The fair share of each 
> queue is 5GB. Let's say that maxAMShare is 0.5. So, each queue has at most 
> 2.5GB memory for AMs. If all the users requested AMs of size 3GB - the 
> cluster logjams. Nothing gets scheduled even when 20GB of resources are 
> available.

This message was sent by Atlassian JIRA

Reply via email to