Sandy Ryza commented on YARN-1913:

Thanks for the updated patch Wei.  For queues, maxAMShare should be defined as 
a fraction of the queue's fair share, not maxShare.  The majority of queues are 
configured with infinite maxResources.  We need to be careful with this, as 
fair shares can change when queues are created dynamically.

I think it might make sense to only allow the queue-level maxAMShare on leaf 
queues for the moment.  I can't think of a strong reason somebody would want to 
set it on a parent queue, and doing this would allow us to avoid the complex 
logic in MaxRunningAppsEnforcer, and merely enforce the AM max share by 
checking in AppSchedulable.assignContainer.  This is also what the Capacity 
Scheduler has at the moment.

> With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
> ------------------------------------------------------------------------------
>                 Key: YARN-1913
>                 URL: https://issues.apache.org/jira/browse/YARN-1913
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 2.3.0
>            Reporter: bc Wong
>            Assignee: Wei Yan
>         Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
> YARN-1913.patch, YARN-1913.patch
> It's possible to deadlock a cluster by submitting many applications at once, 
> and have all cluster resources taken up by AMs.
> One solution is for the scheduler to limit resources taken up by AMs, as a 
> percentage of total cluster resources, via a "maxApplicationMasterShare" 
> config.

This message was sent by Atlassian JIRA

Reply via email to