[ https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014304#comment-14014304 ]
Sandy Ryza commented on YARN-1913: ---------------------------------- Thanks for the updated patch Wei. For queues, maxAMShare should be defined as a fraction of the queue's fair share, not maxShare. The majority of queues are configured with infinite maxResources. We need to be careful with this, as fair shares can change when queues are created dynamically. I think it might make sense to only allow the queue-level maxAMShare on leaf queues for the moment. I can't think of a strong reason somebody would want to set it on a parent queue, and doing this would allow us to avoid the complex logic in MaxRunningAppsEnforcer, and merely enforce the AM max share by checking in AppSchedulable.assignContainer. This is also what the Capacity Scheduler has at the moment. > With Fair Scheduler, cluster can logjam when all resources are consumed by AMs > ------------------------------------------------------------------------------ > > Key: YARN-1913 > URL: https://issues.apache.org/jira/browse/YARN-1913 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler > Affects Versions: 2.3.0 > Reporter: bc Wong > Assignee: Wei Yan > Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, > YARN-1913.patch, YARN-1913.patch > > > It's possible to deadlock a cluster by submitting many applications at once, > and have all cluster resources taken up by AMs. > One solution is for the scheduler to limit resources taken up by AMs, as a > percentage of total cluster resources, via a "maxApplicationMasterShare" > config. -- This message was sent by Atlassian JIRA (v6.2#6252)