[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209986#comment-14209986 ]
Jason Lowe commented on YARN-2604: ---------------------------------- bq. Actually, I wonder if we should add a config to specify either (a) a particular number of NMs after which this behavior kicks in or (b) a minimum/floor value for the configurable maximum For the restart case this sounds a lot like YARN-2001 where we needed some kind of threshold to control when the RM started making scheduling decisions based on AM requests. This is a similar situation -- we don't want to make scheduling decisions too early before we have a good idea about the cluster. That uses a config, specific to work-preserving restart, to wait for 10s by default before acting on requests. We could do something similar here, either using the value directly or adding a "safe mode" config for the RM (and maybe tying one value to the other config by default since they are similar concepts). > Scheduler should consider max-allocation-* in conjunction with the largest > node > ------------------------------------------------------------------------------- > > Key: YARN-2604 > URL: https://issues.apache.org/jira/browse/YARN-2604 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler > Affects Versions: 2.5.1 > Reporter: Karthik Kambatla > Assignee: Robert Kanter > Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch > > > If the scheduler max-allocation-* values are larger than the resources > available on the largest node in the cluster, an application requesting > resources between the two values will be accepted by the scheduler but the > requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)