Jason Lowe commented on YARN-2604:

bq. Actually, I wonder if we should add a config to specify either (a) a 
particular number of NMs after which this behavior kicks in or (b) a 
minimum/floor value for the configurable maximum

For the restart case this sounds a lot like YARN-2001 where we needed some kind 
of threshold to control when the RM started making scheduling decisions based 
on AM requests.  This is a similar situation -- we don't want to make 
scheduling decisions too early before we have a good idea about the cluster.  
That uses a config, specific to work-preserving restart, to wait for 10s by 
default before acting on requests.  We could do something similar here, either 
using the value directly or adding a "safe mode" config for the RM (and maybe 
tying one value to the other config by default since they are similar concepts).

> Scheduler should consider max-allocation-* in conjunction with the largest 
> node
> -------------------------------------------------------------------------------
>                 Key: YARN-2604
>                 URL: https://issues.apache.org/jira/browse/YARN-2604
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: scheduler
>    Affects Versions: 2.5.1
>            Reporter: Karthik Kambatla
>            Assignee: Robert Kanter
>         Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch
> If the scheduler max-allocation-* values are larger than the resources 
> available on the largest node in the cluster, an application requesting 
> resources between the two values will be accepted by the scheduler but the 
> requests will never be satisfied. The app essentially hangs forever. 

This message was sent by Atlassian JIRA

Reply via email to