Hadoop QA commented on YARN-1857:

{color:red}-1 overall{color}.  Here are the results of testing the latest 
  against trunk revision 9196db9.

    {color:green}+1 @author{color}.  The patch does not contain any @author 

    {color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in 


    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5311//console

This message is automatically generated.

> CapacityScheduler headroom doesn't account for other AM's running
> -----------------------------------------------------------------
>                 Key: YARN-1857
>                 URL: https://issues.apache.org/jira/browse/YARN-1857
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacityscheduler
>    Affects Versions: 2.3.0
>            Reporter: Thomas Graves
>            Assignee: Chen He
>            Priority: Critical
>         Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
> YARN-1857.patch, YARN-1857.patch, YARN-1857.patch
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.

This message was sent by Atlassian JIRA

Reply via email to