Rohit Agarwal created YARN-3415:
-----------------------------------

             Summary: Non-AM containers can be counted towards amResourceUsage 
of a fairscheduler queue
                 Key: YARN-3415
                 URL: https://issues.apache.org/jira/browse/YARN-3415
             Project: Hadoop YARN
          Issue Type: Bug
          Components: fairscheduler
    Affects Versions: 2.6.0
            Reporter: Rohit Agarwal


We encountered this problem while running a spark cluster. The amResourceUsage 
for a queue became artificially high and then the cluster got deadlocked 
because the maxAMShare constrain kicked in and no new AM got admitted to the 
cluster.

I have described the problem in detail here: 
https://github.com/apache/spark/pull/5233#issuecomment-87160289

In summary - the condition for adding the container's memory towards 
amResourceUsage is fragile. It depends on the number of live containers 
belonging to the app. We saw that the spark AM went down without explicitly 
releasing its requested containers and then one of those containers memory was 
counted towards amResource.

cc - [~sandyr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to