Rohit Agarwal created YARN-3415:

             Summary: Non-AM containers can be counted towards amResourceUsage 
of a fairscheduler queue
                 Key: YARN-3415
             Project: Hadoop YARN
          Issue Type: Bug
          Components: fairscheduler
    Affects Versions: 2.6.0
            Reporter: Rohit Agarwal

We encountered this problem while running a spark cluster. The amResourceUsage 
for a queue became artificially high and then the cluster got deadlocked 
because the maxAMShare constrain kicked in and no new AM got admitted to the 

I have described the problem in detail here:

In summary - the condition for adding the container's memory towards 
amResourceUsage is fragile. It depends on the number of live containers 
belonging to the app. We saw that the spark AM went down without explicitly 
releasing its requested containers and then one of those containers memory was 
counted towards amResource.

cc - [~sandyr]

This message was sent by Atlassian JIRA

Reply via email to