[jira] [Created] (YARN-10378) When NM goes down and comes back up, PC allocation tags are not removed for completed containers

Tarun Parimi (Jira) Wed, 29 Jul 2020 23:51:17 -0700

Tarun Parimi created YARN-10378:
-----------------------------------

             Summary: When NM goes down and comes back up, PC allocation tags 
are not removed for completed containers
                 Key: YARN-10378
                 URL: https://issues.apache.org/jira/browse/YARN-10378
             Project: Hadoop YARN
          Issue Type: Bug
          Components: capacity scheduler
    Affects Versions: 3.1.1, 3.2.0
            Reporter: Tarun Parimi
            Assignee: Tarun Parimi



We are using placement constaints anti-affinity in an application along with 
node label. The application requests two containers with anti affinity on the 
node label containing only two nodes.

So two containers will be allocated in the two nodes, one on each node 
satisfying anti-affinity.

When one nodemanager goes down for some time, the node is marked as lost by RM 
and then it will kill all containers in that node.

The AM will now have one pending container request, since the previous 
container got killed.

When the Nodemanager becomes up after some time, the pending container is not 
getting allocated in that node again and the application has to wait forever 
for that container.

If the ResourceManager is restarted, this issue disappears and the container 
gets allocated on the NodeManager which came back up recently.

This seems to be an issue with the allocation tags not removed.

The allocation tag is added for the container 
container_e68_1595886973474_0005_01_000003 .
{code:java}
2020-07-28 17:02:04,091 DEBUG constraint.AllocationTagsManager 
(AllocationTagsManager.java:addContainer(355)) - Added 
container=container_e68_1595886973474_0005_01_000003 with tags=[hbase]\
{code}
However, the allocation tag is not removed when the container 
container_e68_1595886973474_0005_01_000003 is released. There is no equivalent 
DEBUG message seen for removing tags. This means that the tags are not getting 
removed. If the tag is not removed, then scheduler will not allocate in the 
same node resulting in the issue observed.
{code:java}
2020-07-28 17:19:34,353 DEBUG scheduler.AbstractYarnScheduler 
(AbstractYarnScheduler.java:updateCompletedContainers(1038)) - Container 
FINISHED: container_e68_1595886973474_0005_01_000003
2020-07-28 17:19:34,353 INFO  scheduler.AbstractYarnScheduler 
(AbstractYarnScheduler.java:completedContainer(669)) - Container 
container_e68_1595886973474_0005_01_000003 completed with event FINISHED, but 
corresponding RMContainer doesn't exist.
{code}
This seems to be due to changes done in YARN-8511 . Change here was made to 
remove the tags only after NM confirms container is released. However, in our 
scenario this is not happening. So the tag will never get removed until RM 
restart.

Reverting YARN-8511 fixes this particular issue and tags are getting removed. 
But this is not a valid solution since the problem that YARN-8511 solves is 
also valid. We need to find a solution which does not break YARN-8511 and also 
fixes this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10378) When NM goes down and comes back up, PC allocation tags are not removed for completed containers

Reply via email to