[jira] [Updated] (YARN-10378) When NM goes down and comes back up, PC allocation tags are not removed for completed containers

Tarun Parimi (Jira) Wed, 29 Jul 2020 23:53:24 -0700


     [ 
https://issues.apache.org/jira/browse/YARN-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tarun Parimi updated YARN-10378:
--------------------------------
    Description: 
We are using placement constaints anti-affinity in an application along with 
node label. The application requests two containers with anti affinity on the 
node label containing only two nodes.

So two containers will be allocated in the two nodes, one on each node 
satisfying anti-affinity.

When one nodemanager goes down for some time, the node is marked as lost by RM 
and then it will kill all containers in that node.

The AM will now have one pending container request, since the previous 
container got killed.

When the Nodemanager becomes up after some time, the pending container is not 
getting allocated in that node again and the application has to wait forever 
for that container.

If the ResourceManager is restarted, this issue disappears and the container 
gets allocated on the NodeManager which came back up recently.

This seems to be an issue with the allocation tags not removed.

The allocation tag is added for the container 
container_e68_1595886973474_0005_01_000003 .
{code:java}
2020-07-28 17:02:04,091 DEBUG constraint.AllocationTagsManager 
(AllocationTagsManager.java:addContainer(355)) - Added 
container=container_e68_1595886973474_0005_01_000003 with tags=[hbase]\
{code}
However, the allocation tag is not removed when the container 
container_e68_1595886973474_0005_01_000003 is released. There is no equivalent 
DEBUG message seen for removing tags. This means that the tags are not getting 
removed. If the tag is not removed, then scheduler will not allocate in the 
same node due to anti-affinity resulting in the issue observed.
{code:java}
2020-07-28 17:19:34,353 DEBUG scheduler.AbstractYarnScheduler 
(AbstractYarnScheduler.java:updateCompletedContainers(1038)) - Container 
FINISHED: container_e68_1595886973474_0005_01_000003
2020-07-28 17:19:34,353 INFO  scheduler.AbstractYarnScheduler 
(AbstractYarnScheduler.java:completedContainer(669)) - Container 
container_e68_1595886973474_0005_01_000003 completed with event FINISHED, but 
corresponding RMContainer doesn't exist.
{code}
This seems to be due to changes done in YARN-8511 . Change here was made to 
remove the tags only after NM confirms container is released. However, in our 
scenario this is not happening. So the tag will never get removed until RM 
restart.

Reverting YARN-8511 fixes this particular issue and tags are getting removed. 
But this is not a valid solution since the problem that YARN-8511 solves is 
also valid. We need to find a solution which does not break YARN-8511 and also 
fixes this issue.

  was:
We are using placement constaints anti-affinity in an application along with 
node label. The application requests two containers with anti affinity on the 
node label containing only two nodes.

So two containers will be allocated in the two nodes, one on each node 
satisfying anti-affinity.

When one nodemanager goes down for some time, the node is marked as lost by RM 
and then it will kill all containers in that node.

The AM will now have one pending container request, since the previous 
container got killed.

When the Nodemanager becomes up after some time, the pending container is not 
getting allocated in that node again and the application has to wait forever 
for that container.

If the ResourceManager is restarted, this issue disappears and the container 
gets allocated on the NodeManager which came back up recently.

This seems to be an issue with the allocation tags not removed.

The allocation tag is added for the container 
container_e68_1595886973474_0005_01_000003 .
{code:java}
2020-07-28 17:02:04,091 DEBUG constraint.AllocationTagsManager 
(AllocationTagsManager.java:addContainer(355)) - Added 
container=container_e68_1595886973474_0005_01_000003 with tags=[hbase]\
{code}
However, the allocation tag is not removed when the container 
container_e68_1595886973474_0005_01_000003 is released. There is no equivalent 
DEBUG message seen for removing tags. This means that the tags are not getting 
removed. If the tag is not removed, then scheduler will not allocate in the 
same node resulting in the issue observed.
{code:java}
2020-07-28 17:19:34,353 DEBUG scheduler.AbstractYarnScheduler 
(AbstractYarnScheduler.java:updateCompletedContainers(1038)) - Container 
FINISHED: container_e68_1595886973474_0005_01_000003
2020-07-28 17:19:34,353 INFO  scheduler.AbstractYarnScheduler 
(AbstractYarnScheduler.java:completedContainer(669)) - Container 
container_e68_1595886973474_0005_01_000003 completed with event FINISHED, but 
corresponding RMContainer doesn't exist.
{code}
This seems to be due to changes done in YARN-8511 . Change here was made to 
remove the tags only after NM confirms container is released. However, in our 
scenario this is not happening. So the tag will never get removed until RM 
restart.

Reverting YARN-8511 fixes this particular issue and tags are getting removed. 
But this is not a valid solution since the problem that YARN-8511 solves is 
also valid. We need to find a solution which does not break YARN-8511 and also 
fixes this issue.


> When NM goes down and comes back up, PC allocation tags are not removed for 
> completed containers
> ------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10378
>                 URL: https://issues.apache.org/jira/browse/YARN-10378
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 3.2.0, 3.1.1
>            Reporter: Tarun Parimi
>            Assignee: Tarun Parimi
>            Priority: Major
>
> We are using placement constaints anti-affinity in an application along with 
> node label. The application requests two containers with anti affinity on the 
> node label containing only two nodes.
> So two containers will be allocated in the two nodes, one on each node 
> satisfying anti-affinity.
> When one nodemanager goes down for some time, the node is marked as lost by 
> RM and then it will kill all containers in that node.
> The AM will now have one pending container request, since the previous 
> container got killed.
> When the Nodemanager becomes up after some time, the pending container is not 
> getting allocated in that node again and the application has to wait forever 
> for that container.
> If the ResourceManager is restarted, this issue disappears and the container 
> gets allocated on the NodeManager which came back up recently.
> This seems to be an issue with the allocation tags not removed.
> The allocation tag is added for the container 
> container_e68_1595886973474_0005_01_000003 .
> {code:java}
> 2020-07-28 17:02:04,091 DEBUG constraint.AllocationTagsManager 
> (AllocationTagsManager.java:addContainer(355)) - Added 
> container=container_e68_1595886973474_0005_01_000003 with tags=[hbase]\
> {code}
> However, the allocation tag is not removed when the container 
> container_e68_1595886973474_0005_01_000003 is released. There is no 
> equivalent DEBUG message seen for removing tags. This means that the tags are 
> not getting removed. If the tag is not removed, then scheduler will not 
> allocate in the same node due to anti-affinity resulting in the issue 
> observed.
> {code:java}
> 2020-07-28 17:19:34,353 DEBUG scheduler.AbstractYarnScheduler 
> (AbstractYarnScheduler.java:updateCompletedContainers(1038)) - Container 
> FINISHED: container_e68_1595886973474_0005_01_000003
> 2020-07-28 17:19:34,353 INFO  scheduler.AbstractYarnScheduler 
> (AbstractYarnScheduler.java:completedContainer(669)) - Container 
> container_e68_1595886973474_0005_01_000003 completed with event FINISHED, but 
> corresponding RMContainer doesn't exist.
> {code}
> This seems to be due to changes done in YARN-8511 . Change here was made to 
> remove the tags only after NM confirms container is released. However, in our 
> scenario this is not happening. So the tag will never get removed until RM 
> restart.
> Reverting YARN-8511 fixes this particular issue and tags are getting removed. 
> But this is not a valid solution since the problem that YARN-8511 solves is 
> also valid. We need to find a solution which does not break YARN-8511 and 
> also fixes this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10378) When NM goes down and comes back up, PC allocation tags are not removed for completed containers

Reply via email to