[
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002587#comment-15002587
]
Jason Lowe commented on YARN-4344:
----------------------------------
Ah yes, the non-work-preserving NM restart case. The code is assuming that an
NM registering without any active apps might be a non-work-preserving NM
reconnecting, so we need to explicitly remove the node and add it back in so
the scheduler will release any containers that were being tracked on that node.
At first I thought YARN-3802 had an inherent race in it where it assumes that
the node event will be processed before the capability is updated. That turns
out to be true for the CapacityScheduler, but I think that's a bug in the
CapacityScheduler. Note that node update path appears to have the same issue
-- RMNodeImpl updates the node's capability _before_ sending the scheduler node
updated event. So how can it work in that case? It works because the
CapacityScheduler for node update isn't looking at what the resource was in the
RMNode passed in the event. Instead it's looking up the scheduler node based
on the RMNodeId and then referencing the total capability tracked there. Seems
to me the bug here is that the scheduler is relying on the RMNode in the event
directly rather than the SchedulerNode to handle the capability calculation.
We probably should have limited a lot of these scheduler events to just having
RMNodeId rather than the full RMNode to avoid the temptation to directly
examine the RMNode when handling the event. As seen here, the RMNode can be
"moving" while the scheduler is trying to examine it.
> NMs reconnecting with changed capabilities can lead to wrong cluster resource
> calculations
> ------------------------------------------------------------------------------------------
>
> Key: YARN-4344
> URL: https://issues.apache.org/jira/browse/YARN-4344
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.7.1, 2.6.2
> Reporter: Varun Vasudev
> Assignee: Varun Vasudev
> Priority: Critical
> Attachments: YARN-4344.001.patch
>
>
> After YARN-3802, if an NM re-connects to the RM with changed capabilities,
> there can arise situations where the overall cluster resource calculation for
> the cluster will be incorrect leading to inconsistencies in scheduling.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)