[ 
https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002587#comment-15002587
 ] 

Jason Lowe commented on YARN-4344:
----------------------------------

Ah yes, the non-work-preserving NM restart case.  The code is assuming that an 
NM registering without any active apps might be a non-work-preserving NM 
reconnecting, so we need to explicitly remove the node and add it back in so 
the scheduler will release any containers that were being tracked on that node.

At first I thought YARN-3802 had an inherent race in it where it assumes that 
the node event will be processed before the capability is updated.  That turns 
out to be true for the CapacityScheduler, but I think that's a bug in the 
CapacityScheduler.  Note that node update path appears to have the same issue 
-- RMNodeImpl updates the node's capability _before_ sending the scheduler node 
updated event.  So how can it work in that case?  It works because the 
CapacityScheduler for node update isn't looking at what the resource was in the 
RMNode passed in the event.  Instead it's looking up the scheduler node based 
on the RMNodeId and then referencing the total capability tracked there.  Seems 
to me the bug here is that the scheduler is relying on the RMNode in the event 
directly rather than the SchedulerNode to handle the capability calculation.  
We probably should have limited a lot of these scheduler events to just having 
RMNodeId rather than the full RMNode to avoid the temptation to directly 
examine the RMNode when handling the event.  As seen here, the RMNode can be 
"moving" while the scheduler is trying to examine it.

> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-4344
>                 URL: https://issues.apache.org/jira/browse/YARN-4344
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.1, 2.6.2
>            Reporter: Varun Vasudev
>            Assignee: Varun Vasudev
>            Priority: Critical
>         Attachments: YARN-4344.001.patch
>
>
> After YARN-3802, if an NM re-connects to the RM with changed capabilities, 
> there can arise situations where the overall cluster resource calculation for 
> the cluster will be incorrect leading to inconsistencies in scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to