[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029396#comment-17029396
 ] 

Mikayla Konst commented on YARN-9011:
-------------------------------------

We experienced this exact same race condition recently (resource manager 
sending SHUTDOWN signal to node manager because it received a heartbeat from 
the node manager *after* the HostDetails reference was updated, but *before* 
the node was transitioned to state DECOMMISSIONING).

I think this patch is a huge improvement over the previous behavior, but I 
think there is still a narrow race that can happen when refresh nodes is called 
multiple times in a row in quick succession with the same set of nodes in the 
exclude file:
 # lazy-loaded HostDetails reference is updated
 # nodes are added to gracefullyDecommissionableNodes set
 # current HostDetails reference is updated
 # event to update node status to DECOMMISSIONING is added to asynchronous 
event handler's event queue, but hasn't been processed yet
 # refresh nodes is called a second time
 # lazy-loaded HostDetails reference is updated
 # gracefullyDecommissionableNodes set is cleared
 # node manager heartbeats to resource manager. It is not in state 
DECOMMISSIONING and not in the gracefullyDecommissionableNodes set, but is an 
excluded node in the HostDetails, so it is sent a SHUTDOWN signal
 # node is added to gracefullyDecommissionableNodes set
 # event handler transitions node to state DECOMMISSIONING at some point

This would be fixed if you used an AtomicReference for your set of 
"gracefullyDecommissionableNodes" and swapped out the reference, similar to how 
you handled the HostDetails.

Alternatively, instead of using an asynchronous event handler to update the 
state of the nodes to DECOMMISSIONING, you could update the state 
synchronously. You could grab a lock, then update HostDetails and synchronously 
update the states of the nodes being gracefully decommissioned, then release 
the lock. When the resource tracker service receives a heartbeat and needs to 
check if a node should be shutdown (if it is excluded and in state 
decommissioning), it would grab the lock right before doing the check. Having 
the resource tracker service wait on a lock doesn't sound great, but it would 
likely be on the order of milliseconds, and only when refresh nodes is called.

> Race condition during decommissioning
> -------------------------------------
>
>                 Key: YARN-9011
>                 URL: https://issues.apache.org/jira/browse/YARN-9011
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.1.1
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>             Fix For: 3.3.0, 3.2.2, 3.1.4
>
>         Attachments: YARN-9011-001.patch, YARN-9011-002.patch, 
> YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, 
> YARN-9011-006.patch, YARN-9011-007.patch, YARN-9011-008.patch, 
> YARN-9011-009.patch, YARN-9011-branch-3.1.001.patch, 
> YARN-9011-branch-3.2.001.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: DECOMMISSIONING  node-6.hostname.com:8041 is ready to be 
> decommissioned
> {noformat}
> Note the two different messages from the RM ("Disallowed NodeManager" vs 
> "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an 
> inconsistent state of nodes while they're being updated:
> {noformat}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader 
> include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219}
>  exclude:{node-6.hostname.com}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully 
> decommission node node-6.hostname.com:8041 with state RUNNING
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: 
> node-6.hostname.com
> 2018-06-18 21:00:17,576 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node 
> node-6.hostname.com:8041 in DECOMMISSIONING.
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn     
> IP=172.26.22.115        OPERATION=refreshNodes  TARGET=AdminService     
> RESULT=SUCCESS
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve 
> original total capability: <memory:8192, vCores:8>
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING
> {noformat}
> When the decommissioning succeeds, there is no output logged from 
> {{ResourceTrackerService}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to