[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961935#comment-16961935 ] Zhankun Tang edited comment on YARN-9011 at 10/29/19 12:15 PM: --- [~pbacsko], Thanks for the explanation. After the offline sync up, this "lazyLoaded" seems the good way to go without lock the hostDetails. + 1 from me. Thoughts? [~bibinchundatt]? was (Author: tangzhankun): [~pbacsko], Thanks for the explanation. After the offline sync up, this seems the good way to go without lock the hostDetails. + 1 from me. Thoughts? [~bibinchundatt]? > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9011-001.patch, YARN-9011-002.patch, > YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, > YARN-9011-006.patch, YARN-9011-007.patch > > > During internal testing, we found a nasty race condition which occurs during > decommissioning. > Node manager, incorrect behaviour: > {noformat} > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 > hostname:node-6.hostname.com > {noformat} > Node manager, expected behaviour: > {noformat} > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: DECOMMISSIONING node-6.hostname.com:8041 is ready to be > decommissioned > {noformat} > Note the two different messages from the RM ("Disallowed NodeManager" vs > "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an > inconsistent state of nodes while they're being updated: > {noformat} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader > include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219} > exclude:{node-6.hostname.com} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully > decommission node node-6.hostname.com:8041 with state RUNNING > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: > node-6.hostname.com > 2018-06-18 21:00:17,576 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node > node-6.hostname.com:8041 in DECOMMISSIONING. > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn > IP=172.26.22.115OPERATION=refreshNodes TARGET=AdminService > RESULT=SUCCESS > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve > original total capability: > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING > {noformat} > When the decommissioning succeeds, there is no output logged from > {{ResourceTrackerService}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961935#comment-16961935 ] Zhankun Tang edited comment on YARN-9011 at 10/29/19 12:14 PM: --- [~pbacsko], Thanks for the explanation. After the offline sync up, this seems the good way to go without lock the hostDetails. + 1 from me. Thoughts? [~bibinchundatt]? was (Author: tangzhankun): [~pbacsko], Thanks for the explanation. After the offline sync up, this seems the good lock-free way to go. + 1 from me. > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9011-001.patch, YARN-9011-002.patch, > YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, > YARN-9011-006.patch, YARN-9011-007.patch > > > During internal testing, we found a nasty race condition which occurs during > decommissioning. > Node manager, incorrect behaviour: > {noformat} > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 > hostname:node-6.hostname.com > {noformat} > Node manager, expected behaviour: > {noformat} > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: DECOMMISSIONING node-6.hostname.com:8041 is ready to be > decommissioned > {noformat} > Note the two different messages from the RM ("Disallowed NodeManager" vs > "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an > inconsistent state of nodes while they're being updated: > {noformat} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader > include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219} > exclude:{node-6.hostname.com} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully > decommission node node-6.hostname.com:8041 with state RUNNING > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: > node-6.hostname.com > 2018-06-18 21:00:17,576 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node > node-6.hostname.com:8041 in DECOMMISSIONING. > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn > IP=172.26.22.115OPERATION=refreshNodes TARGET=AdminService > RESULT=SUCCESS > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve > original total capability: > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING > {noformat} > When the decommissioning succeeds, there is no output logged from > {{ResourceTrackerService}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961621#comment-16961621 ] Zhankun Tang edited comment on YARN-9011 at 10/29/19 11:54 AM: --- [~pbacsko], Thanks for the new patch. The idea looks good to me. Several comments: 1. Why do we need a "lazyLoaded"? I don't see "hostDetails" differences between "getLazyLoadedHostDetails" and "getHostDetails". 2. Could we check the "Decommissioning" status before "isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"? Because The "gracefulDecommissionableNodes" will only be cleared after the refresh operation. So it will always be scanned when heartbeat which seems not necessary. was (Author: tangzhankun): [~pbacsko], Thanks for the new patch. The idea looks good to me. Several comments: 1. Why do we need a lazy update? I don't see "hostDetails" differences between "getLazyLoadedHostDetails" and "getHostDetails". 2. Could we check the "Decommissioning" status before "isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"? Because The "gracefulDecommissionableNodes" will only be cleared after the refresh operation. So it will always be scanned when heartbeat which seems not necessary. > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9011-001.patch, YARN-9011-002.patch, > YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, > YARN-9011-006.patch, YARN-9011-007.patch > > > During internal testing, we found a nasty race condition which occurs during > decommissioning. > Node manager, incorrect behaviour: > {noformat} > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 > hostname:node-6.hostname.com > {noformat} > Node manager, expected behaviour: > {noformat} > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: DECOMMISSIONING node-6.hostname.com:8041 is ready to be > decommissioned > {noformat} > Note the two different messages from the RM ("Disallowed NodeManager" vs > "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an > inconsistent state of nodes while they're being updated: > {noformat} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader > include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219} > exclude:{node-6.hostname.com} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully > decommission node node-6.hostname.com:8041 with state RUNNING > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: > node-6.hostname.com > 2018-06-18 21:00:17,576 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node > node-6.hostname.com:8041 in DECOMMISSIONING. > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn > IP=172.26.22.115OPERATION=refreshNodes TARGET=AdminService > RESULT=SUCCESS > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve > original total capability: > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING > {noformat} > When the decommissioning succeeds, there is no output logged from > {{ResourceTrackerService}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands,
[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961879#comment-16961879 ] Peter Bacsko edited comment on YARN-9011 at 10/29/19 11:10 AM: --- _"1. Why do we need a lazy update?"_ Please see details in my comment above that I posted on 25th Sep: https://issues.apache.org/jira/browse/YARN-9011?focusedCommentId=16937696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937696 It is important that when you do a "lazy" refresh, you should not make the new changes visible to {{ResourceTrackerService}}. The problematic part of the code is this: {noformat} // 1. Check if it's a valid (i.e. not excluded) node, if not, see if it is // in decommissioning. if (!this.nodesListManager.isValidNode(nodeId.getHost()) && !isNodeInDecommissioning(nodeId)) { ... {noformat} If you perform a graceful decom, it is important that {{isNodeInDecommissioning()}} return true. However, it takes time for {{RMAppImpl}} to go into {{DECOMMISSIONING}} state, that's why this code is not fully reliable. Therefore, {{isValidNode()}} should only return false when we already constructed a set of nodes that we want to decommission. _2. Could we check the "Decommissioning" status before "isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"?_ -No, we can't (well, we can, but it would be pointless). Decomissioning status only occurs when you refresh (reload) the exclusion/inclusion files. That is, we need to call {{NodesListManager.refreshNodes()}}. And that is the problem - during refresh, excludeable nodes become visible almost immediately, but not the fact that they're decomissionable.- I misunderstood this question. It's doable, see my comment below. _3. So it will always be scanned when heartbeat which seems not necessary._ Scanning is necessary to avoid the race condition, but this isn't really a problem because of three things: 1. It happens only for those nodes which are excluded ({{isValid()}} is false) 2. We lookup inside a ConcurrentHashMap, which should be really fast 3. Once {{RMNode}} reaches {{DECOMMISSIONING}} state (which should happen pretty quickly from {{RUNNING}}), we no longer need the set. *Edit*: even though it's not a huge problem, I agree that it can be enhanced, again, see below. I can imagine a small enhancement here: once the node reached {{DECOMISSIONING}} state, we remove it from the set, making it smaller and smaller. was (Author: pbacsko): _"1. Why do we need a lazy update?"_ Please see details in my comment above that I posted on 25th Sep: https://issues.apache.org/jira/browse/YARN-9011?focusedCommentId=16937696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937696 It is important that when you do a "lazy" refresh, you should not make the new changes visible to {{ResourceTrackerService}}. The problematic part of the code is this: {noformat} // 1. Check if it's a valid (i.e. not excluded) node, if not, see if it is // in decommissioning. if (!this.nodesListManager.isValidNode(nodeId.getHost()) && !isNodeInDecommissioning(nodeId)) { ... {noformat} If you perform a graceful decom, it is important that {{isNodeInDecommissioning()}} return true. However, it takes time for {{RMAppImpl}} to go into {{DECOMMISSIONING}} state, that's why this code is not fully reliable. Therefore, {{isValidNode()}} should only return false when we already constructed a set of nodes that we want to decommission. _2. Could we check the "Decommissioning" status before "isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"?_ No, we can't (well, we can, but it would be pointless). Decomissioning status only occurs when you refresh (reload) the exclusion/inclusion files. That is, we need to call {{NodesListManager.refreshNodes()}}. And that is the problem - during refresh, excludeable nodes become visible almost immediately, but not the fact that they're decomissionable. _3. So it will always be scanned when heartbeat which seems not necessary._ Scanning is necessary to avoid the race condition, but this isn't really a problem because of three things: 1. It happens only for those nodes which are excluded ({{isValid()}} is false) 2. We lookup inside a ConcurrentHashMap, which should be really fast 3. Once {{RMNode}} reaches {{DECOMMISSIONING}} state (which should happen pretty quickly from {{RUNNING}}), we no longer need the set. I can imagine a small enhancement here: once the node reached {{DECOMISSIONING}} state, we remove it from the set, making it smaller and smaller. > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug
[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961897#comment-16961897 ] Peter Bacsko edited comment on YARN-9011 at 10/29/19 11:03 AM: --- Ok, actually we always call the {{isGracefullyDecommissionableNode()}} method inside {{isNodeInDecommissioning()}}. We just have to slightly re-arrange the order of calls like: {noformat} private boolean isNodeInDecommissioning(NodeId nodeId) { RMNode rmNode = this.rmContext.getRMNodes().get(nodeId); // state OK - early return if (rmNode != null && rmNode.getState() == NodeState.DECOMMISSIONING) { return true; } // Graceful decom: wait until node moves out of RUNNING state. if (rmNode != null && this.nodesListManager.isGracefullyDecommissionableNode(rmNode)) { NodeState currentState = rmNode.getState(); if (currentState == NodeState.RUNNING) { return true; } } return false; } {noformat} This avoids the unnecessary invocation of {{nodesListManager.isGracefullyDecommissionableNode()}}. was (Author: pbacsko): Ok, actually we always call the {{isGracefullyDecommissionableNode()}} method inside {{isNodeInDecommissioning()}}. We just have to slightly re-arrange the order of calls like: {noformat} private boolean isNodeInDecommissioning(NodeId nodeId) { RMNode rmNode = this.rmContext.getRMNodes().get(nodeId); // state OK - early return if (rmNode != null && rmNode.getState() == NodeState.DECOMMISSIONING) { return true; } // Graceful decom: wait until node moves out of RUNNING state. if (rmNode != null && this.nodesListManager.isGracefullyDecommissionableNode(rmNode)) { NodeState currentState = rmNode.getState(); if (currentState == NodeState.RUNNING) { return true; } } return false; } {noformat} This avoid the unnecessary invocation of {{nodesListManager.isGracefullyDecommissionableNode()}}. > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9011-001.patch, YARN-9011-002.patch, > YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, > YARN-9011-006.patch, YARN-9011-007.patch > > > During internal testing, we found a nasty race condition which occurs during > decommissioning. > Node manager, incorrect behaviour: > {noformat} > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 > hostname:node-6.hostname.com > {noformat} > Node manager, expected behaviour: > {noformat} > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: DECOMMISSIONING node-6.hostname.com:8041 is ready to be > decommissioned > {noformat} > Note the two different messages from the RM ("Disallowed NodeManager" vs > "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an > inconsistent state of nodes while they're being updated: > {noformat} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader > include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219} > exclude:{node-6.hostname.com} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully > decommission node node-6.hostname.com:8041 with state RUNNING > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: > node-6.hostname.com > 2018-06-18 21:00:17,576 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node > node-6.hostname.com:8041 in DECOMMISSIONING. > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn
[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961879#comment-16961879 ] Peter Bacsko edited comment on YARN-9011 at 10/29/19 10:48 AM: --- _"1. Why do we need a lazy update?"_ Please see details in my comment above that I posted on 25th Sep: https://issues.apache.org/jira/browse/YARN-9011?focusedCommentId=16937696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937696 It is important that when you do a "lazy" refresh, you should not make the new changes visible to {{ResourceTrackerService}}. The problematic part of the code is this: {noformat} // 1. Check if it's a valid (i.e. not excluded) node, if not, see if it is // in decommissioning. if (!this.nodesListManager.isValidNode(nodeId.getHost()) && !isNodeInDecommissioning(nodeId)) { ... {noformat} If you perform a graceful decom, it is important that {{isNodeInDecommissioning()}} return true. However, it takes time for {{RMAppImpl}} to go into {{DECOMMISSIONING}} state, that's why this code is not fully reliable. Therefore, {{isValidNode()}} should only return false when we already constructed a set of nodes that we want to decommission. _2. Could we check the "Decommissioning" status before "isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"?_ No, we can't (well, we can, but it would be pointless). Decomissioning status only occurs when you refresh (reload) the exclusion/inclusion files. That is, we need to call {{NodesListManager.refreshNodes()}}. And that is the problem - during refresh, excludeable nodes become visible almost immediately, but not the fact that they're decomissionable. _3. So it will always be scanned when heartbeat which seems not necessary._ Scanning is necessary to avoid the race condition, but this isn't really a problem because of three things: 1. It happens only for those nodes which are excluded ({{isValid()}} is false) 2. We lookup inside a ConcurrentHashMap, which should be really fast 3. Once {{RMNode}} reaches {{DECOMMISSIONING}} state (which should happen pretty quickly from {{RUNNING}}), we no longer need the set. I can imagine a small enhancement here: once the node reached {{DECOMISSIONING}} state, we remove it from the set, making it smaller and smaller. was (Author: pbacsko): _"1. Why do we need a lazy update?"_ Please see details in my comment above that I posted on 25th Sep: https://issues.apache.org/jira/browse/YARN-9011?focusedCommentId=16937696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937696 It is important that when you do a "lazy" refresh, you should not make the new changes visible to {{ResourceTrackerService}}. The problematic part of the code is this: {noformat} // 1. Check if it's a valid (i.e. not excluded) node, if not, see if it is // in decommissioning. if (!this.nodesListManager.isValidNode(nodeId.getHost()) && !isNodeInDecommissioning(nodeId)) { ... {noformat} If you perform a graceful decom, it is important that {{isNodeInDecommissioning()}} return true. However, it takes time for {{RMAppImpl}} to go into {{DECOMMISSIONING}} state, that's why this code is not fully reliable. Therefore, {{isValidNode()}} should only return false when we already constructed a set of nodes that we want to decommission. _2. Could we check the "Decommissioning" status before "isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"?_ No, we can't (well, we can, but it would be pointless). Decomissioning status only occurs when you refresh (reload) the exclusion/inclusion files. That is, we need to call {{NodesListManager.refreshNodes()}}. And that is the problem - during refresh, excludeable nodes become visible almost immediately, but not the fact that they're decomissionable. _3. So it will always be scanned when heartbeat which seems not necessary._ Scanning is necessary to avoid the race condition, but this isn't really a problem because of two things: 1. It happens only for those nodes which are excluded ({{isValid()}} is false) 2. We lookup inside a ConcurrentHashMap, which should be really fast 3. Once {{RMNode}} reaches {{DECOMMISSIONING}} state (which should happen pretty quickly from {{RUNNING}}), we no longer need the set. I can imagine a small enhancement here: once the node reached {{DECOMISSIONING}} state, we remove it from the set, making it smaller and smaller. > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major >
[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961879#comment-16961879 ] Peter Bacsko edited comment on YARN-9011 at 10/29/19 10:48 AM: --- _"1. Why do we need a lazy update?"_ Please see details in my comment above that I posted on 25th Sep: https://issues.apache.org/jira/browse/YARN-9011?focusedCommentId=16937696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937696 It is important that when you do a "lazy" refresh, you should not make the new changes visible to {{ResourceTrackerService}}. The problematic part of the code is this: {noformat} // 1. Check if it's a valid (i.e. not excluded) node, if not, see if it is // in decommissioning. if (!this.nodesListManager.isValidNode(nodeId.getHost()) && !isNodeInDecommissioning(nodeId)) { ... {noformat} If you perform a graceful decom, it is important that {{isNodeInDecommissioning()}} return true. However, it takes time for {{RMAppImpl}} to go into {{DECOMMISSIONING}} state, that's why this code is not fully reliable. Therefore, {{isValidNode()}} should only return false when we already constructed a set of nodes that we want to decommission. _2. Could we check the "Decommissioning" status before "isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"?_ No, we can't (well, we can, but it would be pointless). Decomissioning status only occurs when you refresh (reload) the exclusion/inclusion files. That is, we need to call {{NodesListManager.refreshNodes()}}. And that is the problem - during refresh, excludeable nodes become visible almost immediately, but not the fact that they're decomissionable. _3. So it will always be scanned when heartbeat which seems not necessary._ Scanning is necessary to avoid the race condition, but this isn't really a problem because of two things: 1. It happens only for those nodes which are excluded ({{isValid()}} is false) 2. We lookup inside a ConcurrentHashMap, which should be really fast 3. Once {{RMNode}} reaches {{DECOMMISSIONING}} state (which should happen pretty quickly from {{RUNNING}}), we no longer need the set. I can imagine a small enhancement here: once the node reached {{DECOMISSIONING}} state, we remove it from the set, making it smaller and smaller. was (Author: pbacsko): _"1. Why do we need a lazy update?"_ Please see details in my comment above that I posted on 25th Sep: https://issues.apache.org/jira/browse/YARN-9011?focusedCommentId=16937696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937696 It is important that when you do a "lazy" refresh, you should not make the new changes visible to {{ResourceTrackerService}}. The problematic part of the code is this: {noformat} // 1. Check if it's a valid (i.e. not excluded) node, if not, see if it is // in decommissioning. if (!this.nodesListManager.isValidNode(nodeId.getHost()) && !isNodeInDecommissioning(nodeId)) { ... {noformat} If you perform a graceful decom, it is important that {{isNodeInDecommissioning()}} return true. However, it takes time for {{RMAppImpl}} to go into {{DECOMMISSIONING}} state, that's why this code is not fully reliable. Therefore, {{isValidNode()}} should only return false when we already constructed a set of nodes that we want to decommission. _2. Could we check the "Decommissioning" status before "isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"?_ No, we can't (well, we can, but it would be pointless). Decomissioning status only occurs when you refresh (reload) the exclusion/inclusion files. That is, we need to call {{NodesListManager.refreshNodes()}}. And that is the problem - during refresh, excludeable nodes become visible almost immediately, but not the fact that they're decomissionable. _3. So it will always be scanned when heartbeat which seems not necessary._ Scanning is necessary to avoid the race condition, but this isn't really a problem because of two things: 1. It happens only for those nodes which are excluded ({{isValid()}} is false) 2. We lookup inside a ConcurrentHashMap, which should be really fast I can imagine an enhancement here: once the node reached {{DECOMISSIONING}} state, we remove it from the set, making it smaller and smaller. > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9011-001.patch, YARN-9011-002.patch, > YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, >
[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961621#comment-16961621 ] Zhankun Tang edited comment on YARN-9011 at 10/29/19 2:49 AM: -- [~pbacsko], Thanks for the new patch. The idea looks good to me. Several comments: 1. Why do we need a lazy update? I don't see "hostDetails" differences between "getLazyLoadedHostDetails" and "getHostDetails". 2. Could we check the "Decommissioning" status before "isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"? Because The "gracefulDecommissionableNodes" will only be cleared after the refresh operation. So it will always be scanned when heartbeat which seems not necessary. was (Author: tangzhankun): [~pbacsko], Thanks for the new patch. The idea looks good to me. Several comments: 1. Why do we need a lazy update? I don't see "hostDetails" differences between "getLazyLoadedHostDetails" and "getHostDetails". 2. Could we check the "Decommissioning" status before "isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"? Because The "gracefulDecommissionableNodes" will only be cleared after the refresh operation. So it will always be scanned which seems not necessary. > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9011-001.patch, YARN-9011-002.patch, > YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, > YARN-9011-006.patch, YARN-9011-007.patch > > > During internal testing, we found a nasty race condition which occurs during > decommissioning. > Node manager, incorrect behaviour: > {noformat} > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 > hostname:node-6.hostname.com > {noformat} > Node manager, expected behaviour: > {noformat} > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: DECOMMISSIONING node-6.hostname.com:8041 is ready to be > decommissioned > {noformat} > Note the two different messages from the RM ("Disallowed NodeManager" vs > "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an > inconsistent state of nodes while they're being updated: > {noformat} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader > include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219} > exclude:{node-6.hostname.com} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully > decommission node node-6.hostname.com:8041 with state RUNNING > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: > node-6.hostname.com > 2018-06-18 21:00:17,576 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node > node-6.hostname.com:8041 in DECOMMISSIONING. > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn > IP=172.26.22.115OPERATION=refreshNodes TARGET=AdminService > RESULT=SUCCESS > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve > original total capability: > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING > {noformat} > When the decommissioning succeeds, there is no output logged from > {{ResourceTrackerService}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961621#comment-16961621 ] Zhankun Tang edited comment on YARN-9011 at 10/29/19 2:49 AM: -- [~pbacsko], Thanks for the new patch. The idea looks good to me. Several comments: 1. Why do we need a lazy update? I don't see "hostDetails" differences between "getLazyLoadedHostDetails" and "getHostDetails". 2. Could we check the "Decommissioning" status before "isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"? Because The "gracefulDecommissionableNodes" will only be cleared after the refresh operation. So it will always be scanned which seems not necessary. was (Author: tangzhankun): [~pbacsko], Thanks for the new patch. The idea looks good to me. Several comments: 1. Why do we need a lazy update? I don't see "hostDetails" differences between "getLazyLoadedHostDetails" and "getHostDetails". 2. Could we check the "Decommissioning" status before "isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"? Because The "gracefulDecommissionableNodes" will only be cleared after the refresh operation. So it will always be executed which seems not necessary. > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9011-001.patch, YARN-9011-002.patch, > YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, > YARN-9011-006.patch, YARN-9011-007.patch > > > During internal testing, we found a nasty race condition which occurs during > decommissioning. > Node manager, incorrect behaviour: > {noformat} > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 > hostname:node-6.hostname.com > {noformat} > Node manager, expected behaviour: > {noformat} > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: DECOMMISSIONING node-6.hostname.com:8041 is ready to be > decommissioned > {noformat} > Note the two different messages from the RM ("Disallowed NodeManager" vs > "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an > inconsistent state of nodes while they're being updated: > {noformat} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader > include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219} > exclude:{node-6.hostname.com} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully > decommission node node-6.hostname.com:8041 with state RUNNING > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: > node-6.hostname.com > 2018-06-18 21:00:17,576 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node > node-6.hostname.com:8041 in DECOMMISSIONING. > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn > IP=172.26.22.115OPERATION=refreshNodes TARGET=AdminService > RESULT=SUCCESS > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve > original total capability: > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING > {noformat} > When the decommissioning succeeds, there is no output logged from > {{ResourceTrackerService}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936842#comment-16936842 ] Bibin A Chundatt edited comment on YARN-9011 at 9/24/19 2:19 PM: - {quote} But even if you have to wait, it's a very small tiny window which is probably just milliseconds {quote} Depends on the time taken to process events. In large clusters we cant expect that to be mills. *Alternate approach* NodelistManager is the source for *GRACEFUL_DECOMMISSION* event based on which state transistion of RMNodeImpl to DECOMMISSIONING happens.I think as per YARN-3212 the state avoids the containers getting killed during the period of DECOMMISIONING. * We could maintain in nodelistmanager the list of to be decommissioned list for which the *GRACEFUL_DECOMMISSION* was fired. * HostsFileReader set the refreshed *HostDetails* only after the event is fired. This way the HostsFileReader and nodeState could be sync. Thoughts?? was (Author: bibinchundatt): {quote} But even if you have to wait, it's a very small tiny window which is probably just milliseconds {quote} Depends on the time taken to process events. In large clusters we cant expect that to be mills. *Alternate approach* NodelistManager is the source for *GRACEFUL_DECOMMISSION* event based on which state transistion of RMNodeImpl to DECOMMISSIONING happens.I think as per YARN-3212 the state avoids the containers getting killed during the period of DECOMMISIONING. * We could maintain in nodelistmanager the list of to be decommissioned list for which the *GRACEFUL_DECOMMISSION* was fired. * HostsFileReader set the refreshed *HostDetails* only after the event is fired. This was the HostsFileReader and nodeState could be sync. Thoughts?? > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9011-001.patch, YARN-9011-002.patch, > YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch > > > During internal testing, we found a nasty race condition which occurs during > decommissioning. > Node manager, incorrect behaviour: > {noformat} > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 > hostname:node-6.hostname.com > {noformat} > Node manager, expected behaviour: > {noformat} > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: DECOMMISSIONING node-6.hostname.com:8041 is ready to be > decommissioned > {noformat} > Note the two different messages from the RM ("Disallowed NodeManager" vs > "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an > inconsistent state of nodes while they're being updated: > {noformat} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader > include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219} > exclude:{node-6.hostname.com} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully > decommission node node-6.hostname.com:8041 with state RUNNING > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: > node-6.hostname.com > 2018-06-18 21:00:17,576 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node > node-6.hostname.com:8041 in DECOMMISSIONING. > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn > IP=172.26.22.115OPERATION=refreshNodes TARGET=AdminService > RESULT=SUCCESS > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve > original total capability: > 2018-06-18 21:00:17,577 INFO >
[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936576#comment-16936576 ] Peter Bacsko edited comment on YARN-9011 at 9/24/19 9:06 AM: - [~tangzhankun] yes, that's correct. The problem is that {{RMNodeImpl}} does not immediately go to {{DECOMMISSIONING}} state, you have to wait for the state machine to complete the transition on a different thread. I was thinking about alternative approaches, for example adding a new field to {{RMNodeImpl}} that stores graceful decommissioning intent and it could be updated synchronously from {{NodesListManager}}, so it's enough to have the {{synchronized}} block. But then you have deal with this extra variable and update it if necessary, so I felt that right now this approach is safer. was (Author: pbacsko): [~tangzhankun] yes, that's correct. The problem is that {{RMNodeImpl}} does not immediately go to {{DECOMMISSIONING}} state, you have to wait for the state machine to complete the transition on a different thread. I was thinking about alternative approaches, for examplee introducing a new field to {{RMNodeImpl}} that stores graceful decommissioning intent and it could be updated synchronously from {{NodesListManager}}, so it's enough to have the {{synchronized}} block. But then you have deal with this extra variable and update it if necessary, so I felt that right now this approach is safer. > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9011-001.patch, YARN-9011-002.patch, > YARN-9011-003.patch, YARN-9011-004.patch > > > During internal testing, we found a nasty race condition which occurs during > decommissioning. > Node manager, incorrect behaviour: > {noformat} > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 > hostname:node-6.hostname.com > {noformat} > Node manager, expected behaviour: > {noformat} > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: DECOMMISSIONING node-6.hostname.com:8041 is ready to be > decommissioned > {noformat} > Note the two different messages from the RM ("Disallowed NodeManager" vs > "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an > inconsistent state of nodes while they're being updated: > {noformat} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader > include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219} > exclude:{node-6.hostname.com} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully > decommission node node-6.hostname.com:8041 with state RUNNING > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: > node-6.hostname.com > 2018-06-18 21:00:17,576 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node > node-6.hostname.com:8041 in DECOMMISSIONING. > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn > IP=172.26.22.115OPERATION=refreshNodes TARGET=AdminService > RESULT=SUCCESS > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve > original total capability: > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING > {noformat} > When the decommissioning succeeds, there is no output logged from > {{ResourceTrackerService}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935731#comment-16935731 ] Peter Bacsko edited comment on YARN-9011 at 9/23/19 10:38 AM: -- [~adam.antal] so the problem is that {{ResourceTrackerService}} uses {{NodesListManager}} to determine what nodes are enabled. But sometimes it sees an inconsistent state: {{NodesListManager}} returns that a certain node is in the excluded list, but it's state is not {{DECOMMISSIONING}}. So we have to wait for this state change. First, adding {{synchronized}} blocks to {{NodesListManager}} is necessary. When you call {{isValidNode()}} you have to wait until the XML (which contains the list of to-be-decommissioned nodes) is completely processed. However, if {{isValid()}} returns false, you don't know if graceful decommissioning is going on. If it is, the state of {{RMNodeImpl}} is {{NodeState.DECOMMISSIONING}}. But the catch is that state transition happens on a separate dispatcher thread so you have to wait for it. Most of the time it's quick enough, but you can miss it. When that happens, RTS simply considers a node to be "disallowed" and orders a shutdown immediately. So what's why I introduced a new class called {{DecommissioningNodesSyncer}}. If a node is selected for graceful decommissioning, it is added to a deque. When {{isValid()}} returns, we check if the node is included in this deque. Then we wait for the state transition with a {{Condition}} object. Signaling comes from {{RMNode}} itself. The change is a bit bigger than it should be because I modified constructors, so to avoid compilation problems, tests also had to be modified. Alternative to this is using a singleton {{DecommissioningNodesSyncer}} but I just don't like it. I prefer dependency injection to singletons. was (Author: pbacsko): [~adam.antal] so the problem is that {{ResourceTrackerService}} uses {{NodesListManager}} to determine what nodes are enabled or not. But sometimes it sees an inconsistent state: {{NodesListManager}} returns that a certain node is in the excluded list, but it's state is not {{DECOMMISSIONING}}. So we have to wait for this state change. First, adding {{synchronized}} blocks to {{NodesListManager}} is necessary. When you call {{isValidNode()}} you have to wait until the XML (which contains the list of to-be-decommissioned nodes) is completely processed. However, if {{isValid()}} returns false, you don't know if graceful decommissioning is going on. If it is, the state of {{RMNodeImpl}} is {{NodeState.DECOMMISSIONING}}. But the catch is that state transition happens on a separate dispatcher thread so you have to wait for it. Most of the time it's quick enough, but you can miss it. When that happens, RTS simply considers a node to be "disallowed" and orders a shutdown immediately. So what's why I introduced a new class called {{DecommissioningNodesSyncer}}. If a node is selected for graceful decommissioning, it is added to a deque. When {{isValid()}} returns, we check if the node is included in this deque. Then we wait for the state transition with a {{Condition}} object. Signaling comes from {{RMNode}} itself. The change is a bit bigger than it should be because I modified constructors, so to avoid compilation problems, tests also had to be modified. Alternative to this is using a singleton {{DecommissioningNodesSyncer}} but I just don't like it. I prefer dependency injection to singletons. > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9011-001.patch, YARN-9011-002.patch, > YARN-9011-003.patch, YARN-9011-004.patch > > > During internal testing, we found a nasty race condition which occurs during > decommissioning. > Node manager, incorrect behaviour: > {noformat} > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 > hostname:node-6.hostname.com > {noformat} > Node manager, expected behaviour: > {noformat} > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:07:37,377 WARN >
[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935731#comment-16935731 ] Peter Bacsko edited comment on YARN-9011 at 9/23/19 10:38 AM: -- [~adam.antal] so the problem is that {{ResourceTrackerService}} uses {{NodesListManager}} to determine what nodes are enabled or not. But sometimes it sees an inconsistent state: {{NodesListManager}} returns that a certain node is in the excluded list, but it's state is not {{DECOMMISSIONING}}. So we have to wait for this state change. First, adding {{synchronized}} blocks to {{NodesListManager}} is necessary. When you call {{isValidNode()}} you have to wait until the XML (which contains the list of to-be-decommissioned nodes) is completely processed. However, if {{isValid()}} returns false, you don't know if graceful decommissioning is going on. If it is, the state of {{RMNodeImpl}} is {{NodeState.DECOMMISSIONING}}. But the catch is that state transition happens on a separate dispatcher thread so you have to wait for it. Most of the time it's quick enough, but you can miss it. When that happens, RTS simply considers a node to be "disallowed" and orders a shutdown immediately. So what's why I introduced a new class called {{DecommissioningNodesSyncer}}. If a node is selected for graceful decommissioning, it is added to a deque. When {{isValid()}} returns, we check if the node is included in this deque. Then we wait for the state transition with a {{Condition}} object. Signaling comes from {{RMNode}} itself. The change is a bit bigger than it should be because I modified constructors, so to avoid compilation problems, tests also had to be modified. Alternative to this is using a singleton {{DecommissioningNodesSyncer}} but I just don't like it. I prefer dependency injection to singletons. was (Author: pbacsko): [~adam.antal] so the problem is that {{ResourceTrackerService}} uses {{NodesListManager}} to determine what nodes are healthy or not. But sometimes it sees an inconsistent state: {{NodesListManager}} returns that a certain node is in the excluded list, but it's state is not {{DECOMMISSIONING}}. So we have to wait for this state change. First, adding {{synchronized}} blocks to {{NodesListManager}} is necessary. When you call {{isValidNode()}} you have to wait until the XML (which contains the list of to-be-decommissioned nodes) is completely processed. However, if {{isValid()}} returns false, you don't know if graceful decommissioning is going on. If it is, the state of {{RMNodeImpl}} is {{NodeState.DECOMMISSIONING}}. But the catch is that state transition happens on a separate dispatcher thread so you have to wait for it. Most of the time it's quick enough, but you can miss it. When that happens, RTS simply considers a node to be "disallowed" and orders a shutdown immediately. So what's why I introduced a new class called {{DecommissioningNodesSyncer}}. If a node is selected for graceful decommissioning, it is added to a deque. When {{isValid()}} returns, we check if the node is included in this deque. Then we wait for the state transition with a {{Condition}} object. Signaling comes from {{RMNode}} itself. The change is a bit bigger than it should be because I modified constructors, so to avoid compilation problems, tests also had to be modified. Alternative to this is using a singleton {{DecommissioningNodesSyncer}} but I just don't like it. I prefer dependency injection to singletons. > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9011-001.patch, YARN-9011-002.patch, > YARN-9011-003.patch, YARN-9011-004.patch > > > During internal testing, we found a nasty race condition which occurs during > decommissioning. > Node manager, incorrect behaviour: > {noformat} > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 > hostname:node-6.hostname.com > {noformat} > Node manager, expected behaviour: > {noformat} > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:07:37,377 WARN >