[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning

2019-10-29 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961935#comment-16961935
 ] 

Zhankun Tang edited comment on YARN-9011 at 10/29/19 12:15 PM:
---

[~pbacsko], Thanks for the explanation. After the offline sync up, this 
"lazyLoaded" seems the good way to go without lock the hostDetails. + 1 from 
me. Thoughts? [~bibinchundatt]?


was (Author: tangzhankun):
[~pbacsko], Thanks for the explanation. After the offline sync up, this seems 
the good way to go without lock the hostDetails. + 1 from me. Thoughts? 
[~bibinchundatt]?

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch, 
> YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, 
> YARN-9011-006.patch, YARN-9011-007.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: DECOMMISSIONING  node-6.hostname.com:8041 is ready to be 
> decommissioned
> {noformat}
> Note the two different messages from the RM ("Disallowed NodeManager" vs 
> "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an 
> inconsistent state of nodes while they're being updated:
> {noformat}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader 
> include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219}
>  exclude:{node-6.hostname.com}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully 
> decommission node node-6.hostname.com:8041 with state RUNNING
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: 
> node-6.hostname.com
> 2018-06-18 21:00:17,576 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node 
> node-6.hostname.com:8041 in DECOMMISSIONING.
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn 
> IP=172.26.22.115OPERATION=refreshNodes  TARGET=AdminService 
> RESULT=SUCCESS
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve 
> original total capability: 
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING
> {noformat}
> When the decommissioning succeeds, there is no output logged from 
> {{ResourceTrackerService}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning

2019-10-29 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961935#comment-16961935
 ] 

Zhankun Tang edited comment on YARN-9011 at 10/29/19 12:14 PM:
---

[~pbacsko], Thanks for the explanation. After the offline sync up, this seems 
the good way to go without lock the hostDetails. + 1 from me. Thoughts? 
[~bibinchundatt]?


was (Author: tangzhankun):
[~pbacsko], Thanks for the explanation. After the offline sync up, this seems 
the good lock-free way to go. + 1 from me.

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch, 
> YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, 
> YARN-9011-006.patch, YARN-9011-007.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: DECOMMISSIONING  node-6.hostname.com:8041 is ready to be 
> decommissioned
> {noformat}
> Note the two different messages from the RM ("Disallowed NodeManager" vs 
> "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an 
> inconsistent state of nodes while they're being updated:
> {noformat}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader 
> include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219}
>  exclude:{node-6.hostname.com}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully 
> decommission node node-6.hostname.com:8041 with state RUNNING
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: 
> node-6.hostname.com
> 2018-06-18 21:00:17,576 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node 
> node-6.hostname.com:8041 in DECOMMISSIONING.
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn 
> IP=172.26.22.115OPERATION=refreshNodes  TARGET=AdminService 
> RESULT=SUCCESS
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve 
> original total capability: 
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING
> {noformat}
> When the decommissioning succeeds, there is no output logged from 
> {{ResourceTrackerService}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning

2019-10-29 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961621#comment-16961621
 ] 

Zhankun Tang edited comment on YARN-9011 at 10/29/19 11:54 AM:
---

[~pbacsko], Thanks for the new patch. The idea looks good to me. Several 
comments:

1. Why do we need a "lazyLoaded"? I don't see "hostDetails" differences between 
"getLazyLoadedHostDetails" and "getHostDetails".
2. Could we check the "Decommissioning" status before 
"isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"? Because 
The "gracefulDecommissionableNodes" will only be cleared after the refresh 
operation. So it will always be scanned when heartbeat which seems not 
necessary. 


was (Author: tangzhankun):
[~pbacsko], Thanks for the new patch. The idea looks good to me. Several 
comments:

1. Why do we need a lazy update? I don't see "hostDetails" differences between 
"getLazyLoadedHostDetails" and "getHostDetails".
2. Could we check the "Decommissioning" status before 
"isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"? Because 
The "gracefulDecommissionableNodes" will only be cleared after the refresh 
operation. So it will always be scanned when heartbeat which seems not 
necessary. 

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch, 
> YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, 
> YARN-9011-006.patch, YARN-9011-007.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: DECOMMISSIONING  node-6.hostname.com:8041 is ready to be 
> decommissioned
> {noformat}
> Note the two different messages from the RM ("Disallowed NodeManager" vs 
> "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an 
> inconsistent state of nodes while they're being updated:
> {noformat}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader 
> include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219}
>  exclude:{node-6.hostname.com}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully 
> decommission node node-6.hostname.com:8041 with state RUNNING
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: 
> node-6.hostname.com
> 2018-06-18 21:00:17,576 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node 
> node-6.hostname.com:8041 in DECOMMISSIONING.
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn 
> IP=172.26.22.115OPERATION=refreshNodes  TARGET=AdminService 
> RESULT=SUCCESS
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve 
> original total capability: 
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING
> {noformat}
> When the decommissioning succeeds, there is no output logged from 
> {{ResourceTrackerService}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, 

[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning

2019-10-29 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961879#comment-16961879
 ] 

Peter Bacsko edited comment on YARN-9011 at 10/29/19 11:10 AM:
---

_"1. Why do we need a lazy update?"_

Please see details in my comment above that I posted on 25th Sep: 
https://issues.apache.org/jira/browse/YARN-9011?focusedCommentId=16937696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937696

It is important that when you do a "lazy" refresh, you should not make the new 
changes visible to {{ResourceTrackerService}}. The problematic part of the code 
is this:
{noformat}
// 1. Check if it's a valid (i.e. not excluded) node, if not, see if it is
// in decommissioning.
if (!this.nodesListManager.isValidNode(nodeId.getHost())
&& !isNodeInDecommissioning(nodeId)) {
...
{noformat}
If you perform a graceful decom, it is important that 
{{isNodeInDecommissioning()}} return true. However, it takes time for 
{{RMAppImpl}} to go into {{DECOMMISSIONING}} state, that's why this code is not 
fully reliable. Therefore, {{isValidNode()}} should only return false when we 
already constructed a set of nodes that we want to decommission.

_2. Could we check the "Decommissioning" status before 
"isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"?_

-No, we can't (well, we can, but it would be pointless). Decomissioning status 
only occurs when you refresh (reload) the exclusion/inclusion files. That is, 
we need to call {{NodesListManager.refreshNodes()}}. And that is the problem - 
during refresh, excludeable nodes become visible almost immediately, but not 
the fact that they're decomissionable.-

I misunderstood this question. It's doable, see my comment below.

_3. So it will always be scanned when heartbeat which seems not necessary._
 Scanning is necessary to avoid the race condition, but this isn't really a 
problem because of three things:
 1. It happens only for those nodes which are excluded ({{isValid()}} is false)
 2. We lookup inside a ConcurrentHashMap, which should be really fast
 3. Once {{RMNode}} reaches {{DECOMMISSIONING}} state (which should happen 
pretty quickly from {{RUNNING}}), we no longer need the set.

*Edit*: even though it's not a huge problem, I agree that it can be enhanced, 
again, see below.

I can imagine a small enhancement here: once the node reached 
{{DECOMISSIONING}} state, we remove it from the set, making it smaller and 
smaller.


was (Author: pbacsko):
_"1. Why do we need a lazy update?"_

Please see details in my comment above that I posted on 25th Sep: 
https://issues.apache.org/jira/browse/YARN-9011?focusedCommentId=16937696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937696

It is important that when you do a "lazy" refresh, you should not make the new 
changes visible to {{ResourceTrackerService}}. The problematic part of the code 
is this:
{noformat}
// 1. Check if it's a valid (i.e. not excluded) node, if not, see if it is
// in decommissioning.
if (!this.nodesListManager.isValidNode(nodeId.getHost())
&& !isNodeInDecommissioning(nodeId)) {
...
{noformat}
If you perform a graceful decom, it is important that 
{{isNodeInDecommissioning()}} return true. However, it takes time for 
{{RMAppImpl}} to go into {{DECOMMISSIONING}} state, that's why this code is not 
fully reliable. Therefore, {{isValidNode()}} should only return false when we 
already constructed a set of nodes that we want to decommission.

_2. Could we check the "Decommissioning" status before 
"isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"?_

No, we can't (well, we can, but it would be pointless). Decomissioning status 
only occurs when you refresh (reload) the exclusion/inclusion files. That is, 
we need to call {{NodesListManager.refreshNodes()}}. And that is the problem - 
during refresh, excludeable nodes become visible almost immediately, but not 
the fact that they're decomissionable.

_3. So it will always be scanned when heartbeat which seems not necessary._
 Scanning is necessary to avoid the race condition, but this isn't really a 
problem because of three things:
 1. It happens only for those nodes which are excluded ({{isValid()}} is false)
 2. We lookup inside a ConcurrentHashMap, which should be really fast
 3. Once {{RMNode}} reaches {{DECOMMISSIONING}} state (which should happen 
pretty quickly from {{RUNNING}}), we no longer need the set.

I can imagine a small enhancement here: once the node reached 
{{DECOMISSIONING}} state, we remove it from the set, making it smaller and 
smaller.

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug

[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning

2019-10-29 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961897#comment-16961897
 ] 

Peter Bacsko edited comment on YARN-9011 at 10/29/19 11:03 AM:
---

Ok, actually we always call the {{isGracefullyDecommissionableNode()}} method 
inside {{isNodeInDecommissioning()}}.

We just have to slightly re-arrange the order of calls like:

{noformat}
  private boolean isNodeInDecommissioning(NodeId nodeId) {
RMNode rmNode = this.rmContext.getRMNodes().get(nodeId);

   // state OK - early return
if (rmNode != null &&
rmNode.getState() == NodeState.DECOMMISSIONING) {
  return true;
}

// Graceful decom: wait until node moves out of RUNNING state.
if (rmNode != null &&
this.nodesListManager.isGracefullyDecommissionableNode(rmNode)) {
  NodeState currentState = rmNode.getState();

  if (currentState == NodeState.RUNNING) {
return true;
  }
}

return false;
  }
{noformat}

This avoids the unnecessary invocation of 
{{nodesListManager.isGracefullyDecommissionableNode()}}.


was (Author: pbacsko):
Ok, actually we always call the {{isGracefullyDecommissionableNode()}} method 
inside {{isNodeInDecommissioning()}}.

We just have to slightly re-arrange the order of calls like:

{noformat}
  private boolean isNodeInDecommissioning(NodeId nodeId) {
RMNode rmNode = this.rmContext.getRMNodes().get(nodeId);

   // state OK - early return
if (rmNode != null &&
rmNode.getState() == NodeState.DECOMMISSIONING) {
  return true;
}

// Graceful decom: wait until node moves out of RUNNING state.
if (rmNode != null &&
this.nodesListManager.isGracefullyDecommissionableNode(rmNode)) {
  NodeState currentState = rmNode.getState();

  if (currentState == NodeState.RUNNING) {
return true;
  }
}

return false;
  }
{noformat}

This avoid the unnecessary invocation of 
{{nodesListManager.isGracefullyDecommissionableNode()}}.

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch, 
> YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, 
> YARN-9011-006.patch, YARN-9011-007.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: DECOMMISSIONING  node-6.hostname.com:8041 is ready to be 
> decommissioned
> {noformat}
> Note the two different messages from the RM ("Disallowed NodeManager" vs 
> "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an 
> inconsistent state of nodes while they're being updated:
> {noformat}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader 
> include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219}
>  exclude:{node-6.hostname.com}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully 
> decommission node node-6.hostname.com:8041 with state RUNNING
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: 
> node-6.hostname.com
> 2018-06-18 21:00:17,576 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node 
> node-6.hostname.com:8041 in DECOMMISSIONING.
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn   

[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning

2019-10-29 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961879#comment-16961879
 ] 

Peter Bacsko edited comment on YARN-9011 at 10/29/19 10:48 AM:
---

_"1. Why do we need a lazy update?"_

Please see details in my comment above that I posted on 25th Sep: 
https://issues.apache.org/jira/browse/YARN-9011?focusedCommentId=16937696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937696

It is important that when you do a "lazy" refresh, you should not make the new 
changes visible to {{ResourceTrackerService}}. The problematic part of the code 
is this:
{noformat}
// 1. Check if it's a valid (i.e. not excluded) node, if not, see if it is
// in decommissioning.
if (!this.nodesListManager.isValidNode(nodeId.getHost())
&& !isNodeInDecommissioning(nodeId)) {
...
{noformat}
If you perform a graceful decom, it is important that 
{{isNodeInDecommissioning()}} return true. However, it takes time for 
{{RMAppImpl}} to go into {{DECOMMISSIONING}} state, that's why this code is not 
fully reliable. Therefore, {{isValidNode()}} should only return false when we 
already constructed a set of nodes that we want to decommission.

_2. Could we check the "Decommissioning" status before 
"isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"?_

No, we can't (well, we can, but it would be pointless). Decomissioning status 
only occurs when you refresh (reload) the exclusion/inclusion files. That is, 
we need to call {{NodesListManager.refreshNodes()}}. And that is the problem - 
during refresh, excludeable nodes become visible almost immediately, but not 
the fact that they're decomissionable.

_3. So it will always be scanned when heartbeat which seems not necessary._
 Scanning is necessary to avoid the race condition, but this isn't really a 
problem because of three things:
 1. It happens only for those nodes which are excluded ({{isValid()}} is false)
 2. We lookup inside a ConcurrentHashMap, which should be really fast
 3. Once {{RMNode}} reaches {{DECOMMISSIONING}} state (which should happen 
pretty quickly from {{RUNNING}}), we no longer need the set.

I can imagine a small enhancement here: once the node reached 
{{DECOMISSIONING}} state, we remove it from the set, making it smaller and 
smaller.


was (Author: pbacsko):
_"1. Why do we need a lazy update?"_

Please see details in my comment above that I posted on 25th Sep: 
https://issues.apache.org/jira/browse/YARN-9011?focusedCommentId=16937696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937696

It is important that when you do a "lazy" refresh, you should not make the new 
changes visible to {{ResourceTrackerService}}. The problematic part of the code 
is this:
{noformat}
// 1. Check if it's a valid (i.e. not excluded) node, if not, see if it is
// in decommissioning.
if (!this.nodesListManager.isValidNode(nodeId.getHost())
&& !isNodeInDecommissioning(nodeId)) {
...
{noformat}
If you perform a graceful decom, it is important that 
{{isNodeInDecommissioning()}} return true. However, it takes time for 
{{RMAppImpl}} to go into {{DECOMMISSIONING}} state, that's why this code is not 
fully reliable. Therefore, {{isValidNode()}} should only return false when we 
already constructed a set of nodes that we want to decommission.

_2. Could we check the "Decommissioning" status before 
"isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"?_

No, we can't (well, we can, but it would be pointless). Decomissioning status 
only occurs when you refresh (reload) the exclusion/inclusion files. That is, 
we need to call {{NodesListManager.refreshNodes()}}. And that is the problem - 
during refresh, excludeable nodes become visible almost immediately, but not 
the fact that they're decomissionable.

_3. So it will always be scanned when heartbeat which seems not necessary._
 Scanning is necessary to avoid the race condition, but this isn't really a 
problem because of two things:
 1. It happens only for those nodes which are excluded ({{isValid()}} is false)
 2. We lookup inside a ConcurrentHashMap, which should be really fast
 3. Once {{RMNode}} reaches {{DECOMMISSIONING}} state (which should happen 
pretty quickly from {{RUNNING}}), we no longer need the set.

I can imagine a small enhancement here: once the node reached 
{{DECOMISSIONING}} state, we remove it from the set, making it smaller and 
smaller.

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>   

[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning

2019-10-29 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961879#comment-16961879
 ] 

Peter Bacsko edited comment on YARN-9011 at 10/29/19 10:48 AM:
---

_"1. Why do we need a lazy update?"_

Please see details in my comment above that I posted on 25th Sep: 
https://issues.apache.org/jira/browse/YARN-9011?focusedCommentId=16937696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937696

It is important that when you do a "lazy" refresh, you should not make the new 
changes visible to {{ResourceTrackerService}}. The problematic part of the code 
is this:
{noformat}
// 1. Check if it's a valid (i.e. not excluded) node, if not, see if it is
// in decommissioning.
if (!this.nodesListManager.isValidNode(nodeId.getHost())
&& !isNodeInDecommissioning(nodeId)) {
...
{noformat}
If you perform a graceful decom, it is important that 
{{isNodeInDecommissioning()}} return true. However, it takes time for 
{{RMAppImpl}} to go into {{DECOMMISSIONING}} state, that's why this code is not 
fully reliable. Therefore, {{isValidNode()}} should only return false when we 
already constructed a set of nodes that we want to decommission.

_2. Could we check the "Decommissioning" status before 
"isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"?_

No, we can't (well, we can, but it would be pointless). Decomissioning status 
only occurs when you refresh (reload) the exclusion/inclusion files. That is, 
we need to call {{NodesListManager.refreshNodes()}}. And that is the problem - 
during refresh, excludeable nodes become visible almost immediately, but not 
the fact that they're decomissionable.

_3. So it will always be scanned when heartbeat which seems not necessary._
 Scanning is necessary to avoid the race condition, but this isn't really a 
problem because of two things:
 1. It happens only for those nodes which are excluded ({{isValid()}} is false)
 2. We lookup inside a ConcurrentHashMap, which should be really fast
 3. Once {{RMNode}} reaches {{DECOMMISSIONING}} state (which should happen 
pretty quickly from {{RUNNING}}), we no longer need the set.

I can imagine a small enhancement here: once the node reached 
{{DECOMISSIONING}} state, we remove it from the set, making it smaller and 
smaller.


was (Author: pbacsko):
_"1. Why do we need a lazy update?"_

Please see details in my comment above that I posted on 25th Sep: 
https://issues.apache.org/jira/browse/YARN-9011?focusedCommentId=16937696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937696

It is important that when you do a "lazy" refresh, you should not make the new 
changes visible to {{ResourceTrackerService}}. The problematic part of the code 
is this:
{noformat}
// 1. Check if it's a valid (i.e. not excluded) node, if not, see if it is
// in decommissioning.
if (!this.nodesListManager.isValidNode(nodeId.getHost())
&& !isNodeInDecommissioning(nodeId)) {
...
{noformat}
If you perform a graceful decom, it is important that 
{{isNodeInDecommissioning()}} return true. However, it takes time for 
{{RMAppImpl}} to go into {{DECOMMISSIONING}} state, that's why this code is not 
fully reliable. Therefore, {{isValidNode()}} should only return false when we 
already constructed a set of nodes that we want to decommission.

_2. Could we check the "Decommissioning" status before 
"isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"?_

No, we can't (well, we can, but it would be pointless). Decomissioning status 
only occurs when you refresh (reload) the exclusion/inclusion files. That is, 
we need to call {{NodesListManager.refreshNodes()}}. And that is the problem - 
during refresh, excludeable nodes become visible almost immediately, but not 
the fact that they're decomissionable.

_3. So it will always be scanned when heartbeat which seems not necessary._
 Scanning is necessary to avoid the race condition, but this isn't really a 
problem because of two things:
 1. It happens only for those nodes which are excluded ({{isValid()}} is false)
 2. We lookup inside a ConcurrentHashMap, which should be really fast

I can imagine an enhancement here: once the node reached {{DECOMISSIONING}} 
state, we remove it from the set, making it smaller and smaller.

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch, 
> YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, 
> 

[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning

2019-10-28 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961621#comment-16961621
 ] 

Zhankun Tang edited comment on YARN-9011 at 10/29/19 2:49 AM:
--

[~pbacsko], Thanks for the new patch. The idea looks good to me. Several 
comments:

1. Why do we need a lazy update? I don't see "hostDetails" differences between 
"getLazyLoadedHostDetails" and "getHostDetails".
2. Could we check the "Decommissioning" status before 
"isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"? Because 
The "gracefulDecommissionableNodes" will only be cleared after the refresh 
operation. So it will always be scanned when heartbeat which seems not 
necessary. 


was (Author: tangzhankun):
[~pbacsko], Thanks for the new patch. The idea looks good to me. Several 
comments:

1. Why do we need a lazy update? I don't see "hostDetails" differences between 
"getLazyLoadedHostDetails" and "getHostDetails".
2. Could we check the "Decommissioning" status before 
"isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"? Because 
The "gracefulDecommissionableNodes" will only be cleared after the refresh 
operation. So it will always be scanned which seems not necessary. 

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch, 
> YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, 
> YARN-9011-006.patch, YARN-9011-007.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: DECOMMISSIONING  node-6.hostname.com:8041 is ready to be 
> decommissioned
> {noformat}
> Note the two different messages from the RM ("Disallowed NodeManager" vs 
> "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an 
> inconsistent state of nodes while they're being updated:
> {noformat}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader 
> include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219}
>  exclude:{node-6.hostname.com}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully 
> decommission node node-6.hostname.com:8041 with state RUNNING
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: 
> node-6.hostname.com
> 2018-06-18 21:00:17,576 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node 
> node-6.hostname.com:8041 in DECOMMISSIONING.
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn 
> IP=172.26.22.115OPERATION=refreshNodes  TARGET=AdminService 
> RESULT=SUCCESS
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve 
> original total capability: 
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING
> {noformat}
> When the decommissioning succeeds, there is no output logged from 
> {{ResourceTrackerService}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning

2019-10-28 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961621#comment-16961621
 ] 

Zhankun Tang edited comment on YARN-9011 at 10/29/19 2:49 AM:
--

[~pbacsko], Thanks for the new patch. The idea looks good to me. Several 
comments:

1. Why do we need a lazy update? I don't see "hostDetails" differences between 
"getLazyLoadedHostDetails" and "getHostDetails".
2. Could we check the "Decommissioning" status before 
"isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"? Because 
The "gracefulDecommissionableNodes" will only be cleared after the refresh 
operation. So it will always be scanned which seems not necessary. 


was (Author: tangzhankun):
[~pbacsko], Thanks for the new patch. The idea looks good to me. Several 
comments:

1. Why do we need a lazy update? I don't see "hostDetails" differences between 
"getLazyLoadedHostDetails" and "getHostDetails".
2. Could we check the "Decommissioning" status before 
"isGracefullyDecommissionableNode" in method "isNodeInDecommissioning"? Because 
The "gracefulDecommissionableNodes" will only be cleared after the refresh 
operation. So it will always be executed which seems not necessary. 

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch, 
> YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, 
> YARN-9011-006.patch, YARN-9011-007.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: DECOMMISSIONING  node-6.hostname.com:8041 is ready to be 
> decommissioned
> {noformat}
> Note the two different messages from the RM ("Disallowed NodeManager" vs 
> "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an 
> inconsistent state of nodes while they're being updated:
> {noformat}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader 
> include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219}
>  exclude:{node-6.hostname.com}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully 
> decommission node node-6.hostname.com:8041 with state RUNNING
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: 
> node-6.hostname.com
> 2018-06-18 21:00:17,576 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node 
> node-6.hostname.com:8041 in DECOMMISSIONING.
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn 
> IP=172.26.22.115OPERATION=refreshNodes  TARGET=AdminService 
> RESULT=SUCCESS
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve 
> original total capability: 
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING
> {noformat}
> When the decommissioning succeeds, there is no output logged from 
> {{ResourceTrackerService}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning

2019-09-24 Thread Bibin A Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936842#comment-16936842
 ] 

Bibin A Chundatt edited comment on YARN-9011 at 9/24/19 2:19 PM:
-

{quote}
But even if you have to wait, it's a very small tiny window which is probably 
just milliseconds
{quote}
Depends on the time taken to process events. In large clusters we cant expect 
that to be mills.

*Alternate approach*

NodelistManager is the source for *GRACEFUL_DECOMMISSION* event based on which 
state transistion of RMNodeImpl to DECOMMISSIONING happens.I think as per 
YARN-3212 the state avoids the containers getting killed during the period of 
DECOMMISIONING.

* We could maintain in nodelistmanager the list of to be decommissioned list 
for which the *GRACEFUL_DECOMMISSION* was fired.
* HostsFileReader set the refreshed *HostDetails* only after the event is fired.

This way the HostsFileReader and nodeState could be sync. Thoughts??



was (Author: bibinchundatt):
{quote}
But even if you have to wait, it's a very small tiny window which is probably 
just milliseconds
{quote}
Depends on the time taken to process events. In large clusters we cant expect 
that to be mills.

*Alternate approach*

NodelistManager is the source for *GRACEFUL_DECOMMISSION* event based on which 
state transistion of RMNodeImpl to DECOMMISSIONING happens.I think as per 
YARN-3212 the state avoids the containers getting killed during the period of 
DECOMMISIONING.

* We could maintain in nodelistmanager the list of to be decommissioned list 
for which the *GRACEFUL_DECOMMISSION* was fired.
* HostsFileReader set the refreshed *HostDetails* only after the event is fired.

This was the HostsFileReader and nodeState could be sync. Thoughts??


> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch, 
> YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: DECOMMISSIONING  node-6.hostname.com:8041 is ready to be 
> decommissioned
> {noformat}
> Note the two different messages from the RM ("Disallowed NodeManager" vs 
> "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an 
> inconsistent state of nodes while they're being updated:
> {noformat}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader 
> include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219}
>  exclude:{node-6.hostname.com}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully 
> decommission node node-6.hostname.com:8041 with state RUNNING
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: 
> node-6.hostname.com
> 2018-06-18 21:00:17,576 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node 
> node-6.hostname.com:8041 in DECOMMISSIONING.
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn 
> IP=172.26.22.115OPERATION=refreshNodes  TARGET=AdminService 
> RESULT=SUCCESS
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve 
> original total capability: 
> 2018-06-18 21:00:17,577 INFO 
> 

[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning

2019-09-24 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936576#comment-16936576
 ] 

Peter Bacsko edited comment on YARN-9011 at 9/24/19 9:06 AM:
-

[~tangzhankun] yes, that's correct.

The problem is that {{RMNodeImpl}} does not immediately go to 
{{DECOMMISSIONING}} state, you have to wait for the state machine to complete 
the transition on a different thread. 

I was thinking about alternative approaches, for example adding a new field to 
{{RMNodeImpl}} that stores graceful decommissioning intent and it could be 
updated synchronously from {{NodesListManager}}, so it's enough to have the 
{{synchronized}} block. But then you have deal with this extra variable and 
update it if necessary, so I felt that right now this approach is safer.


was (Author: pbacsko):
[~tangzhankun] yes, that's correct.

The problem is that {{RMNodeImpl}} does not immediately go to 
{{DECOMMISSIONING}} state, you have to wait for the state machine to complete 
the transition on a different thread. 

I was thinking about alternative approaches, for examplee introducing a new 
field to {{RMNodeImpl}} that stores graceful decommissioning intent and it 
could be updated synchronously from {{NodesListManager}}, so it's enough to 
have the {{synchronized}} block. But then you have deal with this extra 
variable and update it if necessary, so I felt that right now this approach is 
safer.

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch, 
> YARN-9011-003.patch, YARN-9011-004.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: DECOMMISSIONING  node-6.hostname.com:8041 is ready to be 
> decommissioned
> {noformat}
> Note the two different messages from the RM ("Disallowed NodeManager" vs 
> "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an 
> inconsistent state of nodes while they're being updated:
> {noformat}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader 
> include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219}
>  exclude:{node-6.hostname.com}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully 
> decommission node node-6.hostname.com:8041 with state RUNNING
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: 
> node-6.hostname.com
> 2018-06-18 21:00:17,576 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node 
> node-6.hostname.com:8041 in DECOMMISSIONING.
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn 
> IP=172.26.22.115OPERATION=refreshNodes  TARGET=AdminService 
> RESULT=SUCCESS
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve 
> original total capability: 
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING
> {noformat}
> When the decommissioning succeeds, there is no output logged from 
> {{ResourceTrackerService}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning

2019-09-23 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935731#comment-16935731
 ] 

Peter Bacsko edited comment on YARN-9011 at 9/23/19 10:38 AM:
--

[~adam.antal] so the problem is that {{ResourceTrackerService}} uses 
{{NodesListManager}} to determine what nodes are enabled. But sometimes it sees 
an inconsistent state: {{NodesListManager}} returns that a certain node is in 
the excluded list, but it's state is not {{DECOMMISSIONING}}. So we have to 
wait for this state change.

First, adding {{synchronized}} blocks to {{NodesListManager}} is necessary. 
When you call {{isValidNode()}} you have to wait until the XML (which contains 
the list of to-be-decommissioned nodes) is completely processed.

However, if {{isValid()}} returns false, you don't know if graceful 
decommissioning is going on. If it is, the state of {{RMNodeImpl}} is 
{{NodeState.DECOMMISSIONING}}. But the catch is that state transition happens 
on a separate dispatcher thread so you have to wait for it. Most of the time 
it's quick enough, but you can miss it. When that happens, RTS simply considers 
a node to be "disallowed" and orders a shutdown immediately.

So what's why I introduced a new class called {{DecommissioningNodesSyncer}}. 
If a node is selected for graceful decommissioning, it is added to a deque. 
When {{isValid()}} returns, we check if the node is included in this deque. 
Then we wait for the state transition with a {{Condition}} object. Signaling 
comes from {{RMNode}} itself.

The change is a bit bigger than it should be because I modified constructors, 
so to avoid compilation problems, tests also had to be modified. Alternative to 
this is using a singleton {{DecommissioningNodesSyncer}} but I just don't like 
it. I prefer dependency injection to singletons.


was (Author: pbacsko):
[~adam.antal] so the problem is that {{ResourceTrackerService}} uses 
{{NodesListManager}} to determine what nodes are enabled or not. But sometimes 
it sees an inconsistent state: {{NodesListManager}} returns that a certain node 
is in the excluded list, but it's state is not {{DECOMMISSIONING}}. So we have 
to wait for this state change.

First, adding {{synchronized}} blocks to {{NodesListManager}} is necessary. 
When you call {{isValidNode()}} you have to wait until the XML (which contains 
the list of to-be-decommissioned nodes) is completely processed.

However, if {{isValid()}} returns false, you don't know if graceful 
decommissioning is going on. If it is, the state of {{RMNodeImpl}} is 
{{NodeState.DECOMMISSIONING}}. But the catch is that state transition happens 
on a separate dispatcher thread so you have to wait for it. Most of the time 
it's quick enough, but you can miss it. When that happens, RTS simply considers 
a node to be "disallowed" and orders a shutdown immediately.

So what's why I introduced a new class called {{DecommissioningNodesSyncer}}. 
If a node is selected for graceful decommissioning, it is added to a deque. 
When {{isValid()}} returns, we check if the node is included in this deque. 
Then we wait for the state transition with a {{Condition}} object. Signaling 
comes from {{RMNode}} itself.

The change is a bit bigger than it should be because I modified constructors, 
so to avoid compilation problems, tests also had to be modified. Alternative to 
this is using a singleton {{DecommissioningNodesSyncer}} but I just don't like 
it. I prefer dependency injection to singletons.

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch, 
> YARN-9011-003.patch, YARN-9011-004.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
> 

[jira] [Comment Edited] (YARN-9011) Race condition during decommissioning

2019-09-23 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935731#comment-16935731
 ] 

Peter Bacsko edited comment on YARN-9011 at 9/23/19 10:38 AM:
--

[~adam.antal] so the problem is that {{ResourceTrackerService}} uses 
{{NodesListManager}} to determine what nodes are enabled or not. But sometimes 
it sees an inconsistent state: {{NodesListManager}} returns that a certain node 
is in the excluded list, but it's state is not {{DECOMMISSIONING}}. So we have 
to wait for this state change.

First, adding {{synchronized}} blocks to {{NodesListManager}} is necessary. 
When you call {{isValidNode()}} you have to wait until the XML (which contains 
the list of to-be-decommissioned nodes) is completely processed.

However, if {{isValid()}} returns false, you don't know if graceful 
decommissioning is going on. If it is, the state of {{RMNodeImpl}} is 
{{NodeState.DECOMMISSIONING}}. But the catch is that state transition happens 
on a separate dispatcher thread so you have to wait for it. Most of the time 
it's quick enough, but you can miss it. When that happens, RTS simply considers 
a node to be "disallowed" and orders a shutdown immediately.

So what's why I introduced a new class called {{DecommissioningNodesSyncer}}. 
If a node is selected for graceful decommissioning, it is added to a deque. 
When {{isValid()}} returns, we check if the node is included in this deque. 
Then we wait for the state transition with a {{Condition}} object. Signaling 
comes from {{RMNode}} itself.

The change is a bit bigger than it should be because I modified constructors, 
so to avoid compilation problems, tests also had to be modified. Alternative to 
this is using a singleton {{DecommissioningNodesSyncer}} but I just don't like 
it. I prefer dependency injection to singletons.


was (Author: pbacsko):
[~adam.antal] so the problem is that {{ResourceTrackerService}} uses 
{{NodesListManager}} to determine what nodes are healthy or not. But sometimes 
it sees an inconsistent state: {{NodesListManager}} returns that a certain node 
is in the excluded list, but it's state is not {{DECOMMISSIONING}}. So we have 
to wait for this state change.

First, adding {{synchronized}} blocks to {{NodesListManager}} is necessary. 
When you call {{isValidNode()}} you have to wait until the XML (which contains 
the list of to-be-decommissioned nodes) is completely processed.

However, if {{isValid()}} returns false, you don't know if graceful 
decommissioning is going on. If it is, the state of {{RMNodeImpl}} is 
{{NodeState.DECOMMISSIONING}}. But the catch is that state transition happens 
on a separate dispatcher thread so you have to wait for it. Most of the time 
it's quick enough, but you can miss it. When that happens, RTS simply considers 
a node to be "disallowed" and orders a shutdown immediately.

So what's why I introduced a new class called {{DecommissioningNodesSyncer}}. 
If a node is selected for graceful decommissioning, it is added to a deque. 
When {{isValid()}} returns, we check if the node is included in this deque. 
Then we wait for the state transition with a {{Condition}} object. Signaling 
comes from {{RMNode}} itself. 

The change is a bit bigger than it should be because I modified constructors, 
so to avoid compilation problems, tests also had to be modified. Alternative to 
this is using a singleton {{DecommissioningNodesSyncer}} but I just don't like 
it. I prefer dependency injection to singletons.

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch, 
> YARN-9011-003.patch, YARN-9011-004.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
>