[jira] [Commented] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure

2019-07-23 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890873#comment-16890873
 ] 

He Xiaoqiao commented on HDFS-12820:


It does not matter, I will close this issue. Thanks [~zhangchen] for checking.

> Decommissioned datanode is counted in service cause datanode allcating failure
> --
>
> Key: HDFS-12820
> URL: https://issues.apache.org/jira/browse/HDFS-12820
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 2.4.0
>Reporter: Gang Xie
>Priority: Major
>
> When allocate a datanode when dfsclient write with considering the load, it 
> checks if the datanode is overloaded by calculating the average xceivers of 
> all the in service datanode. But if the datanode is decommissioned and become 
> dead, it's still treated as in service, which make the average load much more 
> than the real one especially when the number of the decommissioned datanode 
> is great. In our cluster, 180 datanode, and 100 of them decommissioned, and 
> the average load is 17. This failed all the datanode allocation. 
> private void subtract(final DatanodeDescriptor node) {
>   capacityUsed -= node.getDfsUsed();
>   blockPoolUsed -= node.getBlockPoolUsed();
>   xceiverCount -= node.getXceiverCount();
> {color:red}  if (!(node.isDecommissionInProgress() || 
> node.isDecommissioned())) {{color}
> nodesInService--;
> nodesInServiceXceiverCount -= node.getXceiverCount();
> capacityTotal -= node.getCapacity();
> capacityRemaining -= node.getRemaining();
>   } else {
> capacityTotal -= node.getDfsUsed();
>   }
>   cacheCapacity -= node.getCacheCapacity();
>   cacheUsed -= node.getCacheUsed();
> }



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure

2019-07-23 Thread Chen Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890872#comment-16890872
 ] 

Chen Zhang commented on HDFS-12820:
---

Thanks [~hexiaoqiao] for your comments, my bad, ignored the order of subtract 
operation and status change.

So I think we can resolve this issue as "not an issue"?

> Decommissioned datanode is counted in service cause datanode allcating failure
> --
>
> Key: HDFS-12820
> URL: https://issues.apache.org/jira/browse/HDFS-12820
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 2.4.0
>Reporter: Gang Xie
>Priority: Major
>
> When allocate a datanode when dfsclient write with considering the load, it 
> checks if the datanode is overloaded by calculating the average xceivers of 
> all the in service datanode. But if the datanode is decommissioned and become 
> dead, it's still treated as in service, which make the average load much more 
> than the real one especially when the number of the decommissioned datanode 
> is great. In our cluster, 180 datanode, and 100 of them decommissioned, and 
> the average load is 17. This failed all the datanode allocation. 
> private void subtract(final DatanodeDescriptor node) {
>   capacityUsed -= node.getDfsUsed();
>   blockPoolUsed -= node.getBlockPoolUsed();
>   xceiverCount -= node.getXceiverCount();
> {color:red}  if (!(node.isDecommissionInProgress() || 
> node.isDecommissioned())) {{color}
> nodesInService--;
> nodesInServiceXceiverCount -= node.getXceiverCount();
> capacityTotal -= node.getCapacity();
> capacityRemaining -= node.getRemaining();
>   } else {
> capacityTotal -= node.getDfsUsed();
>   }
>   cacheCapacity -= node.getCacheCapacity();
>   cacheUsed -= node.getCacheUsed();
> }



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure

2019-07-23 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890858#comment-16890858
 ] 

He Xiaoqiao commented on HDFS-12820:


[~zhangchen] IIUC, {{nodeInService}} and other attributes should subtract when 
trigger decommission.
{code:java}
  synchronized void startDecommission(final DatanodeDescriptor node) {
if (!node.isAlive()) {
  LOG.info("Dead node {} is decommissioned immediately.", node);
  node.setDecommissioned();
} else {
  stats.subtract(node);  // where node is still in service
  node.startDecommission();
  stats.add(node);// where node is set to decommission in progress
}
  }
{code}

> Decommissioned datanode is counted in service cause datanode allcating failure
> --
>
> Key: HDFS-12820
> URL: https://issues.apache.org/jira/browse/HDFS-12820
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 2.4.0
>Reporter: Gang Xie
>Priority: Major
>
> When allocate a datanode when dfsclient write with considering the load, it 
> checks if the datanode is overloaded by calculating the average xceivers of 
> all the in service datanode. But if the datanode is decommissioned and become 
> dead, it's still treated as in service, which make the average load much more 
> than the real one especially when the number of the decommissioned datanode 
> is great. In our cluster, 180 datanode, and 100 of them decommissioned, and 
> the average load is 17. This failed all the datanode allocation. 
> private void subtract(final DatanodeDescriptor node) {
>   capacityUsed -= node.getDfsUsed();
>   blockPoolUsed -= node.getBlockPoolUsed();
>   xceiverCount -= node.getXceiverCount();
> {color:red}  if (!(node.isDecommissionInProgress() || 
> node.isDecommissioned())) {{color}
> nodesInService--;
> nodesInServiceXceiverCount -= node.getXceiverCount();
> capacityTotal -= node.getCapacity();
> capacityRemaining -= node.getRemaining();
>   } else {
> capacityTotal -= node.getDfsUsed();
>   }
>   cacheCapacity -= node.getCacheCapacity();
>   cacheUsed -= node.getCacheUsed();
> }



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure

2019-07-23 Thread Chen Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890847#comment-16890847
 ] 

Chen Zhang commented on HDFS-12820:
---

Hi [~jojochuang], I've checked the code of the trunk branch, I think this issue 
still exists on the latest version

If we decommission a datanode and then stop it, the nodesInService of 
DatanodeStats variable is not subtracted, see the follow code:

 
{code:java}
synchronized void subtract(final DatanodeDescriptor node) {
  xceiverCount -= node.getXceiverCount();
  if (node.isInService()) { //Admin.DECOMMISSIONED is not count as isInService
capacityUsed -= node.getDfsUsed();
capacityUsedNonDfs -= node.getNonDfsUsed();
blockPoolUsed -= node.getBlockPoolUsed();
nodesInService--;
nodesInServiceXceiverCount -= node.getXceiverCount();
capacityTotal -= node.getCapacity();
capacityRemaining -= node.getRemaining();
cacheCapacity -= node.getCacheCapacity();
cacheUsed -= node.getCacheUsed();
  } else if (node.isDecommissionInProgress() ||
node.isEnteringMaintenance()) {
cacheCapacity -= node.getCacheCapacity();
cacheUsed -= node.getCacheUsed();
  }
  ...
}{code}
so If we have a cluster of 100 nodes and we decommission and stopped 50 nodes, 
the nodeInService variable will still be 100, this would makes the value 
stats.getInServiceXceiverAverage returns is only half of real "average xceiver 
count", which will cause most nodes become overloaded in the following code
{code:java}
boolean excludeNodeByLoad(DatanodeDescriptor node){
  final double maxLoad = considerLoadFactor *
  stats.getInServiceXceiverAverage(); //calculated by 
total-xceiverCount/nodesInService
  final int nodeLoad = node.getXceiverCount();
  if ((nodeLoad > maxLoad) && (maxLoad > 0)) {
logNodeIsNotChosen(node, NodeNotChosenReason.NODE_TOO_BUSY,
  "(load: " + nodeLoad + " > " + maxLoad + ")");
return true;
  }
  return false;
}
{code}
 

> Decommissioned datanode is counted in service cause datanode allcating failure
> --
>
> Key: HDFS-12820
> URL: https://issues.apache.org/jira/browse/HDFS-12820
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 2.4.0
>Reporter: Gang Xie
>Priority: Major
>
> When allocate a datanode when dfsclient write with considering the load, it 
> checks if the datanode is overloaded by calculating the average xceivers of 
> all the in service datanode. But if the datanode is decommissioned and become 
> dead, it's still treated as in service, which make the average load much more 
> than the real one especially when the number of the decommissioned datanode 
> is great. In our cluster, 180 datanode, and 100 of them decommissioned, and 
> the average load is 17. This failed all the datanode allocation. 
> private void subtract(final DatanodeDescriptor node) {
>   capacityUsed -= node.getDfsUsed();
>   blockPoolUsed -= node.getBlockPoolUsed();
>   xceiverCount -= node.getXceiverCount();
> {color:red}  if (!(node.isDecommissionInProgress() || 
> node.isDecommissioned())) {{color}
> nodesInService--;
> nodesInServiceXceiverCount -= node.getXceiverCount();
> capacityTotal -= node.getCapacity();
> capacityRemaining -= node.getRemaining();
>   } else {
> capacityTotal -= node.getDfsUsed();
>   }
>   cacheCapacity -= node.getCacheCapacity();
>   cacheUsed -= node.getCacheUsed();
> }



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure

2017-11-20 Thread Gang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258981#comment-16258981
 ] 

Gang Xie commented on HDFS-12820:
-

Why we don't substract nodesInService when we complete the decommission of the 
datanode and it becomes dead? And consideration here? 

> Decommissioned datanode is counted in service cause datanode allcating failure
> --
>
> Key: HDFS-12820
> URL: https://issues.apache.org/jira/browse/HDFS-12820
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 2.4.0
>Reporter: Gang Xie
>
> When allocate a datanode when dfsclient write with considering the load, it 
> checks if the datanode is overloaded by calculating the average xceivers of 
> all the in service datanode. But if the datanode is decommissioned and become 
> dead, it's still treated as in service, which make the average load much more 
> than the real one especially when the number of the decommissioned datanode 
> is great. In our cluster, 180 datanode, and 100 of them decommissioned, and 
> the average load is 17. This failed all the datanode allocation. 
> private void subtract(final DatanodeDescriptor node) {
>   capacityUsed -= node.getDfsUsed();
>   blockPoolUsed -= node.getBlockPoolUsed();
>   xceiverCount -= node.getXceiverCount();
> {color:red}  if (!(node.isDecommissionInProgress() || 
> node.isDecommissioned())) {{color}
> nodesInService--;
> nodesInServiceXceiverCount -= node.getXceiverCount();
> capacityTotal -= node.getCapacity();
> capacityRemaining -= node.getRemaining();
>   } else {
> capacityTotal -= node.getDfsUsed();
>   }
>   cacheCapacity -= node.getCacheCapacity();
>   cacheUsed -= node.getCacheUsed();
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure

2017-11-20 Thread Gang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258975#comment-16258975
 ] 

Gang Xie commented on HDFS-12820:
-

And I believe this issue still in the latest version

> Decommissioned datanode is counted in service cause datanode allcating failure
> --
>
> Key: HDFS-12820
> URL: https://issues.apache.org/jira/browse/HDFS-12820
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 2.4.0
>Reporter: Gang Xie
>
> When allocate a datanode when dfsclient write with considering the load, it 
> checks if the datanode is overloaded by calculating the average xceivers of 
> all the in service datanode. But if the datanode is decommissioned and become 
> dead, it's still treated as in service, which make the average load much more 
> than the real one especially when the number of the decommissioned datanode 
> is great. In our cluster, 180 datanode, and 100 of them decommissioned, and 
> the average load is 17. This failed all the datanode allocation. 
> private void subtract(final DatanodeDescriptor node) {
>   capacityUsed -= node.getDfsUsed();
>   blockPoolUsed -= node.getBlockPoolUsed();
>   xceiverCount -= node.getXceiverCount();
> {color:red}  if (!(node.isDecommissionInProgress() || 
> node.isDecommissioned())) {{color}
> nodesInService--;
> nodesInServiceXceiverCount -= node.getXceiverCount();
> capacityTotal -= node.getCapacity();
> capacityRemaining -= node.getRemaining();
>   } else {
> capacityTotal -= node.getDfsUsed();
>   }
>   cacheCapacity -= node.getCacheCapacity();
>   cacheUsed -= node.getCacheUsed();
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure

2017-11-19 Thread Gang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258834#comment-16258834
 ] 

Gang Xie commented on HDFS-12820:
-

nice, let me check it out.

> Decommissioned datanode is counted in service cause datanode allcating failure
> --
>
> Key: HDFS-12820
> URL: https://issues.apache.org/jira/browse/HDFS-12820
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 2.4.0
>Reporter: Gang Xie
>
> When allocate a datanode when dfsclient write with considering the load, it 
> checks if the datanode is overloaded by calculating the average xceivers of 
> all the in service datanode. But if the datanode is decommissioned and become 
> dead, it's still treated as in service, which make the average load much more 
> than the real one especially when the number of the decommissioned datanode 
> is great. In our cluster, 180 datanode, and 100 of them decommissioned, and 
> the average load is 17. This failed all the datanode allocation. 
> private void subtract(final DatanodeDescriptor node) {
>   capacityUsed -= node.getDfsUsed();
>   blockPoolUsed -= node.getBlockPoolUsed();
>   xceiverCount -= node.getXceiverCount();
> {color:red}  if (!(node.isDecommissionInProgress() || 
> node.isDecommissioned())) {{color}
> nodesInService--;
> nodesInServiceXceiverCount -= node.getXceiverCount();
> capacityTotal -= node.getCapacity();
> capacityRemaining -= node.getRemaining();
>   } else {
> capacityTotal -= node.getDfsUsed();
>   }
>   cacheCapacity -= node.getCacheCapacity();
>   cacheUsed -= node.getCacheUsed();
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org