Ming Ma created HDFS-8056: ----------------------------- Summary: Decommissioned dead nodes should continue to be counted as dead after NN restart Key: HDFS-8056 URL: https://issues.apache.org/jira/browse/HDFS-8056 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma
We had some offline discussion with [~andrew.wang] and [~cmccabe] about this. Bring this up for more input and get the patch in place. Dead nodes are tracked by {{DatanodeManager}}'s {{datanodeMap}}. However, after NN restarts, those nodes that were dead before NN restart won't be in {{datanodeMap}}. {{DatanodeManager}}'s {{getDatanodeListForReport}} will add those dead nodes, but not if they are in the exclude file. {noformat} if (listDeadNodes) { for (InetSocketAddress addr : includedNodes) { if (foundNodes.matchedBy(addr) || excludedNodes.match(addr)) { continue; } // The remaining nodes are ones that are referenced by the hosts // files but that we do not know about, ie that we have never // head from. Eg. an entry that is no longer part of the cluster // or a bogus entry was given in the hosts files // // If the host file entry specified the xferPort, we use that. // Otherwise, we guess that it is the default xfer port. // We can't ask the DataNode what it had configured, because it's // dead. DatanodeDescriptor dn = new DatanodeDescriptor(new DatanodeID(addr .getAddress().getHostAddress(), addr.getHostName(), "", addr.getPort() == 0 ? defaultXferPort : addr.getPort(), defaultInfoPort, defaultInfoSecurePort, defaultIpcPort)); setDatanodeDead(dn); nodes.add(dn); } } {noformat} The issue here is the decommissioned dead node JMX will be different after NN restart. It might be better to make it consistent across NN restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)