[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943165#comment-16943165 ] Wei-Chiu Chuang commented on HDFS-14527: Patch applies cleanly in branch-3.2 also. But it doesn't compile in branch-3.1. I'll provide a patch shortly. > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.0, 3.2.2 > > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, > HDFS-14527.003.patch, HDFS-14527.004.patch, HDFS-14527.005.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16857931#comment-16857931 ] He Xiaoqiao commented on HDFS-14527: Thanks [~elgoiri], [~ayushtkn] for the reviews and commit. > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, > HDFS-14527.003.patch, HDFS-14527.004.patch, HDFS-14527.005.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16857922#comment-16857922 ] Hudson commented on HDFS-14527: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16694 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16694/]) HDFS-14527. Stop all DataNodes may result in NN terminate. Contributed (inigoiri: rev 944adc61b1830388d520d4052fc7eb6c7ba2790d) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRedundancyMonitor.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyRackFaultTolerant.java > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, > HDFS-14527.003.patch, HDFS-14527.004.patch, HDFS-14527.005.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16857914#comment-16857914 ] Íñigo Goiri commented on HDFS-14527: Thanks [~hexiaoqiao] for the patch and [~ayushtkn] for the review. Committed to trunk. > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, > HDFS-14527.003.patch, HDFS-14527.004.patch, HDFS-14527.005.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16857910#comment-16857910 ] Íñigo Goiri commented on HDFS-14527: Committing to trunk soon. > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, > HDFS-14527.003.patch, HDFS-14527.004.patch, HDFS-14527.005.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16857905#comment-16857905 ] Ayush Saxena commented on HDFS-14527: - No more from my side > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, > HDFS-14527.003.patch, HDFS-14527.004.patch, HDFS-14527.005.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16857890#comment-16857890 ] Íñigo Goiri commented on HDFS-14527: +1 on [^HDFS-14527.005.patch]. [~ayushtkn], any further comments? > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, > HDFS-14527.003.patch, HDFS-14527.004.patch, HDFS-14527.005.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16857262#comment-16857262 ] He Xiaoqiao commented on HDFS-14527: Thanks [~elgoiri], checked failed unit tests and run {{TestDataNodeHotSwapVolumes}} & {{TestDFSZKFailoverController}} local, both passed. {{TestWebHdfsTimeouts}} and {{TestBalancer}} meet timeout exception and I do not think it is related with this patch. > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, > HDFS-14527.003.patch, HDFS-14527.004.patch, HDFS-14527.005.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856852#comment-16856852 ] Íñigo Goiri commented on HDFS-14527: [^HDFS-14527.005.patch] looks good. I think the unit tests are unrelated; do you mind making a quick check on them? > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, > HDFS-14527.003.patch, HDFS-14527.004.patch, HDFS-14527.005.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856437#comment-16856437 ] Hadoop QA commented on HDFS-14527: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}112m 52s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 46s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}184m 30s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.tools.TestDFSZKFailoverController | | | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.balancer.TestBalancer | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14527 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12970906/HDFS-14527.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a2b211898a13 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cd17cc2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/26900/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/26900/testReport/ | | Max. process+thread count | 3855 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | |
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856309#comment-16856309 ] He Xiaoqiao commented on HDFS-14527: [^HDFS-14527.005.patch] fix checkstyle and rename {{chooseTargetFuturn}} to {{chooseTargetFuture}}. check the failed unit test, It seems not relate with this issue. > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, > HDFS-14527.003.patch, HDFS-14527.004.patch, HDFS-14527.005.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856219#comment-16856219 ] Íñigo Goiri commented on HDFS-14527: Just one minor comment, it looks like {{chooseTargetFuturn}} should be {{chooseTargetFuture}}, right? > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, > HDFS-14527.003.patch, HDFS-14527.004.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856181#comment-16856181 ] Hadoop QA commented on HDFS-14527: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 3s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 37s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 44s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 40 unchanged - 0 fixed = 42 total (was 40) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}128m 51s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}205m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestReencryption | | | hadoop.hdfs.TestReconstructStripedFile | | | hadoop.hdfs.server.namenode.TestReencryptionWithKMS | | | hadoop.hdfs.server.namenode.TestPersistentStoragePolicySatisfier | | | hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized | | | hadoop.hdfs.server.namenode.TestReconstructStripedBlocks | | | hadoop.hdfs.server.namenode.TestListCorruptFileBlocks | \\ \\ || Subsystem || Report/Notes || | Docker | Client=18.09.5 Server=18.09.5 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-14527 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12970870/HDFS-14527.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux cb7207dc3cc6 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 580b639 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856027#comment-16856027 ] He Xiaoqiao commented on HDFS-14527: [~elgoiri],[~ayushtkn] Thanks for your quick reviews. {quote}though the scenario seems quite difficult to happen at production{quote} yes, you are right, as mentioned in description, it is not reprod easily, but I did encounter this problem. I just submit another patch [^HDFS-14527.004.patch] and following all suggestions above. Thanks again. > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, > HDFS-14527.003.patch, HDFS-14527.004.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855975#comment-16855975 ] Hadoop QA commented on HDFS-14527: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 17s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 53s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}107m 45s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}176m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14527 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12970848/HDFS-14527.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5a7d38f32f30 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 827a847 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/26894/testReport/ | | Max. process+thread count | 2937 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/26894/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL:
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855923#comment-16855923 ] Íñigo Goiri commented on HDFS-14527: Just to clarify what [~ayushtkn] is proposing for the exception, instead of: {code} try { chooseTargetFuturn.get(); } catch (ArithmeticException ae) { fail("It meets RuntimeException since there are no DataNodes!"); } {code} It should be: {code} try { chooseTargetFuturn.get(); } catch (ExecutionException ee) { throw ee.getCause(); } {code} > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, > HDFS-14527.003.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855903#comment-16855903 ] Ayush Saxena commented on HDFS-14527: - Thanx [~hexiaoqiao] for the patch, though the scenario seems quite difficult to happen at production, but still technically it can. * Well if we pull {{numRacks}} calculation above the {{clusterSize}} calculation, same logic shall work? The present approach is correct too. Just if we want to keep the same logic. {code:java} try { chooseTargetFuturn.get(); } catch (ArithmeticException ae) { fail("It meets RuntimeException since there are no DataNodes!"); } {code} * This isn't working, This throws {{java.util.concurrent.ExecutionException}} the cause for which is {{ArithmeticException}} . May be you can let the exception surface in case of failure, rather than having this. Would be better for someone landing up with this failed. * For the Test do you need 6 Dn's? can go away with 2 too, I guess, might save some time. * For the miniDfsCluster, instead having finally to close at end you may use try with resources. Somewhat like this. {code:java} try(MiniDFSCluster miniCluster = new MiniDFSCluster.Builder(conf).racks(racks) .hosts(hosts).numDataNodes(hosts.length).build()){ {code} * Can make the name better little generic as compared to existing {{TestRedundancyMonitorChooseTarget}} may be related to BPP, and put a javadoc on the test explaining the scenario, So, that the test class may be used by someone else too in future. > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, > HDFS-14527.003.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855822#comment-16855822 ] He Xiaoqiao commented on HDFS-14527: Thanks [~elgoiri] for your detailed reviews. upload [^HDFS-14527.003.patch] to fix following comments. Pending jenkins. Another more reviews. Thanks again. > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, > HDFS-14527.003.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854844#comment-16854844 ] Íñigo Goiri commented on HDFS-14527: Thanks [~hexiaoqiao] for [^HDFS-14527.002.patch]: * Can we get rid of the whitebox deprecation? It may need some refactoring. * Can we fix the checkstyle? * Can we use lambdas for the submit operations? * Sleeping 500 ms is kind of arbitrary, is there any condition we cna way for? * Not a big fan of catching an exception to do a fail. What is the advantage with seeing the full exception go throw and fail the test? * Do a direct import of GenericTestUtils.DelayAnswer. * Add an overview of what the test is doing. * Make the capitalizaiton of the comments consistent. > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854807#comment-16854807 ] Hadoop QA commented on HDFS-14527: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 7s{color} | {color:red} hadoop-hdfs-project_hadoop-hdfs generated 2 new + 475 unchanged - 0 fixed = 477 total (was 475) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 46s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 40 unchanged - 0 fixed = 43 total (was 40) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 44s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 85m 22s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}152m 23s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.datanode.TestDataNodeLifeline | | | hadoop.hdfs.TestEncryptionZonesWithKMS | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14527 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12970688/HDFS-14527.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1e1d58c81c9c 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 59719dc | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | javac | https://builds.apache.org/job/PreCommit-HDFS-Build/26888/artifact/out/diff-compile-javac-hadoop-hdfs-project_hadoop-hdfs.txt | | checkstyle |
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854629#comment-16854629 ] He Xiaoqiao commented on HDFS-14527: [~elgoiri], [^HDFS-14527.002.patch] add a unit test and fix this issue. Please take a review. > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853190#comment-16853190 ] Íñigo Goiri commented on HDFS-14527: Can we add a unit test reproing the issue? > Stop all DataNodes may result in NN terminate > - > > Key: HDFS-14527 > URL: https://issues.apache.org/jira/browse/HDFS-14527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14527.001.patch > > > If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget > may get ArithmeticException when calling #getMaxNodesPerRack, which throws > the runtime exception out to BlockManager's ReplicationMonitor thread and > then terminate the NN. > The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the > global lock, and if all DataNodes are dead between > {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet > {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}. > {code:java} > private DatanodeStorageInfo[] chooseTarget(int numOfReplicas, > Node writer, > List chosenStorage, > boolean returnChosenNodes, > Set excludedNodes, > long blocksize, > final BlockStoragePolicy storagePolicy, > EnumSet addBlockFlags, > EnumMap sTypes) { > if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { > return DatanodeStorageInfo.EMPTY_ARRAY; > } > .. > int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas); > .. > } > {code} > Some detailed log show as following. > {code:java} > 2019-05-31 12:29:21,803 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388) > at java.lang.Thread.run(Thread.java:745) > 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > To be honest, this is not serious bug and not reprod easily, since if we stop > all Datanodes and only keep NameNode lives, HDFS could be not offer service > normally and we could only retrieve directory. It may be one corner case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate
[ https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852911#comment-16852911 ] Hadoop QA commented on HDFS-14527: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 50s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 56s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}131m 32s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14527 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12970452/HDFS-14527.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8f2831b763a9 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 52128e3 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/26877/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/26877/testReport/ | | Max. process+thread count | 4872 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output |