[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895596#comment-16895596 ] Wei-Chiu Chuang commented on HDFS-14429: +1 [~caiyicong] looks like you've done lots of study about decomm and maint mode. I'm glad to have you contribute the fix. Committing it now. > Block remain in COMMITTED but not COMPLETE cause by Decommission > > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, > HDFS-14429.03.patch, HDFS-14429.branch-2.01.patch, > HDFS-14429.branch-2.02.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875357#comment-16875357 ] Yicong Cai commented on HDFS-14429: --- [~jojochuang] Before fixing this issue, the decommissing block will not complete the block, so the Redundancy check will not be performed. After fixing the problem, the Redundancy check will be performed and updateNeededReconstructions will be performed. The replication of the maintenance is Effective, but the decommission is not, so a neededReconstruction.update will cause curReplicas to be negative. {code:java} // handle low redundancy/extra redundancy short fileRedundancy = getExpectedRedundancyNum(storedBlock); if (!isNeededReconstruction(storedBlock, num, pendingNum)) { neededReconstruction.remove(storedBlock, numCurrentReplica, num.readOnlyReplicas(), num.outOfServiceReplicas(), fileRedundancy); } else { // Perform update updateNeededReconstructions(storedBlock, curReplicaDelta, 0); } {code} {code:java} if (!hasEnoughEffectiveReplicas(block, repl, pendingNum)) { neededReconstruction.update(block, repl.liveReplicas() + pendingNum, repl.readOnlyReplicas(), repl.outOfServiceReplicas(), curExpectedReplicas, curReplicasDelta, expectedReplicasDelta); } {code} {code:java} synchronized void update(BlockInfo block, int curReplicas, int readOnlyReplicas, int outOfServiceReplicas, int curExpectedReplicas, int curReplicasDelta, int expectedReplicasDelta) { // Cause Negative here int oldReplicas = curReplicas-curReplicasDelta; int oldExpectedReplicas = curExpectedReplicas-expectedReplicasDelta; int curPri = getPriority(block, curReplicas, readOnlyReplicas, outOfServiceReplicas, curExpectedReplicas); int oldPri = getPriority(block, oldReplicas, readOnlyReplicas, outOfServiceReplicas, oldExpectedReplicas); if(NameNode.stateChangeLog.isDebugEnabled()) { NameNode.stateChangeLog.debug("LowRedundancyBlocks.update " + block + " curReplicas " + curReplicas + " curExpectedReplicas " + curExpectedReplicas + " oldReplicas " + oldReplicas + " oldExpectedReplicas " + oldExpectedReplicas + " curPri " + curPri + " oldPri " + oldPri); } // oldPri is mostly correct, but not always. If not found with oldPri, // other levels will be searched until the block is found & removed. remove(block, oldPri, oldExpectedReplicas); if(add(block, curPri, curExpectedReplicas)) { NameNode.blockStateChangeLog.debug( "BLOCK* NameSystem.LowRedundancyBlock.update: {} has only {} " + "replicas and needs {} replicas so is added to " + "neededReconstructions at priority level {}", block, curReplicas, curExpectedReplicas, curPri); } } {code} > Block remain in COMMITTED but not COMPLETE cause by Decommission > > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, > HDFS-14429.03.patch, HDFS-14429.branch-2.01.patch, > HDFS-14429.branch-2.02.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at >
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875132#comment-16875132 ] Wei-Chiu Chuang commented on HDFS-14429: [~hexiaoqiao] thanks for the reviews. Test and fix LGTM with a question that I look for answers. I think I am a little confused as to the exact scenario it fixes -- decomm or maintenance mode or both? One of the line includes the number of nodes in decommissioned+decommissioning, but the other line also includes the in maintenance mode nodes. > Block remain in COMMITTED but not COMPLETE cause by Decommission > > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, > HDFS-14429.03.patch, HDFS-14429.branch-2.01.patch, > HDFS-14429.branch-2.02.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873435#comment-16873435 ] He Xiaoqiao commented on HDFS-14429: Thanks [~caiyicong], I try to verify at local, it runs as expect. I believe TestDecommission#testAllocAndIBRWhileDecommission can cover this corner case enough. The failed unit test seems not relate with this patch. +1 for [^HDFS-14429.03.patch], [~jojochuang] would you mind take another review? > Block remain in COMMITTED but not COMPLETE cause by Decommission > > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, > HDFS-14429.03.patch, HDFS-14429.branch-2.01.patch, > HDFS-14429.branch-2.02.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871159#comment-16871159 ] Hadoop QA commented on HDFS-14429: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 6s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 6s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 13s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}134m 24s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.diskbalancer.TestDiskBalancer | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14429 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12972706/HDFS-14429.03.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ac272ea014b6 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b28ddb2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/27053/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/27053/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results |
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871079#comment-16871079 ] Hadoop QA commented on HDFS-14429: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 52s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} branch-2 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} branch-2 passed with JDK v1.8.0_212 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 10s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 93m 7s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation | | | hadoop.hdfs.TestSecureEncryptionZoneWithKMS | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:da67579 | | JIRA Issue | HDFS-14429 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12972704/HDFS-14429.branch-2.02.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a1b02391690a 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 13 15:00:41 UTC 2019
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871032#comment-16871032 ] Yicong Cai commented on HDFS-14429: --- Thanks [~hexiaoqiao] for reviewing my patch. I have modified the three issues you mentioned a/b/c. trunk: [^HDFS-14429.03.patch] branch-2: [^HDFS-14429.branch-2.02.patch] d. Do we need add {{pendingNum}} when calc numUsableReplicas? {color:#FF}No need to add pendingNum. Because only the FINALIZED block and reach the minimum replication, COMPLETE can be entered. The Pending Block is not a FINALIZED block.{color} > Block remain in COMMITTED but not COMPLETE cause by Decommission > > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, > HDFS-14429.03.patch, HDFS-14429.branch-2.01.patch, > HDFS-14429.branch-2.02.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870846#comment-16870846 ] He Xiaoqiao commented on HDFS-14429: Thanks [~caiyicong] for your report and patch, this is good catch. I verify and test this patch at local, it works well. Just minor comments for [^HDFS-14429.02.patch], a. Please fix checkstyle refer to https://builds.apache.org/job/PreCommit-HDFS-Build/27046/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt b. Some labels of annotation about TestDecommission#testAllocAndIBRWhileDecommission may be unnecessary. c. InterruptedException may also be unnecessary about testAllocAndIBRWhileDecommission since there are no logic throw this exception. d. Do we need add {{pendingNum}} when calc numUsableReplicas? {code:java} int numUsableReplicas = num.liveReplicas() + num.decommissioning() + num.liveEnteringMaintenanceReplicas(); {code} Thanks again. > Block remain in COMMITTED but not COMPLETE cause by Decommission > > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, > HDFS-14429.branch-2.01.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870607#comment-16870607 ] Hadoop QA commented on HDFS-14429: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 32s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} branch-2 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 52s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} branch-2 passed with JDK v1.8.0_212 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 26s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 20 new + 159 unchanged - 0 fixed = 179 total (was 159) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 16s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}125m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.server.balancer.TestBalancerRPCDelay | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:da67579 | | JIRA Issue | HDFS-14429 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12972620/HDFS-14429.branch-2.01.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2b1a32e77838 3.13.0-153-generic #203-Ubuntu SMP
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870593#comment-16870593 ] Hadoop QA commented on HDFS-14429: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 33s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 58s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 38s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 18 new + 135 unchanged - 0 fixed = 153 total (was 135) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 44s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}106m 57s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}165m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.diskbalancer.TestDiskBalancer | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14429 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12972616/HDFS-14429.02.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2d54115f24c2 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b28ddb2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/27045/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/27045/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit |
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870579#comment-16870579 ] Yicong Cai commented on HDFS-14429: --- Provided branch-2 [^HDFS-14429.branch-2.01.patch] and trunck [^HDFS-14429.02.patch] [~jojochuang] > Block remain in COMMITTED but not COMPLETE cause by Decommission > > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, > HDFS-14429.branch-2.01.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866891#comment-16866891 ] Yicong Cai commented on HDFS-14429: --- Okay, I'll provide relevant test cases. > Block remain in COMMITTED but not COMPLETE cause by Decommission > > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: HDFS-14429.01.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866307#comment-16866307 ] Wei-Chiu Chuang commented on HDFS-14429: This is a nasty data race. [~caiyicong] would it be possible to supply a test case? Something similar to the test I added in HDFS-10240. > Block remain in COMMITTED but not COMPLETE cause by Decommission > > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: HDFS-14429.01.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864798#comment-16864798 ] Wei-Chiu Chuang commented on HDFS-14429: [~daryn] mind take a look? Seems [~caiyicong] fixed HDFS-12747 that you reported a while ago. > Block remain in COMMITTED but not COMPLETE cause by Decommission > > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: HDFS-14429.01.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834616#comment-16834616 ] Hadoop QA commented on HDFS-14429: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 7m 47s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 39s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 111 unchanged - 0 fixed = 112 total (was 111) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 29s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 1s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}144m 31s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.namenode.TestDecommissioningStatus | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14429 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12968029/HDFS-14429.01.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 37cd1a8e5eba 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 49e1292 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/26763/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit |