[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled
[ https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378010#comment-16378010 ] Chao Sun commented on HDFS-13145: - Thanks [~xkrogen] for the review and [~shv] for committing the patch. :) > SBN crash when transition to ANN with in-progress edit tailing enabled > -- > > Key: HDFS-13145 > URL: https://issues.apache.org/jira/browse/HDFS-13145 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 3.0.0 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Fix For: 3.0.2 > > Attachments: HDFS-13145.000.patch, HDFS-13145.001.patch > > > With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} > will send two batches to JNs, one normal edit batch followed by a dummy batch > to update the commit ID on JNs. > {code} > QuorumCallqcall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > numReadyTxns, data); > loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits"); > > // Since we successfully wrote this batch, let the loggers know. Any > future > // RPCs will thus let the loggers know of the most recent transaction, > even > // if a logger has fallen behind. > loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1); > // If we don't have this dummy send, committed TxId might be one-batch > // stale on the Journal Nodes > if (updateCommittedTxId) { > QuorumCall fakeCall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > 0, new byte[0]); > loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits"); > } > {code} > Between each batch, it will wait for the JNs to reach a quorum. However, if > the ANN crashes in between, then SBN will crash while transiting to ANN: > {code} > java.lang.IllegalStateException: Cannot start writing at txid 24312595802 > when there is a stream available for read: .. > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839) > at > org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) > at > org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622) > at > org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) > at > org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490) > 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > This is because without the dummy batch, the {{commitTxnId}} will lag behind > the {{endTxId}}, which caused the check in {{openForWrite}} to fail: > {code} > List streams = new ArrayList(); > journalSet.selectInputStreams(streams, segmentTxId, true, false); > if (!streams.isEmpty()) { > String error = String.format("Cannot start writing at txid %s " + > "when there is a stream available for read: %s", > segmentTxId, streams.get(0)); > IOUtils.cleanupWithLogger(LOG, > streams.toArray(new EditLogInputStream[0])); > throw new IllegalStateException(error); > } > {code} > In our environment, this can be reproduced pretty consistently, which will > leave the cluster with no running namenodes. Even though we are using a 2.8.2 > backport, I believe the same issue also
[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled
[ https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377846#comment-16377846 ] Hudson commented on HDFS-13145: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13722 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13722/]) HDFS-13145. SBN crash when transition to ANN with in-progress edit (shv: rev ae290a4bb4e514e2fe9b40d28426a7589afe2a3f) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumJournalManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/client/TestQuorumJournalManager.java > SBN crash when transition to ANN with in-progress edit tailing enabled > -- > > Key: HDFS-13145 > URL: https://issues.apache.org/jira/browse/HDFS-13145 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 3.0.0 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Fix For: 3.0.2 > > Attachments: HDFS-13145.000.patch, HDFS-13145.001.patch > > > With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} > will send two batches to JNs, one normal edit batch followed by a dummy batch > to update the commit ID on JNs. > {code} > QuorumCallqcall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > numReadyTxns, data); > loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits"); > > // Since we successfully wrote this batch, let the loggers know. Any > future > // RPCs will thus let the loggers know of the most recent transaction, > even > // if a logger has fallen behind. > loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1); > // If we don't have this dummy send, committed TxId might be one-batch > // stale on the Journal Nodes > if (updateCommittedTxId) { > QuorumCall fakeCall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > 0, new byte[0]); > loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits"); > } > {code} > Between each batch, it will wait for the JNs to reach a quorum. However, if > the ANN crashes in between, then SBN will crash while transiting to ANN: > {code} > java.lang.IllegalStateException: Cannot start writing at txid 24312595802 > when there is a stream available for read: .. > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839) > at > org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) > at > org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622) > at > org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) > at > org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490) > 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > This is because without the dummy batch, the {{commitTxnId}} will lag behind > the {{endTxId}}, which caused the check in {{openForWrite}} to fail: > {code} > List streams = new ArrayList(); > journalSet.selectInputStreams(streams, segmentTxId, true, false); > if (!streams.isEmpty()) { > String error = String.format("Cannot start writing at txid %s " + > "when there is a stream
[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled
[ https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377785#comment-16377785 ] Konstantin Shvachko commented on HDFS-13145: +1 Will commit in a bit. > SBN crash when transition to ANN with in-progress edit tailing enabled > -- > > Key: HDFS-13145 > URL: https://issues.apache.org/jira/browse/HDFS-13145 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 3.0.0 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13145.000.patch, HDFS-13145.001.patch > > > With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} > will send two batches to JNs, one normal edit batch followed by a dummy batch > to update the commit ID on JNs. > {code} > QuorumCallqcall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > numReadyTxns, data); > loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits"); > > // Since we successfully wrote this batch, let the loggers know. Any > future > // RPCs will thus let the loggers know of the most recent transaction, > even > // if a logger has fallen behind. > loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1); > // If we don't have this dummy send, committed TxId might be one-batch > // stale on the Journal Nodes > if (updateCommittedTxId) { > QuorumCall fakeCall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > 0, new byte[0]); > loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits"); > } > {code} > Between each batch, it will wait for the JNs to reach a quorum. However, if > the ANN crashes in between, then SBN will crash while transiting to ANN: > {code} > java.lang.IllegalStateException: Cannot start writing at txid 24312595802 > when there is a stream available for read: .. > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839) > at > org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) > at > org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622) > at > org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) > at > org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490) > 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > This is because without the dummy batch, the {{commitTxnId}} will lag behind > the {{endTxId}}, which caused the check in {{openForWrite}} to fail: > {code} > List streams = new ArrayList(); > journalSet.selectInputStreams(streams, segmentTxId, true, false); > if (!streams.isEmpty()) { > String error = String.format("Cannot start writing at txid %s " + > "when there is a stream available for read: %s", > segmentTxId, streams.get(0)); > IOUtils.cleanupWithLogger(LOG, > streams.toArray(new EditLogInputStream[0])); > throw new IllegalStateException(error); > } > {code} > In our environment, this can be reproduced pretty consistently, which will > leave the cluster with no running namenodes. Even though we are using a 2.8.2 > backport, I believe the same issue also exist in 3.0.x. -- This message was sent by Atlassian
[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled
[ https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375442#comment-16375442 ] genericqa commented on HDFS-13145: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 31s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}127m 4s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}178m 54s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | | | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13145 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12911876/HDFS-13145.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux cc8ff5e5c254 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1e84e46 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/23187/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/23187/testReport/ | | Max. process+thread count | 2838 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output
[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled
[ https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375351#comment-16375351 ] Chao Sun commented on HDFS-13145: - Thank you for the review [~xkrogen]! I've attached patch v1 addressing the comments. Also I used {{verifyEdits()}} to check the selected input streams, which seems a better choice. > SBN crash when transition to ANN with in-progress edit tailing enabled > -- > > Key: HDFS-13145 > URL: https://issues.apache.org/jira/browse/HDFS-13145 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13145.000.patch, HDFS-13145.001.patch > > > With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} > will send two batches to JNs, one normal edit batch followed by a dummy batch > to update the commit ID on JNs. > {code} > QuorumCallqcall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > numReadyTxns, data); > loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits"); > > // Since we successfully wrote this batch, let the loggers know. Any > future > // RPCs will thus let the loggers know of the most recent transaction, > even > // if a logger has fallen behind. > loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1); > // If we don't have this dummy send, committed TxId might be one-batch > // stale on the Journal Nodes > if (updateCommittedTxId) { > QuorumCall fakeCall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > 0, new byte[0]); > loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits"); > } > {code} > Between each batch, it will wait for the JNs to reach a quorum. However, if > the ANN crashes in between, then SBN will crash while transiting to ANN: > {code} > java.lang.IllegalStateException: Cannot start writing at txid 24312595802 > when there is a stream available for read: .. > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839) > at > org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) > at > org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622) > at > org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) > at > org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490) > 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > This is because without the dummy batch, the {{commitTxnId}} will lag behind > the {{endTxId}}, which caused the check in {{openForWrite}} to fail: > {code} > List streams = new ArrayList(); > journalSet.selectInputStreams(streams, segmentTxId, true, false); > if (!streams.isEmpty()) { > String error = String.format("Cannot start writing at txid %s " + > "when there is a stream available for read: %s", > segmentTxId, streams.get(0)); > IOUtils.cleanupWithLogger(LOG, > streams.toArray(new EditLogInputStream[0])); > throw new IllegalStateException(error); > } > {code} > In our environment, this can be reproduced pretty consistently, which will > leave the cluster with no running namenodes. Even though we are using a
[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled
[ https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375221#comment-16375221 ] genericqa commented on HDFS-13145: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 23s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 32s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}125m 51s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}186m 20s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestBlocksScheduledCounter | | | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13145 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12911816/HDFS-13145.000.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 212343408dbf 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 68ce193 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/23178/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/23178/testReport/ | | Max. process+thread count | 2980 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U:
[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled
[ https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375217#comment-16375217 ] genericqa commented on HDFS-13145: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 42s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 18s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}124m 40s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}172m 11s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13145 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12911818/HDFS-13145.000.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f59f4dca3303 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 68ce193 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/23179/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/23179/testReport/ | | Max. process+thread count | 3619 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/23179/console | | Powered by | Apache Yetus
[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled
[ https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375172#comment-16375172 ] Erik Krogen commented on HDFS-13145: v0 LGTM. Simple fix. Verified that the test fails without your change. Considering that HDFS-10519 is in 3.0, I'm thinking we should target this for branch-3.0 and up? > SBN crash when transition to ANN with in-progress edit tailing enabled > -- > > Key: HDFS-13145 > URL: https://issues.apache.org/jira/browse/HDFS-13145 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13145.000.patch > > > With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} > will send two batches to JNs, one normal edit batch followed by a dummy batch > to update the commit ID on JNs. > {code} > QuorumCallqcall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > numReadyTxns, data); > loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits"); > > // Since we successfully wrote this batch, let the loggers know. Any > future > // RPCs will thus let the loggers know of the most recent transaction, > even > // if a logger has fallen behind. > loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1); > // If we don't have this dummy send, committed TxId might be one-batch > // stale on the Journal Nodes > if (updateCommittedTxId) { > QuorumCall fakeCall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > 0, new byte[0]); > loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits"); > } > {code} > Between each batch, it will wait for the JNs to reach a quorum. However, if > the ANN crashes in between, then SBN will crash while transiting to ANN: > {code} > java.lang.IllegalStateException: Cannot start writing at txid 24312595802 > when there is a stream available for read: .. > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839) > at > org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) > at > org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622) > at > org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) > at > org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490) > 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > This is because without the dummy batch, the {{commitTxnId}} will lag behind > the {{endTxId}}, which caused the check in {{openForWrite}} to fail: > {code} > List streams = new ArrayList(); > journalSet.selectInputStreams(streams, segmentTxId, true, false); > if (!streams.isEmpty()) { > String error = String.format("Cannot start writing at txid %s " + > "when there is a stream available for read: %s", > segmentTxId, streams.get(0)); > IOUtils.cleanupWithLogger(LOG, > streams.toArray(new EditLogInputStream[0])); > throw new IllegalStateException(error); > } > {code} > In our environment, this can be reproduced pretty consistently, which will > leave the cluster with no running namenodes. Even though we are using a 2.8.2 > backport, I believe
[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled
[ https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372103#comment-16372103 ] Erik Krogen commented on HDFS-13145: Agreed, SGTM. > SBN crash when transition to ANN with in-progress edit tailing enabled > -- > > Key: HDFS-13145 > URL: https://issues.apache.org/jira/browse/HDFS-13145 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > > With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} > will send two batches to JNs, one normal edit batch followed by a dummy batch > to update the commit ID on JNs. > {code} > QuorumCallqcall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > numReadyTxns, data); > loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits"); > > // Since we successfully wrote this batch, let the loggers know. Any > future > // RPCs will thus let the loggers know of the most recent transaction, > even > // if a logger has fallen behind. > loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1); > // If we don't have this dummy send, committed TxId might be one-batch > // stale on the Journal Nodes > if (updateCommittedTxId) { > QuorumCall fakeCall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > 0, new byte[0]); > loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits"); > } > {code} > Between each batch, it will wait for the JNs to reach a quorum. However, if > the ANN crashes in between, then SBN will crash while transiting to ANN: > {code} > java.lang.IllegalStateException: Cannot start writing at txid 24312595802 > when there is a stream available for read: .. > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839) > at > org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) > at > org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622) > at > org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) > at > org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490) > 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > This is because without the dummy batch, the {{commitTxnId}} will lag behind > the {{endTxId}}, which caused the check in {{openForWrite}} to fail: > {code} > List streams = new ArrayList(); > journalSet.selectInputStreams(streams, segmentTxId, true, false); > if (!streams.isEmpty()) { > String error = String.format("Cannot start writing at txid %s " + > "when there is a stream available for read: %s", > segmentTxId, streams.get(0)); > IOUtils.cleanupWithLogger(LOG, > streams.toArray(new EditLogInputStream[0])); > throw new IllegalStateException(error); > } > {code} > In our environment, this can be reproduced pretty consistently, which will > leave the cluster with no running namenodes. Even though we are using a 2.8.2 > backport, I believe the same issue also exist in 3.0.x. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled
[ https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372086#comment-16372086 ] Chao Sun commented on HDFS-13145: - Thanks [~xkrogen]! Very good points. I like the change on the if-statement which is even simpler than the solution I'm proposing. It seems we still need to resolve this JIRA rather than waiting for [HDFS-13150|https://issues.apache.org/jira/browse/HDFS-13150]? If that's so, I'll submit a patch on this. > SBN crash when transition to ANN with in-progress edit tailing enabled > -- > > Key: HDFS-13145 > URL: https://issues.apache.org/jira/browse/HDFS-13145 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > > With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} > will send two batches to JNs, one normal edit batch followed by a dummy batch > to update the commit ID on JNs. > {code} > QuorumCallqcall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > numReadyTxns, data); > loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits"); > > // Since we successfully wrote this batch, let the loggers know. Any > future > // RPCs will thus let the loggers know of the most recent transaction, > even > // if a logger has fallen behind. > loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1); > // If we don't have this dummy send, committed TxId might be one-batch > // stale on the Journal Nodes > if (updateCommittedTxId) { > QuorumCall fakeCall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > 0, new byte[0]); > loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits"); > } > {code} > Between each batch, it will wait for the JNs to reach a quorum. However, if > the ANN crashes in between, then SBN will crash while transiting to ANN: > {code} > java.lang.IllegalStateException: Cannot start writing at txid 24312595802 > when there is a stream available for read: .. > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839) > at > org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) > at > org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622) > at > org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) > at > org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490) > 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > This is because without the dummy batch, the {{commitTxnId}} will lag behind > the {{endTxId}}, which caused the check in {{openForWrite}} to fail: > {code} > List streams = new ArrayList(); > journalSet.selectInputStreams(streams, segmentTxId, true, false); > if (!streams.isEmpty()) { > String error = String.format("Cannot start writing at txid %s " + > "when there is a stream available for read: %s", > segmentTxId, streams.get(0)); > IOUtils.cleanupWithLogger(LOG, > streams.toArray(new EditLogInputStream[0])); > throw new IllegalStateException(error); > } > {code} > In our environment, this can be reproduced pretty consistently, which will > leave the
[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled
[ https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371954#comment-16371954 ] Erik Krogen commented on HDFS-13145: I agree that the issue is still possible even without the dummy sync. IMO the underlying issue is that the {{committedTxnId}} on JNs was not meant to be used for consistency, but rather as a sanity check: {code} /** * Lower-bound on the last committed transaction ID. This is not * depended upon for correctness, but acts as a sanity check * during the recovery procedures, and as a visibility mark * for clients reading in-progress logs. */ private BestEffortLongFile committedTxnId; {code} So it is not surprising that trying to use it for correctness causes issues. The design I am proposing for fast path (just finishing internal review now) no longer uses {{committedTxnId}} which is another benefit. I agree that we should not read in-progress edit logs while catching up for failover. However, actually, I think the problem is that the if-statement, instead of {code} if (onlyDurableTxns && inProgressOk) { {code} should be {code} if (onlyDurableTxns && inProgressOk && remoteLog.isInProgress()) { {code} In the case you described above, the remote log you are currently reading from is not actually in progress. It has been finalized by {code} // May need to recover editLog.recoverUnclosedStreams(); {code} In a finalized segment, we are sure that all transactions are committed. Thus we only need to do any modifications as a result of {{onlyDurableTxns}} _if_ the edit log in question is in-progress. > SBN crash when transition to ANN with in-progress edit tailing enabled > -- > > Key: HDFS-13145 > URL: https://issues.apache.org/jira/browse/HDFS-13145 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > > With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} > will send two batches to JNs, one normal edit batch followed by a dummy batch > to update the commit ID on JNs. > {code} > QuorumCallqcall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > numReadyTxns, data); > loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits"); > > // Since we successfully wrote this batch, let the loggers know. Any > future > // RPCs will thus let the loggers know of the most recent transaction, > even > // if a logger has fallen behind. > loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1); > // If we don't have this dummy send, committed TxId might be one-batch > // stale on the Journal Nodes > if (updateCommittedTxId) { > QuorumCall fakeCall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > 0, new byte[0]); > loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits"); > } > {code} > Between each batch, it will wait for the JNs to reach a quorum. However, if > the ANN crashes in between, then SBN will crash while transiting to ANN: > {code} > java.lang.IllegalStateException: Cannot start writing at txid 24312595802 > when there is a stream available for read: .. > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839) > at > org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) > at > org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622) > at > org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) > at > org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851) > at
[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled
[ https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371921#comment-16371921 ] Chao Sun commented on HDFS-13145: - Did more research on this issue. I think this issue happens in the following order: # Active NN exits right before flushing the dummy batch. Also, because of the abrupt exit (in our case, it exited with SIGNAL 15: SIGTERM, even when we call {{hadoop-daemon.sh stop namenode}}), it will not stop the active services, and most importantly, will not finalize the current log segment. As result, the {{committedTxnId}} on the remote journal will be less than the {{endTxId}}. # When the SBN take over, it will [catch up to the latest edits from old active|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L1214]. When selecting input stream, it will set the {{endTxId}} to be {{committedTxnId}} if the latter is smaller: {code:java} // If it's bounded by durable Txns, endTxId could not be larger // than committedTxnId. This ensures the consistency. if (onlyDurableTxns && inProgressOk) { endTxId = Math.min(endTxId, committedTxnId); if (endTxId < remoteLog.getStartTxId()) { LOG.warn("Found endTxId (" + endTxId + ") that is less than " + "the startTxId (" + remoteLog.getStartTxId() + ") - setting it to startTxId."); endTxId = remoteLog.getStartTxId(); } } {code} # After catching up all edits, the SBN [set the nextTnId to be endTxId + 1|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L1238]{{}} and call {{FSEditLog::openForWrite}}, which will select all input streams from remote journals using the {{nextTnId}} as the starting transaction ID. The result will be non-empty because of the issue in 1). # Exception is thrown because of 3). [~xkrogen]: I could be wrong but I think the step 1) could still happen even if we get rid of the dummy patch, unless we constantly making sure that the {{committedTxnId}} is up-to-date with the {{endTxId}}. One potential fix would be to *not* tail in-progress log when the SBN catches up the latest edits in failover. This seems to me benign.The change is simple and I've tested in our dev environment. > SBN crash when transition to ANN with in-progress edit tailing enabled > -- > > Key: HDFS-13145 > URL: https://issues.apache.org/jira/browse/HDFS-13145 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > > With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} > will send two batches to JNs, one normal edit batch followed by a dummy batch > to update the commit ID on JNs. > {code} > QuorumCallqcall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > numReadyTxns, data); > loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits"); > > // Since we successfully wrote this batch, let the loggers know. Any > future > // RPCs will thus let the loggers know of the most recent transaction, > even > // if a logger has fallen behind. > loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1); > // If we don't have this dummy send, committed TxId might be one-batch > // stale on the Journal Nodes > if (updateCommittedTxId) { > QuorumCall fakeCall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > 0, new byte[0]); > loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits"); > } > {code} > Between each batch, it will wait for the JNs to reach a quorum. However, if > the ANN crashes in between, then SBN will crash while transiting to ANN: > {code} > java.lang.IllegalStateException: Cannot start writing at txid 24312595802 > when there is a stream available for read: .. > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839) > at > org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) > at > org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) >
[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled
[ https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363287#comment-16363287 ] Chao Sun commented on HDFS-13145: - Sounds great [~xkrogen]! Looking forward to the design. :) > SBN crash when transition to ANN with in-progress edit tailing enabled > -- > > Key: HDFS-13145 > URL: https://issues.apache.org/jira/browse/HDFS-13145 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > > With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} > will send two batches to JNs, one normal edit batch followed by a dummy batch > to update the commit ID on JNs. > {code} > QuorumCallqcall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > numReadyTxns, data); > loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits"); > > // Since we successfully wrote this batch, let the loggers know. Any > future > // RPCs will thus let the loggers know of the most recent transaction, > even > // if a logger has fallen behind. > loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1); > // If we don't have this dummy send, committed TxId might be one-batch > // stale on the Journal Nodes > if (updateCommittedTxId) { > QuorumCall fakeCall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > 0, new byte[0]); > loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits"); > } > {code} > Between each batch, it will wait for the JNs to reach a quorum. However, if > the ANN crashes in between, then SBN will crash while transiting to ANN: > {code} > java.lang.IllegalStateException: Cannot start writing at txid 24312595802 > when there is a stream available for read: .. > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839) > at > org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) > at > org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622) > at > org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) > at > org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490) > 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > This is because without the dummy batch, the {{commitTxnId}} will lag behind > the {{endTxId}}, which caused the check in {{openForWrite}} to fail: > {code} > List streams = new ArrayList(); > journalSet.selectInputStreams(streams, segmentTxId, true, false); > if (!streams.isEmpty()) { > String error = String.format("Cannot start writing at txid %s " + > "when there is a stream available for read: %s", > segmentTxId, streams.get(0)); > IOUtils.cleanupWithLogger(LOG, > streams.toArray(new EditLogInputStream[0])); > throw new IllegalStateException(error); > } > {code} > In our environment, this can be reproduced pretty consistently, which will > leave the cluster with no running namenodes. Even though we are using a 2.8.2 > backport, I believe the same issue also exist in 3.0.x. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HDFS-13145) SBN crash when transition to ANN with in-progress edit tailing enabled
[ https://issues.apache.org/jira/browse/HDFS-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363272#comment-16363272 ] Erik Krogen commented on HDFS-13145: I noticed this as well and don't really think that the approach of sending a second "dummy" batch to update commitTxnId is the right one. It also gives additional work to the ANN which I would like to avoid. I am about to post a design for a "fast path" for an Observer to read from JNs which can reduce the lag time for a txn to appear on Observer after going through on the ANN down to a few ms. I have an idea for incorporating some logic to remove the necessity for this dummy batch. Details coming soon... > SBN crash when transition to ANN with in-progress edit tailing enabled > -- > > Key: HDFS-13145 > URL: https://issues.apache.org/jira/browse/HDFS-13145 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > > With edit log in-progress edit log tailing enabled, {{QuorumOutputStream}} > will send two batches to JNs, one normal edit batch followed by a dummy batch > to update the commit ID on JNs. > {code} > QuorumCallqcall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > numReadyTxns, data); > loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits"); > > // Since we successfully wrote this batch, let the loggers know. Any > future > // RPCs will thus let the loggers know of the most recent transaction, > even > // if a logger has fallen behind. > loggers.setCommittedTxId(firstTxToFlush + numReadyTxns - 1); > // If we don't have this dummy send, committed TxId might be one-batch > // stale on the Journal Nodes > if (updateCommittedTxId) { > QuorumCall fakeCall = loggers.sendEdits( > segmentTxId, firstTxToFlush, > 0, new byte[0]); > loggers.waitForWriteQuorum(fakeCall, writeTimeoutMs, "sendEdits"); > } > {code} > Between each batch, it will wait for the JNs to reach a quorum. However, if > the ANN crashes in between, then SBN will crash while transiting to ANN: > {code} > java.lang.IllegalStateException: Cannot start writing at txid 24312595802 > when there is a stream available for read: .. > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:329) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1196) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1839) > at > org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) > at > org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1707) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1622) > at > org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) > at > org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:851) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:794) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2490) > 2018-02-13 00:43:20,728 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > {code} > This is because without the dummy batch, the {{commitTxnId}} will lag behind > the {{endTxId}}, which caused the check in {{openForWrite}} to fail: > {code} > List streams = new ArrayList(); > journalSet.selectInputStreams(streams, segmentTxId, true, false); > if (!streams.isEmpty()) { > String error = String.format("Cannot start writing at txid %s " + > "when there is a stream available for read: %s", > segmentTxId, streams.get(0)); > IOUtils.cleanupWithLogger(LOG, >