[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278488#comment-17278488 ] xuzq commented on HDFS-13609: - Thanks [~xkrogen]. {quote}Why JN1 is lagging: You're saying this is happening because JN1 wrote some txns to its cache, but not onto disk. Can you elaborate on why this causes it to lag? {quote} JN1 lagging, because the running JN1 is restarted with wrong _dfs.journalnode.edits.dir._ On this question, i think there are some bugs :(: # Cache is not reflective of what eventually written to disk in Journal. # _onlyDurableTxns_ is true in _selectRpcInputStreams_ is not correctness. Because maybe here is only have quorum responses, not all journal's response. And the first responses may not contain the full edits. It will cause can't tail any edits from journal. # _onlyDurableTxns_ is true in _editLogTailer.catchupDuringFailover()_, but is false in _getFSImage().editLog.openForWrite(getEffectiveLayoutVersion())_ in FSNamesystem#startActiveServices(). It maybe caused NameNode crash when failover it to active. [~vagarychen] and [~shv], thanks. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278150#comment-17278150 ] Erik Krogen commented on HDFS-13609: Thanks for the detailed explanation and example [~xuzq_zander]! I see now the issue. I believe the logic in the snippet you shared is doing the right thing -- and is necessary for correctness. I guess there are two issues being discussed here: # Why JN1 is lagging: You're saying this is happening because JN1 wrote some txns to its cache, but not onto disk. Can you elaborate on why this causes it to lag? It's been a long time since I looked at this code. Regardless, I agree that this is definitely a bug, and we should be doing whatever is necessary to keep the cache reflective of what eventually got written to disk. I don't know if the right approach is to write to the cache after the disk (this may cause performance issues?) or to invalidate the cache if the disk write fails. # Why JN1 lagging is causing broader issues: We use {{loggers.waitForWriteQuorum}} which returns as soon as it gets a quorum of responses, but in some cases we actually want to wait a bit longer but get more responses, for example in the case you described where JN1 keeps responding with a low txid. I actually have some memory of discussing this back in the implementation days with [~shv] and [~vagarychen] but don't remember the conclusion -- I think we were waiting to see if this became an issue in practice. It sounds like (1) is a legitimate bug, and (2) is kind of a bug and kind of a performance/reliability enhancement. Unfortunately I'm no longer working in this area so I can't spend much time beyond providing high-level input, but these sound both like good areas for improvement. Perhaps [~shv] can provide input on whether we're seeing similar issues on our end and whether or not he remembers any discussions on these matters. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277680#comment-17277680 ] xuzq commented on HDFS-13609: - Give one example to illustrate what I think. We have 5 journals, like jn1 ~ jn5. And Active write edits like: |Txid|SuccessWriteJournalId|FailedJournalId| |TxId1|jn2, jn3, jn4, jn5|jn1(write into cache, write disk failed)| |TxId2|jn2, jn3, jn4, jn5| | |TxId3|jn2, jn3, jn4, jn5| | |TxId4|jn2, jn3, jn4, jn5| | |TxId5|jn2, jn3, jn4, jn5| | When we attempt to failover standby to active, standby need to catchup all edits from TxId1 ~ TxId5 from TxId1, and change to active. But before to failover standby to active, jn4 and jn5 have some delay times, caused responseCounts like (0(jn1), 5(jn2), 5(jn3)) when _editLogTailer.catchupDuringFailover()._ Standby NameNode expect to get all edits from TxId1 ~ TxId5, but only get txId1. TxId2 ~ TxId5 don't applied into fsImage. And it will caused StandbyNameNode cashed when _getFSImage().editLog.openForWrite()._ I think we should use responseCounts(2) ~ responseCounts(4) to ensure can catchup all edits. But the last edit in responseCounts(2) ~ responseCounts(4) maybe is writing by active, maybe not on a quorum of JNs. It will cause Obsever NameNode or Standby NameNode tail UnQuorum edits. [~xkrogen] On this question, if you have some good ideas, please tell me, thanks. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277637#comment-17277637 ] xuzq commented on HDFS-13609: - Thanks [~xkrogen] for the comment. It is when {{onlyDurableTxns}} is true that we get {{responseCounts.get(0)}} In our production environment, one nameNode is down when we failover it to active, and cache one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1658) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslator PB.java:111) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:54 09) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:620) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1125) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3246) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3242) at java.base/java.security.AccessController.doPrivileged(AccessController.java:689) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3240) {code} After looking at the code, i think _editLogTailer.catchupDuringFailover()_ can't catchup all edits, cause check failed when _getFSImage().editLog.openForWrite()_. As when _{{onlyDurableTxns}}_ is true that we get {{_responseCounts.get(0)_, }}cause _editLogTailer.catchupDuringFailover()_ can't catchup all edits, because one journal is wrong when write journal on disk after write into cache, and this journal response is {{_responseCounts.get(0)_.}} {quote}Thus since we only got 3 responses, we have to take the lowest txn that any of those responses are aware of. {quote} * It maybe cause *_editLogTailer.catchupDuringFailover()_ can't catchup all edits* when _maxAllowedTxns={{responseCounts.get(0)=0.}}_ * __And It maybe cause doTailEdits can't tail any edits too. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277640#comment-17277640 ] xuzq commented on HDFS-13609: - Thanks [~xkrogen], It is when onlyDurableTxns is true that we get responseCounts.get(0). In our production environment, one nameNode is down when we failover it to active, and cached one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1658) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslator PB.java:111) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:54 09) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:620) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1125) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3246) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3242) at java.base/java.security.AccessController.doPrivileged(AccessController.java:689) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3240) {code} After looking at code, I think is _editLogTailer.catchupDuringFailover()_ can't catchup all edits, caused check failed when _getFSImage().editLog.openForWrite()_. And one journal is wrong when write edit into disk after write is into cache successfully. And as _onlyDurableTxns_ is true, then we get _responseCounts.get(0),_ and the wrong journal's response is _responseCounts.get(0),_ so caused _editLogTailer.catchupDuringFailover()_ can't catchup all edits. {quote}Thus since we only got 3 responses, we have to take the lowest txn that any of those responses are aware of. {quote} It maybe causes we can't tail any edits when the first response Journal is wrong. * It maybe caused _editLogTailer.catchupDuringFailover()_ can't catchup all edits, and NN crashed when change to active. * It maybe caused Observer NameNode can't supported read rpc. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277250#comment-17277250 ] Erik Krogen commented on HDFS-13609: Hi [~xuzq_zander], thanks for taking a look. {quote} when onlyDurableTxns is false, maxAllowedTxns = responseCounts.get(0) {quote} Correct me if I'm wrong but I think you have this backwards. If {{onlyDurableTxns}} is false, then {{maxAllowedTxns = highestTxnCount}} which is {{responseCounts.get(2)}} It is when {{onlyDurableTxns}} is true that you get {{responseCounts.get(0)}}. In this case, we really do need to take the lowest of the returned values. Since we only got 3 responses, we can't make any assumptions about the other 2 JNs, so just assume they have 0 txns. We only want to take txns that have landed on a quorum of JNs (thus becoming durable). Thus since we only got 3 responses, we have to take the lowest txn that any of those responses are aware of. For example if we got back {{(5, 10, 20)}}, then only txns 1-5 are available on all 3 JNs we got responses from, so those are the only transactions we know are durable. Of course more _might_ be durable if they were persisted on the two JNs we didn't get responses from, but we don't know that. Let me know if that clears things up. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277005#comment-17277005 ] xuzq commented on HDFS-13609: - Hi [~xkrogen] and [~linyiqun] , recently I am learning *Consistent Reads from Standby Node*. {code:java} private void selectRpcInputStreams(Collection streams, long fromTxnId, boolean onlyDurableTxns) throws IOException { QuorumCall q = loggers.getJournaledEdits(fromTxnId, maxTxnsPerRpc); Map responseMap = loggers.waitForWriteQuorum(q, selectInputStreamsTimeoutMs, "selectRpcInputStreams"); assert responseMap.size() >= loggers.getMajoritySize() : "Quorum call returned without a majority"; List responseCounts = new ArrayList<>(); for (GetJournaledEditsResponseProto resp : responseMap.values()) { responseCounts.add(resp.getTxnCount()); } Collections.sort(responseCounts); int highestTxnCount = responseCounts.get(responseCounts.size() - 1); ... // Cancel any outstanding calls to JN's. q.cancelCalls(); int maxAllowedTxns = !onlyDurableTxns ? highestTxnCount : responseCounts.get(responseCounts.size() - loggers.getMajoritySize()); if (maxAllowedTxns == 0) { LOG.debug("No new edits available in logs; requested starting from " + "ID " + fromTxnId); return; } ... } {code} Maybe somethings wrong in {code:java} int maxAllowedTxns = !onlyDurableTxns ? highestTxnCount : responseCounts.get(responseCounts.size() - loggers.getMajoritySize());{code} * Let's say we have 5 JournalNodes, and loggers.getMajoritySize() is 3. * _loggers.getJournaledEdits(fromTxnId, maxTxnsPerRpc)_ just need quorum result, so responseCounts.size() maybe is 3. * when _onlyDurableTxns_ is false, _maxAllowedTxns_ = responseCounts.get(0) * _responseCounts.get(0)_ maybe not expect Quorum Result, and it maybe even doesn't have any results from _fromTxnId_ ** maybe one journal disk is wrong, and only write into cache for _fromTxnId_ [~xkrogen] and [~linyiqun], if have time, please look at this question, thanks. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728479#comment-16728479 ] Hudson commented on HDFS-13609: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #15662 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15662/]) HDFS-13609. [SBN read] Edit Tail Fast Path Part 3: NameNode-side changes (shv: rev 00e99c65943e64fd696ec715cf21e851b93115f1) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/AsyncLogger.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/AsyncLoggerSet.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/client/TestQuorumJournalManagerUnit.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileInputStream.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumJournalManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/client/TestQuorumJournalManager.java > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16523243#comment-16523243 ] Erik Krogen commented on HDFS-13609: Thanks [~shv] and [~linyiqun]! [~csun]: * You are right, according to [Oracle's conventions|http://www.oracle.com/technetwork/articles/java/index-137868.html]: {quote} Insert a blank comment line between the description and the list of tags, as shown. {quote} I was not aware of this, thanks for educating me. * Thanks for the catch. I have attached v004 patch to document the final changes. Given how minor the v003 -> v004 patch change is (Chao's two whitespace comments, I just committed this based on the +1s on v003. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16523236#comment-16523236 ] genericqa commented on HDFS-13609: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HDFS-13609 does not apply to HDFS-12943. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-13609 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12929150/HDFS-13609-HDFS-12943.004.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/24492/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16521272#comment-16521272 ] Chao Sun commented on HDFS-13609: - +1 as well. Looks great! I have two _tiny_ nits: - Should we have an empty line between "mechanism optimized for low latency." and "@param streams The collection to store the return streams into." for the javadoc of {{selectRpcInputStreams}} method? - In {{hdfs-default.xml}}, extra space after "via the RPC-based mechanism". > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520924#comment-16520924 ] Yiqun Lin commented on HDFS-13609: -- Thanks for the explaination, [~xkrogen]. +1 for the fix. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520851#comment-16520851 ] Konstantin Shvachko commented on HDFS-13609: Erik, the last patch looks good. +1 > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520850#comment-16520850 ] genericqa commented on HDFS-13609: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 41s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} HDFS-12943 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 35m 9s{color} | {color:green} HDFS-12943 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} HDFS-12943 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} HDFS-12943 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} HDFS-12943 passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 4m 2s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s{color} | {color:green} HDFS-12943 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} HDFS-12943 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 2m 27s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 6s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}150m 34s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestHASafeMode | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HDFS-13609 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12928807/HDFS-13609-HDFS-12943.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 81c7fa97f970 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-12943 / 292ccdc | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/24485/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/24485/testReport/ | | Max. process+thread count | 3085 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U:
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520734#comment-16520734 ] Erik Krogen commented on HDFS-13609: Thanks for looking at {{BackupImage}} [~shv]! I had no idea what was happening there :) I have removed this extra parameter altogether; now on the QJM {{optimizeLatency == inProgressOK}}. This reduced the scope of changes significantly. [~linyiqun], thanks for taking a look! I agree with you on the Precondition check; I have incorporated into v003. For your second comment, it should be {{ < 0 }}. A return value of {{highestTxnCount == 0}} is expected behavior if you have read all available edits and continue to request more; this is not an error situation, while seeing a value {{ < 0 }} is an error. Let me know if you disagree. Uploaded v003 patch incorporating all of [~shv] and [~linyiqun]'s comments. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520115#comment-16520115 ] Yiqun Lin commented on HDFS-13609: -- Hi [~xkrogen], recently I am learning *Consistent Reads from Standby Node*. Just reviewed on this, two comments: * Looks like we should do a Precondition check when getting {{maxTxnsPerRpc}} value from configuration. If a invalid max -txns value configured (0 or <0), no edits data will be returned. * {code:java} private void selectRpcInputStreams(Collection streams, long fromTxnId, boolean onlyDurableTxns) throws IOException { ... int highestTxnCount = responseCounts.get(responseCounts.size() - 1); if (LOG.isDebugEnabled() || highestTxnCount < 0) { ... msg.append(">"); if (highestTxnCount < 0) { throw new IOException("Did not get any valid JournaledEdits " + "responses: " + msg); } else { LOG.debug(msg.toString()); } } ... } {code} Here {{highestTxnCount < 0}} is accurate? Seems {{highestTxnCount <= 0}} is right. The txnCount can be 0 returned by {{JournaledEditsCache#retrieveEdits}} (For example, when requestedStartTxn > highestTxnId). > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519923#comment-16519923 ] Konstantin Shvachko commented on HDFS-13609: Very comprehensive analysis [~xkrogen]. I was looking at {{BackupImage.tryConvergeJournalSpool()}}. Remembered some details. The spool in BackupNode context is an edits where BackupNode writes transactions received from NN during checkpointing the image, because ut cannot apply them to the memory state while writing the image. After completing the image write, BN reads the edits it saved, this is called {{convergeJournalSpool()}}. Since BN uses only EditLogFileStreams the effect of the optimization parameter should be completely ignored. Anyways I checked and run {{TestBackupNode}} with both {{optimizeLatency = true}} and {{false}} and it passed. I think it is safe to use the single parameter in this case. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517672#comment-16517672 ] Erik Krogen commented on HDFS-13609: Thanks for the review [~shv]! # I agree that this would be much cleaner. In many cases, {{inProgressOk}} will be equivalent to {{optimizeLatency}}. However there are a few cases where this is not currently true: ** {{FSEditLog#openForWrite()}} - It is using {{selectInputStreams}} to confirm that no one else is writing new transactions. It seems fine to allow this to use the RPC mechanism. ** {{BootstrapStandby#checkLogsAvailableForRead()}} - It is confirming that a range of transaction IDs are available. Seems fine to allow this to use the RPC mechanism. ** {{NameNodeRpcServer#getEventBatchList()}} - Serves ranges of transactions for INotify feature. Seems fine (actually, seems desirable) to let this use the RPC mechanism. However, on a slightly unrelated note, one portion of this will need to be changed to work properly in a read-from-standby environment... Filed HDFS-13689 for this. ** {{NameNode#copyEditLogSegmentsToSharedDir()}} - This is only called on {{NameNode#initializeSharedEdits()}}, i.e. a separate startup flag for the NameNode. I don't think it's necessary to optimize for this situation. ** {{BackupImage#tryConvergeJournalSpool()}} - This code is doing some sketchy things and making assumptions about the streams returned that will not be true when using the RPC mechanism. We need to prevent this from using the RPC mechanism, but given that this is only for the BackupNode, I recommend we avoid adding a new API / parameter just for this situation and disable the RPC mechanism on the BackupNode entirely. I instead propose that we add a way for the BackupNode to disable RPC reads on the {{QuorumJournalManager}}. This could take the form of an undocumented config parameter, or, my preference, add a static method {{QuorumJournalManager.disableRPCJournalStreams()}} which the BackupNode can call. If you agree that we can handle {{BackupImage}} as I described, I think I can remove this new parameter and limit the scope of the change. # Agreed. I will fix this in the next patch. # I thought more about why an operator might want to change this config. I determined that I can imagine situations when I would want to increase it, if the situation arises that RPC response time from the JournalNodes is high and the number of transactions per second is very high (say, a very high write workload). But I can't think of a reason to lower it; this is more about just setting a sanity-check upper bound. This makes me think we should (a) raise the default limit to 5000 -> even with a RTT RPC time of 100ms, which is quite high, this would allow 50k transactions per second, (b) make it undocumented as you described. I will incorporate this into the next patch. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514540#comment-16514540 ] Konstantin Shvachko commented on HDFS-13609: Looks like shadedclient failures are also in the trunk. I wouldn't worry about it here. # Looking at the patch I see that a lot of changes are related to adding new parameter {{boolean optimizeLatency}} into {{LogsPurgeable.selectInputStreams()}}, which in turn affected {{JournalManager}} interface. The parameter is actively used only in {{QuorumJournalManager}}. In all other implementations it is ignored. In {{QuorumJournalManager.selectInputStreams()}} implementation you require {{optimizeLatency}} to be the same as {{inProgressOk}} except when {{optimizeLatency == false && nProgressOk == true}}. But in the latter case {{optimizeLatency}} is ignored. So my main question is can we simply use {{inProgressOk}} as an indicator to optimize for latency and drop the {{optimizeLatency}} parameter? This should simplify changes a lot. # In {{hdfs-default.xml}} rephrase "This will also enable tailing of edit logs via" -> "This enables tailing of edit logs via". Like that you clarify it. # Should {{dfs.ha.tail-edits.qjm.rpc.max-txns}} be a public or an undocumented config parameter? I see there is a bunch of "Change with caution" properties in {{hdfs-default.xml}}. This is exactly why we keep them undocumented. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513921#comment-16513921 ] Erik Krogen commented on HDFS-13609: It looks like the shadedclient failure is due to the following error while running {{common-test-bats-driver}} within {{hadoop-common}}: {code} [exec] not ok 2 hadoop_stop_daemon_force_kill [exec] # (in test file hadoop_stop_daemon.bats, line 43) [exec] # `[ -f ${TMP}/pidfile ]' failed [exec] # bindir: /testptch/hadoop/hadoop-common-project/hadoop-common/src/test/scripts [exec] # sh: 0: Can't open /testptch/hadoop/hadoop-common-project/hadoop-common/src/test/scripts/process_with_sigterm_trap.sh {code} The error is identical across the branch and the patch so it must be unrelated. I don't know enough about this {{common-test-bats-driver}} to dig deeper... > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513314#comment-16513314 ] genericqa commented on HDFS-13609: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} HDFS-12943 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 32m 2s{color} | {color:green} HDFS-12943 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} HDFS-12943 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} HDFS-12943 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} HDFS-12943 passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 3m 30s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s{color} | {color:green} HDFS-12943 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} HDFS-12943 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 2m 21s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 18s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}124m 55s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HDFS-13609 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927913/HDFS-13609-HDFS-12943.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 60ad511fa3dd 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-12943 / 292ccdc | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/24448/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/24448/testReport/ | | Max. process+thread count | 3591 (vs. ulimit of 1) | | modules | C:
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513226#comment-16513226 ] Erik Krogen commented on HDFS-13609: Noticed two issues with {{TestQuorumJournalManager}}. Uploaded v002 addressing these. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513219#comment-16513219 ] Erik Krogen commented on HDFS-13609: Just attached v001 patch applied on top of changes in HDFS-13607 / HDFS-13608. Also did some cleanup from v000 patch, and removed Java 8 functionality (lambdas / stream). Should be ready for review. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487627#comment-16487627 ] Erik Krogen commented on HDFS-13609: Attaching v000 patch but not marking patch available since this depends on HDFS-13607 / HDFS-13608 > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13609-HDFS-12943.000.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org