[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2021-02-03 Thread xuzq (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278488#comment-17278488
 ] 

xuzq commented on HDFS-13609:
-

Thanks [~xkrogen].
{quote}Why JN1 is lagging: You're saying this is happening because JN1 wrote 
some txns to its cache, but not onto disk. Can you elaborate on why this causes 
it to lag?
{quote}
JN1 lagging, because the running JN1 is restarted with wrong 
_dfs.journalnode.edits.dir._

 

On this question, i think there are some bugs :(:
 # Cache is not reflective of what eventually written to disk in Journal.
 # _onlyDurableTxns_ is true in _selectRpcInputStreams_ is not correctness. 
Because maybe here is only have quorum responses, not all journal's response. 
And the first responses may not contain the full edits. It will cause can't 
tail any edits from journal.
 # _onlyDurableTxns_ is true in _editLogTailer.catchupDuringFailover()_, but is 
false in _getFSImage().editLog.openForWrite(getEffectiveLayoutVersion())_ in 
FSNamesystem#startActiveServices(). It maybe caused NameNode crash when 
failover it to active.

 

[~vagarychen] and [~shv],  thanks.

 

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: HDFS-12943, 3.3.0
>
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, 
> HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2021-02-03 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278150#comment-17278150
 ] 

Erik Krogen commented on HDFS-13609:


Thanks for the detailed explanation and example [~xuzq_zander]! I see now the 
issue. I believe the logic in the snippet you shared is doing the right thing 
-- and is necessary for correctness. I guess there are two issues being 
discussed here:

# Why JN1 is lagging: You're saying this is happening because JN1 wrote some 
txns to its cache, but not onto disk. Can you elaborate on why this causes it 
to lag? It's been a long time since I looked at this code. Regardless, I agree 
that this is definitely a bug, and we should be doing whatever is necessary to 
keep the cache reflective of what eventually got written to disk. I don't know 
if the right approach is to write to the cache after the disk (this may cause 
performance issues?) or to invalidate the cache if the disk write fails.
# Why JN1 lagging is causing broader issues: We use 
{{loggers.waitForWriteQuorum}} which returns as soon as it gets a quorum of 
responses, but in some cases we actually want to wait a bit longer but get more 
responses, for example in the case you described where JN1 keeps responding 
with a low txid. I actually have some memory of discussing this back in the 
implementation days with [~shv] and [~vagarychen] but don't remember the 
conclusion -- I think we were waiting to see if this became an issue in 
practice.

It sounds like (1) is a legitimate bug, and (2) is kind of a bug and kind of a 
performance/reliability enhancement. Unfortunately I'm no longer working in 
this area so I can't spend much time beyond providing high-level input, but 
these sound both like good areas for improvement. Perhaps [~shv] can provide 
input on whether we're seeing similar issues on our end and whether or not he 
remembers any discussions on these matters.

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: HDFS-12943, 3.3.0
>
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, 
> HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2021-02-02 Thread xuzq (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277680#comment-17277680
 ] 

xuzq commented on HDFS-13609:
-

Give one example to illustrate what I think.

We have 5 journals, like jn1 ~ jn5.

And Active write edits like:
|Txid|SuccessWriteJournalId|FailedJournalId|
|TxId1|jn2, jn3, jn4, jn5|jn1(write into cache, write disk failed)|
|TxId2|jn2, jn3, jn4, jn5| |
|TxId3|jn2, jn3, jn4, jn5| |
|TxId4|jn2, jn3, jn4, jn5| |
|TxId5|jn2, jn3, jn4, jn5| |

 

 

When we attempt to failover standby to active, standby need to catchup all 
edits from TxId1 ~ TxId5 from TxId1, and change to active.

But before to failover standby to active, jn4 and jn5 have some delay times, 
caused responseCounts like (0(jn1), 5(jn2), 5(jn3)) when 
_editLogTailer.catchupDuringFailover()._

 

Standby NameNode expect to get all edits from TxId1 ~ TxId5, but only get 
txId1.  TxId2 ~ TxId5 don't applied into fsImage.

And it will caused StandbyNameNode cashed when 
_getFSImage().editLog.openForWrite()._

 

 

I think we should use responseCounts(2) ~ responseCounts(4) to ensure can 
catchup all edits.

But the last edit in responseCounts(2) ~ responseCounts(4) maybe is writing by 
active, maybe not on a quorum of JNs.

It will cause Obsever NameNode or Standby NameNode tail UnQuorum edits.

 

[~xkrogen] On this question, if you have some good ideas, please tell me, 
thanks.

 

 

 

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: HDFS-12943, 3.3.0
>
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, 
> HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2021-02-02 Thread xuzq (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277637#comment-17277637
 ] 

xuzq commented on HDFS-13609:
-

Thanks [~xkrogen] for the comment. It is when {{onlyDurableTxns}} is true that 
we get {{responseCounts.get(0)}}

In our production  environment,  one nameNode is down when we failover it to 
active, and cache one exception like:

 
{code:java}
2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode 
IPC Server handler 227 on 8022: Error encountered requiring N
N shutdown. Shutting down immediately.
java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when 
there is a stream available for read: org.apache.hadoop.hdfs
.server.namenode.RedundantEditLogInputStream@57d3ac44
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969)
at 
org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
at 
org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1658)
at 
org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslator
PB.java:111)
at 
org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:54
09)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:620)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1125)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3246)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3242)
at 
java.base/java.security.AccessController.doPrivileged(AccessController.java:689)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3240)
{code}
 

After looking at the code, i think _editLogTailer.catchupDuringFailover()_ 
can't catchup all edits, cause check failed when 
_getFSImage().editLog.openForWrite()_.

As when _{{onlyDurableTxns}}_ is true that we get {{_responseCounts.get(0)_, 
}}cause _editLogTailer.catchupDuringFailover()_ can't catchup all edits, 
because one journal is wrong when write journal on disk after write into cache, 
and this journal response is {{_responseCounts.get(0)_.}}

 
{quote}Thus since we only got 3 responses, we have to take the lowest txn that 
any of those responses are aware of.
{quote} * It maybe cause *_editLogTailer.catchupDuringFailover()_ can't catchup 
all edits* when _maxAllowedTxns={{responseCounts.get(0)=0.}}_
 * __And It maybe cause doTailEdits can't tail any edits too.

 

 

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: HDFS-12943, 3.3.0
>
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, 
> HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2021-02-02 Thread xuzq (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277640#comment-17277640
 ] 

xuzq commented on HDFS-13609:
-

Thanks [~xkrogen], It is when onlyDurableTxns is true that we get 
responseCounts.get(0).

In our production environment, one nameNode is down when we failover it to 
active, and cached one exception like:

 
{code:java}
2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode 
IPC Server handler 227 on 8022: Error encountered requiring N
N shutdown. Shutting down immediately.
java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when 
there is a stream available for read: org.apache.hadoop.hdfs
.server.namenode.RedundantEditLogInputStream@57d3ac44
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969)
at 
org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
at 
org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1658)
at 
org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslator
PB.java:111)
at 
org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:54
09)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:620)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1125)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3246)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3242)
at 
java.base/java.security.AccessController.doPrivileged(AccessController.java:689)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3240)
{code}
 

After looking at code, I think is _editLogTailer.catchupDuringFailover()_ can't 
catchup all edits, caused check failed when 
_getFSImage().editLog.openForWrite()_.

And one journal is wrong when write edit into disk after write is into cache 
successfully.

And as _onlyDurableTxns_ is true, then we get _responseCounts.get(0),_ and the 
wrong journal's response is _responseCounts.get(0),_ so caused 
_editLogTailer.catchupDuringFailover()_ can't catchup all edits.

 
{quote}Thus since we only got 3 responses, we have to take the lowest txn that 
any of those responses are aware of.
{quote}
It maybe causes we can't tail any edits when the first response Journal is 
wrong.
 * It maybe caused _editLogTailer.catchupDuringFailover()_ can't catchup all 
edits, and NN crashed when change to active.
 * It maybe caused Observer NameNode can't supported read rpc.

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: HDFS-12943, 3.3.0
>
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, 
> HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2021-02-02 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277250#comment-17277250
 ] 

Erik Krogen commented on HDFS-13609:


Hi [~xuzq_zander], thanks for taking a look.

{quote}
when onlyDurableTxns is false, maxAllowedTxns = responseCounts.get(0)
{quote}
Correct me if I'm wrong but I think you have this backwards. If 
{{onlyDurableTxns}} is false, then {{maxAllowedTxns = highestTxnCount}} which 
is {{responseCounts.get(2)}}

It is when {{onlyDurableTxns}} is true that you get {{responseCounts.get(0)}}. 
In this case, we really do need to take the lowest of the returned values. 
Since we only got 3 responses, we can't make any assumptions about the other 2 
JNs, so just assume they have 0 txns. We only want to take txns that have 
landed on a quorum of JNs (thus becoming durable). Thus since we only got 3 
responses, we have to take the lowest txn that any of those responses are aware 
of. For example if we got back {{(5, 10, 20)}}, then only txns 1-5 are 
available on all 3 JNs we got responses from, so those are the only 
transactions we know are durable. Of course more _might_ be durable if they 
were persisted on the two JNs we didn't get responses from, but we don't know 
that.

Let me know if that clears things up.

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: HDFS-12943, 3.3.0
>
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, 
> HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2021-02-02 Thread xuzq (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277005#comment-17277005
 ] 

xuzq commented on HDFS-13609:
-

Hi [~xkrogen] and [~linyiqun] , recently I am learning *Consistent Reads from 
Standby Node*.  

 
{code:java}
private void selectRpcInputStreams(Collection streams,
long fromTxnId, boolean onlyDurableTxns) throws IOException {
  QuorumCall q =
  loggers.getJournaledEdits(fromTxnId, maxTxnsPerRpc);
  Map responseMap =
  loggers.waitForWriteQuorum(q, selectInputStreamsTimeoutMs,
  "selectRpcInputStreams");
  assert responseMap.size() >= loggers.getMajoritySize() :
  "Quorum call returned without a majority";

  List responseCounts = new ArrayList<>();
  for (GetJournaledEditsResponseProto resp : responseMap.values()) {
responseCounts.add(resp.getTxnCount());
  }
  Collections.sort(responseCounts);
  int highestTxnCount = responseCounts.get(responseCounts.size() - 1);
  ...
  // Cancel any outstanding calls to JN's.
  q.cancelCalls();

  int maxAllowedTxns = !onlyDurableTxns ? highestTxnCount :
  responseCounts.get(responseCounts.size() - loggers.getMajoritySize());
  if (maxAllowedTxns == 0) {
LOG.debug("No new edits available in logs; requested starting from " +
"ID " + fromTxnId);
return;
  }
  ...
}
{code}
 

 

Maybe somethings wrong in 
{code:java}
int maxAllowedTxns = !onlyDurableTxns ? highestTxnCount : 
responseCounts.get(responseCounts.size() - loggers.getMajoritySize());{code}
 * Let's say we have 5 JournalNodes, and loggers.getMajoritySize() is 3.
 * _loggers.getJournaledEdits(fromTxnId, maxTxnsPerRpc)_ just need quorum 
result, so responseCounts.size() maybe is 3.
 * when _onlyDurableTxns_ is false, _maxAllowedTxns_ = responseCounts.get(0)
 * _responseCounts.get(0)_ maybe not expect Quorum Result, and it maybe even 
doesn't have any results from _fromTxnId_
 ** maybe one journal disk is wrong, and only write into cache for _fromTxnId_

 

[~xkrogen] and [~linyiqun], if have time, please look at this question, thanks.

 

 

 

 

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: HDFS-12943, 3.3.0
>
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, 
> HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-12-24 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728479#comment-16728479
 ] 

Hudson commented on HDFS-13609:
---

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #15662 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15662/])
HDFS-13609. [SBN read] Edit Tail Fast Path Part 3: NameNode-side changes (shv: 
rev 00e99c65943e64fd696ec715cf21e851b93115f1)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/AsyncLogger.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/AsyncLoggerSet.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/client/TestQuorumJournalManagerUnit.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileInputStream.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumJournalManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/client/TestQuorumJournalManager.java


> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: HDFS-12943, 3.3.0
>
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, 
> HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-25 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16523243#comment-16523243
 ] 

Erik Krogen commented on HDFS-13609:


Thanks [~shv] and [~linyiqun]!

[~csun]:
* You are right, according to [Oracle's 
conventions|http://www.oracle.com/technetwork/articles/java/index-137868.html]:
{quote}
Insert a blank comment line between the description and the list of tags, as 
shown.
{quote}
I was not aware of this, thanks for educating me.
* Thanks for the catch.

I have attached v004 patch to document the final changes. Given how minor the 
v003 -> v004 patch change is (Chao's two whitespace comments, I just committed 
this based on the +1s on v003.

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, 
> HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-25 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16523236#comment-16523236
 ] 

genericqa commented on HDFS-13609:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HDFS-13609 does not apply to HDFS-12943. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-13609 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929150/HDFS-13609-HDFS-12943.004.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24492/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, 
> HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-23 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16521272#comment-16521272
 ] 

Chao Sun commented on HDFS-13609:
-

+1 as well. Looks great! I have two _tiny_ nits:
- Should we have an empty line between "mechanism optimized for low latency." 
and "@param streams The collection to store the return streams into." for the 
javadoc of {{selectRpcInputStreams}} method?
- In {{hdfs-default.xml}}, extra space after "via the RPC-based mechanism".

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, 
> HDFS-13609-HDFS-12943.003.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-22 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520924#comment-16520924
 ] 

Yiqun Lin commented on HDFS-13609:
--

Thanks for the explaination, [~xkrogen]. +1 for the fix.

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, 
> HDFS-13609-HDFS-12943.003.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-22 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520851#comment-16520851
 ] 

Konstantin Shvachko commented on HDFS-13609:


Erik, the last patch looks good. +1

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, 
> HDFS-13609-HDFS-12943.003.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520850#comment-16520850
 ] 

genericqa commented on HDFS-13609:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-12943 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 35m 
 9s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} HDFS-12943 passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  4m  
2s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} HDFS-12943 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  2m 
27s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m  6s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}150m 34s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestHASafeMode |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDFS-13609 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12928807/HDFS-13609-HDFS-12943.003.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 81c7fa97f970 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-12943 / 292ccdc |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24485/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24485/testReport/ |
| Max. process+thread count | 3085 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 

[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-22 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520734#comment-16520734
 ] 

Erik Krogen commented on HDFS-13609:


Thanks for looking at {{BackupImage}} [~shv]! I had no idea what was happening 
there :) I have removed this extra parameter altogether; now on the QJM 
{{optimizeLatency == inProgressOK}}. This reduced the scope of changes 
significantly.

[~linyiqun], thanks for taking a look! I agree with you on the Precondition 
check; I have incorporated into v003. For your second comment, it should be {{ 
< 0 }}. A return value of {{highestTxnCount == 0}} is expected behavior if you 
have read all available edits and continue to request more; this is not an 
error situation, while seeing a value {{ < 0 }} is an error. Let me know if you 
disagree.

Uploaded v003 patch incorporating all of [~shv] and [~linyiqun]'s comments.

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, 
> HDFS-13609-HDFS-12943.003.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-22 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520115#comment-16520115
 ] 

Yiqun Lin commented on HDFS-13609:
--

Hi [~xkrogen], recently I am learning *Consistent Reads from Standby Node*. 
Just reviewed on this, two comments:
 * Looks like we should do a Precondition check when getting {{maxTxnsPerRpc}} 
value from configuration. If a invalid max -txns value configured (0 or <0), no 
edits data will be returned.

 * 
{code:java}
  private void selectRpcInputStreams(Collection streams,
  long fromTxnId, boolean onlyDurableTxns) throws IOException {
...
 
int highestTxnCount = responseCounts.get(responseCounts.size() - 1);
if (LOG.isDebugEnabled() || highestTxnCount < 0) {
  ...
  msg.append(">");
  if (highestTxnCount < 0) {
throw new IOException("Did not get any valid JournaledEdits " +
"responses: " + msg);
  } else {
LOG.debug(msg.toString());
  }
}
...
}
{code}
Here {{highestTxnCount < 0}} is accurate? Seems {{highestTxnCount <= 0}} is 
right. The txnCount can be 0 returned by {{JournaledEditsCache#retrieveEdits}} 
(For example, when requestedStartTxn > highestTxnId).

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-21 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519923#comment-16519923
 ] 

Konstantin Shvachko commented on HDFS-13609:


Very comprehensive analysis [~xkrogen].
I was looking at {{BackupImage.tryConvergeJournalSpool()}}. Remembered some 
details. The spool in BackupNode context is an edits where BackupNode writes 
transactions received from NN during checkpointing the image, because ut cannot 
apply them to the memory state while writing the image. After completing the 
image write, BN reads the edits it saved, this is called 
{{convergeJournalSpool()}}. Since BN uses only EditLogFileStreams the effect of 
the optimization parameter should be completely ignored.
Anyways I checked and run {{TestBackupNode}} with both {{optimizeLatency = 
true}} and {{false}} and it passed.
I think it is safe to use the single parameter in this case.

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-19 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517672#comment-16517672
 ] 

Erik Krogen commented on HDFS-13609:


Thanks for the review [~shv]!
# I agree that this would be much cleaner. In many cases, {{inProgressOk}} will 
be equivalent to {{optimizeLatency}}. However there are a few cases where this 
is not currently true: 
** {{FSEditLog#openForWrite()}} - It is using {{selectInputStreams}} to confirm 
that no one else is writing new transactions. It seems fine to allow this to 
use the RPC mechanism.
** {{BootstrapStandby#checkLogsAvailableForRead()}} - It is confirming that a 
range of transaction IDs are available. Seems fine to allow this to use the RPC 
mechanism.
** {{NameNodeRpcServer#getEventBatchList()}} - Serves ranges of transactions 
for INotify feature. Seems fine (actually, seems desirable) to let this use the 
RPC mechanism. However, on a slightly unrelated note, one portion of this will 
need to be changed to work properly in a read-from-standby environment... Filed 
HDFS-13689 for this.
** {{NameNode#copyEditLogSegmentsToSharedDir()}} - This is only called on 
{{NameNode#initializeSharedEdits()}}, i.e. a separate startup flag for the 
NameNode. I don't think it's necessary to optimize for this situation.
** {{BackupImage#tryConvergeJournalSpool()}} - This code is doing some sketchy 
things and making assumptions about the streams returned that will not be true 
when using the RPC mechanism. We need to prevent this from using the RPC 
mechanism, but given that this is only for the BackupNode, I recommend we avoid 
adding a new API / parameter just for this situation and disable the RPC 
mechanism on the BackupNode entirely. I instead propose that we add a way for 
the BackupNode to disable RPC reads on the {{QuorumJournalManager}}. This could 
take the form of an undocumented config parameter, or, my preference, add a 
static method {{QuorumJournalManager.disableRPCJournalStreams()}} which the 
BackupNode can call.
If you agree that we can handle {{BackupImage}} as I described, I think I can 
remove this new parameter and limit the scope of the change.
# Agreed. I will fix this in the next patch.
# I thought more about why an operator might want to change this config. I 
determined that I can imagine situations when I would want to increase it, if 
the situation arises that RPC response time from the JournalNodes is high and 
the number of transactions per second is very high (say, a very high write 
workload). But I can't think of a reason to lower it; this is more about just 
setting a sanity-check upper bound. This makes me think we should (a) raise the 
default limit to 5000 -> even with a RTT RPC time of 100ms, which is quite 
high, this would allow 50k transactions per second, (b) make it undocumented as 
you described. I will incorporate this into the next patch.

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-15 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514540#comment-16514540
 ] 

Konstantin Shvachko commented on HDFS-13609:


Looks like shadedclient failures are also in the trunk. I wouldn't worry about 
it here.
 # Looking at the patch I see that a lot of changes are related to adding new 
parameter {{boolean optimizeLatency}} into 
{{LogsPurgeable.selectInputStreams()}}, which in turn affected 
{{JournalManager}} interface.
 The parameter is actively used only in {{QuorumJournalManager}}. In all other 
implementations it is ignored. In {{QuorumJournalManager.selectInputStreams()}} 
implementation you require {{optimizeLatency}} to be the same as 
{{inProgressOk}} except when {{optimizeLatency == false && nProgressOk == 
true}}. But in the latter case {{optimizeLatency}} is ignored. So my main 
question is can we simply use {{inProgressOk}} as an indicator to optimize for 
latency and drop the {{optimizeLatency}} parameter? This should simplify 
changes a lot.
 # In {{hdfs-default.xml}} rephrase "This will also enable tailing of edit logs 
via" -> "This enables tailing of edit logs via". Like that you clarify it.
 # Should {{dfs.ha.tail-edits.qjm.rpc.max-txns}} be a public or an undocumented 
config parameter? I see there is a bunch of "Change with caution" properties in 
{{hdfs-default.xml}}. This is exactly why we keep them undocumented.

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-15 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513921#comment-16513921
 ] 

Erik Krogen commented on HDFS-13609:


It looks like the shadedclient failure is due to the following error while 
running {{common-test-bats-driver}} within {{hadoop-common}}:
{code}
 [exec] not ok 2 hadoop_stop_daemon_force_kill
 [exec] # (in test file hadoop_stop_daemon.bats, line 43)
 [exec] #   `[ -f ${TMP}/pidfile ]' failed
 [exec] # bindir: 
/testptch/hadoop/hadoop-common-project/hadoop-common/src/test/scripts
 [exec] # sh: 0: Can't open 
/testptch/hadoop/hadoop-common-project/hadoop-common/src/test/scripts/process_with_sigterm_trap.sh
{code}
The error is identical across the branch and the patch so it must be unrelated. 
I don't know enough about this {{common-test-bats-driver}} to dig deeper...

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513314#comment-16513314
 ] 

genericqa commented on HDFS-13609:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-12943 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 32m 
 2s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} HDFS-12943 passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  3m 
30s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} HDFS-12943 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  2m 
21s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 18s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}124m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDFS-13609 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12927913/HDFS-13609-HDFS-12943.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 60ad511fa3dd 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-12943 / 292ccdc |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24448/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24448/testReport/ |
| Max. process+thread count | 3591 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-14 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513226#comment-16513226
 ] 

Erik Krogen commented on HDFS-13609:


Noticed two issues with {{TestQuorumJournalManager}}. Uploaded v002 addressing 
these.

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-14 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513219#comment-16513219
 ] 

Erik Krogen commented on HDFS-13609:


Just attached v001 patch applied on top of changes in HDFS-13607 / HDFS-13608. 
Also did some cleanup from v000 patch, and removed Java 8 functionality 
(lambdas / stream). Should be ready for review.

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-05-23 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487627#comment-16487627
 ] 

Erik Krogen commented on HDFS-13609:


Attaching v000 patch but not marking patch available since this depends on 
HDFS-13607 / HDFS-13608

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13609-HDFS-12943.000.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org