[jira] [Commented] (HDFS-16493) [SBN Read]When fast path tail enabled, standby or observer namenode may read uncommitted data

2022-05-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17531016#comment-17531016
 ] 

袁枫 commented on HDFS-16493:
---

I am confusing in:
org/apache/hadoop/hdfs/qjournal/client/SegmentRecoveryComparator.java:86
{code:java}
return ComparisonChain.start()
.compare(r1SeenEpoch, r2SeenEpoch)
.compare(r1.getSegmentState().getEndTxId(), 
r2.getSegmentState().getEndTxId())
.result();
{code}

Why would pick longest when round1, if this way, will pick jn3`s log to sync?
Why not quorum length?  Do you know this? [~liutongwei]


> [SBN Read]When fast path tail enabled, standby or observer namenode may read 
> uncommitted data
> -
>
> Key: HDFS-16493
> URL: https://issues.apache.org/jira/browse/HDFS-16493
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node, namanode
>Reporter: liutongwei
>Priority: Critical
> Attachments: exapmle.v1.patch
>
>
> Although fast path tail use quorum read to pull edit log, it seem like can 
> read uncommitted data in some corner case.
> Here is an example. Suppose we have three JN, their init state is:
>  
> {code:java}
> epoch 1
> JN1 [1-3](in-progress)
> JN2 [1-3](in-progress)
> JN3 [1-4](in-progress)
> Note that, in epoch 1 txid 1-3 was committed, and txid 4 not.
> {code}
> When a failover occur, if a new writer cannot contact to JN3 for network 
> partition, and finish the recovery stage, and write a new txid 4 in epoch 2, 
> which value not equal to JN3's.
>  
> {code:java}
> epcho 2
> JN1 [1-3](finalized) [4-4](inprogress)
> JN2 [1-3](finalized) [4-4](inprogress)
> JN3 [1-4](inprogress)
> Note that, in JN3 txid4's value not equal to other JN.
> {code}
>  
> Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it 
> got majority response. But it got logs of same length but different 
> content.And no more information to choose which log is right. If we choose 
> JN3, we got meta data corruption.
> There is a test example patch [^example.patch] for running and debug.
> For fix it i think we should add finalized state to 
> {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16493) [SBN Read]When fast path tail enabled, standby or observer namenode may read uncommitted data

2022-04-12 Thread liutongwei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521419#comment-17521419
 ] 

liutongwei commented on HDFS-16493:
---

[~Feng Yuan]  Sorry for the mistake when I create path I do not recheck code. 
Update a new version.

> [SBN Read]When fast path tail enabled, standby or observer namenode may read 
> uncommitted data
> -
>
> Key: HDFS-16493
> URL: https://issues.apache.org/jira/browse/HDFS-16493
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node, namanode
>Reporter: liutongwei
>Priority: Critical
> Attachments: exapmle.v1.patch
>
>
> Although fast path tail use quorum read to pull edit log, it seem like can 
> read uncommitted data in some corner case.
> Here is an example. Suppose we have three JN, their init state is:
>  
> {code:java}
> epoch 1
> JN1 [1-3](in-progress)
> JN2 [1-3](in-progress)
> JN3 [1-4](in-progress)
> Note that, in epoch 1 txid 1-3 was committed, and txid 4 not.
> {code}
> When a failover occur, if a new writer cannot contact to JN3 for network 
> partition, and finish the recovery stage, and write a new txid 4 in epoch 2, 
> which value not equal to JN3's.
>  
> {code:java}
> epcho 2
> JN1 [1-3](finalized) [4-4](inprogress)
> JN2 [1-3](finalized) [4-4](inprogress)
> JN3 [1-4](inprogress)
> Note that, in JN3 txid4's value not equal to other JN.
> {code}
>  
> Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it 
> got majority response. But it got logs of same length but different 
> content.And no more information to choose which log is right. If we choose 
> JN3, we got meta data corruption.
> There is a test example patch [^example.patch] for running and debug.
> For fix it i think we should add finalized state to 
> {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16493) [SBN Read]When fast path tail enabled, standby or observer namenode may read uncommitted data

2022-04-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521131#comment-17521131
 ] 

袁枫 commented on HDFS-16493:
---

[~liutongwei]
 Is there some mistake?
1. 
{code:java}
failLoggerAtTxn(spies.get(1), 4);
failLoggerAtTxn(spies.get(2), 4);
{code}
indicate jn2 and jn3?

> [SBN Read]When fast path tail enabled, standby or observer namenode may read 
> uncommitted data
> -
>
> Key: HDFS-16493
> URL: https://issues.apache.org/jira/browse/HDFS-16493
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node, namanode
>Reporter: liutongwei
>Priority: Critical
> Attachments: example.patch
>
>
> Although fast path tail use quorum read to pull edit log, it seem like can 
> read uncommitted data in some corner case.
> Here is an example. Suppose we have three JN, their init state is:
>  
> {code:java}
> epoch 1
> JN1 [1-3](in-progress)
> JN2 [1-3](in-progress)
> JN3 [1-4](in-progress)
> Note that, in epoch 1 txid 1-3 was committed, and txid 4 not.
> {code}
> When a failover occur, if a new writer cannot contact to JN3 for network 
> partition, and finish the recovery stage, and write a new txid 4 in epoch 2, 
> which value not equal to JN3's.
>  
> {code:java}
> epcho 2
> JN1 [1-3](finalized) [4-4](inprogress)
> JN2 [1-3](finalized) [4-4](inprogress)
> JN3 [1-4](inprogress)
> Note that, in JN3 txid4's value not equal to other JN.
> {code}
>  
> Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it 
> got majority response. But it got logs of same length but different 
> content.And no more information to choose which log is right. If we choose 
> JN3, we got meta data corruption.
> There is a test example patch [^example.patch] for running and debug.
> For fix it i think we should add finalized state to 
> {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16493) [SBN Read]When fast path tail enabled, standby or observer namenode may read uncommitted data

2022-03-06 Thread liutongwei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502070#comment-17502070
 ] 

liutongwei commented on HDFS-16493:
---

Thanks [~xkrogen] for replying. 

 
{quote}Thanks for reporting liutongwei! I guess this is a continuation of your 
comment on HDFS-13150, is that correct?
{quote}
Yes. When I reviewing the journalnode source code to track the issue 
[HDFS-16490|https://issues.apache.org/jira/browse/HDFS-16490]. The doubt of 
fast path tail resurface.

After add test case [^example.patch], it seems that fast path tail not return 
edit logs same as the original tail process. 

 

[~shv] , what's your opinion about this issue?

> [SBN Read]When fast path tail enabled, standby or observer namenode may read 
> uncommitted data
> -
>
> Key: HDFS-16493
> URL: https://issues.apache.org/jira/browse/HDFS-16493
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node, namanode
>Reporter: liutongwei
>Priority: Critical
> Attachments: example.patch
>
>
> Although fast path tail use quorum read to pull edit log, it seem like can 
> read uncommitted data in some corner case.
> Here is an example. Suppose we have three JN, their init state is:
>  
> {code:java}
> epoch 1
> JN1 [1-3](in-progress)
> JN2 [1-3](in-progress)
> JN3 [1-4](in-progress)
> Note that, in epoch 1 txid 1-3 was committed, and txid 4 not.
> {code}
> When a failover occur, if a new writer cannot contact to JN3 for network 
> partition, and finish the recovery stage, and write a new txid 4 in epoch 2, 
> which value not equal to JN3's.
>  
> {code:java}
> epcho 2
> JN1 [1-3](finalized) [4-4](inprogress)
> JN2 [1-3](finalized) [4-4](inprogress)
> JN3 [1-4](inprogress)
> Note that, in JN3 txid4's value not equal to other JN.
> {code}
>  
> Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it 
> got majority response. But it got logs of same length but different 
> content.And no more information to choose which log is right. If we choose 
> JN3, we got meta data corruption.
> There is a test example patch [^example.patch] for running and debug.
> For fix it i think we should add finalized state to 
> {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16493) [SBN Read]When fast path tail enabled, standby or observer namenode may read uncommitted data

2022-03-04 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501435#comment-17501435
 ] 

Erik Krogen commented on HDFS-16493:


Thanks for reporting [~liutongwei]! I guess this is a continuation of [your 
comment on 
HDFS-13150|https://issues.apache.org/jira/browse/HDFS-13150?focusedCommentId=17408479=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17408479],
 is that correct?

As I said there I don't personally have bandwidth to dig deep onto this, but 
from your detailed explanation, it does seem to be a valid issue. I will let 
[~shv] take a closer look.

> [SBN Read]When fast path tail enabled, standby or observer namenode may read 
> uncommitted data
> -
>
> Key: HDFS-16493
> URL: https://issues.apache.org/jira/browse/HDFS-16493
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node, namanode
>Reporter: liutongwei
>Priority: Critical
> Attachments: example.patch
>
>
> Although fast path tail use quorum read to pull edit log, it seem like can 
> read uncommitted data in some corner case.
> Here is an example. Suppose we have three JN, their init state is:
>  
> {code:java}
> epoch 1
> JN1 [1-3](in-progress)
> JN2 [1-3](in-progress)
> JN3 [1-4](in-progress)
> Note that, in epoch 1 txid 1-3 was committed, and txid 4 not.
> {code}
> When a failover occur, if a new writer cannot contact to JN3 for network 
> partition, and finish the recovery stage, and write a new txid 4 in epoch 2, 
> which value not equal to JN3's.
>  
> {code:java}
> epcho 2
> JN1 [1-3](finalized) [4-4](inprogress)
> JN2 [1-3](finalized) [4-4](inprogress)
> JN3 [1-4](inprogress)
> Note that, in JN3 txid4's value not equal to other JN.
> {code}
>  
> Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it 
> got majority response. But it got logs of same length but different 
> content.And no more information to choose which log is right. If we choose 
> JN3, we got meta data corruption.
> There is a test example patch [^example.patch] for running and debug.
> For fix it i think we should add finalized state to 
> {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16493) [SBN Read]When fast path tail enabled, standby or observer namenode may read uncommitted data

2022-03-03 Thread liutongwei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501172#comment-17501172
 ] 

liutongwei commented on HDFS-16493:
---

[~shv]  [~xkrogen] , I have add a test case for the concern mentioned in 
https://issues.apache.org/jira/browse/HDFS-13150?focusedCommentId=17408479=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17408479.

> [SBN Read]When fast path tail enabled, standby or observer namenode may read 
> uncommitted data
> -
>
> Key: HDFS-16493
> URL: https://issues.apache.org/jira/browse/HDFS-16493
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node, namanode
>Reporter: liutongwei
>Priority: Critical
> Attachments: example.patch
>
>
> Although fast path tail use quorum read to pull edit log, it seem like is can 
> read uncommitted data in some corner case.
> Here is an example. Suppose we have three JN, their init state is:
>  
> {code:java}
> epoch 1
> JN1 [1-3](in-progress)
> JN2 [1-3](in-progress)
> JN3 [1-4](in-progress)
> Note that, in epoch 1 txid 1-3 was committed, and txid 4 not.
> {code}
> When a failover occur, if a new writer cannot contact to JN3 for network 
> partition, and finish the recovery stage, and write a new txid 4 in epoch 2, 
> which value not equal to JN3's.
>  
> {code:java}
> epcho 2
> JN1 [1-3](finalized) [4-4](inprogress)
> JN2 [1-3](finalized) [4-4](inprogress)
> JN3 [1-4](inprogress)
> Note that, in JN3 txid4's value not equal to other JN.
> {code}
>  
> Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it 
> got majority response. But it got logs of same length but different 
> content.And no more information to choose which log is right. If we choose 
> JN3, we got meta data corruption.
> There is a test example patch [^example.patch] for running and debug.
> For fix it i think we should add finalized state to 
> {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org