[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT

2022-03-23 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511061#comment-17511061
 ] 

Hudson commented on HBASE-26791:


Results for branch branch-2
[build #495 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/495/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/495/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/495/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/495/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/495/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
-- Something went wrong with this stage, [check relevant console 
output|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/495//console].


> Memstore flush fencing issue for SFT
> 
>
> Key: HBASE-26791
> URL: https://issues.apache.org/jira/browse/HBASE-26791
> Project: HBase
>  Issue Type: Sub-task
>  Components: HFile
>Affects Versions: 2.6.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-3
>
>
> The scenarios is the following:
>  # rs1 is flushing file to S3 for region1
>  # rs1 loses ZK lock
>  # region1 gets assigned to rs2
>  # rs2 opens region1
>  # rs1 completes flush and updates sft file for region1
>  # rs2 has a different “version” of the sft file for region1
> The flush should fail at the end, but the SFT file gets overwritten before 
> that, resulting in potential data loss.
>  
> Potential solutions include:
>  * Adding timestamp to the tracker file names. This and creating a new 
> tracker file when an rs open the region would allow us to list available 
> tracker files before an update and compare the found timestamps to the one 
> stored in memory to verify the store still owns the latest tracker file
>  * Using the existing timestamp in the tracker file content. This would also 
> require us to create a new tracker file when a new rs opens the region, but 
> instead of listing the available tracker files, we could try to load and 
> de-serialize the last tracker file and compare the timestamp found in it to 
> the one stored in memory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT

2022-03-22 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17510794#comment-17510794
 ] 

Hudson commented on HBASE-26791:


Results for branch master
[build #545 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/545/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/545/General_20Nightly_20Build_20Report/]






(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/545/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/545/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Memstore flush fencing issue for SFT
> 
>
> Key: HBASE-26791
> URL: https://issues.apache.org/jira/browse/HBASE-26791
> Project: HBase
>  Issue Type: Sub-task
>  Components: HFile
>Affects Versions: 2.6.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-3
>
>
> The scenarios is the following:
>  # rs1 is flushing file to S3 for region1
>  # rs1 loses ZK lock
>  # region1 gets assigned to rs2
>  # rs2 opens region1
>  # rs1 completes flush and updates sft file for region1
>  # rs2 has a different “version” of the sft file for region1
> The flush should fail at the end, but the SFT file gets overwritten before 
> that, resulting in potential data loss.
>  
> Potential solutions include:
>  * Adding timestamp to the tracker file names. This and creating a new 
> tracker file when an rs open the region would allow us to list available 
> tracker files before an update and compare the found timestamps to the one 
> stored in memory to verify the store still owns the latest tracker file
>  * Using the existing timestamp in the tracker file content. This would also 
> require us to create a new tracker file when a new rs opens the region, but 
> instead of listing the available tracker files, we could try to load and 
> de-serialize the last tracker file and compare the timestamp found in it to 
> the one stored in memory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT

2022-03-12 Thread Josh Elser (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505292#comment-17505292
 ] 

Josh Elser commented on HBASE-26791:


{quote}isn't the broader issue here the fact RS1 doesn't abort immediately upon 
the loss of its ZK lock? Shouldn't we rather ensure an RS abort is triggered 
and all ongoing operations (including any hstore flushes) are interrupted right 
away?
{quote}
Yes and no. In normal cases, yeah, we should just be able to interrupt the 
threads and expect them all to exit gracefully. However, when you start to 
consider JVM pauses and the like, it's non-deterministic if we can expect one 
thread in the RS to notice that we lost the RS lock, send an interrupt to all 
other flush/compaction threads, and then those threads to notice and take 
action on that.

If we can avoid it another way, there's value in that.

> Memstore flush fencing issue for SFT
> 
>
> Key: HBASE-26791
> URL: https://issues.apache.org/jira/browse/HBASE-26791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Duo Zhang
>Priority: Major
>
> The scenarios is the following:
>  # rs1 is flushing file to S3 for region1
>  # rs1 loses ZK lock
>  # region1 gets assigned to rs2
>  # rs2 opens region1
>  # rs1 completes flush and updates sft file for region1
>  # rs2 has a different “version” of the sft file for region1
> The flush should fail at the end, but the SFT file gets overwritten before 
> that, resulting in potential data loss.
>  
> Potential solutions include:
>  * Adding timestamp to the tracker file names. This and creating a new 
> tracker file when an rs open the region would allow us to list available 
> tracker files before an update and compare the found timestamps to the one 
> stored in memory to verify the store still owns the latest tracker file
>  * Using the existing timestamp in the tracker file content. This would also 
> require us to create a new tracker file when a new rs opens the region, but 
> instead of listing the available tracker files, we could try to load and 
> de-serialize the last tracker file and compare the timestamp found in it to 
> the one stored in memory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT

2022-03-10 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504364#comment-17504364
 ] 

Duo Zhang commented on HBASE-26791:
---

Will provide a PR soon.

> Memstore flush fencing issue for SFT
> 
>
> Key: HBASE-26791
> URL: https://issues.apache.org/jira/browse/HBASE-26791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Duo Zhang
>Priority: Major
>
> The scenarios is the following:
>  # rs1 is flushing file to S3 for region1
>  # rs1 loses ZK lock
>  # region1 gets assigned to rs2
>  # rs2 opens region1
>  # rs1 completes flush and updates sft file for region1
>  # rs2 has a different “version” of the sft file for region1
> The flush should fail at the end, but the SFT file gets overwritten before 
> that, resulting in potential data loss.
>  
> Potential solutions include:
>  * Adding timestamp to the tracker file names. This and creating a new 
> tracker file when an rs open the region would allow us to list available 
> tracker files before an update and compare the found timestamps to the one 
> stored in memory to verify the store still owns the latest tracker file
>  * Using the existing timestamp in the tracker file content. This would also 
> require us to create a new tracker file when a new rs opens the region, but 
> instead of listing the available tracker files, we could try to load and 
> de-serialize the last tracker file and compare the timestamp found in it to 
> the one stored in memory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT

2022-03-10 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504104#comment-17504104
 ] 

Duo Zhang commented on HBASE-26791:
---

{quote}
Using the existing timestamp in the tracker file content. This would also 
require us to create a new tracker file when a new rs opens the region, but 
instead of listing the available tracker files, we could try to load and 
de-serialize the last tracker file and compare the timestamp found in it to the 
one stored in memory.
{quote}

In this way we can not solve the problem when there are two region servers want 
to write to the same file. Two regionservers could both load the timestamp, and 
think it could write the file, and then both write the file. Depending on 
different file system implementation, the final result could be different, I do 
not think we should rely on this...

> Memstore flush fencing issue for SFT
> 
>
> Key: HBASE-26791
> URL: https://issues.apache.org/jira/browse/HBASE-26791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Priority: Major
>
> The scenarios is the following:
>  # rs1 is flushing file to S3 for region1
>  # rs1 loses ZK lock
>  # region1 gets assigned to rs2
>  # rs2 opens region1
>  # rs1 completes flush and updates sft file for region1
>  # rs2 has a different “version” of the sft file for region1
> The flush should fail at the end, but the SFT file gets overwritten before 
> that, resulting in potential data loss.
>  
> Potential solutions include:
>  * Adding timestamp to the tracker file names. This and creating a new 
> tracker file when an rs open the region would allow us to list available 
> tracker files before an update and compare the found timestamps to the one 
> stored in memory to verify the store still owns the latest tracker file
>  * Using the existing timestamp in the tracker file content. This would also 
> require us to create a new tracker file when a new rs opens the region, but 
> instead of listing the available tracker files, we could try to load and 
> de-serialize the last tracker file and compare the timestamp found in it to 
> the one stored in memory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT

2022-03-10 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504083#comment-17504083
 ] 

Duo Zhang commented on HBASE-26791:
---

{quote}
Whilst the proposed solutions would handle the pitfalls of File based SFT impl, 
isn't the broader issue here the fact RS1 doesn't abort immediately upon the 
loss of its ZK lock? Shouldn't we rather ensure an RS abort is triggered and 
all ongoing operations (including any hstore flushes) are interrupted right 
away?
{quote}

We have already handled this in the past. We will write a compaction marker to 
WAL before actually deleting any store files, so if a rs is dead, it will fail 
at this step and give up deleting the store files. If it fails after writing 
the compaction marker out, at the new region server, we will read the 
compaction marker and finish the compaction, i.e, delete the old store files, 
so it will not introduce any problems too, the old region server will only 
notice that when deleting, the store files have already been deleted. There 
were some corner cases that we can not read the compaction marker when opening 
a region, HBASE-20724 had solved the problem.

For File based SFT, it is another problem, actually. In the past, the only 
problem is that the old region server may delete the store files, so the 
solution is to either let the RS can not delete the files, or we delete them by 
ourselves. But in the current file based SFT implementation, we will always 
overwrite the two track files(to prevent listing), the 'dead' region server 
could mess up the track file and cause problem.

That's why I proposed above that, we just let the new region server do not 
reuse the old track files, then the old region servers will not introduce any 
real problems even if it write the track files.



> Memstore flush fencing issue for SFT
> 
>
> Key: HBASE-26791
> URL: https://issues.apache.org/jira/browse/HBASE-26791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Priority: Major
>
> The scenarios is the following:
>  # rs1 is flushing file to S3 for region1
>  # rs1 loses ZK lock
>  # region1 gets assigned to rs2
>  # rs2 opens region1
>  # rs1 completes flush and updates sft file for region1
>  # rs2 has a different “version” of the sft file for region1
> The flush should fail at the end, but the SFT file gets overwritten before 
> that, resulting in potential data loss.
>  
> Potential solutions include:
>  * Adding timestamp to the tracker file names. This and creating a new 
> tracker file when an rs open the region would allow us to list available 
> tracker files before an update and compare the found timestamps to the one 
> stored in memory to verify the store still owns the latest tracker file
>  * Using the existing timestamp in the tracker file content. This would also 
> require us to create a new tracker file when a new rs opens the region, but 
> instead of listing the available tracker files, we could try to load and 
> de-serialize the last tracker file and compare the timestamp found in it to 
> the one stored in memory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT

2022-03-09 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503704#comment-17503704
 ] 

Duo Zhang commented on HBASE-26791:
---

I've talked with [~elserj] on slack about this.

If we always overwrite the same set of track files, I do not think there is a 
possible way to fix this problem.

So I propose we solve the problem in this way:

1. Include a timestamp/sequenceid in the track file name, which means when 
opening a region, we need to list the track file directory(sad) to find the 
newest track file and load it.
2. To avoid generating too many track files, we only need to bump the 
timestamp/sequenceid when opening a region. So the open region steps will be:
  a. List the track file directory, load the newest track file. If there are 
two files with the same timestamp/sequenceid, then comparing the timestamp 
store in the file content, just as what we have done before.
  b. Bump the timestamp/sequenceid, to a value greater than the loaded 
timestamp/sequenceid, and we will use this timestamp/sequenceid as new track 
file names.
In this way, the old rs will only overwrite the track files with old 
timestamp/sequenceid, so it will not effect the new track files. So the problem 
can be solved.

Notice that, the track file name will be simething like f1-12345.fileslist and 
f2-12345.filelist.

> Memstore flush fencing issue for SFT
> 
>
> Key: HBASE-26791
> URL: https://issues.apache.org/jira/browse/HBASE-26791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Priority: Major
>
> The scenarios is the following:
>  # rs1 is flushing file to S3 for region1
>  # rs1 loses ZK lock
>  # region1 gets assigned to rs2
>  # rs2 opens region1
>  # rs1 completes flush and updates sft file for region1
>  # rs2 has a different “version” of the sft file for region1
> The flush should fail at the end, but the SFT file gets overwritten before 
> that, resulting in potential data loss.
>  
> Potential solutions include:
>  * Adding timestamp to the tracker file names. This and creating a new 
> tracker file when an rs open the region would allow us to list available 
> tracker files before an update and compare the found timestamps to the one 
> stored in memory to verify the store still owns the latest tracker file
>  * Using the existing timestamp in the tracker file content. This would also 
> require us to create a new tracker file when a new rs opens the region, but 
> instead of listing the available tracker files, we could try to load and 
> de-serialize the last tracker file and compare the timestamp found in it to 
> the one stored in memory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT

2022-03-09 Thread Josh Elser (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503647#comment-17503647
 ] 

Josh Elser commented on HBASE-26791:


ICYMI [~zhangduo] 

> Memstore flush fencing issue for SFT
> 
>
> Key: HBASE-26791
> URL: https://issues.apache.org/jira/browse/HBASE-26791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Priority: Major
>
> The scenarios is the following:
>  # rs1 is flushing file to S3 for region1
>  # rs1 loses ZK lock
>  # region1 gets assigned to rs2
>  # rs2 opens region1
>  # rs1 completes flush and updates sft file for region1
>  # rs2 has a different “version” of the sft file for region1
> The flush should fail at the end, but the SFT file gets overwritten before 
> that, resulting in potential data loss.
>  
> Potential solutions include:
>  * Adding timestamp to the tracker file names. This and creating a new 
> tracker file when an rs open the region would allow us to list available 
> tracker files before an update and compare the found timestamps to the one 
> stored in memory to verify the store still owns the latest tracker file
>  * Using the existing timestamp in the tracker file content. This would also 
> require us to create a new tracker file when a new rs opens the region, but 
> instead of listing the available tracker files, we could try to load and 
> de-serialize the last tracker file and compare the timestamp found in it to 
> the one stored in memory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT

2022-03-09 Thread Wellington Chevreuil (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503641#comment-17503641
 ] 

Wellington Chevreuil commented on HBASE-26791:
--

Whilst the proposed solutions would handle the pitfalls of File based SFT impl, 
isn't the broader issue here the fact RS1 doesn't abort immediately upon the 
loss of its ZK lock? Shouldn't we rather ensure an RS abort is triggered and 
all ongoing operations (including any hstore flushes) are interrupted right 
away?

> Memstore flush fencing issue for SFT
> 
>
> Key: HBASE-26791
> URL: https://issues.apache.org/jira/browse/HBASE-26791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Priority: Major
>
> The scenarios is the following:
>  # rs1 is flushing file to S3 for region1
>  # rs1 loses ZK lock
>  # region1 gets assigned to rs2
>  # rs2 opens region1
>  # rs1 completes flush and updates sft file for region1
>  # rs2 has a different “version” of the sft file for region1
> The flush should fail at the end, but the SFT file gets overwritten before 
> that, resulting in potential data loss.
>  
> Potential solutions include:
>  * Adding timestamp to the tracker file names. This and creating a new 
> tracker file when an rs open the region would allow us to list available 
> tracker files before an update and compare the found timestamps to the one 
> stored in memory to verify the store still owns the latest tracker file
>  * Using the existing timestamp in the tracker file content. This would also 
> require us to create a new tracker file when a new rs opens the region, but 
> instead of listing the available tracker files, we could try to load and 
> de-serialize the last tracker file and compare the timestamp found in it to 
> the one stored in memory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)