[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT
[ https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511061#comment-17511061 ] Hudson commented on HBASE-26791: Results for branch branch-2 [build #495 on builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/495/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/495/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/495/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/495/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/495/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (x) {color:red}-1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} -- Something went wrong with this stage, [check relevant console output|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/495//console]. > Memstore flush fencing issue for SFT > > > Key: HBASE-26791 > URL: https://issues.apache.org/jira/browse/HBASE-26791 > Project: HBase > Issue Type: Sub-task > Components: HFile >Affects Versions: 2.6.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Duo Zhang >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-3 > > > The scenarios is the following: > # rs1 is flushing file to S3 for region1 > # rs1 loses ZK lock > # region1 gets assigned to rs2 > # rs2 opens region1 > # rs1 completes flush and updates sft file for region1 > # rs2 has a different “version” of the sft file for region1 > The flush should fail at the end, but the SFT file gets overwritten before > that, resulting in potential data loss. > > Potential solutions include: > * Adding timestamp to the tracker file names. This and creating a new > tracker file when an rs open the region would allow us to list available > tracker files before an update and compare the found timestamps to the one > stored in memory to verify the store still owns the latest tracker file > * Using the existing timestamp in the tracker file content. This would also > require us to create a new tracker file when a new rs opens the region, but > instead of listing the available tracker files, we could try to load and > de-serialize the last tracker file and compare the timestamp found in it to > the one stored in memory. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT
[ https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17510794#comment-17510794 ] Hudson commented on HBASE-26791: Results for branch master [build #545 on builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/545/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/545/General_20Nightly_20Build_20Report/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/545/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/545/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Memstore flush fencing issue for SFT > > > Key: HBASE-26791 > URL: https://issues.apache.org/jira/browse/HBASE-26791 > Project: HBase > Issue Type: Sub-task > Components: HFile >Affects Versions: 2.6.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Duo Zhang >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-3 > > > The scenarios is the following: > # rs1 is flushing file to S3 for region1 > # rs1 loses ZK lock > # region1 gets assigned to rs2 > # rs2 opens region1 > # rs1 completes flush and updates sft file for region1 > # rs2 has a different “version” of the sft file for region1 > The flush should fail at the end, but the SFT file gets overwritten before > that, resulting in potential data loss. > > Potential solutions include: > * Adding timestamp to the tracker file names. This and creating a new > tracker file when an rs open the region would allow us to list available > tracker files before an update and compare the found timestamps to the one > stored in memory to verify the store still owns the latest tracker file > * Using the existing timestamp in the tracker file content. This would also > require us to create a new tracker file when a new rs opens the region, but > instead of listing the available tracker files, we could try to load and > de-serialize the last tracker file and compare the timestamp found in it to > the one stored in memory. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT
[ https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505292#comment-17505292 ] Josh Elser commented on HBASE-26791: {quote}isn't the broader issue here the fact RS1 doesn't abort immediately upon the loss of its ZK lock? Shouldn't we rather ensure an RS abort is triggered and all ongoing operations (including any hstore flushes) are interrupted right away? {quote} Yes and no. In normal cases, yeah, we should just be able to interrupt the threads and expect them all to exit gracefully. However, when you start to consider JVM pauses and the like, it's non-deterministic if we can expect one thread in the RS to notice that we lost the RS lock, send an interrupt to all other flush/compaction threads, and then those threads to notice and take action on that. If we can avoid it another way, there's value in that. > Memstore flush fencing issue for SFT > > > Key: HBASE-26791 > URL: https://issues.apache.org/jira/browse/HBASE-26791 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Duo Zhang >Priority: Major > > The scenarios is the following: > # rs1 is flushing file to S3 for region1 > # rs1 loses ZK lock > # region1 gets assigned to rs2 > # rs2 opens region1 > # rs1 completes flush and updates sft file for region1 > # rs2 has a different “version” of the sft file for region1 > The flush should fail at the end, but the SFT file gets overwritten before > that, resulting in potential data loss. > > Potential solutions include: > * Adding timestamp to the tracker file names. This and creating a new > tracker file when an rs open the region would allow us to list available > tracker files before an update and compare the found timestamps to the one > stored in memory to verify the store still owns the latest tracker file > * Using the existing timestamp in the tracker file content. This would also > require us to create a new tracker file when a new rs opens the region, but > instead of listing the available tracker files, we could try to load and > de-serialize the last tracker file and compare the timestamp found in it to > the one stored in memory. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT
[ https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504364#comment-17504364 ] Duo Zhang commented on HBASE-26791: --- Will provide a PR soon. > Memstore flush fencing issue for SFT > > > Key: HBASE-26791 > URL: https://issues.apache.org/jira/browse/HBASE-26791 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Duo Zhang >Priority: Major > > The scenarios is the following: > # rs1 is flushing file to S3 for region1 > # rs1 loses ZK lock > # region1 gets assigned to rs2 > # rs2 opens region1 > # rs1 completes flush and updates sft file for region1 > # rs2 has a different “version” of the sft file for region1 > The flush should fail at the end, but the SFT file gets overwritten before > that, resulting in potential data loss. > > Potential solutions include: > * Adding timestamp to the tracker file names. This and creating a new > tracker file when an rs open the region would allow us to list available > tracker files before an update and compare the found timestamps to the one > stored in memory to verify the store still owns the latest tracker file > * Using the existing timestamp in the tracker file content. This would also > require us to create a new tracker file when a new rs opens the region, but > instead of listing the available tracker files, we could try to load and > de-serialize the last tracker file and compare the timestamp found in it to > the one stored in memory. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT
[ https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504104#comment-17504104 ] Duo Zhang commented on HBASE-26791: --- {quote} Using the existing timestamp in the tracker file content. This would also require us to create a new tracker file when a new rs opens the region, but instead of listing the available tracker files, we could try to load and de-serialize the last tracker file and compare the timestamp found in it to the one stored in memory. {quote} In this way we can not solve the problem when there are two region servers want to write to the same file. Two regionservers could both load the timestamp, and think it could write the file, and then both write the file. Depending on different file system implementation, the final result could be different, I do not think we should rely on this... > Memstore flush fencing issue for SFT > > > Key: HBASE-26791 > URL: https://issues.apache.org/jira/browse/HBASE-26791 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Priority: Major > > The scenarios is the following: > # rs1 is flushing file to S3 for region1 > # rs1 loses ZK lock > # region1 gets assigned to rs2 > # rs2 opens region1 > # rs1 completes flush and updates sft file for region1 > # rs2 has a different “version” of the sft file for region1 > The flush should fail at the end, but the SFT file gets overwritten before > that, resulting in potential data loss. > > Potential solutions include: > * Adding timestamp to the tracker file names. This and creating a new > tracker file when an rs open the region would allow us to list available > tracker files before an update and compare the found timestamps to the one > stored in memory to verify the store still owns the latest tracker file > * Using the existing timestamp in the tracker file content. This would also > require us to create a new tracker file when a new rs opens the region, but > instead of listing the available tracker files, we could try to load and > de-serialize the last tracker file and compare the timestamp found in it to > the one stored in memory. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT
[ https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504083#comment-17504083 ] Duo Zhang commented on HBASE-26791: --- {quote} Whilst the proposed solutions would handle the pitfalls of File based SFT impl, isn't the broader issue here the fact RS1 doesn't abort immediately upon the loss of its ZK lock? Shouldn't we rather ensure an RS abort is triggered and all ongoing operations (including any hstore flushes) are interrupted right away? {quote} We have already handled this in the past. We will write a compaction marker to WAL before actually deleting any store files, so if a rs is dead, it will fail at this step and give up deleting the store files. If it fails after writing the compaction marker out, at the new region server, we will read the compaction marker and finish the compaction, i.e, delete the old store files, so it will not introduce any problems too, the old region server will only notice that when deleting, the store files have already been deleted. There were some corner cases that we can not read the compaction marker when opening a region, HBASE-20724 had solved the problem. For File based SFT, it is another problem, actually. In the past, the only problem is that the old region server may delete the store files, so the solution is to either let the RS can not delete the files, or we delete them by ourselves. But in the current file based SFT implementation, we will always overwrite the two track files(to prevent listing), the 'dead' region server could mess up the track file and cause problem. That's why I proposed above that, we just let the new region server do not reuse the old track files, then the old region servers will not introduce any real problems even if it write the track files. > Memstore flush fencing issue for SFT > > > Key: HBASE-26791 > URL: https://issues.apache.org/jira/browse/HBASE-26791 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Priority: Major > > The scenarios is the following: > # rs1 is flushing file to S3 for region1 > # rs1 loses ZK lock > # region1 gets assigned to rs2 > # rs2 opens region1 > # rs1 completes flush and updates sft file for region1 > # rs2 has a different “version” of the sft file for region1 > The flush should fail at the end, but the SFT file gets overwritten before > that, resulting in potential data loss. > > Potential solutions include: > * Adding timestamp to the tracker file names. This and creating a new > tracker file when an rs open the region would allow us to list available > tracker files before an update and compare the found timestamps to the one > stored in memory to verify the store still owns the latest tracker file > * Using the existing timestamp in the tracker file content. This would also > require us to create a new tracker file when a new rs opens the region, but > instead of listing the available tracker files, we could try to load and > de-serialize the last tracker file and compare the timestamp found in it to > the one stored in memory. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT
[ https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503704#comment-17503704 ] Duo Zhang commented on HBASE-26791: --- I've talked with [~elserj] on slack about this. If we always overwrite the same set of track files, I do not think there is a possible way to fix this problem. So I propose we solve the problem in this way: 1. Include a timestamp/sequenceid in the track file name, which means when opening a region, we need to list the track file directory(sad) to find the newest track file and load it. 2. To avoid generating too many track files, we only need to bump the timestamp/sequenceid when opening a region. So the open region steps will be: a. List the track file directory, load the newest track file. If there are two files with the same timestamp/sequenceid, then comparing the timestamp store in the file content, just as what we have done before. b. Bump the timestamp/sequenceid, to a value greater than the loaded timestamp/sequenceid, and we will use this timestamp/sequenceid as new track file names. In this way, the old rs will only overwrite the track files with old timestamp/sequenceid, so it will not effect the new track files. So the problem can be solved. Notice that, the track file name will be simething like f1-12345.fileslist and f2-12345.filelist. > Memstore flush fencing issue for SFT > > > Key: HBASE-26791 > URL: https://issues.apache.org/jira/browse/HBASE-26791 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Priority: Major > > The scenarios is the following: > # rs1 is flushing file to S3 for region1 > # rs1 loses ZK lock > # region1 gets assigned to rs2 > # rs2 opens region1 > # rs1 completes flush and updates sft file for region1 > # rs2 has a different “version” of the sft file for region1 > The flush should fail at the end, but the SFT file gets overwritten before > that, resulting in potential data loss. > > Potential solutions include: > * Adding timestamp to the tracker file names. This and creating a new > tracker file when an rs open the region would allow us to list available > tracker files before an update and compare the found timestamps to the one > stored in memory to verify the store still owns the latest tracker file > * Using the existing timestamp in the tracker file content. This would also > require us to create a new tracker file when a new rs opens the region, but > instead of listing the available tracker files, we could try to load and > de-serialize the last tracker file and compare the timestamp found in it to > the one stored in memory. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT
[ https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503647#comment-17503647 ] Josh Elser commented on HBASE-26791: ICYMI [~zhangduo] > Memstore flush fencing issue for SFT > > > Key: HBASE-26791 > URL: https://issues.apache.org/jira/browse/HBASE-26791 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Priority: Major > > The scenarios is the following: > # rs1 is flushing file to S3 for region1 > # rs1 loses ZK lock > # region1 gets assigned to rs2 > # rs2 opens region1 > # rs1 completes flush and updates sft file for region1 > # rs2 has a different “version” of the sft file for region1 > The flush should fail at the end, but the SFT file gets overwritten before > that, resulting in potential data loss. > > Potential solutions include: > * Adding timestamp to the tracker file names. This and creating a new > tracker file when an rs open the region would allow us to list available > tracker files before an update and compare the found timestamps to the one > stored in memory to verify the store still owns the latest tracker file > * Using the existing timestamp in the tracker file content. This would also > require us to create a new tracker file when a new rs opens the region, but > instead of listing the available tracker files, we could try to load and > de-serialize the last tracker file and compare the timestamp found in it to > the one stored in memory. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26791) Memstore flush fencing issue for SFT
[ https://issues.apache.org/jira/browse/HBASE-26791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503641#comment-17503641 ] Wellington Chevreuil commented on HBASE-26791: -- Whilst the proposed solutions would handle the pitfalls of File based SFT impl, isn't the broader issue here the fact RS1 doesn't abort immediately upon the loss of its ZK lock? Shouldn't we rather ensure an RS abort is triggered and all ongoing operations (including any hstore flushes) are interrupted right away? > Memstore flush fencing issue for SFT > > > Key: HBASE-26791 > URL: https://issues.apache.org/jira/browse/HBASE-26791 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Priority: Major > > The scenarios is the following: > # rs1 is flushing file to S3 for region1 > # rs1 loses ZK lock > # region1 gets assigned to rs2 > # rs2 opens region1 > # rs1 completes flush and updates sft file for region1 > # rs2 has a different “version” of the sft file for region1 > The flush should fail at the end, but the SFT file gets overwritten before > that, resulting in potential data loss. > > Potential solutions include: > * Adding timestamp to the tracker file names. This and creating a new > tracker file when an rs open the region would allow us to list available > tracker files before an update and compare the found timestamps to the one > stored in memory to verify the store still owns the latest tracker file > * Using the existing timestamp in the tracker file content. This would also > require us to create a new tracker file when a new rs opens the region, but > instead of listing the available tracker files, we could try to load and > de-serialize the last tracker file and compare the timestamp found in it to > the one stored in memory. -- This message was sent by Atlassian Jira (v8.20.1#820001)