Re: [PR] HBASE-28385 make Scan estimates more realistic [hbase]

2024-03-13 Thread via GitHub


Apache-HBase commented on PR #5713:
URL: https://github.com/apache/hbase/pull/5713#issuecomment-1996235622

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 35s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 10s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 51s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 20s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 55s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 51s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 51s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 17s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 232m  3s |  hbase-server in the patch passed.  
|
   |  |   | 256m 39s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5713 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 5d75f4d268c7 5.4.0-172-generic #190-Ubuntu SMP Fri Feb 2 
23:24:22 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / beafd33261 |
   | Default Java | Eclipse Adoptium-11.0.17+8 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/testReport/
 |
   | Max. process+thread count | 4941 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28385 make Scan estimates more realistic [hbase]

2024-03-13 Thread via GitHub


Apache-HBase commented on PR #5713:
URL: https://github.com/apache/hbase/pull/5713#issuecomment-1996228026

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 43s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  2s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 52s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 38s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 26s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 39s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 39s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 36s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 22s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 223m 47s |  hbase-server in the patch passed.  
|
   |  |   | 247m  8s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5713 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux d9f3fe995387 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / beafd33261 |
   | Default Java | Temurin-1.8.0_352-b08 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/testReport/
 |
   | Max. process+thread count | 5683 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28385 make Scan estimates more realistic [hbase]

2024-03-13 Thread via GitHub


Apache-HBase commented on PR #5713:
URL: https://github.com/apache/hbase/pull/5713#issuecomment-1996213403

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 25s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 50s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 52s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 28s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 25s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 47s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 51s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 51s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 29s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 204m 26s |  hbase-server in the patch passed.  
|
   |  |   | 228m 18s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5713 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux bbf01b6c7a25 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / beafd33261 |
   | Default Java | Eclipse Adoptium-17.0.10+7 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/testReport/
 |
   | Max. process+thread count | 5324 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (HBASE-28385) Quota estimates are too optimistic for large scans

2024-03-13 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28385.
---
Fix Version/s: 3.0.0-beta-2
 Release Note: When hbase.quota.use.result.size.bytes is false, we will now 
estimate the amount of quota to grab for a scan based on the block bytes 
scanned of previous next() requests. This will increase throughput for large 
scans which might prefer to wait a little longer for a larger portion of the 
quota.
   Resolution: Fixed

> Quota estimates are too optimistic for large scans
> --
>
> Key: HBASE-28385
> URL: https://issues.apache.org/jira/browse/HBASE-28385
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ray Mattingly
>Assignee: Ray Mattingly
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> Let's say you're running a table scan with a throttle of 100MB/sec per 
> RegionServer. Ideally your scans are going to pull down large results, often 
> containing hundreds or thousands of blocks.
> You will estimate each scan as costing a single block of read capacity, and 
> if your quota is already exhausted then the server will evaluate the backoff 
> required for your estimated consumption (1 block) to be available. This will 
> often be ~1ms, causing your retries to basically be immediate.
> Obviously it will routinely take much longer than 1ms for 100MB of IO to 
> become available in the given configuration, so your retries will be destined 
> to fail. At worst this can cause a saturation of your server's RPC layer, and 
> at best this causes erroneous exhaustion of the client's retries.
> We should find a way to make these estimates a bit smarter for large scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] HBASE-28385 make Scan estimates more realistic [hbase]

2024-03-13 Thread via GitHub


bbeaudreault merged PR #5713:
URL: https://github.com/apache/hbase/pull/5713


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28385 make Scan estimates more realistic [hbase]

2024-03-13 Thread via GitHub


Apache-HBase commented on PR #5713:
URL: https://github.com/apache/hbase/pull/5713#issuecomment-1995922218

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 37s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files 
found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any 
anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any 
@author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 57s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 28s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   0m 34s |  master passed  |
   | +1 :green_heart: |  spotless  |   0m 42s |  branch has no errors when 
running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   1m 33s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 46s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 29s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 29s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 37s |  hbase-server: The patch 
generated 0 new + 9 unchanged - 1 fixed = 9 total (was 10)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace 
issues.  |
   | +1 :green_heart: |  hadoopcheck  |   4m 48s |  Patch does not cause any 
errors with Hadoop 3.3.6.  |
   | +1 :green_heart: |  spotless  |   0m 43s |  patch has no errors when 
running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   1m 39s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 10s |  The patch does not generate 
ASF License warnings.  |
   |  |   |  28m 13s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/artifact/yetus-general-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5713 |
   | Optional Tests | dupname asflicense javac spotbugs hadoopcheck hbaseanti 
spotless checkstyle compile |
   | uname | Linux 50f02278a732 5.4.0-169-generic #187-Ubuntu SMP Thu Nov 23 
14:52:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / beafd33261 |
   | Default Java | Eclipse Adoptium-11.0.17+8 |
   | Max. process+thread count | 77 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/console 
|
   | versions | git=2.34.1 maven=3.8.6 spotbugs=4.7.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HBASE-28441) Update downloads.xml for 2.5.8

2024-03-13 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28441:
---

 Summary: Update downloads.xml for 2.5.8
 Key: HBASE-28441
 URL: https://issues.apache.org/jira/browse/HBASE-28441
 Project: HBase
  Issue Type: Task
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28441) Update downloads.xml for 2.5.8

2024-03-13 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-28441.
-
Resolution: Fixed

> Update downloads.xml for 2.5.8
> --
>
> Key: HBASE-28441
> URL: https://issues.apache.org/jira/browse/HBASE-28441
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28260) Possible data loss in WAL after RegionServer crash

2024-03-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826851#comment-17826851
 ] 

Hudson commented on HBASE-28260:


Results for branch branch-2.5
[build #496 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/496/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/496/General_20Nightly_20Build_20Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/496/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/496/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/496/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Possible data loss in WAL after RegionServer crash
> --
>
> Key: HBASE-28260
> URL: https://issues.apache.org/jira/browse/HBASE-28260
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Assignee: Charles Connell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2, 2.5.9
>
>
> We recently had a production incident:
>  # RegionServer crashes, but local DataNode lives on
>  # WAL lease recovery kicks in
>  # Namenode reconstructs the block during lease recovery (which results in a 
> new genstamp). It chooses the replica on the local DataNode as the primary.
>  # Local DataNode reconstructs the block, so NameNode registers the new 
> genstamp.
>  # Local DataNode and the underlying host dies, before the new block could be 
> replicated to other replicas.
> This leaves us with a missing block, because the new genstamp block has no 
> replicas. The old replicas still remain, but are considered corrupt due to 
> GENSTAMP_MISMATCH.
> Thankfully we were able to confirm that the length of the corrupt blocks were 
> identical to the newly constructed and lost block. Further, the file in 
> question was only 1 block. So we downloaded one of those corrupt block files 
> and hdfs {{hdfs dfs -put -f}} to force that block to replace the file in 
> hdfs. So in this case we had no actual data loss, but it could have happened 
> easily if the file was more than 1 block or the replicas weren't fully in 
> sync prior to reconstruction.
> In order to avoid this issue, we should avoid writing WAL blocks too the 
> local datanode. We can use CreateFlag.NO_WRITE_LOCAL for this. Hat tip to 
> [~weichiu] for pointing this out.
> During reading of WALs we already reorder blocks so as to avoid reading from 
> the local datanode, but avoiding writing there altogether would be better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826829#comment-17826829
 ] 

Andrew Kyle Purtell commented on HBASE-27826:
-

I started a design document. Find it in the issue links. Anyone who has this 
link can edit. [~prathyu6] [~zhangduo] [~wchevreuil] 

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 
> instantaneous. 
> There are two issues but both can be handled with modifications to the store 
> file tracker interface and the file based store file tracker implementation. 
> When the file based store file file tracker is enabled the HFile links should 
> be virtual entities that only exist in the file manifest. We do not require 
> physical files in the filesystem to serve as links now. That is the magic of 
> the this file tracker, the manifest file replaces requirements to list the 
> filesystem.
> Then, when splitting or merging, the HFile links should be collected into a 
> list and committed in one batch using a new FILE file tracker interface, 
> requiring only one update of the manifest file in S3, bringing the time 
> requirement for this operation to O(1) down from O[n].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28419) Allow Action and Policies of ServerKillingMonkey to be configurable

2024-03-13 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HBASE-28419.
-
Fix Version/s: 4.0.0-alpha-1
   Resolution: Fixed

> Allow Action and Policies of ServerKillingMonkey to be configurable
> ---
>
> Key: HBASE-28419
> URL: https://issues.apache.org/jira/browse/HBASE-28419
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Reporter: Pratyush Bhatt
>Assignee: Wei-Chiu Chuang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>
> Currently for ServerKillingMonkeyFactory, actions and policies have hardcoded 
> timeouts.
> {code:java}
>     Action[] actions1 = new Action[] {
>       new RestartRandomRsExceptMetaAction(6),
>       new RestartActiveMasterAction(5000),
>       // only allow 2 servers to be dead
>       new RollingBatchRestartRsAction(5000, 1.0f, 2, true),
>       new ForceBalancerAction(),
>       new GracefulRollingRestartRsAction(gracefulRollingRestartTSSLeepTime),
>       new RollingBatchSuspendResumeRsAction(rollingBatchSuspendRSSleepTime,
>           rollingBatchSuspendtRSRatio)
>     }; {code}
> and
> {code:java}
>     return new PolicyBasedChaosMonkey(properties, util,
>       new CompositeSequentialPolicy(new DoActionsOncePolicy(60 * 1000, 
> actions1),
>         new PeriodicRandomActionPolicy(60 * 1000, actions1)),
>       new PeriodicRandomActionPolicy(60 * 1000, actions2));
>   } {code}
> We should allow these to be configurable too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] HBASE-28419 Allow Action and Policies of ServerKillingMonkey to be configurable. [hbase]

2024-03-13 Thread via GitHub


jojochuang merged PR #5743:
URL: https://github.com/apache/hbase/pull/5743


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28419 Allow Action and Policies of ServerKillingMonkey to be configurable. [hbase]

2024-03-13 Thread via GitHub


jojochuang commented on PR #5743:
URL: https://github.com/apache/hbase/pull/5743#issuecomment-1995116868

   Thanks @ndimiduk merging it to master branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Wellington Chevreuil (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826794#comment-17826794
 ] 

Wellington Chevreuil commented on HBASE-27826:
--

{quote}

Like createLink(), deleteLink(), createReference(), deleteReference(), and so 
on. The SFT becomes responsible for listing the link and reference files among 
the store contents. Today we sometimes go directly to the filesystem for 
listing stores, still. We do direct filesystem access for making and 
discovering link and reference files. This is wrong. SFT should be the 
exclusive way we track and discover store contents.

{quote}

Yeah, I underestimated what splitFile meant, was thinking it merely as creating 
refs on each of the daughter regions. I agree, we should make SFT central point 
for these FS interactions.

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 
> instantaneous. 
> There are two issues but both can be handled with modifications to the store 
> file tracker interface and the file based store file tracker implementation. 
> When the file based store file file tracker is enabled the HFile links should 
> be virtual entities that only exist in the file manifest. We do not require 
> physical files in the filesystem to serve as links now. That is the magic of 
> the this file tracker, the manifest file replaces requirements to list the 
> filesystem.
> Then, when splitting or merging, the HFile links should be collected into a 
> list and committed in one batch using a new FILE file tracker interface, 
> requiring only one update of the manifest file in S3, bringing the time 
> requirement for this operation to O(1) down from O[n].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 4:16 PM:
--

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. References and links are 
aspects of maintaining a directory of store file contents. SFT is the 
appropriate place to make design changes (in my opinion). And once SFT is 
managing references and links, they do not need to be real files, they can be 
virtual concepts maintained in the manifest. So SFT gets new additional methods 
for adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on. The SFT becomes responsible 
for listing the link and reference files among the store contents. Today we 
sometimes go directly to the filesystem for listing stores, still. This is 
wrong! SFT should be the exclusive way we track and discover store contents. 

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine links and references are entries in the manifest, not real files. We 
don't take the cost of creating files in the S3 bucket, we only update the 
manifest, and that can be optimized further. We can gather all of the links and 
references we want to create into a list, and we submit them to SFT all at 
once, using an interface method that accepts an array or list of SFT mutations 
to perform in batch, so there is only one manifest update required, and then 
this aspect of splitting becomes O(1) in time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.


was (Author: apurtell):
{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. References and links are 
aspects of maintaining a directory of store file contents. SFT is the 
appropriate place to make design changes (in my opinion). And once SFT is 
managing references and links, they do not need to be real files, they can be 
virtual concepts maintained in the manifest. So SFT gets new additional methods 
for adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on.

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine links and references are entries in the manifest, not real files. We 
don't take the cost of creating files in the S3 bucket, we only update the 
manifest, and that can be optimized further. We can gather all of the links and 
references we want to create into a list, and we submit them to SFT all at 
once, using an interface method that accepts an array or list of SFT mutations 
to perform in batch, so there is only one manifest update required, and then 
this aspect of splitting becomes O(1) in time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. 

[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 4:17 PM:
--

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. References and links are 
aspects of maintaining a directory of store file contents. SFT is the 
appropriate place to make design changes (in my opinion). And once SFT is 
managing references and links, they do not need to be real files, they can be 
virtual concepts maintained in the manifest. So SFT gets new additional methods 
for adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on. The SFT becomes responsible 
for listing the link and reference files among the store contents. Today we 
sometimes go directly to the filesystem for listing stores, still. We do direct 
filesystem access for making and discovering link and reference files. This is 
wrong. SFT should be the exclusive way we track and discover store contents.

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine links and references are entries in the manifest, not real files. We 
don't take the cost of creating files in the S3 bucket, we only update the 
manifest, and that can be optimized further. We can gather all of the links and 
references we want to create into a list, and we submit them to SFT all at 
once, using an interface method that accepts an array or list of SFT mutations 
to perform in batch, so there is only one manifest update required, and then 
this aspect of splitting becomes O(1) in time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.


was (Author: apurtell):
{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. References and links are 
aspects of maintaining a directory of store file contents. SFT is the 
appropriate place to make design changes (in my opinion). And once SFT is 
managing references and links, they do not need to be real files, they can be 
virtual concepts maintained in the manifest. So SFT gets new additional methods 
for adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on. The SFT becomes responsible 
for listing the link and reference files among the store contents. Today we 
sometimes go directly to the filesystem for listing stores, still. This is 
wrong! SFT should be the exclusive way we track and discover store contents. 

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine links and references are entries in the manifest, not real files. We 
don't take the cost of creating files in the S3 bucket, we only update the 
manifest, and that can be optimized further. We can gather all of the links and 
references we want to create into a list, and we submit them to SFT all at 
once, using an interface method that accepts an array or list of SFT mutations 
to perform in batch, so there is only one manifest update required, and then 
this aspect of 

[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 4:14 PM:
--

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. References and links are 
aspects of maintaining a directory of store file contents. SFT is the 
appropriate place to make design changes (in my opinion). And once SFT is 
managing references and links, they do not need to be real files, they can be 
virtual concepts maintained in the manifest. So SFT gets new additional methods 
for adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on.

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine links and references are entries in the manifest, not real files. We 
don't take the cost of creating files in the S3 bucket, we only update the 
manifest, and that can be optimized further. We can gather all of the links and 
references we want to create into a list, and we submit them to SFT all at 
once, using an interface method that accepts an array or list of SFT mutations 
to perform in batch, so there is only one manifest update required, and then 
this aspect of splitting becomes O(1) in time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.


was (Author: apurtell):
{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. References and links are 
aspects of maintaining a directory of store file contents. SFT is the 
appropriate place to make design changes (in my opinion). And once SFT is 
managing references and links, they do not need to be real files, they can be 
virtual concepts maintained in the manifest. So SFT gets new additional methods 
for adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on.

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine we gather all of the links and references we want to create into a 
list, and we submit them to SFT all at once, using an interface method that 
accepts an array or list of SFT mutations to perform in batch, so there is only 
one manifest update required, and then this aspect of splitting becomes O(1) in 
time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: 

[jira] [Created] (HBASE-28440) Add support for using mapreduce sort in HFileOutputFormat2

2024-03-13 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28440:
-

 Summary: Add support for using mapreduce sort in HFileOutputFormat2
 Key: HBASE-28440
 URL: https://issues.apache.org/jira/browse/HBASE-28440
 Project: HBase
  Issue Type: Improvement
  Components: backuprestore
Reporter: Bryan Beaudreault


Currently HFileOutputFormat2 uses CellSortReducer, which attempts to sort all 
of the cells of a row in memory using a TreeSet. There is a warning in the 
javadoc "If lots of columns per row, it will use lots of memory sorting." This 
can be problematic for WALPlayer, which uses HFileOutputFormat2. You could have 
reasonably sized row which just gets lots of edits in the time period of WALs 
being replayed, and that would cause an OOM. We are seeing this in some cases 
with incremental backups.

MapReduce has built-in sorting capabilities which are not limited to sorting in 
memory. It can spill to disk as necessary to sort very large datasets. We can 
get this capability in HFileOutputFormat2 with a couple changes:
 # Add support for a KeyOnlyCellComparable type as the map output key
 # When configured, use 
job.setSortComparatorClass(CellWritableComparator.class) and 
job.setReducerClass(PreSortedCellsReducer.class)
 # Update WALPlayer to have a mode which can output this new comparable instead 
of ImmutableBytesWritable

CellWritableComparator exists already for the Import job, so there is some 
prior art. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 3:39 PM:
--

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. References and links are 
aspects of maintaining a directory of store file contents. SFT is the 
appropriate place to make design changes (in my opinion). And once SFT is 
managing references and links, they do not need to be real files, they can be 
virtual concepts maintained in the manifest. So SFT gets new additional methods 
for adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on.

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine we gather all of the links and references we want to create into a 
list, and we submit them to SFT all at once, using an interface method that 
accepts an array or list of SFT mutations to perform in batch, so there is only 
one manifest update required, and then this aspect of splitting becomes O(1) in 
time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.


was (Author: apurtell):
{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on.

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine we gather all of the links and references we want to create into a 
list, and we submit them to SFT all at once, using an interface method that 
accepts an array or list of SFT mutations to perform in batch, so there is only 
one manifest update required, and then this aspect of splitting becomes O(1) in 
time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the 

[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 3:37 PM:
--

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on.

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine we gather all of the links and references we want to create into a 
list, and we submit them to SFT all at once, using an interface method that 
accepts an array or list of SFT mutations to perform in batch, so there is only 
one manifest update required, and then this aspect of splitting becomes O(1) in 
time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.


was (Author: apurtell):
{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on. 

Once references and links are virtual concepts when using the 
FileBasedStoreFileTracker, we do not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region.

We can further optimize by gathering all of the links and references we want to 
create into a list and submitting them to SFT all at once using an interface 
method that accepts an array or list of SFT mutations to perform in batch, so 
there is only one manifest update required, and then this aspect of splitting 
becomes O(1) in time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> 

[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 3:36 PM:
--

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on. 

Once references and links are virtual concepts when using the 
FileBasedStoreFileTracker, we do not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region.

We can further optimize by gathering all of the links and references we want to 
create into a list and submitting them to SFT all at once using an interface 
method that accepts an array or list of SFT mutations to perform in batch, so 
there is only one manifest update required, and then this aspect of splitting 
becomes O(1) in time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.


was (Author: apurtell):
{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links.

Once references and links are virtual concepts when using the 
FileBasedStoreFileTracker, we do not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region.

We can further optimize by gathering all of the links and references we want to 
create into a list and submitting them to SFT all at once by some method like 
SFT.createLink(HFileLink links[]), so there is only one manifest update 
required, and then this aspect of splitting becomes O(1) in time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking. 

A design doc will help clarify the proposals and discussion.

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and 

[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 3:33 PM:
--

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links.

Once references and links are virtual concepts when using the 
FileBasedStoreFileTracker, we do not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region.

We can further optimize by gathering all of the links and references we want to 
create into a list and submitting them to SFT all at once by some method like 
SFT.createLink(HFileLink links[]), so there is only one manifest update 
required, and then this aspect of splitting becomes O(1) in time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking. 

A design doc will help clarify the proposals and discussion.


was (Author: apurtell):
{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links. 

Once references and links are virtual concepts when using the 
FileBasedStoreFileTracker, we do not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. 

We can further optimize by gathering all of the links and references we want to 
create into a list and submitting them to SFT all at once by some method like 
SFT.createLink(HFileLink links[]), so there is only one manifest update 
required, and then this aspect of splitting becomes O(1) in time.

A design doc will help clarify the proposals and discussion.

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 

[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 3:29 PM:
--

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links. 

Once references and links are virtual concepts when using the 
FileBasedStoreFileTracker, we do not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. 

We can further optimize by gathering all of the links and references we want to 
create into a list and submitting them to SFT all at once by some method like 
SFT.createLink(HFileLink links[]), so there is only one manifest update 
required, and then this aspect of splitting becomes O(1) in time.

A design doc will help clarify the proposals and discussion.


was (Author: apurtell):
{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this. 

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links. 

A design doc will help clarify the proposals and discussion.

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 
> instantaneous. 
> There are two issues but both can be handled with modifications to the store 
> file tracker interface and the file based store file tracker implementation. 
> When the file based store file file tracker is enabled the HFile links should 
> be virtual entities that only exist in the file manifest. We do not require 
> physical files in the filesystem to serve as links now. That is the magic of 
> the this file tracker, the manifest file replaces requirements to list the 
> filesystem.
> Then, when splitting or merging, the HFile links should be collected into a 
> list and committed in one batch using a new FILE file tracker interface, 
> requiring only one update of the manifest file in S3, bringing the time 
> requirement for this operation to O(1) down from O[n].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell commented on HBASE-27826:
-

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this. 

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links. 

A design doc will help clarify the proposals and discussion.

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 
> instantaneous. 
> There are two issues but both can be handled with modifications to the store 
> file tracker interface and the file based store file tracker implementation. 
> When the file based store file file tracker is enabled the HFile links should 
> be virtual entities that only exist in the file manifest. We do not require 
> physical files in the filesystem to serve as links now. That is the magic of 
> the this file tracker, the manifest file replaces requirements to list the 
> filesystem.
> Then, when splitting or merging, the HFile links should be collected into a 
> list and committed in one batch using a new FILE file tracker interface, 
> requiring only one update of the manifest file in S3, bringing the time 
> requirement for this operation to O(1) down from O[n].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Wellington Chevreuil (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826761#comment-17826761
 ] 

Wellington Chevreuil commented on HBASE-27826:
--

Thanks for the headsup, [~zhangduo] and for picking this up, [~prathyu6]!

 

Summarising my understanding from the discussion:

1) We will define a splitFiles method in StoreFileTracker interface, so that 
everywhere we do split logic currently (like in SplitTableRegionProcedure) 
would now delegate to the StoreFileTracker implementation;

2) DefaultStoreFileTracker implementation would still create actual ref/link 
files in the split daughter regions, whilst FileBasedTracker impl would keep a 
link/ref at its metadata only.

3) We would need to change the format for the meta files of FileBasedTracker to 
include the parent region location for split daughters "inherited" files.

 

Seems reasonable to me, looking forward for the design doc/initial PR.

 

 

 

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 
> instantaneous. 
> There are two issues but both can be handled with modifications to the store 
> file tracker interface and the file based store file tracker implementation. 
> When the file based store file file tracker is enabled the HFile links should 
> be virtual entities that only exist in the file manifest. We do not require 
> physical files in the filesystem to serve as links now. That is the magic of 
> the this file tracker, the manifest file replaces requirements to list the 
> filesystem.
> Then, when splitting or merging, the HFile links should be collected into a 
> list and committed in one batch using a new FILE file tracker interface, 
> requiring only one update of the manifest file in S3, bringing the time 
> requirement for this operation to O(1) down from O[n].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Prathyusha (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826069#comment-17826069
 ] 

Prathyusha edited comment on HBASE-27826 at 3/13/24 2:05 PM:
-

Sure [~zhangduo] and 
> And about the split time, how many store files do you have when testing? I 
> think we can gain more if there are more store files.
Yes, here I had a single region with 1 column family and 20 store files, and 
multithreading was enabled for creating ref/link files, but yes a dev cluster 
with no other load on it


was (Author: prathyu6):
Sure [~zhangduo] and 
> And about the split time, how many store files do you have when testing? I 
> think we can gain more if there are more store files.
Yes, here I had a single region with 1 column family and 20 store files, and 
multithreading was enabled creating ref/link files, but yes a dev cluster with 
no other load on it

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 
> instantaneous. 
> There are two issues but both can be handled with modifications to the store 
> file tracker interface and the file based store file tracker implementation. 
> When the file based store file file tracker is enabled the HFile links should 
> be virtual entities that only exist in the file manifest. We do not require 
> physical files in the filesystem to serve as links now. That is the magic of 
> the this file tracker, the manifest file replaces requirements to list the 
> filesystem.
> Then, when splitting or merging, the HFile links should be collected into a 
> list and committed in one batch using a new FILE file tracker interface, 
> requiring only one update of the manifest file in S3, bringing the time 
> requirement for this operation to O(1) down from O[n].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Prathyusha (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826069#comment-17826069
 ] 

Prathyusha commented on HBASE-27826:


Sure [~zhangduo] and 
> And about the split time, how many store files do you have when testing? I 
> think we can gain more if there are more store files.
Yes, here I had a single region with 1 column family and 20 store files, and 
multithreading was enabled creating ref/link files, but yes a dev cluster with 
no other load on it

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 
> instantaneous. 
> There are two issues but both can be handled with modifications to the store 
> file tracker interface and the file based store file tracker implementation. 
> When the file based store file file tracker is enabled the HFile links should 
> be virtual entities that only exist in the file manifest. We do not require 
> physical files in the filesystem to serve as links now. That is the magic of 
> the this file tracker, the manifest file replaces requirements to list the 
> filesystem.
> Then, when splitting or merging, the HFile links should be collected into a 
> list and committed in one batch using a new FILE file tracker interface, 
> requiring only one update of the manifest file in S3, bringing the time 
> requirement for this operation to O(1) down from O[n].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28439) Remove ZooKeeper as a means of creating a client connection

2024-03-13 Thread Nick Dimiduk (Jira)
Nick Dimiduk created HBASE-28439:


 Summary: Remove ZooKeeper as a means of creating a client 
connection
 Key: HBASE-28439
 URL: https://issues.apache.org/jira/browse/HBASE-28439
 Project: HBase
  Issue Type: Task
  Components: Client
Affects Versions: 4.0.0-alpha-1
Reporter: Nick Dimiduk


Following up the discussion and decision around HBASE-23324, we will remove 
ZooKeeper as a point of entry for client connections.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28260) Possible data loss in WAL after RegionServer crash

2024-03-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825924#comment-17825924
 ] 

Hudson commented on HBASE-28260:


Results for branch branch-2
[build #1011 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1011/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1011/General_20Nightly_20Build_20Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1011/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1011/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1011/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Possible data loss in WAL after RegionServer crash
> --
>
> Key: HBASE-28260
> URL: https://issues.apache.org/jira/browse/HBASE-28260
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Assignee: Charles Connell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2, 2.5.9
>
>
> We recently had a production incident:
>  # RegionServer crashes, but local DataNode lives on
>  # WAL lease recovery kicks in
>  # Namenode reconstructs the block during lease recovery (which results in a 
> new genstamp). It chooses the replica on the local DataNode as the primary.
>  # Local DataNode reconstructs the block, so NameNode registers the new 
> genstamp.
>  # Local DataNode and the underlying host dies, before the new block could be 
> replicated to other replicas.
> This leaves us with a missing block, because the new genstamp block has no 
> replicas. The old replicas still remain, but are considered corrupt due to 
> GENSTAMP_MISMATCH.
> Thankfully we were able to confirm that the length of the corrupt blocks were 
> identical to the newly constructed and lost block. Further, the file in 
> question was only 1 block. So we downloaded one of those corrupt block files 
> and hdfs {{hdfs dfs -put -f}} to force that block to replace the file in 
> hdfs. So in this case we had no actual data loss, but it could have happened 
> easily if the file was more than 1 block or the replicas weren't fully in 
> sync prior to reconstruction.
> In order to avoid this issue, we should avoid writing WAL blocks too the 
> local datanode. We can use CreateFlag.NO_WRITE_LOCAL for this. Hat tip to 
> [~weichiu] for pointing this out.
> During reading of WALs we already reorder blocks so as to avoid reading from 
> the local datanode, but avoiding writing there altogether would be better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)