Re: [PR] HBASE-28385 make Scan estimates more realistic [hbase]
Apache-HBase commented on PR #5713: URL: https://github.com/apache/hbase/pull/5713#issuecomment-1996235622 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | +0 :ok: | reexec | 0m 35s | Docker mode activated. | | -0 :warning: | yetus | 0m 3s | Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck | ||| _ Prechecks _ | ||| _ master Compile Tests _ | | +1 :green_heart: | mvninstall | 3m 10s | master passed | | +1 :green_heart: | compile | 0m 51s | master passed | | +1 :green_heart: | shadedjars | 5m 20s | branch has no errors when building our shaded downstream artifacts. | | +1 :green_heart: | javadoc | 0m 28s | master passed | ||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 2m 55s | the patch passed | | +1 :green_heart: | compile | 0m 51s | the patch passed | | +1 :green_heart: | javac | 0m 51s | the patch passed | | +1 :green_heart: | shadedjars | 5m 17s | patch has no errors when building our shaded downstream artifacts. | | +1 :green_heart: | javadoc | 0m 27s | the patch passed | ||| _ Other Tests _ | | +1 :green_heart: | unit | 232m 3s | hbase-server in the patch passed. | | | | 256m 39s | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile | | GITHUB PR | https://github.com/apache/hbase/pull/5713 | | Optional Tests | javac javadoc unit shadedjars compile | | uname | Linux 5d75f4d268c7 5.4.0-172-generic #190-Ubuntu SMP Fri Feb 2 23:24:22 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/hbase-personality.sh | | git revision | master / beafd33261 | | Default Java | Eclipse Adoptium-11.0.17+8 | | Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/testReport/ | | Max. process+thread count | 4941 (vs. ulimit of 3) | | modules | C: hbase-server U: hbase-server | | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/console | | versions | git=2.34.1 maven=3.8.6 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] HBASE-28385 make Scan estimates more realistic [hbase]
Apache-HBase commented on PR #5713: URL: https://github.com/apache/hbase/pull/5713#issuecomment-1996228026 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | +0 :ok: | reexec | 0m 43s | Docker mode activated. | | -0 :warning: | yetus | 0m 2s | Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck | ||| _ Prechecks _ | ||| _ master Compile Tests _ | | +1 :green_heart: | mvninstall | 2m 52s | master passed | | +1 :green_heart: | compile | 0m 38s | master passed | | +1 :green_heart: | shadedjars | 5m 38s | branch has no errors when building our shaded downstream artifacts. | | +1 :green_heart: | javadoc | 0m 24s | master passed | ||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 2m 26s | the patch passed | | +1 :green_heart: | compile | 0m 39s | the patch passed | | +1 :green_heart: | javac | 0m 39s | the patch passed | | +1 :green_heart: | shadedjars | 5m 36s | patch has no errors when building our shaded downstream artifacts. | | +1 :green_heart: | javadoc | 0m 22s | the patch passed | ||| _ Other Tests _ | | +1 :green_heart: | unit | 223m 47s | hbase-server in the patch passed. | | | | 247m 8s | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile | | GITHUB PR | https://github.com/apache/hbase/pull/5713 | | Optional Tests | javac javadoc unit shadedjars compile | | uname | Linux d9f3fe995387 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/hbase-personality.sh | | git revision | master / beafd33261 | | Default Java | Temurin-1.8.0_352-b08 | | Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/testReport/ | | Max. process+thread count | 5683 (vs. ulimit of 3) | | modules | C: hbase-server U: hbase-server | | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/console | | versions | git=2.34.1 maven=3.8.6 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] HBASE-28385 make Scan estimates more realistic [hbase]
Apache-HBase commented on PR #5713: URL: https://github.com/apache/hbase/pull/5713#issuecomment-1996213403 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | +0 :ok: | reexec | 0m 25s | Docker mode activated. | | -0 :warning: | yetus | 0m 3s | Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck | ||| _ Prechecks _ | ||| _ master Compile Tests _ | | +1 :green_heart: | mvninstall | 2m 50s | master passed | | +1 :green_heart: | compile | 0m 52s | master passed | | +1 :green_heart: | shadedjars | 5m 28s | branch has no errors when building our shaded downstream artifacts. | | +1 :green_heart: | javadoc | 0m 25s | master passed | ||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 2m 47s | the patch passed | | +1 :green_heart: | compile | 0m 51s | the patch passed | | +1 :green_heart: | javac | 0m 51s | the patch passed | | +1 :green_heart: | shadedjars | 5m 29s | patch has no errors when building our shaded downstream artifacts. | | +1 :green_heart: | javadoc | 0m 24s | the patch passed | ||| _ Other Tests _ | | +1 :green_heart: | unit | 204m 26s | hbase-server in the patch passed. | | | | 228m 18s | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile | | GITHUB PR | https://github.com/apache/hbase/pull/5713 | | Optional Tests | javac javadoc unit shadedjars compile | | uname | Linux bbf01b6c7a25 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/hbase-personality.sh | | git revision | master / beafd33261 | | Default Java | Eclipse Adoptium-17.0.10+7 | | Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/testReport/ | | Max. process+thread count | 5324 (vs. ulimit of 3) | | modules | C: hbase-server U: hbase-server | | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/console | | versions | git=2.34.1 maven=3.8.6 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (HBASE-28385) Quota estimates are too optimistic for large scans
[ https://issues.apache.org/jira/browse/HBASE-28385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Beaudreault resolved HBASE-28385. --- Fix Version/s: 3.0.0-beta-2 Release Note: When hbase.quota.use.result.size.bytes is false, we will now estimate the amount of quota to grab for a scan based on the block bytes scanned of previous next() requests. This will increase throughput for large scans which might prefer to wait a little longer for a larger portion of the quota. Resolution: Fixed > Quota estimates are too optimistic for large scans > -- > > Key: HBASE-28385 > URL: https://issues.apache.org/jira/browse/HBASE-28385 > Project: HBase > Issue Type: Improvement >Reporter: Ray Mattingly >Assignee: Ray Mattingly >Priority: Major > Labels: pull-request-available > Fix For: 2.6.0, 3.0.0-beta-2 > > > Let's say you're running a table scan with a throttle of 100MB/sec per > RegionServer. Ideally your scans are going to pull down large results, often > containing hundreds or thousands of blocks. > You will estimate each scan as costing a single block of read capacity, and > if your quota is already exhausted then the server will evaluate the backoff > required for your estimated consumption (1 block) to be available. This will > often be ~1ms, causing your retries to basically be immediate. > Obviously it will routinely take much longer than 1ms for 100MB of IO to > become available in the given configuration, so your retries will be destined > to fail. At worst this can cause a saturation of your server's RPC layer, and > at best this causes erroneous exhaustion of the client's retries. > We should find a way to make these estimates a bit smarter for large scans. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] HBASE-28385 make Scan estimates more realistic [hbase]
bbeaudreault merged PR #5713: URL: https://github.com/apache/hbase/pull/5713 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] HBASE-28385 make Scan estimates more realistic [hbase]
Apache-HBase commented on PR #5713: URL: https://github.com/apache/hbase/pull/5713#issuecomment-1995922218 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | +0 :ok: | reexec | 0m 37s | Docker mode activated. | ||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. | | +1 :green_heart: | hbaseanti | 0m 0s | Patch does not have any anti-patterns. | | +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. | ||| _ master Compile Tests _ | | +1 :green_heart: | mvninstall | 2m 57s | master passed | | +1 :green_heart: | compile | 2m 28s | master passed | | +1 :green_heart: | checkstyle | 0m 34s | master passed | | +1 :green_heart: | spotless | 0m 42s | branch has no errors when running spotless:check. | | +1 :green_heart: | spotbugs | 1m 33s | master passed | ||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 2m 46s | the patch passed | | +1 :green_heart: | compile | 2m 29s | the patch passed | | +1 :green_heart: | javac | 2m 29s | the patch passed | | +1 :green_heart: | checkstyle | 0m 37s | hbase-server: The patch generated 0 new + 9 unchanged - 1 fixed = 9 total (was 10) | | +1 :green_heart: | whitespace | 0m 0s | The patch has no whitespace issues. | | +1 :green_heart: | hadoopcheck | 4m 48s | Patch does not cause any errors with Hadoop 3.3.6. | | +1 :green_heart: | spotless | 0m 43s | patch has no errors when running spotless:check. | | +1 :green_heart: | spotbugs | 1m 39s | the patch passed | ||| _ Other Tests _ | | +1 :green_heart: | asflicense | 0m 10s | The patch does not generate ASF License warnings. | | | | 28m 13s | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/artifact/yetus-general-check/output/Dockerfile | | GITHUB PR | https://github.com/apache/hbase/pull/5713 | | Optional Tests | dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile | | uname | Linux 50f02278a732 5.4.0-169-generic #187-Ubuntu SMP Thu Nov 23 14:52:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/hbase-personality.sh | | git revision | master / beafd33261 | | Default Java | Eclipse Adoptium-11.0.17+8 | | Max. process+thread count | 77 (vs. ulimit of 3) | | modules | C: hbase-server U: hbase-server | | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5713/8/console | | versions | git=2.34.1 maven=3.8.6 spotbugs=4.7.3 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HBASE-28441) Update downloads.xml for 2.5.8
Andrew Kyle Purtell created HBASE-28441: --- Summary: Update downloads.xml for 2.5.8 Key: HBASE-28441 URL: https://issues.apache.org/jira/browse/HBASE-28441 Project: HBase Issue Type: Task Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28441) Update downloads.xml for 2.5.8
[ https://issues.apache.org/jira/browse/HBASE-28441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-28441. - Resolution: Fixed > Update downloads.xml for 2.5.8 > -- > > Key: HBASE-28441 > URL: https://issues.apache.org/jira/browse/HBASE-28441 > Project: HBase > Issue Type: Task >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-28260) Possible data loss in WAL after RegionServer crash
[ https://issues.apache.org/jira/browse/HBASE-28260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826851#comment-17826851 ] Hudson commented on HBASE-28260: Results for branch branch-2.5 [build #496 on builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/496/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/496/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/496/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/496/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/496/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Possible data loss in WAL after RegionServer crash > -- > > Key: HBASE-28260 > URL: https://issues.apache.org/jira/browse/HBASE-28260 > Project: HBase > Issue Type: Bug >Reporter: Bryan Beaudreault >Assignee: Charles Connell >Priority: Major > Labels: pull-request-available > Fix For: 2.6.0, 3.0.0-beta-2, 2.5.9 > > > We recently had a production incident: > # RegionServer crashes, but local DataNode lives on > # WAL lease recovery kicks in > # Namenode reconstructs the block during lease recovery (which results in a > new genstamp). It chooses the replica on the local DataNode as the primary. > # Local DataNode reconstructs the block, so NameNode registers the new > genstamp. > # Local DataNode and the underlying host dies, before the new block could be > replicated to other replicas. > This leaves us with a missing block, because the new genstamp block has no > replicas. The old replicas still remain, but are considered corrupt due to > GENSTAMP_MISMATCH. > Thankfully we were able to confirm that the length of the corrupt blocks were > identical to the newly constructed and lost block. Further, the file in > question was only 1 block. So we downloaded one of those corrupt block files > and hdfs {{hdfs dfs -put -f}} to force that block to replace the file in > hdfs. So in this case we had no actual data loss, but it could have happened > easily if the file was more than 1 block or the replicas weren't fully in > sync prior to reconstruction. > In order to avoid this issue, we should avoid writing WAL blocks too the > local datanode. We can use CreateFlag.NO_WRITE_LOCAL for this. Hat tip to > [~weichiu] for pointing this out. > During reading of WALs we already reorder blocks so as to avoid reading from > the local datanode, but avoiding writing there altogether would be better. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files
[ https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826829#comment-17826829 ] Andrew Kyle Purtell commented on HBASE-27826: - I started a design document. Find it in the issue links. Anyone who has this link can edit. [~prathyu6] [~zhangduo] [~wchevreuil] > Region split and merge time while offline is O(n) with respect to number of > store files > --- > > Key: HBASE-27826 > URL: https://issues.apache.org/jira/browse/HBASE-27826 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.4 >Reporter: Andrew Kyle Purtell >Priority: Major > > This is a significant availability issue when HFiles are on S3. = > HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed > the split and merge table procedure implementations to indirect through the > StoreFileTracker implementation when selecting HFiles to be merged or split, > rather than directly listing those using file system APIs. It also changed > the commit logic in HRegionFileSystem to add the link/ref files on resulting > split or merged regions to the StoreFileTracker. However, the creation of a > link file is still a filesystem operation and creating a “file” on S3 can > take well over a second. If, for example there are 20 store files in a > region, which is not uncommon, after the region is taken offline for a split > (or merge) it may require more than 20 seconds to create the link files > before the results can be brought back online, creating a severe availability > problem. Splits and merges are supposed to be fast, completing in less than a > second, certainly less than a few seconds. This has been true when HFiles are > stored on HDFS only because file creation operations there are nearly > instantaneous. > There are two issues but both can be handled with modifications to the store > file tracker interface and the file based store file tracker implementation. > When the file based store file file tracker is enabled the HFile links should > be virtual entities that only exist in the file manifest. We do not require > physical files in the filesystem to serve as links now. That is the magic of > the this file tracker, the manifest file replaces requirements to list the > filesystem. > Then, when splitting or merging, the HFile links should be collected into a > list and committed in one batch using a new FILE file tracker interface, > requiring only one update of the manifest file in S3, bringing the time > requirement for this operation to O(1) down from O[n]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28419) Allow Action and Policies of ServerKillingMonkey to be configurable
[ https://issues.apache.org/jira/browse/HBASE-28419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved HBASE-28419. - Fix Version/s: 4.0.0-alpha-1 Resolution: Fixed > Allow Action and Policies of ServerKillingMonkey to be configurable > --- > > Key: HBASE-28419 > URL: https://issues.apache.org/jira/browse/HBASE-28419 > Project: HBase > Issue Type: Improvement > Components: test >Reporter: Pratyush Bhatt >Assignee: Wei-Chiu Chuang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > > Currently for ServerKillingMonkeyFactory, actions and policies have hardcoded > timeouts. > {code:java} > Action[] actions1 = new Action[] { > new RestartRandomRsExceptMetaAction(6), > new RestartActiveMasterAction(5000), > // only allow 2 servers to be dead > new RollingBatchRestartRsAction(5000, 1.0f, 2, true), > new ForceBalancerAction(), > new GracefulRollingRestartRsAction(gracefulRollingRestartTSSLeepTime), > new RollingBatchSuspendResumeRsAction(rollingBatchSuspendRSSleepTime, > rollingBatchSuspendtRSRatio) > }; {code} > and > {code:java} > return new PolicyBasedChaosMonkey(properties, util, > new CompositeSequentialPolicy(new DoActionsOncePolicy(60 * 1000, > actions1), > new PeriodicRandomActionPolicy(60 * 1000, actions1)), > new PeriodicRandomActionPolicy(60 * 1000, actions2)); > } {code} > We should allow these to be configurable too. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] HBASE-28419 Allow Action and Policies of ServerKillingMonkey to be configurable. [hbase]
jojochuang merged PR #5743: URL: https://github.com/apache/hbase/pull/5743 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] HBASE-28419 Allow Action and Policies of ServerKillingMonkey to be configurable. [hbase]
jojochuang commented on PR #5743: URL: https://github.com/apache/hbase/pull/5743#issuecomment-1995116868 Thanks @ndimiduk merging it to master branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files
[ https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826794#comment-17826794 ] Wellington Chevreuil commented on HBASE-27826: -- {quote} Like createLink(), deleteLink(), createReference(), deleteReference(), and so on. The SFT becomes responsible for listing the link and reference files among the store contents. Today we sometimes go directly to the filesystem for listing stores, still. We do direct filesystem access for making and discovering link and reference files. This is wrong. SFT should be the exclusive way we track and discover store contents. {quote} Yeah, I underestimated what splitFile meant, was thinking it merely as creating refs on each of the daughter regions. I agree, we should make SFT central point for these FS interactions. > Region split and merge time while offline is O(n) with respect to number of > store files > --- > > Key: HBASE-27826 > URL: https://issues.apache.org/jira/browse/HBASE-27826 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.4 >Reporter: Andrew Kyle Purtell >Priority: Major > > This is a significant availability issue when HFiles are on S3. = > HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed > the split and merge table procedure implementations to indirect through the > StoreFileTracker implementation when selecting HFiles to be merged or split, > rather than directly listing those using file system APIs. It also changed > the commit logic in HRegionFileSystem to add the link/ref files on resulting > split or merged regions to the StoreFileTracker. However, the creation of a > link file is still a filesystem operation and creating a “file” on S3 can > take well over a second. If, for example there are 20 store files in a > region, which is not uncommon, after the region is taken offline for a split > (or merge) it may require more than 20 seconds to create the link files > before the results can be brought back online, creating a severe availability > problem. Splits and merges are supposed to be fast, completing in less than a > second, certainly less than a few seconds. This has been true when HFiles are > stored on HDFS only because file creation operations there are nearly > instantaneous. > There are two issues but both can be handled with modifications to the store > file tracker interface and the file based store file tracker implementation. > When the file based store file file tracker is enabled the HFile links should > be virtual entities that only exist in the file manifest. We do not require > physical files in the filesystem to serve as links now. That is the magic of > the this file tracker, the manifest file replaces requirements to list the > filesystem. > Then, when splitting or merging, the HFile links should be collected into a > list and committed in one batch using a new FILE file tracker interface, > requiring only one update of the manifest file in S3, bringing the time > requirement for this operation to O(1) down from O[n]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files
[ https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771 ] Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 4:16 PM: -- {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. References and links are aspects of maintaining a directory of store file contents. SFT is the appropriate place to make design changes (in my opinion). And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. Like createLink(), deleteLink(), createReference(), deleteReference(), and so on. The SFT becomes responsible for listing the link and reference files among the store contents. Today we sometimes go directly to the filesystem for listing stores, still. This is wrong! SFT should be the exclusive way we track and discover store contents. Once references and links are concepts managed by SFT, we can have the different SFT implementations optimize for their design cases. When using the FileBasedStoreFileTracker we would not wait for up to a second or two when creating each link or reference in the S3 bucket, causing long offline times during splits proportional to the number of store files in the region. Instead imagine links and references are entries in the manifest, not real files. We don't take the cost of creating files in the S3 bucket, we only update the manifest, and that can be optimized further. We can gather all of the links and references we want to create into a list, and we submit them to SFT all at once, using an interface method that accepts an array or list of SFT mutations to perform in batch, so there is only one manifest update required, and then this aspect of splitting becomes O(1) in time. Regarding the DefaultStoreFileTracker, it maintains existing functionality. DefaultStoreFileTracker needs new methods for creating and managing links too, but they will be real link and reference files, they will maintain their current naming and structure, this will be fully compatible with existing stores. This amounts to refactoring some of the code in HFileLink and ReferenceFile into DefaultStoreFileTracker. This is our current thinking. A design doc will help clarify the proposals and discussion. was (Author: apurtell): {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. References and links are aspects of maintaining a directory of store file contents. SFT is the appropriate place to make design changes (in my opinion). And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. Like createLink(), deleteLink(), createReference(), deleteReference(), and so on. Once references and links are concepts managed by SFT, we can have the different SFT implementations optimize for their design cases. When using the FileBasedStoreFileTracker we would not wait for up to a second or two when creating each link or reference in the S3 bucket, causing long offline times during splits proportional to the number of store files in the region. Instead imagine links and references are entries in the manifest, not real files. We don't take the cost of creating files in the S3 bucket, we only update the manifest, and that can be optimized further. We can gather all of the links and references we want to create into a list, and we submit them to SFT all at once, using an interface method that accepts an array or list of SFT mutations to perform in batch, so there is only one manifest update required, and then this aspect of splitting becomes O(1) in time. Regarding the DefaultStoreFileTracker, it maintains existing functionality. DefaultStoreFileTracker needs new methods for creating and managing links too, but they will be real link and reference files, they will maintain their current naming and structure, this will be fully compatible with existing stores.
[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files
[ https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771 ] Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 4:17 PM: -- {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. References and links are aspects of maintaining a directory of store file contents. SFT is the appropriate place to make design changes (in my opinion). And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. Like createLink(), deleteLink(), createReference(), deleteReference(), and so on. The SFT becomes responsible for listing the link and reference files among the store contents. Today we sometimes go directly to the filesystem for listing stores, still. We do direct filesystem access for making and discovering link and reference files. This is wrong. SFT should be the exclusive way we track and discover store contents. Once references and links are concepts managed by SFT, we can have the different SFT implementations optimize for their design cases. When using the FileBasedStoreFileTracker we would not wait for up to a second or two when creating each link or reference in the S3 bucket, causing long offline times during splits proportional to the number of store files in the region. Instead imagine links and references are entries in the manifest, not real files. We don't take the cost of creating files in the S3 bucket, we only update the manifest, and that can be optimized further. We can gather all of the links and references we want to create into a list, and we submit them to SFT all at once, using an interface method that accepts an array or list of SFT mutations to perform in batch, so there is only one manifest update required, and then this aspect of splitting becomes O(1) in time. Regarding the DefaultStoreFileTracker, it maintains existing functionality. DefaultStoreFileTracker needs new methods for creating and managing links too, but they will be real link and reference files, they will maintain their current naming and structure, this will be fully compatible with existing stores. This amounts to refactoring some of the code in HFileLink and ReferenceFile into DefaultStoreFileTracker. This is our current thinking. A design doc will help clarify the proposals and discussion. was (Author: apurtell): {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. References and links are aspects of maintaining a directory of store file contents. SFT is the appropriate place to make design changes (in my opinion). And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. Like createLink(), deleteLink(), createReference(), deleteReference(), and so on. The SFT becomes responsible for listing the link and reference files among the store contents. Today we sometimes go directly to the filesystem for listing stores, still. This is wrong! SFT should be the exclusive way we track and discover store contents. Once references and links are concepts managed by SFT, we can have the different SFT implementations optimize for their design cases. When using the FileBasedStoreFileTracker we would not wait for up to a second or two when creating each link or reference in the S3 bucket, causing long offline times during splits proportional to the number of store files in the region. Instead imagine links and references are entries in the manifest, not real files. We don't take the cost of creating files in the S3 bucket, we only update the manifest, and that can be optimized further. We can gather all of the links and references we want to create into a list, and we submit them to SFT all at once, using an interface method that accepts an array or list of SFT mutations to perform in batch, so there is only one manifest update required, and then this aspect of
[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files
[ https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771 ] Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 4:14 PM: -- {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. References and links are aspects of maintaining a directory of store file contents. SFT is the appropriate place to make design changes (in my opinion). And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. Like createLink(), deleteLink(), createReference(), deleteReference(), and so on. Once references and links are concepts managed by SFT, we can have the different SFT implementations optimize for their design cases. When using the FileBasedStoreFileTracker we would not wait for up to a second or two when creating each link or reference in the S3 bucket, causing long offline times during splits proportional to the number of store files in the region. Instead imagine links and references are entries in the manifest, not real files. We don't take the cost of creating files in the S3 bucket, we only update the manifest, and that can be optimized further. We can gather all of the links and references we want to create into a list, and we submit them to SFT all at once, using an interface method that accepts an array or list of SFT mutations to perform in batch, so there is only one manifest update required, and then this aspect of splitting becomes O(1) in time. Regarding the DefaultStoreFileTracker, it maintains existing functionality. DefaultStoreFileTracker needs new methods for creating and managing links too, but they will be real link and reference files, they will maintain their current naming and structure, this will be fully compatible with existing stores. This amounts to refactoring some of the code in HFileLink and ReferenceFile into DefaultStoreFileTracker. This is our current thinking. A design doc will help clarify the proposals and discussion. was (Author: apurtell): {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. References and links are aspects of maintaining a directory of store file contents. SFT is the appropriate place to make design changes (in my opinion). And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. Like createLink(), deleteLink(), createReference(), deleteReference(), and so on. Once references and links are concepts managed by SFT, we can have the different SFT implementations optimize for their design cases. When using the FileBasedStoreFileTracker we would not wait for up to a second or two when creating each link or reference in the S3 bucket, causing long offline times during splits proportional to the number of store files in the region. Instead imagine we gather all of the links and references we want to create into a list, and we submit them to SFT all at once, using an interface method that accepts an array or list of SFT mutations to perform in batch, so there is only one manifest update required, and then this aspect of splitting becomes O(1) in time. Regarding the DefaultStoreFileTracker, it maintains existing functionality. DefaultStoreFileTracker needs new methods for creating and managing links too, but they will be real link and reference files, they will maintain their current naming and structure, this will be fully compatible with existing stores. This amounts to refactoring some of the code in HFileLink and ReferenceFile into DefaultStoreFileTracker. This is our current thinking. A design doc will help clarify the proposals and discussion. > Region split and merge time while offline is O(n) with respect to number of > store files > --- > > Key: HBASE-27826 > URL:
[jira] [Created] (HBASE-28440) Add support for using mapreduce sort in HFileOutputFormat2
Bryan Beaudreault created HBASE-28440: - Summary: Add support for using mapreduce sort in HFileOutputFormat2 Key: HBASE-28440 URL: https://issues.apache.org/jira/browse/HBASE-28440 Project: HBase Issue Type: Improvement Components: backuprestore Reporter: Bryan Beaudreault Currently HFileOutputFormat2 uses CellSortReducer, which attempts to sort all of the cells of a row in memory using a TreeSet. There is a warning in the javadoc "If lots of columns per row, it will use lots of memory sorting." This can be problematic for WALPlayer, which uses HFileOutputFormat2. You could have reasonably sized row which just gets lots of edits in the time period of WALs being replayed, and that would cause an OOM. We are seeing this in some cases with incremental backups. MapReduce has built-in sorting capabilities which are not limited to sorting in memory. It can spill to disk as necessary to sort very large datasets. We can get this capability in HFileOutputFormat2 with a couple changes: # Add support for a KeyOnlyCellComparable type as the map output key # When configured, use job.setSortComparatorClass(CellWritableComparator.class) and job.setReducerClass(PreSortedCellsReducer.class) # Update WALPlayer to have a mode which can output this new comparable instead of ImmutableBytesWritable CellWritableComparator exists already for the Import job, so there is some prior art. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files
[ https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771 ] Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 3:39 PM: -- {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. References and links are aspects of maintaining a directory of store file contents. SFT is the appropriate place to make design changes (in my opinion). And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. Like createLink(), deleteLink(), createReference(), deleteReference(), and so on. Once references and links are concepts managed by SFT, we can have the different SFT implementations optimize for their design cases. When using the FileBasedStoreFileTracker we would not wait for up to a second or two when creating each link or reference in the S3 bucket, causing long offline times during splits proportional to the number of store files in the region. Instead imagine we gather all of the links and references we want to create into a list, and we submit them to SFT all at once, using an interface method that accepts an array or list of SFT mutations to perform in batch, so there is only one manifest update required, and then this aspect of splitting becomes O(1) in time. Regarding the DefaultStoreFileTracker, it maintains existing functionality. DefaultStoreFileTracker needs new methods for creating and managing links too, but they will be real link and reference files, they will maintain their current naming and structure, this will be fully compatible with existing stores. This amounts to refactoring some of the code in HFileLink and ReferenceFile into DefaultStoreFileTracker. This is our current thinking. A design doc will help clarify the proposals and discussion. was (Author: apurtell): {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. Like createLink(), deleteLink(), createReference(), deleteReference(), and so on. Once references and links are concepts managed by SFT, we can have the different SFT implementations optimize for their design cases. When using the FileBasedStoreFileTracker we would not wait for up to a second or two when creating each link or reference in the S3 bucket, causing long offline times during splits proportional to the number of store files in the region. Instead imagine we gather all of the links and references we want to create into a list, and we submit them to SFT all at once, using an interface method that accepts an array or list of SFT mutations to perform in batch, so there is only one manifest update required, and then this aspect of splitting becomes O(1) in time. Regarding the DefaultStoreFileTracker, it maintains existing functionality. DefaultStoreFileTracker needs new methods for creating and managing links too, but they will be real link and reference files, they will maintain their current naming and structure, this will be fully compatible with existing stores. This amounts to refactoring some of the code in HFileLink and ReferenceFile into DefaultStoreFileTracker. This is our current thinking. A design doc will help clarify the proposals and discussion. > Region split and merge time while offline is O(n) with respect to number of > store files > --- > > Key: HBASE-27826 > URL: https://issues.apache.org/jira/browse/HBASE-27826 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.4 >Reporter: Andrew Kyle Purtell >Priority: Major > > This is a significant availability issue when HFiles are on S3. = > HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed > the
[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files
[ https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771 ] Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 3:37 PM: -- {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. Like createLink(), deleteLink(), createReference(), deleteReference(), and so on. Once references and links are concepts managed by SFT, we can have the different SFT implementations optimize for their design cases. When using the FileBasedStoreFileTracker we would not wait for up to a second or two when creating each link or reference in the S3 bucket, causing long offline times during splits proportional to the number of store files in the region. Instead imagine we gather all of the links and references we want to create into a list, and we submit them to SFT all at once, using an interface method that accepts an array or list of SFT mutations to perform in batch, so there is only one manifest update required, and then this aspect of splitting becomes O(1) in time. Regarding the DefaultStoreFileTracker, it maintains existing functionality. DefaultStoreFileTracker needs new methods for creating and managing links too, but they will be real link and reference files, they will maintain their current naming and structure, this will be fully compatible with existing stores. This amounts to refactoring some of the code in HFileLink and ReferenceFile into DefaultStoreFileTracker. This is our current thinking. A design doc will help clarify the proposals and discussion. was (Author: apurtell): {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. Like createLink(), deleteLink(), createReference(), deleteReference(), and so on. Once references and links are virtual concepts when using the FileBasedStoreFileTracker, we do not wait for up to a second or two when creating each link or reference in the S3 bucket, causing long offline times during splits proportional to the number of store files in the region. We can further optimize by gathering all of the links and references we want to create into a list and submitting them to SFT all at once using an interface method that accepts an array or list of SFT mutations to perform in batch, so there is only one manifest update required, and then this aspect of splitting becomes O(1) in time. Regarding the DefaultStoreFileTracker, it maintains existing functionality. DefaultStoreFileTracker needs new methods for creating and managing links too, but they will be real link and reference files, they will maintain their current naming and structure, this will be fully compatible with existing stores. This amounts to refactoring some of the code in HFileLink and ReferenceFile into DefaultStoreFileTracker. This is our current thinking. A design doc will help clarify the proposals and discussion. > Region split and merge time while offline is O(n) with respect to number of > store files > --- > > Key: HBASE-27826 > URL: https://issues.apache.org/jira/browse/HBASE-27826 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.4 >Reporter: Andrew Kyle Purtell >Priority: Major > > This is a significant availability issue when HFiles are on S3. = > HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed > the split and merge table procedure implementations to indirect through the > StoreFileTracker implementation when selecting HFiles to be merged or split, > rather than directly listing those using file system APIs. It also changed >
[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files
[ https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771 ] Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 3:36 PM: -- {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. Like createLink(), deleteLink(), createReference(), deleteReference(), and so on. Once references and links are virtual concepts when using the FileBasedStoreFileTracker, we do not wait for up to a second or two when creating each link or reference in the S3 bucket, causing long offline times during splits proportional to the number of store files in the region. We can further optimize by gathering all of the links and references we want to create into a list and submitting them to SFT all at once using an interface method that accepts an array or list of SFT mutations to perform in batch, so there is only one manifest update required, and then this aspect of splitting becomes O(1) in time. Regarding the DefaultStoreFileTracker, it maintains existing functionality. DefaultStoreFileTracker needs new methods for creating and managing links too, but they will be real link and reference files, they will maintain their current naming and structure, this will be fully compatible with existing stores. This amounts to refactoring some of the code in HFileLink and ReferenceFile into DefaultStoreFileTracker. This is our current thinking. A design doc will help clarify the proposals and discussion. was (Author: apurtell): {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. Once references and links are virtual concepts when using the FileBasedStoreFileTracker, we do not wait for up to a second or two when creating each link or reference in the S3 bucket, causing long offline times during splits proportional to the number of store files in the region. We can further optimize by gathering all of the links and references we want to create into a list and submitting them to SFT all at once by some method like SFT.createLink(HFileLink links[]), so there is only one manifest update required, and then this aspect of splitting becomes O(1) in time. Regarding the DefaultStoreFileTracker, it maintains existing functionality. DefaultStoreFileTracker needs new methods for creating and managing links too, but they will be real link and reference files, they will maintain their current naming and structure, this will be fully compatible with existing stores. This amounts to refactoring some of the code in HFileLink and ReferenceFile into DefaultStoreFileTracker. This is our current thinking. A design doc will help clarify the proposals and discussion. > Region split and merge time while offline is O(n) with respect to number of > store files > --- > > Key: HBASE-27826 > URL: https://issues.apache.org/jira/browse/HBASE-27826 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.4 >Reporter: Andrew Kyle Purtell >Priority: Major > > This is a significant availability issue when HFiles are on S3. = > HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed > the split and merge table procedure implementations to indirect through the > StoreFileTracker implementation when selecting HFiles to be merged or split, > rather than directly listing those using file system APIs. It also changed > the commit logic in HRegionFileSystem to add the link/ref files on resulting > split or merged regions to the StoreFileTracker. However, the creation of a > link file is still a filesystem operation and
[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files
[ https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771 ] Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 3:33 PM: -- {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. Once references and links are virtual concepts when using the FileBasedStoreFileTracker, we do not wait for up to a second or two when creating each link or reference in the S3 bucket, causing long offline times during splits proportional to the number of store files in the region. We can further optimize by gathering all of the links and references we want to create into a list and submitting them to SFT all at once by some method like SFT.createLink(HFileLink links[]), so there is only one manifest update required, and then this aspect of splitting becomes O(1) in time. Regarding the DefaultStoreFileTracker, it maintains existing functionality. DefaultStoreFileTracker needs new methods for creating and managing links too, but they will be real link and reference files, they will maintain their current naming and structure, this will be fully compatible with existing stores. This amounts to refactoring some of the code in HFileLink and ReferenceFile into DefaultStoreFileTracker. This is our current thinking. A design doc will help clarify the proposals and discussion. was (Author: apurtell): {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. Once references and links are virtual concepts when using the FileBasedStoreFileTracker, we do not wait for up to a second or two when creating each link or reference in the S3 bucket, causing long offline times during splits proportional to the number of store files in the region. We can further optimize by gathering all of the links and references we want to create into a list and submitting them to SFT all at once by some method like SFT.createLink(HFileLink links[]), so there is only one manifest update required, and then this aspect of splitting becomes O(1) in time. A design doc will help clarify the proposals and discussion. > Region split and merge time while offline is O(n) with respect to number of > store files > --- > > Key: HBASE-27826 > URL: https://issues.apache.org/jira/browse/HBASE-27826 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.4 >Reporter: Andrew Kyle Purtell >Priority: Major > > This is a significant availability issue when HFiles are on S3. = > HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed > the split and merge table procedure implementations to indirect through the > StoreFileTracker implementation when selecting HFiles to be merged or split, > rather than directly listing those using file system APIs. It also changed > the commit logic in HRegionFileSystem to add the link/ref files on resulting > split or merged regions to the StoreFileTracker. However, the creation of a > link file is still a filesystem operation and creating a “file” on S3 can > take well over a second. If, for example there are 20 store files in a > region, which is not uncommon, after the region is taken offline for a split > (or merge) it may require more than 20 seconds to create the link files > before the results can be brought back online, creating a severe availability > problem. Splits and merges are supposed to be fast, completing in less than a > second, certainly less than a few seconds. This has been true when HFiles are > stored on HDFS only because file creation operations there are nearly
[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files
[ https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771 ] Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 3:29 PM: -- {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. Once references and links are virtual concepts when using the FileBasedStoreFileTracker, we do not wait for up to a second or two when creating each link or reference in the S3 bucket, causing long offline times during splits proportional to the number of store files in the region. We can further optimize by gathering all of the links and references we want to create into a list and submitting them to SFT all at once by some method like SFT.createLink(HFileLink links[]), so there is only one manifest update required, and then this aspect of splitting becomes O(1) in time. A design doc will help clarify the proposals and discussion. was (Author: apurtell): {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. A design doc will help clarify the proposals and discussion. > Region split and merge time while offline is O(n) with respect to number of > store files > --- > > Key: HBASE-27826 > URL: https://issues.apache.org/jira/browse/HBASE-27826 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.4 >Reporter: Andrew Kyle Purtell >Priority: Major > > This is a significant availability issue when HFiles are on S3. = > HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed > the split and merge table procedure implementations to indirect through the > StoreFileTracker implementation when selecting HFiles to be merged or split, > rather than directly listing those using file system APIs. It also changed > the commit logic in HRegionFileSystem to add the link/ref files on resulting > split or merged regions to the StoreFileTracker. However, the creation of a > link file is still a filesystem operation and creating a “file” on S3 can > take well over a second. If, for example there are 20 store files in a > region, which is not uncommon, after the region is taken offline for a split > (or merge) it may require more than 20 seconds to create the link files > before the results can be brought back online, creating a severe availability > problem. Splits and merges are supposed to be fast, completing in less than a > second, certainly less than a few seconds. This has been true when HFiles are > stored on HDFS only because file creation operations there are nearly > instantaneous. > There are two issues but both can be handled with modifications to the store > file tracker interface and the file based store file tracker implementation. > When the file based store file file tracker is enabled the HFile links should > be virtual entities that only exist in the file manifest. We do not require > physical files in the filesystem to serve as links now. That is the magic of > the this file tracker, the manifest file replaces requirements to list the > filesystem. > Then, when splitting or merging, the HFile links should be collected into a > list and committed in one batch using a new FILE file tracker interface, > requiring only one update of the manifest file in S3, bringing the time > requirement for this operation to O(1) down from O[n]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files
[ https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771 ] Andrew Kyle Purtell commented on HBASE-27826: - {quote}We will define a splitFiles method in StoreFileTracker interface {quote} No. Split logic should remain in SplitTransaction. Breaking this encapsulation and diluting the split implementation does not seem like a good idea to me, but we could discuss it, if someone actually wants this. StoreFileTracker is a directory of store files. This concept is neatly extended to include management of reference and link files. And once SFT is managing references and links, they do not need to be real files, they can be virtual concepts maintained in the manifest. So SFT gets new additional methods for adding and removing references and links. A design doc will help clarify the proposals and discussion. > Region split and merge time while offline is O(n) with respect to number of > store files > --- > > Key: HBASE-27826 > URL: https://issues.apache.org/jira/browse/HBASE-27826 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.4 >Reporter: Andrew Kyle Purtell >Priority: Major > > This is a significant availability issue when HFiles are on S3. = > HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed > the split and merge table procedure implementations to indirect through the > StoreFileTracker implementation when selecting HFiles to be merged or split, > rather than directly listing those using file system APIs. It also changed > the commit logic in HRegionFileSystem to add the link/ref files on resulting > split or merged regions to the StoreFileTracker. However, the creation of a > link file is still a filesystem operation and creating a “file” on S3 can > take well over a second. If, for example there are 20 store files in a > region, which is not uncommon, after the region is taken offline for a split > (or merge) it may require more than 20 seconds to create the link files > before the results can be brought back online, creating a severe availability > problem. Splits and merges are supposed to be fast, completing in less than a > second, certainly less than a few seconds. This has been true when HFiles are > stored on HDFS only because file creation operations there are nearly > instantaneous. > There are two issues but both can be handled with modifications to the store > file tracker interface and the file based store file tracker implementation. > When the file based store file file tracker is enabled the HFile links should > be virtual entities that only exist in the file manifest. We do not require > physical files in the filesystem to serve as links now. That is the magic of > the this file tracker, the manifest file replaces requirements to list the > filesystem. > Then, when splitting or merging, the HFile links should be collected into a > list and committed in one batch using a new FILE file tracker interface, > requiring only one update of the manifest file in S3, bringing the time > requirement for this operation to O(1) down from O[n]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files
[ https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826761#comment-17826761 ] Wellington Chevreuil commented on HBASE-27826: -- Thanks for the headsup, [~zhangduo] and for picking this up, [~prathyu6]! Summarising my understanding from the discussion: 1) We will define a splitFiles method in StoreFileTracker interface, so that everywhere we do split logic currently (like in SplitTableRegionProcedure) would now delegate to the StoreFileTracker implementation; 2) DefaultStoreFileTracker implementation would still create actual ref/link files in the split daughter regions, whilst FileBasedTracker impl would keep a link/ref at its metadata only. 3) We would need to change the format for the meta files of FileBasedTracker to include the parent region location for split daughters "inherited" files. Seems reasonable to me, looking forward for the design doc/initial PR. > Region split and merge time while offline is O(n) with respect to number of > store files > --- > > Key: HBASE-27826 > URL: https://issues.apache.org/jira/browse/HBASE-27826 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.4 >Reporter: Andrew Kyle Purtell >Priority: Major > > This is a significant availability issue when HFiles are on S3. = > HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed > the split and merge table procedure implementations to indirect through the > StoreFileTracker implementation when selecting HFiles to be merged or split, > rather than directly listing those using file system APIs. It also changed > the commit logic in HRegionFileSystem to add the link/ref files on resulting > split or merged regions to the StoreFileTracker. However, the creation of a > link file is still a filesystem operation and creating a “file” on S3 can > take well over a second. If, for example there are 20 store files in a > region, which is not uncommon, after the region is taken offline for a split > (or merge) it may require more than 20 seconds to create the link files > before the results can be brought back online, creating a severe availability > problem. Splits and merges are supposed to be fast, completing in less than a > second, certainly less than a few seconds. This has been true when HFiles are > stored on HDFS only because file creation operations there are nearly > instantaneous. > There are two issues but both can be handled with modifications to the store > file tracker interface and the file based store file tracker implementation. > When the file based store file file tracker is enabled the HFile links should > be virtual entities that only exist in the file manifest. We do not require > physical files in the filesystem to serve as links now. That is the magic of > the this file tracker, the manifest file replaces requirements to list the > filesystem. > Then, when splitting or merging, the HFile links should be collected into a > list and committed in one batch using a new FILE file tracker interface, > requiring only one update of the manifest file in S3, bringing the time > requirement for this operation to O(1) down from O[n]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files
[ https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826069#comment-17826069 ] Prathyusha edited comment on HBASE-27826 at 3/13/24 2:05 PM: - Sure [~zhangduo] and > And about the split time, how many store files do you have when testing? I > think we can gain more if there are more store files. Yes, here I had a single region with 1 column family and 20 store files, and multithreading was enabled for creating ref/link files, but yes a dev cluster with no other load on it was (Author: prathyu6): Sure [~zhangduo] and > And about the split time, how many store files do you have when testing? I > think we can gain more if there are more store files. Yes, here I had a single region with 1 column family and 20 store files, and multithreading was enabled creating ref/link files, but yes a dev cluster with no other load on it > Region split and merge time while offline is O(n) with respect to number of > store files > --- > > Key: HBASE-27826 > URL: https://issues.apache.org/jira/browse/HBASE-27826 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.4 >Reporter: Andrew Kyle Purtell >Priority: Major > > This is a significant availability issue when HFiles are on S3. = > HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed > the split and merge table procedure implementations to indirect through the > StoreFileTracker implementation when selecting HFiles to be merged or split, > rather than directly listing those using file system APIs. It also changed > the commit logic in HRegionFileSystem to add the link/ref files on resulting > split or merged regions to the StoreFileTracker. However, the creation of a > link file is still a filesystem operation and creating a “file” on S3 can > take well over a second. If, for example there are 20 store files in a > region, which is not uncommon, after the region is taken offline for a split > (or merge) it may require more than 20 seconds to create the link files > before the results can be brought back online, creating a severe availability > problem. Splits and merges are supposed to be fast, completing in less than a > second, certainly less than a few seconds. This has been true when HFiles are > stored on HDFS only because file creation operations there are nearly > instantaneous. > There are two issues but both can be handled with modifications to the store > file tracker interface and the file based store file tracker implementation. > When the file based store file file tracker is enabled the HFile links should > be virtual entities that only exist in the file manifest. We do not require > physical files in the filesystem to serve as links now. That is the magic of > the this file tracker, the manifest file replaces requirements to list the > filesystem. > Then, when splitting or merging, the HFile links should be collected into a > list and committed in one batch using a new FILE file tracker interface, > requiring only one update of the manifest file in S3, bringing the time > requirement for this operation to O(1) down from O[n]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files
[ https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826069#comment-17826069 ] Prathyusha commented on HBASE-27826: Sure [~zhangduo] and > And about the split time, how many store files do you have when testing? I > think we can gain more if there are more store files. Yes, here I had a single region with 1 column family and 20 store files, and multithreading was enabled creating ref/link files, but yes a dev cluster with no other load on it > Region split and merge time while offline is O(n) with respect to number of > store files > --- > > Key: HBASE-27826 > URL: https://issues.apache.org/jira/browse/HBASE-27826 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.4 >Reporter: Andrew Kyle Purtell >Priority: Major > > This is a significant availability issue when HFiles are on S3. = > HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed > the split and merge table procedure implementations to indirect through the > StoreFileTracker implementation when selecting HFiles to be merged or split, > rather than directly listing those using file system APIs. It also changed > the commit logic in HRegionFileSystem to add the link/ref files on resulting > split or merged regions to the StoreFileTracker. However, the creation of a > link file is still a filesystem operation and creating a “file” on S3 can > take well over a second. If, for example there are 20 store files in a > region, which is not uncommon, after the region is taken offline for a split > (or merge) it may require more than 20 seconds to create the link files > before the results can be brought back online, creating a severe availability > problem. Splits and merges are supposed to be fast, completing in less than a > second, certainly less than a few seconds. This has been true when HFiles are > stored on HDFS only because file creation operations there are nearly > instantaneous. > There are two issues but both can be handled with modifications to the store > file tracker interface and the file based store file tracker implementation. > When the file based store file file tracker is enabled the HFile links should > be virtual entities that only exist in the file manifest. We do not require > physical files in the filesystem to serve as links now. That is the magic of > the this file tracker, the manifest file replaces requirements to list the > filesystem. > Then, when splitting or merging, the HFile links should be collected into a > list and committed in one batch using a new FILE file tracker interface, > requiring only one update of the manifest file in S3, bringing the time > requirement for this operation to O(1) down from O[n]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28439) Remove ZooKeeper as a means of creating a client connection
Nick Dimiduk created HBASE-28439: Summary: Remove ZooKeeper as a means of creating a client connection Key: HBASE-28439 URL: https://issues.apache.org/jira/browse/HBASE-28439 Project: HBase Issue Type: Task Components: Client Affects Versions: 4.0.0-alpha-1 Reporter: Nick Dimiduk Following up the discussion and decision around HBASE-23324, we will remove ZooKeeper as a point of entry for client connections. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-28260) Possible data loss in WAL after RegionServer crash
[ https://issues.apache.org/jira/browse/HBASE-28260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825924#comment-17825924 ] Hudson commented on HBASE-28260: Results for branch branch-2 [build #1011 on builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1011/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1011/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1011/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1011/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1011/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Possible data loss in WAL after RegionServer crash > -- > > Key: HBASE-28260 > URL: https://issues.apache.org/jira/browse/HBASE-28260 > Project: HBase > Issue Type: Bug >Reporter: Bryan Beaudreault >Assignee: Charles Connell >Priority: Major > Labels: pull-request-available > Fix For: 2.6.0, 3.0.0-beta-2, 2.5.9 > > > We recently had a production incident: > # RegionServer crashes, but local DataNode lives on > # WAL lease recovery kicks in > # Namenode reconstructs the block during lease recovery (which results in a > new genstamp). It chooses the replica on the local DataNode as the primary. > # Local DataNode reconstructs the block, so NameNode registers the new > genstamp. > # Local DataNode and the underlying host dies, before the new block could be > replicated to other replicas. > This leaves us with a missing block, because the new genstamp block has no > replicas. The old replicas still remain, but are considered corrupt due to > GENSTAMP_MISMATCH. > Thankfully we were able to confirm that the length of the corrupt blocks were > identical to the newly constructed and lost block. Further, the file in > question was only 1 block. So we downloaded one of those corrupt block files > and hdfs {{hdfs dfs -put -f}} to force that block to replace the file in > hdfs. So in this case we had no actual data loss, but it could have happened > easily if the file was more than 1 block or the replicas weren't fully in > sync prior to reconstruction. > In order to avoid this issue, we should avoid writing WAL blocks too the > local datanode. We can use CreateFlag.NO_WRITE_LOCAL for this. Hat tip to > [~weichiu] for pointing this out. > During reading of WALs we already reorder blocks so as to avoid reading from > the local datanode, but avoiding writing there altogether would be better. -- This message was sent by Atlassian Jira (v8.20.10#820010)