[DISCUSS] Blockers for 2.6.1

2024-06-05 Thread Bryan Beaudreault
Hey all,

It's been 2 weeks since 2.6.0 was released. As discussed in the vote
thread, there were a few outstanding backup-related issues. I believe we've
made some progress on some of those.

I'd like to start compiling a list of important backup-related fixes to
target for the 2.6.1 release so that we can track progress. Can those of
you who are involved (Ray Mattingly, Nick Dimiduck, Dieter De Paepe & team,
and any others) please list any important jiras here?

With a list of jiras in hand, we can check to make sure blockers &
fixVersions are set and use that to track what we need to drill down before
releasing.

Here's what I know of so far, let me know if I'm missing anything:

Not yet started:
- HBASE-28084: incremental backups should be forbidden after deleting
backups
- HBASE-28602: Incremental backup fails when WALs move
- HBASE-28462: (similar to ^, but in a different part of the code)
- HBASE-28538: BackupHFileCleaner is very expensive

Patch available:
- HBASE-28539: backup merging does not work when using cloud storage as
filesystem
- HBASE-28562: another possible failure cause for incremental backups +
possibly cause of overly big backup metadata

Resolved:
- HBASE-28502: backed up tables are not listed correctly in backup
metadata, which causes unreliable backup validation
- HBASE-28568: the set of tables included in incremental backups might be
too big


Re: [ANNOUNCE] New HBase committer Andor Molnár

2024-05-29 Thread Bryan Beaudreault
Congrats and welcome!

On Wed, May 29, 2024 at 5:06 PM Viraj Jasani  wrote:

> Congratulations and Welcome, Andor! Well deserved!!
>
>
> On Wed, May 29, 2024 at 7:36 AM Duo Zhang  wrote:
>
> > On behalf of the Apache HBase PMC, I am pleased to announce that
> > Andor Molnár(andor) has accepted the PMC's invitation to become a
> > committer on the project. We appreciate all of Andor Molnár's
> > generous contributions thus far and look forward to his continued
> > involvement.
> >
> > Congratulations and welcome, Andor Molnár!
> >
> > 我很高兴代表 Apache HBase PMC 宣布 Andor Molnár 已接受我们的邀请,成
> > 为 Apache HBase 项目的 Committer。感谢 Andor Molnár 一直以来为 HBase 项目
> > 做出的贡献,并期待他在未来继续承担更多的责任。
> >
> > 欢迎 Andor Molnár!
> >
>


[jira] [Created] (HBASE-28624) Docs around configuring backups can lead to unexpectedly disabling other features

2024-05-28 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28624:
-

 Summary: Docs around configuring backups can lead to unexpectedly 
disabling other features
 Key: HBASE-28624
 URL: https://issues.apache.org/jira/browse/HBASE-28624
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault


In our documentation for enabling backups, we suggest that the user set the 
following:
{code:java}

  hbase.master.logcleaner.plugins
  org.apache.hadoop.hbase.backup.master.BackupLogCleaner,...


  hbase.master.hfilecleaner.plugins
  org.apache.hadoop.hbase.backup.BackupHFileCleaner,...
 {code}
A naive user will set these and not know what to do about the ",..." part. In 
doing so, they will unexpectedly be disabling all of the default cleaners we 
have. For example here are the defaults:
{code:java}

  hbase.master.logcleaner.plugins
  
org.apache.hadoop.hbase.master.cleaner.TimeToLiveLogCleaner,org.apache.hadoop.hbase.master.cleaner.TimeToLiveProcedureWALCleaner,org.apache.hadoop.hbase.master.cleaner.TimeToLiveMasterLocalStoreWALCleaner


  hbase.master.hfilecleaner.plugins
  
org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner,org.apache.hadoop.hbase.master.cleaner.TimeToLiveMasterLocalStoreHFileCleaner
 {code}
So basically disabling support for hbase.master.logcleaner.ttl and 
hbase.master.hfilecleaner.ttl.

There exists a method BackupManager.decorateMasterConfiguration and 
BackupManager.decorateRegionServerConfiguration. They are currently javadoc'd 
as being for tests only, but I think we should call these in HMaster and 
HRegionServer. Then we can only require the user to set "hbase.backup.enable" 
and very much simplify our docs here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Make the AccessChecker configurable

2024-05-24 Thread Bryan Beaudreault
Here is the jira I was referring to where we added coprocessor hooks for
the cases you saw: https://issues.apache.org/jira/browse/HBASE-26268

On Fri, May 24, 2024 at 12:36 PM Bryan Beaudreault 
wrote:

> I agree with Duo that you'd want to write a new coprocessor. AccessChecker
> is just a class used by AccessController, a coprocessor that hooks almost
> every RPC action a user/system could make. You can provide a new
> OpenPolicyAgentCoprocessor which does similar. A user could decide to use
> AccessController or OpenPolicyAgentCoprocessor.
>
> You're right that there are a couple direct calls to AccessChecker, but we
> should wrap those with coprocessor instead. I thought we actually did that
> recently
>
> On Fri, May 24, 2024 at 11:54 AM Lars Francke 
> wrote:
>
>> Thanks Duo.
>>
>> Yeah, they do that but I believe there are some calls to AccessChecker
>> inside of the RegionServer that do not go through the Coprocessor but
>> use the AccessChecker directly mostly to check for Admin privileges
>> (for example when updating configuration) and we thought it'd be
>> useful to capture those as well.
>>
>> But...there's a very good chance we might be missing something as
>> well, yeah. So, I'm also happy to be told I'm wrong :)
>>
>>
>> On Fri, May 24, 2024 at 5:01 PM 张铎(Duo Zhang) 
>> wrote:
>> >
>> > Something like ranger?
>> >
>> > I think ranger just implements its own authorization by HBase
>> coprocessor
>> >
>> > https://github.com/apache/ranger/tree/master/hbase-agent
>> >
>> > Lars Francke  于2024年5月24日周五 22:54写道:
>> > >
>> > > Hi,
>> > >
>> > > we'd like to implement a way to use authorization information from
>> > > Open Policy Agent[1]. We already do the same for HDFS, Trino and a few
>> > > other tools.
>> > >
>> > > It's been a while since I dug into the internals on this one but it
>> > > seems as if we're missing a piece that's needed and that is a plugin
>> > > point to change the actual implementation class for the AccessChecker.
>> > > We'd need to override that.
>> > >
>> > > Before I start working on it and open an issue I wanted to ask for
>> opinions.
>> > > We'd probably want to refactor AccessChecker to be an interface
>> > > instead of an actual class but that is also optional and can be
>> > > discussed later.
>> > >
>> > > For now I'd love to know if we're missing a plugin point that we can
>> > > use already today but it looks very hardcoded and if the idea of
>> > > making AccessChecker pluggable is a useful one we can pursue?
>> > >
>> > > Thanks,
>> > > Lars
>> > >
>> > > [1] <https://www.openpolicyagent.org/>
>>
>


Re: Make the AccessChecker configurable

2024-05-24 Thread Bryan Beaudreault
I agree with Duo that you'd want to write a new coprocessor. AccessChecker
is just a class used by AccessController, a coprocessor that hooks almost
every RPC action a user/system could make. You can provide a new
OpenPolicyAgentCoprocessor which does similar. A user could decide to use
AccessController or OpenPolicyAgentCoprocessor.

You're right that there are a couple direct calls to AccessChecker, but we
should wrap those with coprocessor instead. I thought we actually did that
recently

On Fri, May 24, 2024 at 11:54 AM Lars Francke 
wrote:

> Thanks Duo.
>
> Yeah, they do that but I believe there are some calls to AccessChecker
> inside of the RegionServer that do not go through the Coprocessor but
> use the AccessChecker directly mostly to check for Admin privileges
> (for example when updating configuration) and we thought it'd be
> useful to capture those as well.
>
> But...there's a very good chance we might be missing something as
> well, yeah. So, I'm also happy to be told I'm wrong :)
>
>
> On Fri, May 24, 2024 at 5:01 PM 张铎(Duo Zhang) 
> wrote:
> >
> > Something like ranger?
> >
> > I think ranger just implements its own authorization by HBase coprocessor
> >
> > https://github.com/apache/ranger/tree/master/hbase-agent
> >
> > Lars Francke  于2024年5月24日周五 22:54写道:
> > >
> > > Hi,
> > >
> > > we'd like to implement a way to use authorization information from
> > > Open Policy Agent[1]. We already do the same for HDFS, Trino and a few
> > > other tools.
> > >
> > > It's been a while since I dug into the internals on this one but it
> > > seems as if we're missing a piece that's needed and that is a plugin
> > > point to change the actual implementation class for the AccessChecker.
> > > We'd need to override that.
> > >
> > > Before I start working on it and open an issue I wanted to ask for
> opinions.
> > > We'd probably want to refactor AccessChecker to be an interface
> > > instead of an actual class but that is also optional and can be
> > > discussed later.
> > >
> > > For now I'd love to know if we're missing a plugin point that we can
> > > use already today but it looks very hardcoded and if the idea of
> > > making AccessChecker pluggable is a useful one we can pursue?
> > >
> > > Thanks,
> > > Lars
> > >
> > > [1] 
>


Re: [ANNOUNCE] Apache HBase 2.6.0 is now available for download

2024-05-20 Thread Bryan Beaudreault
Haha yes, I noticed that…after sending. Hopefully the title and various
other mentions of 2.6.0 can suffice this time :)

On Mon, May 20, 2024 at 10:08 PM 张铎(Duo Zhang) 
wrote:

> Congratulations!
>
> But it seems you missed the replacement for the first '_version_'
> placeholder...
>
> Bryan Beaudreault  于2024年5月21日周二 00:44写道:
> >
> > The HBase team is happy to announce the immediate availability of HBase
> > _version_.
> >
> > Apache HBase™ is an open-source, distributed, versioned, non-relational
> > database.
> > Apache HBase gives you low latency random access to billions of rows with
> > millions of columns atop non-specialized hardware. To learn more about
> > HBase,
> > see https://hbase.apache.org/.
> >
> > HBase 2.6.0 is the 1st release in the HBase 2.6.x line, which aims to
> > improve the stability and reliability of HBase. This release includes
> > roughly
> > 560 resolved issues not covered by previous 2.x releases.
> >
> > Notable new features include:
> > - Built-in support for full and incremental backups
> > - Built-in support for TLS encryption and authentication
> > - Erasure Coding support
> > - Various improvements to Quotas
> >
> > The full list of issues can be found here:
> >
> > CHANGELOG: https://downloads.apache.org/hbase/2.6.0/CHANGES.md
> > RELEASENOTES: https://downloads.apache.org/hbase/2.6.0/RELEASENOTES.md
> >
> > or via our issue tracker:
> >   https://issues.apache.org/jira/projects/HBASE/versions/12350930
> >
> > To download please follow the links and instructions on our website:
> >
> > https://hbase.apache.org/downloads.html
> >
> > Question, comments, and problems are always welcome at:
> >   dev@hbase.apache.org
> >   u...@hbase.apache.org
> >   user...@hbase.apache.org
> >
> > Thanks to all who contributed and made this release possible.
> >
> > Cheers,
> > The HBase Dev Team
>


[jira] [Resolved] (HBASE-28228) Release 2.6.0

2024-05-20 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28228.
---
Resolution: Done

2.6.0 has been released

> Release 2.6.0
> -
>
> Key: HBASE-28228
> URL: https://issues.apache.org/jira/browse/HBASE-28228
> Project: HBase
>  Issue Type: Umbrella
>    Reporter: Bryan Beaudreault
>    Assignee: Bryan Beaudreault
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28603) Finish 2.6.0 release

2024-05-20 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28603.
---
Resolution: Done

> Finish 2.6.0 release
> 
>
> Key: HBASE-28603
> URL: https://issues.apache.org/jira/browse/HBASE-28603
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Bryan Beaudreault
>Priority: Major
>
> # Release the artifacts on repository.apache.org
>  # Move the binaries from dist-dev to dist-release
>  # Add xml to download page (via HBASE-28236)
>  # Push tag 2.6.0RC4 as tag rel/2.6.0
>  # Release 2.6.0 on JIRA 
> [https://issues.apache.org/jira/projects/HBASE/versions/12353291]
>  # Add release data on [https://reporter.apache.org/addrelease.html?hbase]
>  # Send announcement email



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28236) Add 2.6.0 to downloads page

2024-05-20 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28236.
---
Resolution: Fixed

> Add 2.6.0 to downloads page
> ---
>
> Key: HBASE-28236
> URL: https://issues.apache.org/jira/browse/HBASE-28236
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Bryan Beaudreault
>    Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28232) Add release manager for 2.6 in ref guide

2024-05-20 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28232.
---
Resolution: Fixed

> Add release manager for 2.6 in ref guide
> 
>
> Key: HBASE-28232
> URL: https://issues.apache.org/jira/browse/HBASE-28232
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Bryan Beaudreault
>    Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[ANNOUNCE] Apache HBase 2.6.0 is now available for download

2024-05-20 Thread Bryan Beaudreault
The HBase team is happy to announce the immediate availability of HBase
_version_.

Apache HBase™ is an open-source, distributed, versioned, non-relational
database.
Apache HBase gives you low latency random access to billions of rows with
millions of columns atop non-specialized hardware. To learn more about
HBase,
see https://hbase.apache.org/.

HBase 2.6.0 is the 1st release in the HBase 2.6.x line, which aims to
improve the stability and reliability of HBase. This release includes
roughly
560 resolved issues not covered by previous 2.x releases.

Notable new features include:
- Built-in support for full and incremental backups
- Built-in support for TLS encryption and authentication
- Erasure Coding support
- Various improvements to Quotas

The full list of issues can be found here:

CHANGELOG: https://downloads.apache.org/hbase/2.6.0/CHANGES.md
RELEASENOTES: https://downloads.apache.org/hbase/2.6.0/RELEASENOTES.md

or via our issue tracker:
  https://issues.apache.org/jira/projects/HBASE/versions/12350930

To download please follow the links and instructions on our website:

https://hbase.apache.org/downloads.html

Question, comments, and problems are always welcome at:
  dev@hbase.apache.org
  u...@hbase.apache.org
  user...@hbase.apache.org

Thanks to all who contributed and made this release possible.

Cheers,
The HBase Dev Team


[jira] [Created] (HBASE-28603) Finish 2.6.0 release

2024-05-17 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28603:
-

 Summary: Finish 2.6.0 release
 Key: HBASE-28603
 URL: https://issues.apache.org/jira/browse/HBASE-28603
 Project: HBase
  Issue Type: Sub-task
Reporter: Bryan Beaudreault


# Release the artifacts on repository.apache.org
 # Move the binaries from dist-dev to dist-release
 # Add xml to download page
 # Push tag 2.6.0RC4 as tag rel/2.6.0
 # Release 2.6.0 on JIRA 
[https://issues.apache.org/jira/projects/HBASE/versions/12353291]
 # Add release data on [https://reporter.apache.org/addrelease.html?hbase]
 # Send announcement email



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28237) Set version to 2.6.1-SNAPSHOT for branch-2.6

2024-05-17 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28237.
---
Resolution: Done

This is handled by automation so probably didn't need to be a jira

> Set version to 2.6.1-SNAPSHOT for branch-2.6
> 
>
> Key: HBASE-28237
> URL: https://issues.apache.org/jira/browse/HBASE-28237
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Bryan Beaudreault
>    Assignee: Bryan Beaudreault
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28233) Run ITBLL for branch-2.6

2024-05-17 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28233.
---
Resolution: Done

> Run ITBLL for branch-2.6
> 
>
> Key: HBASE-28233
> URL: https://issues.apache.org/jira/browse/HBASE-28233
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Bryan Beaudreault
>    Assignee: Bryan Beaudreault
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28235) Put up 2.6.0RC0

2024-05-17 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28235.
---
Resolution: Done

Ended up going to RC4, which has now passed

> Put up 2.6.0RC0
> ---
>
> Key: HBASE-28235
> URL: https://issues.apache.org/jira/browse/HBASE-28235
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Bryan Beaudreault
>    Assignee: Bryan Beaudreault
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28234) Set version as 2.6.0 in branch-2.6 in prep for first RC

2024-05-17 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28234.
---
Resolution: Done

> Set version as 2.6.0 in branch-2.6 in prep for first RC
> ---
>
> Key: HBASE-28234
> URL: https://issues.apache.org/jira/browse/HBASE-28234
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Bryan Beaudreault
>    Assignee: Bryan Beaudreault
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] The third release candidate for 2.6.0, RC4, is available

2024-05-17 Thread Bryan Beaudreault
With 4 binding +1, 1 non-binding +1, and no binding -1, the vote passes!
Thank you everyone for voting.

I want to give a special thanks to Dieter for raising his non-binding -1.
The vote is acknowledged, and we will try to prioritize reviewing & merging
the issues he raised so that we can do a 2.6.1 in the near future. As
mentioned, the issues are important but non-blocker because of the
experimental nature of backups and the necessity to get an initial release
out for all the other non-backup users waiting on 2.6.0. We've already
started making some progress on those issues since he raised it.

I'll try to get the release published shortly.

On Fri, May 17, 2024 at 9:49 AM Bryan Beaudreault 
wrote:

> Here's my own binding +1:
>
> * Signature: ok
>
> * Checksum : ok
>
> * Rat check (11.0.18): ok
>
>  - mvn clean apache-rat:check
>
> * Built from source (11.0.18): ok
>
>  - mvn clean install -DskipTests
>
> * Unit tests pass (11.0.18): ok
>
>  - mvn package -P runSmallTests
>
>
> I've also run this release through LTT and Chaos Money, and we've
> installed it on several load-bearing clusters without issues.
>
> On Tue, May 7, 2024 at 11:33 AM Nick Dimiduk  wrote:
>
>> +1 (binding)
>>
>> * Signature: ok
>> * Checksum : ok
>> * Rat check (1.8.0_412): ok
>>  - mvn clean apache-rat:check
>> * Built from source (1.8.0_412): ok
>>  - mvn clean install  -DskipTests
>> * Built from source (11.0.23): ok
>>  - mvn clean install -D hadoop.profile=3.0 -DskipTests
>> * Built from source (17.0.11): ok
>>  - mvn clean install -D hadoop.profile=3.0 -DskipTests
>> * Built from source (21.0.3): ok
>>  - mvn clean install -D hadoop.profile=3.0 -DskipTests
>> * Compatibility report: okay
>>  - I agree with Duo's assessment of the changes
>>  - filed HBASE-28573 to exclude the shaded package in the future
>> * Reviewed the existing issues with incremental backups as flagged by
>> Dieter.
>> * Exercised the staged repositories using github.com:
>> ndimiduk/hbase-downstreamer.git
>>  - (JDK8 + Hadoop2): ok
>>   - mvn -Dhbase.2.version=2.6.0 -Dhbase.staging.repository=
>> https://repository.apache.org/content/repositories/orgapachehbase-1542
>> -Dhadoop.2.version=2.10.2
>> <https://repository.apache.org/content/repositories/orgapachehbase-1542-Dhadoop.2.version=2.10.2>
>> clean package
>> -Dmaven.repo.local=$(pwd)/m2_repository clean package
>>  - (JDK11 + Hadoop3): ok
>>   - mvn -Dhbase.2.version=2.6.0-hadoop3 -Dhbase.staging.repository=
>> https://repository.apache.org/content/repositories/orgapachehbase-1543
>> -Dhadoop.3.version=3.3.5
>> <https://repository.apache.org/content/repositories/orgapachehbase-1543-Dhadoop.3.version=3.3.5>
>> -Dmaven.repo.local=$(pwd)/m2_repository clean
>> package
>> * Run LTT against local mode (JDK11)
>>  - ./bin/hbase ltt -num_keys 500 -write 3:512 -read 100 -multiput
>>  - web UI looks good
>>  - There's a race condition related to closing down zookeeper connections
>> when shutting down the local mode master process
>>
>>
>> On Mon, Apr 29, 2024 at 9:20 PM Bryan Beaudreault <
>> bbeaudrea...@apache.org>
>> wrote:
>>
>> > Please vote on this Apache hbase release candidate,
>> > hbase-2.6.0RC4
>> >
>> > The VOTE will remain open for at least 72 hours.
>> >
>> > [ ] +1 Release this package as Apache hbase 2.6.0
>> > [ ] -1 Do not release this package because ...
>> >
>> > The tag to be voted on is 2.6.0RC4:
>> >
>> >   https://github.com/apache/hbase/tree/2.6.0RC4
>> >
>> > This tag currently points to git reference
>> >
>> >   de99f8754135ea69adc39da48d2bc2b2710a5366
>> >
>> > The release files, including signatures, digests, as well as CHANGES.md
>> > and RELEASENOTES.md included in this RC can be found at:
>> >
>> >   https://dist.apache.org/repos/dist/dev/hbase/2.6.0RC4/
>> >
>> > Maven artifacts are available in a staging repository at:
>> >
>> >
>> https://repository.apache.org/content/repositories/orgapachehbase-1542/
>> >
>> > Maven artifacts for hadoop3 are available in a staging repository at:
>> >
>> >
>> https://repository.apache.org/content/repositories/orgapachehbase-1543/
>> >
>> > Artifacts were signed with the 0x74EFF462 key which can be found in:
>> >
>> >   https://downloads.apache.org/hbase/KEYS
>> >
>> > To learn more about Apache hbase, please see
>> >
>> >   http://hbase.apache.org/
>> >
>> > Thanks,
>> > Your HBase Release Manager
>> >
>>
>


Re: [VOTE] The third release candidate for 2.6.0, RC4, is available

2024-05-17 Thread Bryan Beaudreault
Here's my own binding +1:

* Signature: ok

* Checksum : ok

* Rat check (11.0.18): ok

 - mvn clean apache-rat:check

* Built from source (11.0.18): ok

 - mvn clean install -DskipTests

* Unit tests pass (11.0.18): ok

 - mvn package -P runSmallTests


I've also run this release through LTT and Chaos Money, and we've installed
it on several load-bearing clusters without issues.

On Tue, May 7, 2024 at 11:33 AM Nick Dimiduk  wrote:

> +1 (binding)
>
> * Signature: ok
> * Checksum : ok
> * Rat check (1.8.0_412): ok
>  - mvn clean apache-rat:check
> * Built from source (1.8.0_412): ok
>  - mvn clean install  -DskipTests
> * Built from source (11.0.23): ok
>  - mvn clean install -D hadoop.profile=3.0 -DskipTests
> * Built from source (17.0.11): ok
>  - mvn clean install -D hadoop.profile=3.0 -DskipTests
> * Built from source (21.0.3): ok
>  - mvn clean install -D hadoop.profile=3.0 -DskipTests
> * Compatibility report: okay
>  - I agree with Duo's assessment of the changes
>  - filed HBASE-28573 to exclude the shaded package in the future
> * Reviewed the existing issues with incremental backups as flagged by
> Dieter.
> * Exercised the staged repositories using github.com:
> ndimiduk/hbase-downstreamer.git
>  - (JDK8 + Hadoop2): ok
>   - mvn -Dhbase.2.version=2.6.0 -Dhbase.staging.repository=
> https://repository.apache.org/content/repositories/orgapachehbase-1542
> -Dhadoop.2.version=2.10.2
> <https://repository.apache.org/content/repositories/orgapachehbase-1542-Dhadoop.2.version=2.10.2>
> clean package
> -Dmaven.repo.local=$(pwd)/m2_repository clean package
>  - (JDK11 + Hadoop3): ok
>   - mvn -Dhbase.2.version=2.6.0-hadoop3 -Dhbase.staging.repository=
> https://repository.apache.org/content/repositories/orgapachehbase-1543
> -Dhadoop.3.version=3.3.5
> <https://repository.apache.org/content/repositories/orgapachehbase-1543-Dhadoop.3.version=3.3.5>
> -Dmaven.repo.local=$(pwd)/m2_repository clean
> package
> * Run LTT against local mode (JDK11)
>  - ./bin/hbase ltt -num_keys 500 -write 3:512 -read 100 -multiput
>  - web UI looks good
>  - There's a race condition related to closing down zookeeper connections
> when shutting down the local mode master process
>
>
> On Mon, Apr 29, 2024 at 9:20 PM Bryan Beaudreault  >
> wrote:
>
> > Please vote on this Apache hbase release candidate,
> > hbase-2.6.0RC4
> >
> > The VOTE will remain open for at least 72 hours.
> >
> > [ ] +1 Release this package as Apache hbase 2.6.0
> > [ ] -1 Do not release this package because ...
> >
> > The tag to be voted on is 2.6.0RC4:
> >
> >   https://github.com/apache/hbase/tree/2.6.0RC4
> >
> > This tag currently points to git reference
> >
> >   de99f8754135ea69adc39da48d2bc2b2710a5366
> >
> > The release files, including signatures, digests, as well as CHANGES.md
> > and RELEASENOTES.md included in this RC can be found at:
> >
> >   https://dist.apache.org/repos/dist/dev/hbase/2.6.0RC4/
> >
> > Maven artifacts are available in a staging repository at:
> >
> >
> https://repository.apache.org/content/repositories/orgapachehbase-1542/
> >
> > Maven artifacts for hadoop3 are available in a staging repository at:
> >
> >
> https://repository.apache.org/content/repositories/orgapachehbase-1543/
> >
> > Artifacts were signed with the 0x74EFF462 key which can be found in:
> >
> >   https://downloads.apache.org/hbase/KEYS
> >
> > To learn more about Apache hbase, please see
> >
> >   http://hbase.apache.org/
> >
> > Thanks,
> > Your HBase Release Manager
> >
>


[jira] [Resolved] (HBASE-26625) ExportSnapshot tool failed to copy data files for tables with merge region

2024-05-16 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-26625.
---
Resolution: Fixed

I've merged the backport to branch-2.5 and added the next unreleased 2.5.x 
version to fixVersions

> ExportSnapshot tool failed to copy data files for tables with merge region
> --
>
> Key: HBASE-26625
> URL: https://issues.apache.org/jira/browse/HBASE-26625
> Project: HBase
>  Issue Type: Bug
>Reporter: Yi Mei
>Assignee: Yi Mei
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.5.9, 2.4.10, 3.0.0-alpha-3
>
>
> When export snapshot for a table with merge regions, we found following 
> exceptions:
> {code:java}
> 2021-12-24 17:14:41,563 INFO  [main] snapshot.ExportSnapshot: Finalize the 
> Snapshot Export
> 2021-12-24 17:14:41,589 INFO  [main] snapshot.ExportSnapshot: Verify snapshot 
> integrity
> 2021-12-24 17:14:41,683 ERROR [main] snapshot.ExportSnapshot: Snapshot export 
> failed
> org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Missing parent 
> hfile for: 043a9fe8aa7c469d8324956a57849db5.8e935527eb39a2cf9bf0f596754b5853 
> path=A/a=t42=8e935527eb39a2cf9bf0f596754b5853-043a9fe8aa7c469d8324956a57849db5
>     at 
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.concurrentVisitReferencedFiles(SnapshotReferenceUtil.java:232)
>     at 
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.concurrentVisitReferencedFiles(SnapshotReferenceUtil.java:195)
>     at 
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.verifySnapshot(SnapshotReferenceUtil.java:172)
>     at 
> org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.verifySnapshot(SnapshotReferenceUtil.java:156)
>     at 
> org.apache.hadoop.hbase.snapshot.ExportSnapshot.verifySnapshot(ExportSnapshot.java:851)
>     at 
> org.apache.hadoop.hbase.snapshot.ExportSnapshot.doWork(ExportSnapshot.java:1096)
>     at 
> org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:154)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>     at 
> org.apache.hadoop.hbase.util.AbstractHBaseTool.doStaticMain(AbstractHBaseTool.java:280)
>     at 
> org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:1144)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Dropping Java 8 support in HBase 3

2024-05-08 Thread Bryan Beaudreault
In my experience, there are a few notable areas where core refactoring is
happening. Most contributions don’t happen in those areas, and as a result
could be cleanly backported if not for gotchas like HBaseTestingUtil
rename.

Anyway I agree that just adding a compile check is easier.

That said, I would still advocate for not diverging the jdk version from
branch-2. In my opinion almost all commits should be backported to
branch-2. The only exceptions are for specific incompatibile/unsafe 3.x
features. The reason for that is we don’t do major releases nearly often
enough, so backporting to branch-2 is the only way to get changes into
users hands.

So if this change is going to make that much more difficult then personally
I’d prefer a more aggressive approach of bumping jdk for branch-2, or a
more conservative approach of not allowing new language features in
branch-3.

Overall I think more frequent smaller major releases would help us be more
agile, and aligns more with other modern projects I’ve seen.

On Tue, May 7, 2024 at 10:00 AM Istvan Toth 
wrote:

> I'd expect the automated backporting process to only work for fairly
> trivial patches which do not use protobuf, etc.
> More involved patches would need manual work anyway.
>
> If we want to make sure that everything compiles with JDK8, it's easier to
> just compile the master branch with JDK8 (along with 11/17),
> and fail the CI check if it doesn't.
>
> We need to find a balance between using the new Java features and keeping
> the workload manageable.
> We could keep compiling master with JDK8 for a year or two, and when
> activity on the 2.x branches tapers off, we could remove that restriction.
>
>
> On Tue, May 7, 2024 at 3:56 PM Andrew Purtell 
> wrote:
>
> > I also like the suggestion to have CI help us here too.
> >
> > > On May 7, 2024, at 9:42 AM, Bryan Beaudreault  >
> > wrote:
> > >
> > > I'm nervous about creating more big long-term divergences between the
> > > branches. Already I sometimes get caught up on HBaseTestingUtil vs
> > > HBaseTestingUtility. And we all know the burden of maintaining the old
> > > HTable impl.
> > >
> > > I'm not sure if this is a useful suggestion since it would require
> > someone
> > > to do a good deal of work, but I wonder if we could automate backport
> > > testing a bit. Our yetus checks already check the patch, maybe it could
> > > apply the patch to branch-2. This would increase the cost of master
> > branch
> > > PRs but maybe speed us up overall.
> > >
> > >> On Tue, May 7, 2024 at 9:21 AM 张铎(Duo Zhang) 
> > wrote:
> > >>
> > >> The problem is that, if we only compile and run tests on JDK11+, the
> > >> contributors may implicitly use some JDK11+ only features and
> > >> introduce difference when backporting to branch-2.x.
> > >>
> > >> Maybe a possible policy is that, once a patch should go into
> > >> branch-2.x too, before mering the master PR, we should make sure the
> > >> contributor open a PR for branch-2.x too, so we can catch the
> > >> differences between the 2 PRs, and whether to align them.
> > >>
> > >> WDYT?
> > >>
> > >> Thanks.
> > >>
> > >> Andrew Purtell  于2024年5月7日周二 20:20写道:
> > >>>
> > >>> I don’t expect 2.x to wind down for up to several more years. We will
> > be
> > >>> still using it in production at my employer for a long time and I
> would
> > >>> continue my role as RM for 2.x as needed. HBase 3 is great but not GA
> > yet
> > >>> and then some users will want to wait one to a couple years before
> > >> adopting
> > >>> the new major version, especially if migration is not seamless. (We
> > even
> > >>> faced breaking changes in a minor upgrade from 2.4 to 2.5 that
> brought
> > >> down
> > >>> a cluster during a rolling upgrade, so there should be no expectation
> > of
> > >> a
> > >>> seamless upgrade.) My plan is to continue releasing 2.x until, like
> > with
> > >>> 1.x, the commits to branch-2 essentially stop, or until the PMC stops
> > >>> allowing release of the candidates.
> > >>>
> > >>> Perhaps we do not need to do a total ban on use of 11 features. We
> > should
> > >>> allow a case by case discussion. We can minimize their scope and even
> > >>> potentially offer multiversion support like we do with Unsafe access
> > >>> utilit

Re: [VOTE] The third release candidate for 2.6.0, RC4, is available

2024-05-07 Thread Bryan Beaudreault
Related, Dieter, I'm going to try to rally some review bandwidth on the
jiras you have linked. I really appreciate your team's efforts in reporting
and providing fixes for these issues. Some of them slipped past me, but I'm
now watching all of them and hope to get to them soon.

On Tue, May 7, 2024 at 10:22 AM Bryan Beaudreault 
wrote:

> Yes, at my employer we are running daily backups using the new system,
> including incremental backups, on ~130 clusters including 2 production
> clusters. We've had troubles with incremental backups on a few clusters,
> but most have been fine. We've also got end-to-end recurring acceptance
> tests which verify that we can create backups and restore from them. We
> don't use merging of incrementals, but that seems more like a nice-to-have
> relative to the core functionality of creating a backup and restoring from
> it.
>
> I agree that it would have been ideal to release with all bugs resolved.
> Part of the problem is lack of community adoption or development, since it
> had been stuck on 3.0-alpha for so long. My employer has pushed a bunch of
> fixes to backups, but they have all been along the lines of how we intend
> to run them (i.e. not looking at merges). So I appreciate having Dieter and
> team's help as well, and will continue to help review and merge fixes there
> as well as push releases.
>
> Backups may be "flagship" but it's also "experimental". So I think it's ok
> to release as-is given the core functionality works and there is much more
> waiting on 2.6.0 beyond just backups. We will release 2.6.1 shortly, after
> the issues have been resolved for backups.
>
> I believe we only need 3 binding votes, which we have (including my own).
> But I will keep it open a little longer in the hopes that other PMC weigh
> in with an official vote on this first minor release in a while.
>
> Thanks everyone!
>
> On Tue, May 7, 2024 at 9:54 AM Nick Dimiduk  wrote:
>
>> The new backups system is one of the "flagship" features for the 2.6
>> release line so it's a shame that these issues with incremental manifest
>> tracking remain. If I'm not mistaken, though, a full backup still works as
>> expected, so these bugs do not prevent taking a backup entirely. I agree
>> that they together are not enough to block the release.
>>
>> Thank you Dieter for raising the profile of the issues. I see that these
>> tickets don't have very many watchers, so it seems they've gone without
>> notice.
>>
>> On Tue, May 7, 2024 at 2:26 PM Andrew Purtell 
>> wrote:
>>
>> > This sounds very reasonable to me, especially with the promise of a
>> quick
>> > follow on release of 2.6.1.
>> >
>> > —
>> >
>> > I apologize for not voting on the 2.6 candidates. I will set up a VM
>> > somewhere where I can drive release candidate tests by phone for the
>> next
>> > time but did not think of this in advance.
>> >
>> >
>> > On Tue, May 7, 2024 at 6:54 AM Bryan Beaudreault <
>> bbeaudrea...@apache.org>
>> > wrote:
>> >
>> > > Thanks Dieter. I’m aware of these issues, but I don’t think they are
>> > > blockers. The idea with releasing backups was that it would be
>> > experimental
>> > > in 2.6, as a way to get the feature into more of the community’s hands
>> > and
>> > > increase development. So far that’s working, as evidenced by your
>> team’s
>> > > great work!
>> > >
>> > > 2.6.0 has been pending since late last year, but kept getting delayed
>> for
>> > > various reasons. I’m aware of community members who are waiting on
>> this
>> > for
>> > > their next upgrade and frustrated by the delays.
>> > >
>> > > It’s unfortunate to release a feature with known bugs, but given it’s
>> > > experimental and just one small part of the release I don’t think we
>> > should
>> > > delay further. It will take some time to review and merge the jiras
>> you
>> > > mention, as well as 1-2 others I’m aware of. I will be happy to
>> release
>> > > 2.6.1 as a quick follow up once these land.
>> > >
>> > > What does the PMC think?
>> > >
>> > > On Tue, May 7, 2024 at 5:13 AM Dieter De Paepe
>> > > > > >
>> > > wrote:
>> > >
>> > > > -1 non binding
>> > > >
>> > > > I've done some testing of the backup-restore feature the past days,
>> and
>> > > > there's still some issues that I think

Re: [VOTE] The third release candidate for 2.6.0, RC4, is available

2024-05-07 Thread Bryan Beaudreault
Yes, at my employer we are running daily backups using the new system,
including incremental backups, on ~130 clusters including 2 production
clusters. We've had troubles with incremental backups on a few clusters,
but most have been fine. We've also got end-to-end recurring acceptance
tests which verify that we can create backups and restore from them. We
don't use merging of incrementals, but that seems more like a nice-to-have
relative to the core functionality of creating a backup and restoring from
it.

I agree that it would have been ideal to release with all bugs resolved.
Part of the problem is lack of community adoption or development, since it
had been stuck on 3.0-alpha for so long. My employer has pushed a bunch of
fixes to backups, but they have all been along the lines of how we intend
to run them (i.e. not looking at merges). So I appreciate having Dieter and
team's help as well, and will continue to help review and merge fixes there
as well as push releases.

Backups may be "flagship" but it's also "experimental". So I think it's ok
to release as-is given the core functionality works and there is much more
waiting on 2.6.0 beyond just backups. We will release 2.6.1 shortly, after
the issues have been resolved for backups.

I believe we only need 3 binding votes, which we have (including my own).
But I will keep it open a little longer in the hopes that other PMC weigh
in with an official vote on this first minor release in a while.

Thanks everyone!

On Tue, May 7, 2024 at 9:54 AM Nick Dimiduk  wrote:

> The new backups system is one of the "flagship" features for the 2.6
> release line so it's a shame that these issues with incremental manifest
> tracking remain. If I'm not mistaken, though, a full backup still works as
> expected, so these bugs do not prevent taking a backup entirely. I agree
> that they together are not enough to block the release.
>
> Thank you Dieter for raising the profile of the issues. I see that these
> tickets don't have very many watchers, so it seems they've gone without
> notice.
>
> On Tue, May 7, 2024 at 2:26 PM Andrew Purtell  wrote:
>
> > This sounds very reasonable to me, especially with the promise of a quick
> > follow on release of 2.6.1.
> >
> > —
> >
> > I apologize for not voting on the 2.6 candidates. I will set up a VM
> > somewhere where I can drive release candidate tests by phone for the next
> > time but did not think of this in advance.
> >
> >
> > On Tue, May 7, 2024 at 6:54 AM Bryan Beaudreault <
> bbeaudrea...@apache.org>
> > wrote:
> >
> > > Thanks Dieter. I’m aware of these issues, but I don’t think they are
> > > blockers. The idea with releasing backups was that it would be
> > experimental
> > > in 2.6, as a way to get the feature into more of the community’s hands
> > and
> > > increase development. So far that’s working, as evidenced by your
> team’s
> > > great work!
> > >
> > > 2.6.0 has been pending since late last year, but kept getting delayed
> for
> > > various reasons. I’m aware of community members who are waiting on this
> > for
> > > their next upgrade and frustrated by the delays.
> > >
> > > It’s unfortunate to release a feature with known bugs, but given it’s
> > > experimental and just one small part of the release I don’t think we
> > should
> > > delay further. It will take some time to review and merge the jiras you
> > > mention, as well as 1-2 others I’m aware of. I will be happy to release
> > > 2.6.1 as a quick follow up once these land.
> > >
> > > What does the PMC think?
> > >
> > > On Tue, May 7, 2024 at 5:13 AM Dieter De Paepe
> >  > > >
> > > wrote:
> > >
> > > > -1 non binding
> > > >
> > > > I've done some testing of the backup-restore feature the past days,
> and
> > > > there's still some issues that I think really should be solved for a
> > > first
> > > > release version.
> > > > PRs are available for all of these:
> > > >
> > > >
> > > >   *   HBASE-28539: backup merging does not work when using cloud
> > storage
> > > > as filesystem
> > > >   *   HBASE-28502: backed up tables are not listed correctly in
> backup
> > > > metadata, which causes unreliable backup validation
> > > >   *   HBASE-28568: the set of tables included in incremental backups
> > > might
> > > > be too big
> > > >   *   HBASE-28562: another possible failure cause for incremental
> > backups
> > > > + possibly cause

Re: [DISCUSS] Dropping Java 8 support in HBase 3

2024-05-07 Thread Bryan Beaudreault
; > >
> > > > > AFAIK spring 6 and spring-boot 3 have jumped to java17 directly,
> so if
> > > we
> > > > > want to upgrade, I also suggest that we jump to java 17 directly.
> > > > >
> > > > > While upgrading to java 17 can reduce our compatibility work on
> > > branch-3+,
> > > > > but consider the widely usage for java 8, I think we still need to
> > > support
> > > > > branch-2 for several years, then this will increase the
> compatibility
> > > work
> > > > > as the code between branch-3+ and branch-2.x will be more and more
> > > > > different.
> > > > >
> > > > > So for me, a workable solution is
> > > > >
> > > > > 1. We first claim that branch-3+ will move minimum java support to
> 11
> > > or
> > > > > 17.
> > > > > 2. Start to move the compilation to java 11 or 17, but still keep
> > > release
> > > > > version 8, and still keep the pre commit pipeline to run java 8,
> 11,
> > > 17, to
> > > > > minimum our compatibility work before we have the first 3.0.0
> release.
> > > > > 3. Cut branch-3.0 and release 3.0.0, so we have a 3.0.0 release,
> > > actually
> > > > > which can still run on java 8, so it will be easier for our users
> to
> > > > > upgrade to 3.x and reduce our pressure on maintaining branch-2,
> > > especially
> > > > > do not need to back port new features there.
> > > > > 4. Start to move the release version to 11 or 17 on branch-3+, and
> > > prepare
> > > > > for 3.1.0 release, which will be the real 11 or 17 only release.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > Bryan Beaudreault 于2024年4月30日 周二02:54写道:
> > > > >
> > > > > > I am a huge +1 for dropping java8.
> > > > > >
> > > > > > One reason I would suggest going to 17 is that it seems so hard
> to
> > > change
> > > > > > these things given our long development cycle on major releases.
> > > There
> > > > > are
> > > > > > some nice language features in 17, but more importantly is that
> the
> > > > > initial
> > > > > > release of java11 was released 6 years ago and java17 released 3
> > > years.
> > > > > > Java21 is already released as well. So I could see java17 being
> > > widely
> > > > > > available enough that we could jump "in the middle" rather than
> to
> > > the
> > > > > > oldest LTS.
> > > > > >
> > > > > > I will say that we're already running java 21 on all of our
> > > hbase/hadoop
> > > > > in
> > > > > > prod (70 clusters, 7k regionservers). I know not every
> organization
> > > can
> > > > > be
> > > > > > that aggressive, and I wouldn't suggest jumping to 21 in the
> > > codebase.
> > > > > Just
> > > > > > pointing it out in terms of basic support already existing and
> being
> > > > > > stable.
> > > > > >
> > > > > > On Mon, Apr 29, 2024 at 2:33 PM Andrew Purtell <
> > > andrew.purt...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I also agree that mitigation of security problems in
> dependencies
> > > will
> > > > > be
> > > > > > > increasingly difficult, as we cannot expect our dependencies to
> > > > > continue
> > > > > > to
> > > > > > > support Java 8. They might, but as time goes on it is less
> likely.
> > > > > > >
> > > > > > > A minimum of Java 11 makes a lot of sense. This is where the
> > > center of
> > > > > > > gravity of the Java ecosystem is, probably.
> > > > > > >
> > > > > > > A minimum of 17 is aggressive and I don’t see the point unless
> > > there
> > > > > is a
> > > > > > > feature in 17 that we would like to base an improvement on.
> > > > > > >
> > > > > > > > On Apr 29, 2024, at 1:23 PM, chrajeshbab...@gmail.com wrote:
> > > > > > > >
> > > > &g

Re: [VOTE] The third release candidate for 2.6.0, RC4, is available

2024-05-07 Thread Bryan Beaudreault
Thanks Dieter. I’m aware of these issues, but I don’t think they are
blockers. The idea with releasing backups was that it would be experimental
in 2.6, as a way to get the feature into more of the community’s hands and
increase development. So far that’s working, as evidenced by your team’s
great work!

2.6.0 has been pending since late last year, but kept getting delayed for
various reasons. I’m aware of community members who are waiting on this for
their next upgrade and frustrated by the delays.

It’s unfortunate to release a feature with known bugs, but given it’s
experimental and just one small part of the release I don’t think we should
delay further. It will take some time to review and merge the jiras you
mention, as well as 1-2 others I’m aware of. I will be happy to release
2.6.1 as a quick follow up once these land.

What does the PMC think?

On Tue, May 7, 2024 at 5:13 AM Dieter De Paepe 
wrote:

> -1 non binding
>
> I've done some testing of the backup-restore feature the past days, and
> there's still some issues that I think really should be solved for a first
> release version.
> PRs are available for all of these:
>
>
>   *   HBASE-28539: backup merging does not work when using cloud storage
> as filesystem
>   *   HBASE-28502: backed up tables are not listed correctly in backup
> metadata, which causes unreliable backup validation
>   *   HBASE-28568: the set of tables included in incremental backups might
> be too big
>   *   HBASE-28562: another possible failure cause for incremental backups
> + possibly cause of overly big backup metadata
>
> I also feel HBASE-28084 is an important one, but there's no PR for that so
> far, so I'm fine with skipping that one for 2.6.0.
>
> Regards,
> Dieter
>
> On 2024/04/29 19:20:27 Bryan Beaudreault wrote:
> > Please vote on this Apache hbase release candidate,
> > hbase-2.6.0RC4
> >
> > The VOTE will remain open for at least 72 hours.
> >
> > [ ] +1 Release this package as Apache hbase 2.6.0
> > [ ] -1 Do not release this package because ...
> >
> > The tag to be voted on is 2.6.0RC4:
> >
> > https://github.com/apache/hbase/tree/2.6.0RC4
> >
> > This tag currently points to git reference
> >
> > de99f8754135ea69adc39da48d2bc2b2710a5366
> >
> > The release files, including signatures, digests, as well as CHANGES.md
> > and RELEASENOTES.md included in this RC can be found at:
> >
> > https://dist.apache.org/repos/dist/dev/hbase/2.6.0RC4/
> >
> > Maven artifacts are available in a staging repository at:
> >
> > https://repository.apache.org/content/repositories/orgapachehbase-1542/
> >
> > Maven artifacts for hadoop3 are available in a staging repository at:
> >
> > https://repository.apache.org/content/repositories/orgapachehbase-1543/
> >
> > Artifacts were signed with the 0x74EFF462 key which can be found in:
> >
> > https://downloads.apache.org/hbase/KEYS
> >
> > To learn more about Apache hbase, please see
> >
> > http://hbase.apache.org/
> >
> > Thanks,
> > Your HBase Release Manager
> >
>
>


[VOTE] The third release candidate for 2.6.0, RC4, is available

2024-04-29 Thread Bryan Beaudreault
Please vote on this Apache hbase release candidate,
hbase-2.6.0RC4

The VOTE will remain open for at least 72 hours.

[ ] +1 Release this package as Apache hbase 2.6.0
[ ] -1 Do not release this package because ...

The tag to be voted on is 2.6.0RC4:

  https://github.com/apache/hbase/tree/2.6.0RC4

This tag currently points to git reference

  de99f8754135ea69adc39da48d2bc2b2710a5366

The release files, including signatures, digests, as well as CHANGES.md
and RELEASENOTES.md included in this RC can be found at:

  https://dist.apache.org/repos/dist/dev/hbase/2.6.0RC4/

Maven artifacts are available in a staging repository at:

  https://repository.apache.org/content/repositories/orgapachehbase-1542/

Maven artifacts for hadoop3 are available in a staging repository at:

  https://repository.apache.org/content/repositories/orgapachehbase-1543/

Artifacts were signed with the 0x74EFF462 key which can be found in:

  https://downloads.apache.org/hbase/KEYS

To learn more about Apache hbase, please see

  http://hbase.apache.org/

Thanks,
Your HBase Release Manager


Re: [DISCUSS] Dropping Java 8 support in HBase 3

2024-04-29 Thread Bryan Beaudreault
I am a huge +1 for dropping java8.

One reason I would suggest going to 17 is that it seems so hard to change
these things given our long development cycle on major releases. There are
some nice language features in 17, but more importantly is that the initial
release of java11 was released 6 years ago and java17 released 3 years.
Java21 is already released as well. So I could see java17 being widely
available enough that we could jump "in the middle" rather than to the
oldest LTS.

I will say that we're already running java 21 on all of our hbase/hadoop in
prod (70 clusters, 7k regionservers). I know not every organization can be
that aggressive, and I wouldn't suggest jumping to 21 in the codebase. Just
pointing it out in terms of basic support already existing and being stable.

On Mon, Apr 29, 2024 at 2:33 PM Andrew Purtell 
wrote:

> I also agree that mitigation of security problems in dependencies will be
> increasingly difficult, as we cannot expect our dependencies to continue to
> support Java 8. They might, but as time goes on it is less likely.
>
> A minimum of Java 11 makes a lot of sense. This is where the center of
> gravity of the Java ecosystem is, probably.
>
> A minimum of 17 is aggressive and I don’t see the point unless there is a
> feature in 17 that we would like to base an improvement on.
>
> > On Apr 29, 2024, at 1:23 PM, chrajeshbab...@gmail.com wrote:
> >
> > Hi!
> >
> > With 3.0 on the horizon, we could look into bumping the minimum required
> > Java version for HBase.
> >
> > The last discussion I could find was four years ago, when dropping 8.0
> > support was rejected.
> >
> > https://lists.apache.org/thread/ph8xry0x37cvjj89fp2jk1k48yb7gs46
> >
> > Now it's four years later, and the end of OpenJDK support for Java 8 and
> 11
> > are much closer.
> > (Oracle public support is so short that I consider that irrelevant)
> >
> > Some critical dependencies (like Jetty) have ended even regular security
> > support for Java 8.
> >
> > By supporting Java 8 we are alse limiting ourselves to using an already
> 10
> > year old Java release, ignoring any developments in the language.
> >
> > My take is that with the current dogmatic emphasis on CVE mitigation the
> > benefits of bumping the required JDK version outweigh the benefits even
> for
> > the legacy install base, especially it's getting harder and harder to be
> > CVE free with Java 8.
> >
> > Furthermore, with RedHat dropping JDK11 support this year, I think we
> could
> > also consider bumping the minimum requirement straight to JDK 17.
> >
> > Hadoop is still on Java 8, but previously it has dropped Java 7 support
> in
> > a patch release, and I wouldn't be surprised if it dropped Java 8 in a
> > similar manner, so I would not put too much stock in that.
> >
> > What do you think ?
> >
> > Thanks,
> > Rajeshbabu.
>


Re: [VOTE] The second release candidate for 2.6.0 (RC3) is available

2024-04-29 Thread Bryan Beaudreault
Nightly build failures look real, test failing is
TestNamespaceAuditor.testRegionMerge

On Sun, Apr 28, 2024 at 9:10 PM 张铎(Duo Zhang)  wrote:

> HBASE-28554 has been merged and the flaky dashboard is back to normal
> for branch-2.6.
>
> There are still other problems for branch-2 but at least they are not
> blockers for branch-2.6.
>
> Please go ahead and cut RC4.
>
> Thanks.
>
> Bryan Beaudreault  于2024年4月28日周日 22:47写道:
> >
> > I'm going to kick off the next RC once that is merged back to branch-2.6
> >
> > On Sun, Apr 28, 2024 at 9:44 AM 张铎(Duo Zhang) 
> wrote:
> >
> > > We need to get HBASE-28554 in before cutting the new RC.
> > >
> > > The PR is ready.
> > >
> > > https://github.com/apache/hbase/pull/5859
> > >
> > > PTAL.
> > >
> > > Thanks.
> > >
> > > 张铎(Duo Zhang)  于2024年4月25日周四 22:05写道:
> > > >
> > > > Thanks Istvan, I've pushed the addendum to all active branches.
> > > >
> > > > Bryan, sadly but need to put up a new RC...
> > > >
> > > > Istvan Toth  于2024年4月25日周四 12:51写道:
> > > > >
> > > > > I can merge https://github.com/apache/hbase/pull/5852 as soon as I
> > > get a
> > > > > review on it for the above issue.
> > > > >
> > > > > best regards
> > > > > Istvan
> > > > >
> > > > >
> > > > > On Thu, Apr 25, 2024 at 4:14 AM 张铎(Duo Zhang) <
> palomino...@gmail.com>
> > > wrote:
> > > > >
> > > > > > HBASE-25818 introduced a breaking change, it removed the
> SCAN_FILTER
> > > > > > field, and introduced two new fields in
> > > > > > org.apache.hadoop.hbase.rest.Constants.
> > > > > >
> > > > > > But unfortunately, org.apache.hadoop.hbase.rest.Constants is
> > > IA.Public
> > > > > > so we can not remove its field without a deprecation cycle...
> > > > > >
> > > > > > Bryan Beaudreault  于2024年4月25日周四
> 09:21写道:
> > > > > > >
> > > > > > > Please vote on this Apache hbase release candidate,
> > > > > > > hbase-2.6.0RC3
> > > > > > >
> > > > > > > The VOTE will remain open for at least 72 hours.
> > > > > > >
> > > > > > > [ ] +1 Release this package as Apache hbase 2.6.0
> > > > > > > [ ] -1 Do not release this package because ...
> > > > > > >
> > > > > > > The tag to be voted on is 2.6.0RC3:
> > > > > > >
> > > > > > >   https://github.com/apache/hbase/tree/2.6.0RC3
> > > > > > >
> > > > > > > This tag currently points to git reference
> > > > > > >
> > > > > > >   df3343989d02966752ce7562546619f86a36169a
> > > > > > >
> > > > > > > The release files, including signatures, digests, as well as
> > > CHANGES.md
> > > > > > > and RELEASENOTES.md included in this RC can be found at:
> > > > > > >
> > > > > > >   https://dist.apache.org/repos/dist/dev/hbase/2.6.0RC3/
> > > > > > >
> > > > > > > Maven artifacts are available in a staging repository at:
> > > > > > >
> > > > > > >
> > > > > >
> > >
> https://repository.apache.org/content/repositories/orgapachehbase-1540/
> > > > > > >
> > > > > > > Maven artifacts for hadoop3 are available in a staging
> repository
> > > at:
> > > > > > >
> > > > > > >
> > > > > >
> > >
> https://repository.apache.org/content/repositories/orgapachehbase-1541/
> > > > > > >
> > > > > > > Artifacts were signed with the 0x74EFF462 key which can be
> found
> > > in:
> > > > > > >
> > > > > > >   https://downloads.apache.org/hbase/KEYS
> > > > > > >
> > > > > > > To learn more about Apache hbase, please see
> > > > > > >
> > > > > > >   http://hbase.apache.org/
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Your HBase Release Manager
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *István Tóth* | Sr. Staff Software Engineer
> > > > > *Email*: st...@cloudera.com
> > > > > cloudera.com <https://www.cloudera.com>
> > > > > [image: Cloudera] <https://www.cloudera.com/>
> > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
> [image:
> > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > > Cloudera
> > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > > > --
> > > > > --
> > >
>


[jira] [Resolved] (HBASE-28482) Reverse scan with tags throws ArrayIndexOutOfBoundsException with DBE

2024-04-28 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28482.
---
Fix Version/s: 2.6.0
   2.4.18
   3.0.0-beta-2
   2.5.9
   Resolution: Fixed

Pushed to all active branches. Thanks for the follow-up fix here [~vineet.4008]!

> Reverse scan with tags throws ArrayIndexOutOfBoundsException with DBE
> -
>
> Key: HBASE-28482
> URL: https://issues.apache.org/jira/browse/HBASE-28482
> Project: HBase
>  Issue Type: Bug
>  Components: HFile
>Reporter: Vineet Kumar Maheshwari
>Assignee: Vineet Kumar Maheshwari
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-2, 2.5.9
>
>
> Facing ArrayIndexOutOfBoundsException when performing reverse scan on a table 
> with 30K+ records in single hfile.
> Exception is happening  when block changes during seekBefore call.
> {code:java}
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>     at 
> org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:1326)
>     at org.apache.hadoop.hbase.nio.SingleByteBuff.get(SingleByteBuff.java:213)
>     at 
> org.apache.hadoop.hbase.io.encoding.DiffKeyDeltaEncoder$DiffSeekerStateBufferedEncodedSeeker.decode(DiffKeyDeltaEncoder.java:431)
>     at 
> org.apache.hadoop.hbase.io.encoding.DiffKeyDeltaEncoder$DiffSeekerStateBufferedEncodedSeeker.decodeNext(DiffKeyDeltaEncoder.java:502)
>     at 
> org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.seekToKeyInBlock(BufferedDataBlockEncoder.java:1012)
>     at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$EncodedScanner.loadBlockAndSeekToKey(HFileReaderImpl.java:1605)
>     at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekBefore(HFileReaderImpl.java:719)
>     at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekBeforeAndSaveKeyToPreviousRow(StoreFileScanner.java:645)
>     at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekToPreviousRowWithoutHint(StoreFileScanner.java:570)
>     at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekToPreviousRow(StoreFileScanner.java:506)
>     at 
> org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.next(ReversedKeyValueHeap.java:126)
>     at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:693)
>     at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:151){code}
>  
> Steps to reproduce:
> Create a table with DataBlockEncoding.DIFF and block size as 1024, write some 
> 30K+ puts with setTTL, then do a reverse scan.
> {code:java}
> @Test
> public void testReverseScanWithDBEWhenCurrentBlockUpdates() throws 
> IOException {
> byte[] family = Bytes.toBytes("0");
> Configuration conf = new Configuration(TEST_UTIL.getConfiguration());
> conf.setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 1);
> try (Connection connection = ConnectionFactory.createConnection(conf)) {
> testReverseScanWithDBE(connection, DataBlockEncoding.DIFF, family, 1024, 
> 3);
> for (DataBlockEncoding encoding : DataBlockEncoding.values()) {
> testReverseScanWithDBE(connection, encoding, family, 1024, 3);
> }
> }
> }
> private void testReverseScanWithDBE(Connection conn, DataBlockEncoding 
> encoding, byte[] family, int blockSize, int maxRows)
> throws IOException {
> LOG.info("Running test with DBE={}", encoding);
> TableName tableName = TableName.valueOf(TEST_NAME.getMethodName() + "-" + 
> encoding);
> TEST_UTIL.createTable(TableDescriptorBuilder.newBuilder(tableName)
> .setColumnFamily(
> ColumnFamilyDescriptorBuilder.newBuilder(family).setDataBlockEncoding(encoding).setBlocksize(blockSize).build())
> .build(), null);
> Table table = conn.getTable(tableName);
> byte[] val1 = new byte[10];
> byte[] val2 = new byte[10];
> Bytes.random(val1);
> Bytes.random(val2);
> for (int i = 0; i < maxRows; i++) {
> table.put(new Put(Bytes.toBytes(i)).addColumn(family, Bytes.toBytes(1), val1)
> .addColumn(family, Bytes.toBytes(2), val2).setTTL(600_000));
> }
> TEST_UTIL.flush(table.getName());
> Scan scan = new Scan();
> scan.setReversed(true);
> try (ResultScanner scanner = table.getScanner(scan)) {
> for (int i = maxRows - 1; i >= 0; i--) {
> Result row = scanner.next();
> assertEquals(2, row.size());
> Cell cell1 = row.getColumnLatestCell(family, Bytes.toBytes(1));
> assertTrue(CellU

Re: [VOTE] The second release candidate for 2.6.0 (RC3) is available

2024-04-28 Thread Bryan Beaudreault
I'm going to kick off the next RC once that is merged back to branch-2.6

On Sun, Apr 28, 2024 at 9:44 AM 张铎(Duo Zhang)  wrote:

> We need to get HBASE-28554 in before cutting the new RC.
>
> The PR is ready.
>
> https://github.com/apache/hbase/pull/5859
>
> PTAL.
>
> Thanks.
>
> 张铎(Duo Zhang)  于2024年4月25日周四 22:05写道:
> >
> > Thanks Istvan, I've pushed the addendum to all active branches.
> >
> > Bryan, sadly but need to put up a new RC...
> >
> > Istvan Toth  于2024年4月25日周四 12:51写道:
> > >
> > > I can merge https://github.com/apache/hbase/pull/5852 as soon as I
> get a
> > > review on it for the above issue.
> > >
> > > best regards
> > > Istvan
> > >
> > >
> > > On Thu, Apr 25, 2024 at 4:14 AM 张铎(Duo Zhang) 
> wrote:
> > >
> > > > HBASE-25818 introduced a breaking change, it removed the SCAN_FILTER
> > > > field, and introduced two new fields in
> > > > org.apache.hadoop.hbase.rest.Constants.
> > > >
> > > > But unfortunately, org.apache.hadoop.hbase.rest.Constants is
> IA.Public
> > > > so we can not remove its field without a deprecation cycle...
> > > >
> > > > Bryan Beaudreault  于2024年4月25日周四 09:21写道:
> > > > >
> > > > > Please vote on this Apache hbase release candidate,
> > > > > hbase-2.6.0RC3
> > > > >
> > > > > The VOTE will remain open for at least 72 hours.
> > > > >
> > > > > [ ] +1 Release this package as Apache hbase 2.6.0
> > > > > [ ] -1 Do not release this package because ...
> > > > >
> > > > > The tag to be voted on is 2.6.0RC3:
> > > > >
> > > > >   https://github.com/apache/hbase/tree/2.6.0RC3
> > > > >
> > > > > This tag currently points to git reference
> > > > >
> > > > >   df3343989d02966752ce7562546619f86a36169a
> > > > >
> > > > > The release files, including signatures, digests, as well as
> CHANGES.md
> > > > > and RELEASENOTES.md included in this RC can be found at:
> > > > >
> > > > >   https://dist.apache.org/repos/dist/dev/hbase/2.6.0RC3/
> > > > >
> > > > > Maven artifacts are available in a staging repository at:
> > > > >
> > > > >
> > > >
> https://repository.apache.org/content/repositories/orgapachehbase-1540/
> > > > >
> > > > > Maven artifacts for hadoop3 are available in a staging repository
> at:
> > > > >
> > > > >
> > > >
> https://repository.apache.org/content/repositories/orgapachehbase-1541/
> > > > >
> > > > > Artifacts were signed with the 0x74EFF462 key which can be found
> in:
> > > > >
> > > > >   https://downloads.apache.org/hbase/KEYS
> > > > >
> > > > > To learn more about Apache hbase, please see
> > > > >
> > > > >   http://hbase.apache.org/
> > > > >
> > > > > Thanks,
> > > > > Your HBase Release Manager
> > > >
> > >
> > >
> > > --
> > > *István Tóth* | Sr. Staff Software Engineer
> > > *Email*: st...@cloudera.com
> > > cloudera.com <https://www.cloudera.com>
> > > [image: Cloudera] <https://www.cloudera.com/>
> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera
> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > --
> > > --
>


[VOTE] The second release candidate for 2.6.0 (RC3) is available

2024-04-24 Thread Bryan Beaudreault
Please vote on this Apache hbase release candidate,
hbase-2.6.0RC3

The VOTE will remain open for at least 72 hours.

[ ] +1 Release this package as Apache hbase 2.6.0
[ ] -1 Do not release this package because ...

The tag to be voted on is 2.6.0RC3:

  https://github.com/apache/hbase/tree/2.6.0RC3

This tag currently points to git reference

  df3343989d02966752ce7562546619f86a36169a

The release files, including signatures, digests, as well as CHANGES.md
and RELEASENOTES.md included in this RC can be found at:

  https://dist.apache.org/repos/dist/dev/hbase/2.6.0RC3/

Maven artifacts are available in a staging repository at:

  https://repository.apache.org/content/repositories/orgapachehbase-1540/

Maven artifacts for hadoop3 are available in a staging repository at:

  https://repository.apache.org/content/repositories/orgapachehbase-1541/

Artifacts were signed with the 0x74EFF462 key which can be found in:

  https://downloads.apache.org/hbase/KEYS

To learn more about Apache hbase, please see

  http://hbase.apache.org/

Thanks,
Your HBase Release Manager


[jira] [Resolved] (HBASE-28255) Correcting spelling errors or annotations with non-standard spelling

2024-04-23 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28255.
---
Fix Version/s: 2.6.0
   3.0.0-beta-2
   2.5.8
   Resolution: Fixed

Looks like this was forgot to resolve. I added what I think are the correct 
fixVersions

> Correcting spelling errors or annotations with non-standard spelling
> 
>
> Key: HBASE-28255
> URL: https://issues.apache.org/jira/browse/HBASE-28255
> Project: HBase
>  Issue Type: Improvement
>Reporter: mazhengxuan
>Priority: Minor
>  Labels: documentation
> Fix For: 2.6.0, 3.0.0-beta-2, 2.5.8
>
>
> Modify some spelling errors or non-standard spelling comments pointed out by 
> Typo



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Looking to create next RC for 2.6.0 this week

2024-04-22 Thread Bryan Beaudreault
Please let me know if you have any blockers


[jira] [Created] (HBASE-28538) BackupHFileCleaner.loadHFileRefs is very expensive

2024-04-19 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28538:
-

 Summary: BackupHFileCleaner.loadHFileRefs is very expensive
 Key: HBASE-28538
 URL: https://issues.apache.org/jira/browse/HBASE-28538
 Project: HBase
  Issue Type: Bug
  Components: backuprestore
Reporter: Bryan Beaudreault


I noticed some odd CPU spikes on the hmasters of one of our clusters. Turns out 
it had been getting lots of bulkoads (30k) and processing them was expensive. 
The method scans hbase and then parses the paths. Surprisingly the parsing is 
more expensive than the reading hbase, with the vast majority of time spent in 
org/apache/hadoop/fs/Path..

We should see if this is possible to be optimized. Attaching profile.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28183) It's impossible to re-enable the quota table if it gets disabled

2024-04-07 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28183.
---
Fix Version/s: 2.6.0
   3.0.0-beta-2
   2.5.9
   Resolution: Fixed

Pushed to branch-2.5+. Thanks for the contribution [~chandrasekhar.k]!

> It's impossible to re-enable the quota table if it gets disabled
> 
>
> Key: HBASE-28183
> URL: https://issues.apache.org/jira/browse/HBASE-28183
> Project: HBase
>  Issue Type: Bug
>    Reporter: Bryan Beaudreault
>Assignee: Chandra Sekhar K
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2, 2.5.9
>
>
> HMaster.enableTable tries to read the quota table. If you disable the quota 
> table, this fails. So then it's impossible to re-enable it. The only solution 
> I can find is to delete the table at this point, so that it gets recreated at 
> startup, but this results in losing any quotas you had defined.  We should 
> fix enableTable to not check quotas if the table in question is hbase:quota.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28483) Merge of incremental backups fails on bulkloaded Hfiles

2024-04-06 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28483.
---
Fix Version/s: 2.6.0
   3.0.0-beta-2
   Resolution: Fixed

Pushed to branch-2.6+. Thanks for the report and fix [~thomas.sarens]!

> Merge of incremental backups fails on bulkloaded Hfiles
> ---
>
> Key: HBASE-28483
> URL: https://issues.apache.org/jira/browse/HBASE-28483
> Project: HBase
>  Issue Type: Bug
>  Components: backuprestore
>Affects Versions: 2.6.0, 4.0.0-alpha-1
>Reporter: thomassarens
>Assignee: thomassarens
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
> Attachments: TestIncrementalBackupMergeWithBulkLoad.java
>
>
> The merge of incremental backups fails in case one of the backups contains a 
> bulk loaded HFile and the other backups doesn't. See test in attachements 
> based on
> {code:java}
> org/apache/hadoop/hbase/backup/TestBackupRestoreWithModifications.java{code}
> that reproduces the exception when useBulkLoad is set to true 
> [^TestIncrementalBackupMergeWithBulkLoad.java].
> This exception occurs in the call to`HFileRecordReader#initialize` as it 
> tries to read a directory path as an HFile. I'll see if I can create a patch 
> on master to fix this.
> {code:java}
> 2024-04-04T14:55:15,462 INFO  LocalJobRunner Map Task Executor #0 {} 
> mapreduce.HFileInputFormat$HFileRecordReader(95): Initialize 
> HFileRecordReader for 
> hdfs://localhost:34093/user/thomass/backupIT/backup_1712235269368/default/table-true/eaeb223066c24d3e77a2ee6987e30cb3/0
> 2024-04-04T14:55:15,482 WARN  [Thread-1429 {}] 
> mapred.LocalJobRunner$Job(590): job_local1854345815_0018
> java.lang.Exception: java.io.FileNotFoundException: Path is not a file: 
> /user/thomass/backupIT/backup_1712235269368/default/table-true/eaeb223066c24d3e77a2ee6987e30cb3/0
> at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:90)
> at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:156)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2124)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:769)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:460)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1213)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1089)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1012)
> at java.base/java.security.AccessController.doPrivileged(Native Method)
> at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3026)
>  
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492) 
> ~[hadoop-mapreduce-client-common-3.3.5.jar:?]
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552) 
> ~[hadoop-mapreduce-client-common-3.3.5.jar:?]
> Caused by: java.io.FileNotFoundException: Path is not a file: 
> /user/thomass/backupIT/backup_1712235269368/default/table-true/eaeb223066c24d3e77a2ee6987e30cb3/0
> at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:90)
> at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:156)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2124)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:769)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServe

[jira] [Resolved] (HBASE-28460) Full backup restore fails for empty HFiles

2024-04-02 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28460.
---
Fix Version/s: 2.6.0
   3.0.0-beta-2
 Assignee: Dieter De Paepe
   Resolution: Fixed

Thanks for the contribution [~dieterdp_ng]! Pushed to branch-2.6+

> Full backup restore fails for empty HFiles
> --
>
> Key: HBASE-28460
> URL: https://issues.apache.org/jira/browse/HBASE-28460
> Project: HBase
>  Issue Type: Bug
>  Components: backuprestore
>Affects Versions: 2.6.0, 4.0.0-alpha-1
>Reporter: Dieter De Paepe
>Assignee: Dieter De Paepe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> A full backup restore fails if the backup contains an empty HFile, for 
> example when all data has been deleted from a table and full compaction has 
> run. There are several issues:
>  * HFiles are read in `RestoreTool` to read the first/last key, but this 
> fails for empty HFiles
>  * In `RestoreTool`, table creation also incorrectly assumes the region 
> contains keys
>  * In `MapReduceRestoreJob`, the tool incorrectly assumes that a bulkload 
> with no loaded entries is an error.
> Example stacktrace:
> {code:java}
> 24/03/21 18:38:09 ERROR org.apache.hadoop.hbase.backup.util.BackupUtils: 
> java.util.NoSuchElementException: No value present
> java.util.NoSuchElementException: No value present
>   at java.base/java.util.Optional.get(Optional.java:143)
>   at 
> org.apache.hadoop.hbase.backup.util.RestoreTool.generateBoundaryKeys(RestoreTool.java:440)
>   at 
> org.apache.hadoop.hbase.backup.util.RestoreTool.checkAndCreateTable(RestoreTool.java:493)
>   at 
> org.apache.hadoop.hbase.backup.util.RestoreTool.createAndRestoreTable(RestoreTool.java:351)
>   at 
> org.apache.hadoop.hbase.backup.util.RestoreTool.fullRestoreTable(RestoreTool.java:211)
>   at 
> org.apache.hadoop.hbase.backup.impl.RestoreTablesClient.restoreImages(RestoreTablesClient.java:151)
>   at 
> org.apache.hadoop.hbase.backup.impl.RestoreTablesClient.restore(RestoreTablesClient.java:229)
>   at 
> org.apache.hadoop.hbase.backup.impl.RestoreTablesClient.execute(RestoreTablesClient.java:265)
>   at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.restore(BackupAdminImpl.java:518)
>   at 
> org.apache.hadoop.hbase.backup.RestoreDriver.parseAndRun(RestoreDriver.java:176)
>   at 
> org.apache.hadoop.hbase.backup.RestoreDriver.doWork(RestoreDriver.java:216)
>   at 
> org.apache.hadoop.hbase.backup.RestoreDriver.run(RestoreDriver.java:252)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
>   at 
> org.apache.hadoop.hbase.backup.RestoreDriver.main(RestoreDriver.java:224)
> 24/03/21 18:38:09 ERROR org.apache.hadoop.hbase.backup.RestoreDriver: Error 
> while running restore backup
> java.lang.IllegalStateException: Cannot restore hbase table
>   at 
> org.apache.hadoop.hbase.backup.util.RestoreTool.createAndRestoreTable(RestoreTool.java:360)
>   at 
> org.apache.hadoop.hbase.backup.util.RestoreTool.fullRestoreTable(RestoreTool.java:211)
>   at 
> org.apache.hadoop.hbase.backup.impl.RestoreTablesClient.restoreImages(RestoreTablesClient.java:151)
>   at 
> org.apache.hadoop.hbase.backup.impl.RestoreTablesClient.restore(RestoreTablesClient.java:229)
>   at 
> org.apache.hadoop.hbase.backup.impl.RestoreTablesClient.execute(RestoreTablesClient.java:265)
>   at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.restore(BackupAdminImpl.java:518)
>   at 
> org.apache.hadoop.hbase.backup.RestoreDriver.parseAndRun(RestoreDriver.java:176)
>   at 
> org.apache.hadoop.hbase.backup.RestoreDriver.doWork(RestoreDriver.java:216)
>   at 
> org.apache.hadoop.hbase.backup.RestoreDriver.run(RestoreDriver.java:252)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
>   at 
> org.apache.hadoop.hbase.backup.RestoreDriver.main(RestoreDriver.java:224)
> Caused by: java.util.NoSuchElementException: No value present
>   at java.base/java.util.Optional.get(Optional.java:143)
>   at 
> org.apache.hadoop.hbase.backup.util.RestoreTool.generateBoundaryKeys(RestoreTool.java:440)
>   at 
> org.apache.hadoop.hbase.backup.util.RestoreTool.checkAndCreateTable(RestoreTool.java:493)
>   at 
> org.apache.hadoop.hbase.backup.util.RestoreTool.createAndRestoreTable(RestoreTool.java:351)
>   ... 10 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] The first release candidate for 2.6.0 (RC2) is available

2024-04-02 Thread Bryan Beaudreault
Oh, sorry! I missed that it was just merged. Thanks again

On Tue, Apr 2, 2024 at 7:06 AM Wellington Chevreuil <
wellington.chevre...@gmail.com> wrote:

> I've just got the approval from Duo and merged it. Am cherry-picking to
> lower branches now.
>
> Em ter., 2 de abr. de 2024 às 12:02, Bryan Beaudreault <
> bbeaudrea...@apache.org> escreveu:
>
> > Thanks Wellington. Was there a reason to only merge that to master?
> >
> > On Tue, Apr 2, 2024 at 5:23 AM Wellington Chevreuil <
> > wellington.chevre...@gmail.com> wrote:
> >
> > > Regarding TestBucketCachePersister flakeyness, I have noticed that last
> > > week and had submitted a fix in
> > > https://issues.apache.org/jira/browse/HBASE-28458.
> > >
> > > Em sex., 29 de mar. de 2024 às 15:41, Bryan Beaudreault <
> > > bbeaudrea...@apache.org> escreveu:
> > >
> > > > Thanks. I pushed the addendum fix to HBASE-27657. I will start
> another
> > RC
> > > > on Monday so that people have time to notice any other issues to
> > include.
> > > >
> > > > On Fri, Mar 29, 2024 at 11:24 AM 张铎(Duo Zhang) <
> palomino...@gmail.com>
> > > > wrote:
> > > >
> > > > > The other parts of the RC are good.
> > > > >
> > > > > I ran all the UTs and also started a mini cluster to test basic
> shell
> > > > > commands, all good.
> > > > >
> > > > > Will vote a +1 after fixing the above compatibility issue.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > Bryan Beaudreault  于2024年3月29日周五 23:10写道:
> > > > > >
> > > > > > Thanks for catching that. I looked at the compatibility report
> but
> > > > missed
> > > > > > that one. I will work on fixing.
> > > > > >
> > > > > > If anyone else wants to take a look for other reasons to sink the
> > > RC, I
> > > > > > will also tackle those before the next one.
> > > > > >
> > > > > > On Fri, Mar 29, 2024 at 11:07 AM 张铎(Duo Zhang) <
> > > palomino...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Sorry but -1 binding
> > > > > > >
> > > > > > > After checking the compatibility report, I found that we made a
> > > > > > > mistake when implementing HBASE-27657.
> > > > > > >
> > > > > > > The 'createConnection(Configuration conf, ExecutorService pool,
> > > final
> > > > > > > User user)' method was lost... We should keep it and make it
> call
> > > the
> > > > > > > newly introduced 'createConnection(Configuration conf,
> > > > ExecutorService
> > > > > > > pool, final User user, Map
> connectionAttributes)'
> > > > > > > method.
> > > > > > > This is an unnecessary incompatible change, we should apply an
> > > > > > > addendum to add this method back.
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > 张铎(Duo Zhang)  于2024年3月29日周五 22:55写道:
> > > > > > > >
> > > > > > > > OK, I tried several more times, it passed...
> > > > > > > >
> > > > > > > > Bryan Beaudreault  于2024年3月29日周五
> > > 19:21写道:
> > > > > > > >
> > > > > > > > >
> > > > > > > > > TestBucketCachePersister is passing for me locally. I've
> > tried
> > > it
> > > > > a few
> > > > > > > > > times and no issues
> > > > > > > > >
> > > > > > > > > On Thu, Mar 28, 2024 at 9:19 PM 张铎(Duo Zhang) <
> > > > > palomino...@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > TestBucketCachePersister always fail for me locally.
> > > > > > > > > >
> > > > > > > > > > Is this a known issue which is not very critical?
> > > > > > > > > >
> > > > > > > > > > Bryan Beaudreault 
> 于2024年3月28日周四
> > > > > 04:01写道:
> > > > > > > > > > >
> 

Re: [VOTE] The first release candidate for 2.6.0 (RC2) is available

2024-04-02 Thread Bryan Beaudreault
Thanks Wellington. Was there a reason to only merge that to master?

On Tue, Apr 2, 2024 at 5:23 AM Wellington Chevreuil <
wellington.chevre...@gmail.com> wrote:

> Regarding TestBucketCachePersister flakeyness, I have noticed that last
> week and had submitted a fix in
> https://issues.apache.org/jira/browse/HBASE-28458.
>
> Em sex., 29 de mar. de 2024 às 15:41, Bryan Beaudreault <
> bbeaudrea...@apache.org> escreveu:
>
> > Thanks. I pushed the addendum fix to HBASE-27657. I will start another RC
> > on Monday so that people have time to notice any other issues to include.
> >
> > On Fri, Mar 29, 2024 at 11:24 AM 张铎(Duo Zhang) 
> > wrote:
> >
> > > The other parts of the RC are good.
> > >
> > > I ran all the UTs and also started a mini cluster to test basic shell
> > > commands, all good.
> > >
> > > Will vote a +1 after fixing the above compatibility issue.
> > >
> > > Thanks.
> > >
> > > Bryan Beaudreault  于2024年3月29日周五 23:10写道:
> > > >
> > > > Thanks for catching that. I looked at the compatibility report but
> > missed
> > > > that one. I will work on fixing.
> > > >
> > > > If anyone else wants to take a look for other reasons to sink the
> RC, I
> > > > will also tackle those before the next one.
> > > >
> > > > On Fri, Mar 29, 2024 at 11:07 AM 张铎(Duo Zhang) <
> palomino...@gmail.com>
> > > > wrote:
> > > >
> > > > > Sorry but -1 binding
> > > > >
> > > > > After checking the compatibility report, I found that we made a
> > > > > mistake when implementing HBASE-27657.
> > > > >
> > > > > The 'createConnection(Configuration conf, ExecutorService pool,
> final
> > > > > User user)' method was lost... We should keep it and make it call
> the
> > > > > newly introduced 'createConnection(Configuration conf,
> > ExecutorService
> > > > > pool, final User user, Map connectionAttributes)'
> > > > > method.
> > > > > This is an unnecessary incompatible change, we should apply an
> > > > > addendum to add this method back.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > 张铎(Duo Zhang)  于2024年3月29日周五 22:55写道:
> > > > > >
> > > > > > OK, I tried several more times, it passed...
> > > > > >
> > > > > > Bryan Beaudreault  于2024年3月29日周五
> 19:21写道:
> > > > > >
> > > > > > >
> > > > > > > TestBucketCachePersister is passing for me locally. I've tried
> it
> > > a few
> > > > > > > times and no issues
> > > > > > >
> > > > > > > On Thu, Mar 28, 2024 at 9:19 PM 张铎(Duo Zhang) <
> > > palomino...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > TestBucketCachePersister always fail for me locally.
> > > > > > > >
> > > > > > > > Is this a known issue which is not very critical?
> > > > > > > >
> > > > > > > > Bryan Beaudreault  于2024年3月28日周四
> > > 04:01写道:
> > > > > > > > >
> > > > > > > > > Please vote on this Apache hbase release candidate,
> > > > > > > > > hbase-2.6.0RC2
> > > > > > > > >
> > > > > > > > > The VOTE will remain open for at least 72 hours.
> > > > > > > > >
> > > > > > > > > [ ] +1 Release this package as Apache hbase 2.6.0
> > > > > > > > > [ ] -1 Do not release this package because ...
> > > > > > > > >
> > > > > > > > > The tag to be voted on is 2.6.0RC2:
> > > > > > > > >
> > > > > > > > >   https://github.com/apache/hbase/tree/2.6.0RC2
> > > > > > > > >
> > > > > > > > > This tag currently points to git reference
> > > > > > > > >
> > > > > > > > >   413bb6d733f2917ff94ce799306d8ab7c3132373
> > > > > > > > >
> > > > > > > > > The release files, including signatures, digests, as well
> as
> > > > > CHANGES.md
> > > > > > > > > and RELEASENOTES.md included in this RC can be found at:
> > > > > > > > >
> > > > > > > > >   https://dist.apache.org/repos/dist/dev/hbase/2.6.0RC2/
> > > > > > > > >
> > > > > > > > > Maven artifacts are available in a staging repository at:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > >
> > >
> https://repository.apache.org/content/repositories/orgapachehbase-1537/
> > > > > > > > >
> > > > > > > > > Maven artifacts for hadoop3 are available in a staging
> > > repository
> > > > > at:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > >
> > >
> https://repository.apache.org/content/repositories/orgapachehbase-1538/
> > > > > > > > >
> > > > > > > > > Artifacts were signed with the 0x74EFF462 key which can be
> > > found
> > > > > in:
> > > > > > > > >
> > > > > > > > >   https://downloads.apache.org/hbase/KEYS
> > > > > > > > >
> > > > > > > > > To learn more about Apache hbase, please see
> > > > > > > > >
> > > > > > > > >   http://hbase.apache.org/
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Your HBase Release Manager
> > > > > > > >
> > > > >
> > >
> >
>


[DISCUSS] Skip shaded packages in compatibility report?

2024-03-29 Thread Bryan Beaudreault
In reviewing the compatibility report for 2.6.0, I missed a real issue. I
think part of the problem for me was the amount of noise in the report
related to changes in shaded zookeeper packages.

According to the usage report of japi-compliance-checker, we could provide
a "-skip-packages" argument which points at a file with package names to
exclude. I wonder if we should exclude org.apache.hadoop.hbase.shaded
and org.apache.hbase.thirdparty.

Thoughts? I can file a JIRA and get it done if there's consensus.


Re: [VOTE] The first release candidate for 2.6.0 (RC2) is available

2024-03-29 Thread Bryan Beaudreault
Thanks. I pushed the addendum fix to HBASE-27657. I will start another RC
on Monday so that people have time to notice any other issues to include.

On Fri, Mar 29, 2024 at 11:24 AM 张铎(Duo Zhang) 
wrote:

> The other parts of the RC are good.
>
> I ran all the UTs and also started a mini cluster to test basic shell
> commands, all good.
>
> Will vote a +1 after fixing the above compatibility issue.
>
> Thanks.
>
> Bryan Beaudreault  于2024年3月29日周五 23:10写道:
> >
> > Thanks for catching that. I looked at the compatibility report but missed
> > that one. I will work on fixing.
> >
> > If anyone else wants to take a look for other reasons to sink the RC, I
> > will also tackle those before the next one.
> >
> > On Fri, Mar 29, 2024 at 11:07 AM 张铎(Duo Zhang) 
> > wrote:
> >
> > > Sorry but -1 binding
> > >
> > > After checking the compatibility report, I found that we made a
> > > mistake when implementing HBASE-27657.
> > >
> > > The 'createConnection(Configuration conf, ExecutorService pool, final
> > > User user)' method was lost... We should keep it and make it call the
> > > newly introduced 'createConnection(Configuration conf, ExecutorService
> > > pool, final User user, Map connectionAttributes)'
> > > method.
> > > This is an unnecessary incompatible change, we should apply an
> > > addendum to add this method back.
> > >
> > > Thanks.
> > >
> > > 张铎(Duo Zhang)  于2024年3月29日周五 22:55写道:
> > > >
> > > > OK, I tried several more times, it passed...
> > > >
> > > > Bryan Beaudreault  于2024年3月29日周五 19:21写道:
> > > >
> > > > >
> > > > > TestBucketCachePersister is passing for me locally. I've tried it
> a few
> > > > > times and no issues
> > > > >
> > > > > On Thu, Mar 28, 2024 at 9:19 PM 张铎(Duo Zhang) <
> palomino...@gmail.com>
> > > wrote:
> > > > >
> > > > > > TestBucketCachePersister always fail for me locally.
> > > > > >
> > > > > > Is this a known issue which is not very critical?
> > > > > >
> > > > > > Bryan Beaudreault  于2024年3月28日周四
> 04:01写道:
> > > > > > >
> > > > > > > Please vote on this Apache hbase release candidate,
> > > > > > > hbase-2.6.0RC2
> > > > > > >
> > > > > > > The VOTE will remain open for at least 72 hours.
> > > > > > >
> > > > > > > [ ] +1 Release this package as Apache hbase 2.6.0
> > > > > > > [ ] -1 Do not release this package because ...
> > > > > > >
> > > > > > > The tag to be voted on is 2.6.0RC2:
> > > > > > >
> > > > > > >   https://github.com/apache/hbase/tree/2.6.0RC2
> > > > > > >
> > > > > > > This tag currently points to git reference
> > > > > > >
> > > > > > >   413bb6d733f2917ff94ce799306d8ab7c3132373
> > > > > > >
> > > > > > > The release files, including signatures, digests, as well as
> > > CHANGES.md
> > > > > > > and RELEASENOTES.md included in this RC can be found at:
> > > > > > >
> > > > > > >   https://dist.apache.org/repos/dist/dev/hbase/2.6.0RC2/
> > > > > > >
> > > > > > > Maven artifacts are available in a staging repository at:
> > > > > > >
> > > > > > >
> > > > > >
> > >
> https://repository.apache.org/content/repositories/orgapachehbase-1537/
> > > > > > >
> > > > > > > Maven artifacts for hadoop3 are available in a staging
> repository
> > > at:
> > > > > > >
> > > > > > >
> > > > > >
> > >
> https://repository.apache.org/content/repositories/orgapachehbase-1538/
> > > > > > >
> > > > > > > Artifacts were signed with the 0x74EFF462 key which can be
> found
> > > in:
> > > > > > >
> > > > > > >   https://downloads.apache.org/hbase/KEYS
> > > > > > >
> > > > > > > To learn more about Apache hbase, please see
> > > > > > >
> > > > > > >   http://hbase.apache.org/
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Your HBase Release Manager
> > > > > >
> > >
>


[jira] [Resolved] (HBASE-27657) Connection and Request Attributes

2024-03-29 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-27657.
---
Resolution: Fixed

Addendum committed to branch-2 and branch-2.6. The problem did not exist on 
master/branch-3.

> Connection and Request Attributes
> -
>
> Key: HBASE-27657
> URL: https://issues.apache.org/jira/browse/HBASE-27657
> Project: HBase
>  Issue Type: New Feature
>    Reporter: Bryan Beaudreault
>    Assignee: Bryan Beaudreault
>Priority: Major
> Fix For: 2.6.0, 3.0.0-beta-1
>
>
> Currently we have the ability to set Operation attributes, via 
> Get.setAttribute, etc. It would be useful to be able to set attributes at the 
> request and connection level.
> These levels can result in less duplication. For example, send some 
> attributes once per connection instead of for every one of the millions of 
> requests a connection might send. Or send once for the request, instead of 
> duplicating on every operation in a multi request.
> Additionally, the Connection and RequestHeader are more globally available on 
> the server side. Both can be accessed via RpcServer.getCurrentCall(), which 
> is useful in various integration points – coprocessors, custom queues, 
> quotas, slow log, etc. Operation attributes are harder to access because you 
> need to parse the raw Message into the appropriate type to get access to the 
> getter.
> I was thinking adding two new methods to Connection interface:
> - setAttribute (and getAttribute/getAttributes)
> - setRequestAttributeProvider
> Any Connection attributes would be set onto the ConnectionHeader during 
> initialization. The RequestAttributeProvider would be called when creating 
> each RequestHeader.
> An alternative to setRequestAttributeProvider would be to add this into 
> HBaseRpcController, which can already be customized via site configuration. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-27657) Connection and Request Attributes

2024-03-29 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault reopened HBASE-27657:
---
  Assignee: Bryan Beaudreault  (was: Ray Mattingly)

Reopening for addendum. We accidentally dropped the following method from 
ConnectionFactory:
{code:java}
ConnectionFactory.createConnection ( Configuration conf, ExecutorService pool, 
User user ) [static]  :  Connection {code}

> Connection and Request Attributes
> -
>
> Key: HBASE-27657
> URL: https://issues.apache.org/jira/browse/HBASE-27657
> Project: HBase
>  Issue Type: New Feature
>    Reporter: Bryan Beaudreault
>    Assignee: Bryan Beaudreault
>Priority: Major
> Fix For: 2.6.0, 3.0.0-beta-1
>
>
> Currently we have the ability to set Operation attributes, via 
> Get.setAttribute, etc. It would be useful to be able to set attributes at the 
> request and connection level.
> These levels can result in less duplication. For example, send some 
> attributes once per connection instead of for every one of the millions of 
> requests a connection might send. Or send once for the request, instead of 
> duplicating on every operation in a multi request.
> Additionally, the Connection and RequestHeader are more globally available on 
> the server side. Both can be accessed via RpcServer.getCurrentCall(), which 
> is useful in various integration points – coprocessors, custom queues, 
> quotas, slow log, etc. Operation attributes are harder to access because you 
> need to parse the raw Message into the appropriate type to get access to the 
> getter.
> I was thinking adding two new methods to Connection interface:
> - setAttribute (and getAttribute/getAttributes)
> - setRequestAttributeProvider
> Any Connection attributes would be set onto the ConnectionHeader during 
> initialization. The RequestAttributeProvider would be called when creating 
> each RequestHeader.
> An alternative to setRequestAttributeProvider would be to add this into 
> HBaseRpcController, which can already be customized via site configuration. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] The first release candidate for 2.6.0 (RC2) is available

2024-03-29 Thread Bryan Beaudreault
Thanks for catching that. I looked at the compatibility report but missed
that one. I will work on fixing.

If anyone else wants to take a look for other reasons to sink the RC, I
will also tackle those before the next one.

On Fri, Mar 29, 2024 at 11:07 AM 张铎(Duo Zhang) 
wrote:

> Sorry but -1 binding
>
> After checking the compatibility report, I found that we made a
> mistake when implementing HBASE-27657.
>
> The 'createConnection(Configuration conf, ExecutorService pool, final
> User user)' method was lost... We should keep it and make it call the
> newly introduced 'createConnection(Configuration conf, ExecutorService
> pool, final User user, Map connectionAttributes)'
> method.
> This is an unnecessary incompatible change, we should apply an
> addendum to add this method back.
>
> Thanks.
>
> 张铎(Duo Zhang)  于2024年3月29日周五 22:55写道:
> >
> > OK, I tried several more times, it passed...
> >
> > Bryan Beaudreault  于2024年3月29日周五 19:21写道:
> >
> > >
> > > TestBucketCachePersister is passing for me locally. I've tried it a few
> > > times and no issues
> > >
> > > On Thu, Mar 28, 2024 at 9:19 PM 张铎(Duo Zhang) 
> wrote:
> > >
> > > > TestBucketCachePersister always fail for me locally.
> > > >
> > > > Is this a known issue which is not very critical?
> > > >
> > > > Bryan Beaudreault  于2024年3月28日周四 04:01写道:
> > > > >
> > > > > Please vote on this Apache hbase release candidate,
> > > > > hbase-2.6.0RC2
> > > > >
> > > > > The VOTE will remain open for at least 72 hours.
> > > > >
> > > > > [ ] +1 Release this package as Apache hbase 2.6.0
> > > > > [ ] -1 Do not release this package because ...
> > > > >
> > > > > The tag to be voted on is 2.6.0RC2:
> > > > >
> > > > >   https://github.com/apache/hbase/tree/2.6.0RC2
> > > > >
> > > > > This tag currently points to git reference
> > > > >
> > > > >   413bb6d733f2917ff94ce799306d8ab7c3132373
> > > > >
> > > > > The release files, including signatures, digests, as well as
> CHANGES.md
> > > > > and RELEASENOTES.md included in this RC can be found at:
> > > > >
> > > > >   https://dist.apache.org/repos/dist/dev/hbase/2.6.0RC2/
> > > > >
> > > > > Maven artifacts are available in a staging repository at:
> > > > >
> > > > >
> > > >
> https://repository.apache.org/content/repositories/orgapachehbase-1537/
> > > > >
> > > > > Maven artifacts for hadoop3 are available in a staging repository
> at:
> > > > >
> > > > >
> > > >
> https://repository.apache.org/content/repositories/orgapachehbase-1538/
> > > > >
> > > > > Artifacts were signed with the 0x74EFF462 key which can be found
> in:
> > > > >
> > > > >   https://downloads.apache.org/hbase/KEYS
> > > > >
> > > > > To learn more about Apache hbase, please see
> > > > >
> > > > >   http://hbase.apache.org/
> > > > >
> > > > > Thanks,
> > > > > Your HBase Release Manager
> > > >
>


Re: [VOTE] The first release candidate for 2.6.0 (RC2) is available

2024-03-29 Thread Bryan Beaudreault
Given the flakiness, seems like something we should fix, but likely not a
blocker here. Can you file a jira with the error you were seeing? I still
haven't seen it fail

On Fri, Mar 29, 2024 at 10:56 AM 张铎(Duo Zhang) 
wrote:

> OK, I tried several more times, it passed...
>
> Bryan Beaudreault  于2024年3月29日周五 19:21写道:
>
> >
> > TestBucketCachePersister is passing for me locally. I've tried it a few
> > times and no issues
> >
> > On Thu, Mar 28, 2024 at 9:19 PM 张铎(Duo Zhang) 
> wrote:
> >
> > > TestBucketCachePersister always fail for me locally.
> > >
> > > Is this a known issue which is not very critical?
> > >
> > > Bryan Beaudreault  于2024年3月28日周四 04:01写道:
> > > >
> > > > Please vote on this Apache hbase release candidate,
> > > > hbase-2.6.0RC2
> > > >
> > > > The VOTE will remain open for at least 72 hours.
> > > >
> > > > [ ] +1 Release this package as Apache hbase 2.6.0
> > > > [ ] -1 Do not release this package because ...
> > > >
> > > > The tag to be voted on is 2.6.0RC2:
> > > >
> > > >   https://github.com/apache/hbase/tree/2.6.0RC2
> > > >
> > > > This tag currently points to git reference
> > > >
> > > >   413bb6d733f2917ff94ce799306d8ab7c3132373
> > > >
> > > > The release files, including signatures, digests, as well as
> CHANGES.md
> > > > and RELEASENOTES.md included in this RC can be found at:
> > > >
> > > >   https://dist.apache.org/repos/dist/dev/hbase/2.6.0RC2/
> > > >
> > > > Maven artifacts are available in a staging repository at:
> > > >
> > > >
> > >
> https://repository.apache.org/content/repositories/orgapachehbase-1537/
> > > >
> > > > Maven artifacts for hadoop3 are available in a staging repository at:
> > > >
> > > >
> > >
> https://repository.apache.org/content/repositories/orgapachehbase-1538/
> > > >
> > > > Artifacts were signed with the 0x74EFF462 key which can be found in:
> > > >
> > > >   https://downloads.apache.org/hbase/KEYS
> > > >
> > > > To learn more about Apache hbase, please see
> > > >
> > > >   http://hbase.apache.org/
> > > >
> > > > Thanks,
> > > > Your HBase Release Manager
> > >
>


Re: [VOTE] The first release candidate for 2.6.0 (RC2) is available

2024-03-29 Thread Bryan Beaudreault
TestBucketCachePersister is passing for me locally. I've tried it a few
times and no issues

On Thu, Mar 28, 2024 at 9:19 PM 张铎(Duo Zhang)  wrote:

> TestBucketCachePersister always fail for me locally.
>
> Is this a known issue which is not very critical?
>
> Bryan Beaudreault  于2024年3月28日周四 04:01写道:
> >
> > Please vote on this Apache hbase release candidate,
> > hbase-2.6.0RC2
> >
> > The VOTE will remain open for at least 72 hours.
> >
> > [ ] +1 Release this package as Apache hbase 2.6.0
> > [ ] -1 Do not release this package because ...
> >
> > The tag to be voted on is 2.6.0RC2:
> >
> >   https://github.com/apache/hbase/tree/2.6.0RC2
> >
> > This tag currently points to git reference
> >
> >   413bb6d733f2917ff94ce799306d8ab7c3132373
> >
> > The release files, including signatures, digests, as well as CHANGES.md
> > and RELEASENOTES.md included in this RC can be found at:
> >
> >   https://dist.apache.org/repos/dist/dev/hbase/2.6.0RC2/
> >
> > Maven artifacts are available in a staging repository at:
> >
> >
> https://repository.apache.org/content/repositories/orgapachehbase-1537/
> >
> > Maven artifacts for hadoop3 are available in a staging repository at:
> >
> >
> https://repository.apache.org/content/repositories/orgapachehbase-1538/
> >
> > Artifacts were signed with the 0x74EFF462 key which can be found in:
> >
> >   https://downloads.apache.org/hbase/KEYS
> >
> > To learn more about Apache hbase, please see
> >
> >   http://hbase.apache.org/
> >
> > Thanks,
> > Your HBase Release Manager
>


[jira] [Created] (HBASE-28462) Incremental backup can fail if log gets archived while WALPlayer is starting up

2024-03-27 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28462:
-

 Summary: Incremental backup can fail if log gets archived while 
WALPlayer is starting up
 Key: HBASE-28462
 URL: https://issues.apache.org/jira/browse/HBASE-28462
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault


We had incremental backup fail with FileNotFoundException for a file in the 
WALs directory. Upon investigation, the log had been archived a few mins 
earlier. WALInputFormat's record reader has support for falling back on an 
archived path:
{code:java}
} catch (IOException e) {
  Path archivedLog = AbstractFSWALProvider.findArchivedLog(logFile, conf);
  // archivedLog can be null if unable to locate in archiveDir.
  if (archivedLog != null) {
openReader(archivedLog);
// Try call again in recursion
return nextKeyValue();
  } else {
throw e;
  }
} {code}
But the getSplits method has different handling:
{code:java}
try {
  List files = getFiles(fs, inputPath, startTime, endTime);
  allFiles.addAll(files);
} catch (FileNotFoundException e) {
  if (ignoreMissing) {
LOG.warn("File " + inputPath + " is missing. Skipping it.");
continue;
  }
  throw e;
} {code}
This ignoreMissing variable was added in HBASE-14141 and is enabled via 
wal.input.ignore.missing.files which is defaulted to false and never set. 
Looking at the comment and reviewboard history of HBASE-14141 I think there 
might have been some confusion about where to handle these missing files, and 
this got lost in the shuffle.
 
I would prefer not to ignore missing hfiles. I think that could result in some 
weird behavior: * RegionServer has 10 archived and 30 not-yet-archived WALs 
needing to be backed up
 * The process starts, and while it's running 1 of those 30 WALs gets archived. 
That would get skipped due to FileNotFoundException
 * But the remaining 29 would be backed up

This scenario could cause some data consistency issues if this incremental 
backup is restored. We missed some edits in the middle of applied edits from 
other WALs.

So I do think failing as we do today is necessary for consistency, but 
unrealistic in a live cluster. The solution is to try finding the missing file 
in the archived directory. Backups has a coprocessor which will not allow the 
archived file to be cleaned up until it's backed up, so I think it's safe to 
say that a WAL is either definitely in WALs or oldWALs.
 *  

- 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] The first release candidate for 2.6.0 (RC2) is available

2024-03-27 Thread Bryan Beaudreault
Please vote on this Apache hbase release candidate,
hbase-2.6.0RC2

The VOTE will remain open for at least 72 hours.

[ ] +1 Release this package as Apache hbase 2.6.0
[ ] -1 Do not release this package because ...

The tag to be voted on is 2.6.0RC2:

  https://github.com/apache/hbase/tree/2.6.0RC2

This tag currently points to git reference

  413bb6d733f2917ff94ce799306d8ab7c3132373

The release files, including signatures, digests, as well as CHANGES.md
and RELEASENOTES.md included in this RC can be found at:

  https://dist.apache.org/repos/dist/dev/hbase/2.6.0RC2/

Maven artifacts are available in a staging repository at:

  https://repository.apache.org/content/repositories/orgapachehbase-1537/

Maven artifacts for hadoop3 are available in a staging repository at:

  https://repository.apache.org/content/repositories/orgapachehbase-1538/

Artifacts were signed with the 0x74EFF462 key which can be found in:

  https://downloads.apache.org/hbase/KEYS

To learn more about Apache hbase, please see

  http://hbase.apache.org/

Thanks,
Your HBase Release Manager


[jira] [Created] (HBASE-28459) HFileOutputFormat2 ClassCastException with s3 magic committer

2024-03-27 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28459:
-

 Summary: HFileOutputFormat2 ClassCastException with s3 magic 
committer
 Key: HBASE-28459
 URL: https://issues.apache.org/jira/browse/HBASE-28459
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault


In hadoop3 there's the s3 magic committer which can speed up s3 writes 
dramatically. In HFileOutputFormat2.createRecordWriter we cast the passed in 
committer as a FileOutputCommitter. This causes a class cast exception when the 
s3 magic committer is enabled:
Error: java.lang.ClassCastException: class 
org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitter cannot be cast to 
class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
We can cast to PathOutputCommitter instead, but its only available in hadoop3+. 
So we will need to use reflection to work around this in branch-2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28412) Restoring incremental backups to mapped table requires existence of original table

2024-03-26 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28412.
---
Fix Version/s: 2.6.0
   3.0.0-beta-2
   Resolution: Fixed

Pushed to branch-2.6+. Thanks [~rubenvw] for the contribution!

I also added you and [~dieterdp_ng] as contributors to the project so that you 
can be assigned jiras.

> Restoring incremental backups to mapped table requires existence of original 
> table
> --
>
> Key: HBASE-28412
> URL: https://issues.apache.org/jira/browse/HBASE-28412
> Project: HBase
>  Issue Type: Bug
>  Components: backuprestore
>Reporter: Dieter De Paepe
>Assignee: Ruben Van Wanzeele
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> It appears that restoring a non-existing table from an incremental backup 
> with the "-m" parameter results in an error in the restore client.
> Reproduction steps:
> Build & start hbase:
> {code:java}
> mvn clean install -Phadoop-3.0 -DskipTests
> bin/start-hbase.sh{code}
> In HBase shell: create table and some values:
> {code:java}
> create 'test', 'cf'
> put 'test', 'row1', 'cf:a', 'value1'
> put 'test', 'row2', 'cf:b', 'value2'
> put 'test', 'row3', 'cf:c', 'value3'
> scan 'test' {code}
> Create a full backup:
> {code:java}
> bin/hbase backup create full file:/tmp/hbase-backup{code}
> Adjust some data through HBase shell:
> {code:java}
> put 'test', 'row1', 'cf:a', 'value1-new'
> scan 'test' {code}
> Create an incremental backup:
> {code:java}
> bin/hbase backup create incremental file:/tmp/hbase-backup {code}
> Delete the original table in HBase shell:
> {code:java}
> disable 'test'
> drop 'test' {code}
> Restore the incremental backup under a new table name:
> {code:java}
> bin/hbase backup history
> bin/hbase restore file:/tmp/hbase-backup  -t "test" -m 
> "test-restored" {code}
> This results in the following output / error:
> {code:java}
> ...
> 2024-03-25T13:38:53,062 WARN  [main {}] util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 2024-03-25T13:38:53,174 INFO  [main {}] Configuration.deprecation: 
> hbase.client.pause.cqtbe is deprecated. Instead, use 
> hbase.client.pause.server.overloaded
> 2024-03-25T13:38:53,554 INFO  [main {}] impl.RestoreTablesClient: HBase table 
> test-restored does not exist. It will be created during restore process
> 2024-03-25T13:38:53,593 INFO  [main {}] impl.RestoreTablesClient: Restoring 
> 'test' to 'test-restored' from full backup image 
> file:/tmp/hbase-backup/backup_1711370230143/default/test
> 2024-03-25T13:38:53,707 INFO  [main {}] util.BackupUtils: Creating target 
> table 'test-restored'
> 2024-03-25T13:38:54,546 INFO  [main {}] mapreduce.MapReduceRestoreJob: 
> Restore test into test-restored
> 2024-03-25T13:38:54,646 INFO  [main {}] mapreduce.HFileOutputFormat2: 
> bulkload locality sensitive enabled
> 2024-03-25T13:38:54,647 INFO  [main {}] mapreduce.HFileOutputFormat2: Looking 
> up current regions for table test-restored
> 2024-03-25T13:38:54,669 INFO  [main {}] mapreduce.HFileOutputFormat2: 
> Configuring 1 reduce partitions to match current region count for all tables
> 2024-03-25T13:38:54,669 INFO  [main {}] mapreduce.HFileOutputFormat2: Writing 
> partition information to 
> file:/tmp/hbase-tmp/partitions_0667b6e2-79ef-4cfe-97e1-abb204ee420d
> 2024-03-25T13:38:54,687 INFO  [main {}] compress.CodecPool: Got brand-new 
> compressor [.deflate]
> 2024-03-25T13:38:54,713 INFO  [main {}] mapreduce.HFileOutputFormat2: 
> Incremental output configured for tables: test-restored
> 2024-03-25T13:38:54,715 WARN  [main {}] mapreduce.TableMapReduceUtil: The 
> addDependencyJars(Configuration, Class...) method has been deprecated 
> since it is easy to use incorrectly. Most users should rely on 
> addDependencyJars(Job) instead. See HBASE-8386 for more details.
> 2024-03-25T13:38:54,742 WARN  [main {}] impl.MetricsConfig: Cannot locate 
> configuration: tried 
> hadoop-metrics2-jobtracker.properties,hadoop-metrics2.properties
> 2024-03-25T13:38:54,834 INFO  [main {}] input.FileInputFormat: Total input 
> files to process : 1
> 2024-03-25T13:38:54,853 INFO  [main {}] mapreduce.JobSubmitter: number of 
> splits:1
> 2024-03-25T13:38:54,964 INFO  [main {}] mapreduce.JobSubmitter: Submitting 
> tokens for job: job_local748155768_0001
> 2024-03-25

Re: Aiming for 2.6.0RC0 tomorrow

2024-03-26 Thread Bryan Beaudreault
As another update, that blocker has been fixed. I'm going to start creating
the next RC tomorrow morning Eastern Time.

On Mon, Mar 25, 2024 at 2:20 PM Bryan Beaudreault 
wrote:

> Over the weekend an issue[1] was submitted for Backup & Restore. I think
> it's potentially bad enough that it should be a blocker for this release. I
> already have a solution for the problem, so it shouldn't delay us much. I'm
> working on turning it into a PR now.
>
> [1] https://issues.apache.org/jira/browse/HBASE-28456
>
> On Sat, Mar 23, 2024 at 5:27 PM Bryan Beaudreault 
> wrote:
>
>> The INFRA ticket has been resolved, and I confirmed that I can now commit
>> the tarballs.
>>
>> Unfortunately, I realized that I used the wrong gpg key, so I need to
>> create a new RC. I'll kick that off tomorrow morning.
>>
>> On Fri, Mar 22, 2024 at 3:27 PM Bryan Beaudreault <
>> bbeaudrea...@apache.org> wrote:
>>
>>> Unfortunately, it still failed. This time it actually failed
>>> on hbase-2.6.0-hadoop3-client-bin.tar.gz, which is only 344MB. I tried it a
>>> few times and it swaps between that one and hbase-2.6.0-hadoop3-bin.tar.gz
>>> so maybe it's non-deterministically ordered.
>>>
>>> I wonder if a limit was just recently introduced. I'm still waiting on a
>>> response to my INFRA-25634 ticket.
>>>
>>> On Fri, Mar 22, 2024 at 12:59 PM Andrew Purtell 
>>> wrote:
>>>
>>>> > If something needs to be removed, I propose the full fat (
>>>> > *hbase-shaded-client*) shaded client JAR.
>>>> > That is never returned by the hbase command AFAIK, and is also the
>>>> largest
>>>> > in size.
>>>>
>>>> Sounds good, if removing examples is insufficient, the limit cannot be
>>>> increased, and some other step need be taken.
>>>>
>>>>
>>>> On Thu, Mar 21, 2024 at 10:40 PM Istvan Toth >>> >
>>>> wrote:
>>>>
>>>> > The *hbase classpath* and *hbase mapredcp* command outputs do include
>>>> the
>>>> > respective  *hbase-shaded-client-byo-hadoop* and
>>>> *hbase-shaded-mapreduce*
>>>> >  jars.
>>>> >
>>>> > At least the 'hbase mapredcp' jars are used by both Spark and Hive
>>>> > integration, and expected to be available on the node filesystem.
>>>> > We also plan to switch the Phoenix connectors to that.
>>>> >
>>>> > Having those two jars in a separate assembly would require further
>>>> > configuration when installing HBase to tell it
>>>> > where to find them, so that the classpath commands can include them.
>>>> >
>>>> > If something needs to be removed, I propose the full fat (
>>>> > *hbase-shaded-client*) shaded client JAR.
>>>> > That is never returned by the hbase command AFAIK, and is also the
>>>> largest
>>>> > in size.
>>>> > (I plan to remove that one from the upcoming Hadoop-less assembly as
>>>> well)
>>>> >
>>>> > Istvan
>>>> >
>>>> > On Fri, Mar 22, 2024 at 4:55 AM 张铎(Duo Zhang) 
>>>> > wrote:
>>>> >
>>>> > > Tested locally, after removing hbase-example from tarball, the
>>>> hadoop3
>>>> > > tarball is about 351MB.
>>>> > >
>>>> > > So you could try to include this commit to publish again, to see if
>>>> this
>>>> > > helps.
>>>> > >
>>>> > > Thanks.
>>>> > >
>>>> > > 张铎(Duo Zhang)  于2024年3月22日周五 09:18写道:
>>>> > > >
>>>> > > > If we exclude hbase-example from the binaries, will it be smaller
>>>> > enough
>>>> > > to fit?
>>>> > > >
>>>> > > > We already commit the changes to master I believe. Let me see if
>>>> we
>>>> > > > can cherry-pick them and commit to branch-2.6 as well.
>>>> > > >
>>>> > > > Thanks.
>>>> > > >
>>>> > > > Bryan Beaudreault  于2024年3月22日周五
>>>> 07:35写道:
>>>> > > > >
>>>> > > > > Thanks, I filed
>>>> > > > > https://issues.apache.org/jira/browse/INFRA-25634
>>>> > > > >
>>>> > > > > On Thu, Mar 21, 2024 at 5

[jira] [Resolved] (HBASE-28456) HBase Restore restores old data if data for the same timestamp is in different hfiles

2024-03-26 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28456.
---
Fix Version/s: 2.6.0
   3.0.0-beta-2
   Resolution: Fixed

> HBase Restore restores old data if data for the same timestamp is in 
> different hfiles
> -
>
> Key: HBASE-28456
> URL: https://issues.apache.org/jira/browse/HBASE-28456
> Project: HBase
>  Issue Type: Bug
>  Components: backuprestore
>Affects Versions: 2.6.0, 3.0.0
>Reporter: Ruben Van Wanzeele
>    Assignee: Bryan Beaudreault
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
> Attachments: 
> Add_incremental_test_for_HBASE-28456_Fix_HBASE-28412_for_incremental_test.patch,
>  ChangesOnHFilesOnSameTimestampAreNotCorrectlyRestored.java
>
>
> The restore brings back 'old' data when executing restore.
> It feels like the hfile sequence id is not respected during the restore.
> See testing code attached. The workaround solution is to trigger major 
> compaction before doing the backup (not really feasible for daily backups)
> We didn't investigate this yet, but this might also impact the merge of 
> multiple incremental backups (since that follows a similar code path merging 
> hfiles).
> This currently blocks our support for HBase backup and restore.
> Willing to participate in a solution if necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28449) Fix BackupSystemTable Scans

2024-03-25 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28449.
---
Fix Version/s: 2.6.0
   3.0.0-beta-2
   Resolution: Fixed

Pushed to branch-2.6+. Thanks [~baugenreich]!

> Fix BackupSystemTable Scans 
> 
>
> Key: HBASE-28449
> URL: https://issues.apache.org/jira/browse/HBASE-28449
> Project: HBase
>  Issue Type: Bug
>Reporter: Briana Augenreich
>Assignee: Briana Augenreich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> When calculating which WALs should be included in an incremental backup the 
> backup system does a prefix scan for the last roll log timestamp. This uses 
> the backup root in the prefix (.) If you happen have 
> multiple backup roots where one is a root of the other you'll get inaccurate 
> results. 
>  
> Since the rowkey is  let's modify 
> the prefix scan to be .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28453) Support a middle ground between the Average and Fixed interval rate limiters

2024-03-25 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28453.
---
Fix Version/s: 2.6.0
   3.0.0-beta-2
 Release Note: FixedIntervalRateLimiter now supports a custom refill 
interval via hbase.quota.rate.limiter.refill.interval.ms. Users of quotas may 
wish to change hbase.quota.rate.limiter to FixedIntervalRateLimiter and 
customize this new setting. It will likely lead to healthier backoffs for 
clients and more full quota utilization.
   Resolution: Fixed

Pushed to branch-2.6+. Thanks [~rmdmattingly] !

> Support a middle ground between the Average and Fixed interval rate limiters
> 
>
> Key: HBASE-28453
> URL: https://issues.apache.org/jira/browse/HBASE-28453
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Ray Mattingly
>Assignee: Ray Mattingly
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
> Attachments: Screenshot 2024-03-21 at 2.08.51 PM.png, Screenshot 
> 2024-03-21 at 2.30.01 PM.png
>
>
> h3. Background
> HBase quotas support two rate limiters: a "fixed" and an "average" interval 
> rate limiter.
> h4. FixedIntervalRateLimiter
> The fixed interval rate limiter is simpler: it has a TimeUnit, say 1 second, 
> and it refills a resource allotment on the recurring interval. So you may get 
> 10 resources every second, and if you exhaust all 10 resources in the first 
> millisecond of an interval then you will need to wait 999ms to acquire even 1 
> more resource.
> h4. AverageIntervalRateLimiter
> The average interval rate limiter, HBase's default, allows for more flexibly 
> timed refilling of the resource allotment. Extending our previous example, 
> say you have a 10 reads/sec quota and you have exhausted all 10 resources 
> within 1ms of the last full refill. If you request 1 more read then, rather 
> than returning a 999ms wait interval indicating the next full refill time, 
> the rate limiter will recognize that you only need to wait 99ms before 1 read 
> can be available. After 100ms has passed in aggregate since the last full 
> refill, it will support the refilling of 1/10th the limit to facilitate the 
> request for 1/10th the resources.
> h3. The Problems with Current RateLimiters
> The problem with the fixed interval rate limiter is that it is too strict 
> from a latency perspective. It results in quota limits to which we cannot 
> fully subscribe with any consistency.
> The problem with the average interval rate limiter is that, in practice, it 
> is far too optimistic. For example, a real rate limiter might limit to 
> 100MB/sec of read IO per machine. Any multigets that come in will require 
> only a tiny fraction of this limit; for example, a 64kb block is only 0.06% 
> of the total. As a result, the vast majority of wait intervals end up being 
> tiny — like <5ms. This can actually cause an inverse of your intention, where 
> setting up a throttle causes a DDOS of your RPC layer via continuous 
> throttling and ~immediate retrying. I've discussed this problem in 
> https://issues.apache.org/jira/browse/HBASE-28429 and proposed a minimum wait 
> interval as the solution there; after some more thinking, I believe this new 
> rate limiter would be a less hacky solution to this deficit so I'd like to 
> close that Jira in favor of this one.
> See the attached chart where I put in place a 10k req/sec/machine throttle 
> for this user at 10:43 to try to curb this high traffic, and it resulted in a 
> huge spike of req/sec due to the throttle/retry loop created by the 
> AverageIntervalRateLimiter.
> h3. Original Proposal: PartialIntervalRateLimiter as a Solution
> I've implemented a RateLimiter which allows for partial chunks of the overall 
> interval to be refilled, by default these chunks are 10% (or 100ms of a 1s 
> interval). I've deployed this to a test cluster at my day job and have seen 
> this really help our ability to full subscribe to a quota limit without 
> executing superfluous retries. See the other attached chart which shows a 
> cluster undergoing a rolling restart from using FixedIntervalRateLimiter to 
> my new PartialIntervalRateLimiter and how it is then able to fully subscribe 
> to its allotted 25MB/sec/machine read IO quota.
> h3. Updated Proposal: Improving FixedIntervalRateLimiter
> Rather than implement a new rate limiter, we can make a lower touch change 
> which just adds support for a refill interva

Re: Aiming for 2.6.0RC0 tomorrow

2024-03-25 Thread Bryan Beaudreault
Over the weekend an issue[1] was submitted for Backup & Restore. I think
it's potentially bad enough that it should be a blocker for this release. I
already have a solution for the problem, so it shouldn't delay us much. I'm
working on turning it into a PR now.

[1] https://issues.apache.org/jira/browse/HBASE-28456

On Sat, Mar 23, 2024 at 5:27 PM Bryan Beaudreault 
wrote:

> The INFRA ticket has been resolved, and I confirmed that I can now commit
> the tarballs.
>
> Unfortunately, I realized that I used the wrong gpg key, so I need to
> create a new RC. I'll kick that off tomorrow morning.
>
> On Fri, Mar 22, 2024 at 3:27 PM Bryan Beaudreault 
> wrote:
>
>> Unfortunately, it still failed. This time it actually failed
>> on hbase-2.6.0-hadoop3-client-bin.tar.gz, which is only 344MB. I tried it a
>> few times and it swaps between that one and hbase-2.6.0-hadoop3-bin.tar.gz
>> so maybe it's non-deterministically ordered.
>>
>> I wonder if a limit was just recently introduced. I'm still waiting on a
>> response to my INFRA-25634 ticket.
>>
>> On Fri, Mar 22, 2024 at 12:59 PM Andrew Purtell 
>> wrote:
>>
>>> > If something needs to be removed, I propose the full fat (
>>> > *hbase-shaded-client*) shaded client JAR.
>>> > That is never returned by the hbase command AFAIK, and is also the
>>> largest
>>> > in size.
>>>
>>> Sounds good, if removing examples is insufficient, the limit cannot be
>>> increased, and some other step need be taken.
>>>
>>>
>>> On Thu, Mar 21, 2024 at 10:40 PM Istvan Toth >> >
>>> wrote:
>>>
>>> > The *hbase classpath* and *hbase mapredcp* command outputs do include
>>> the
>>> > respective  *hbase-shaded-client-byo-hadoop* and
>>> *hbase-shaded-mapreduce*
>>> >  jars.
>>> >
>>> > At least the 'hbase mapredcp' jars are used by both Spark and Hive
>>> > integration, and expected to be available on the node filesystem.
>>> > We also plan to switch the Phoenix connectors to that.
>>> >
>>> > Having those two jars in a separate assembly would require further
>>> > configuration when installing HBase to tell it
>>> > where to find them, so that the classpath commands can include them.
>>> >
>>> > If something needs to be removed, I propose the full fat (
>>> > *hbase-shaded-client*) shaded client JAR.
>>> > That is never returned by the hbase command AFAIK, and is also the
>>> largest
>>> > in size.
>>> > (I plan to remove that one from the upcoming Hadoop-less assembly as
>>> well)
>>> >
>>> > Istvan
>>> >
>>> > On Fri, Mar 22, 2024 at 4:55 AM 张铎(Duo Zhang) 
>>> > wrote:
>>> >
>>> > > Tested locally, after removing hbase-example from tarball, the
>>> hadoop3
>>> > > tarball is about 351MB.
>>> > >
>>> > > So you could try to include this commit to publish again, to see if
>>> this
>>> > > helps.
>>> > >
>>> > > Thanks.
>>> > >
>>> > > 张铎(Duo Zhang)  于2024年3月22日周五 09:18写道:
>>> > > >
>>> > > > If we exclude hbase-example from the binaries, will it be smaller
>>> > enough
>>> > > to fit?
>>> > > >
>>> > > > We already commit the changes to master I believe. Let me see if we
>>> > > > can cherry-pick them and commit to branch-2.6 as well.
>>> > > >
>>> > > > Thanks.
>>> > > >
>>> > > > Bryan Beaudreault  于2024年3月22日周五 07:35写道:
>>> > > > >
>>> > > > > Thanks, I filed
>>> > > > > https://issues.apache.org/jira/browse/INFRA-25634
>>> > > > >
>>> > > > > On Thu, Mar 21, 2024 at 5:46 PM Andrew Purtell <
>>> apurt...@apache.org>
>>> > > wrote:
>>> > > > >
>>> > > > > > The hadoop3 bin tarball for 2.5.8 is 352.8MB. Perhaps we have
>>> just
>>> > > barely
>>> > > > > > and recently crossed a threshold. File an INFRA JIRA and ask
>>> about
>>> > > it.
>>> > > > > > Perhaps some limit can be increased, or maybe they will ask us
>>> to
>>> > > live
>>> > > > > > within it.
>>> > > > > >
&g

Re: Aiming for 2.6.0RC0 tomorrow

2024-03-23 Thread Bryan Beaudreault
The INFRA ticket has been resolved, and I confirmed that I can now commit
the tarballs.

Unfortunately, I realized that I used the wrong gpg key, so I need to
create a new RC. I'll kick that off tomorrow morning.

On Fri, Mar 22, 2024 at 3:27 PM Bryan Beaudreault 
wrote:

> Unfortunately, it still failed. This time it actually failed
> on hbase-2.6.0-hadoop3-client-bin.tar.gz, which is only 344MB. I tried it a
> few times and it swaps between that one and hbase-2.6.0-hadoop3-bin.tar.gz
> so maybe it's non-deterministically ordered.
>
> I wonder if a limit was just recently introduced. I'm still waiting on a
> response to my INFRA-25634 ticket.
>
> On Fri, Mar 22, 2024 at 12:59 PM Andrew Purtell 
> wrote:
>
>> > If something needs to be removed, I propose the full fat (
>> > *hbase-shaded-client*) shaded client JAR.
>> > That is never returned by the hbase command AFAIK, and is also the
>> largest
>> > in size.
>>
>> Sounds good, if removing examples is insufficient, the limit cannot be
>> increased, and some other step need be taken.
>>
>>
>> On Thu, Mar 21, 2024 at 10:40 PM Istvan Toth 
>> wrote:
>>
>> > The *hbase classpath* and *hbase mapredcp* command outputs do include
>> the
>> > respective  *hbase-shaded-client-byo-hadoop* and
>> *hbase-shaded-mapreduce*
>> >  jars.
>> >
>> > At least the 'hbase mapredcp' jars are used by both Spark and Hive
>> > integration, and expected to be available on the node filesystem.
>> > We also plan to switch the Phoenix connectors to that.
>> >
>> > Having those two jars in a separate assembly would require further
>> > configuration when installing HBase to tell it
>> > where to find them, so that the classpath commands can include them.
>> >
>> > If something needs to be removed, I propose the full fat (
>> > *hbase-shaded-client*) shaded client JAR.
>> > That is never returned by the hbase command AFAIK, and is also the
>> largest
>> > in size.
>> > (I plan to remove that one from the upcoming Hadoop-less assembly as
>> well)
>> >
>> > Istvan
>> >
>> > On Fri, Mar 22, 2024 at 4:55 AM 张铎(Duo Zhang) 
>> > wrote:
>> >
>> > > Tested locally, after removing hbase-example from tarball, the hadoop3
>> > > tarball is about 351MB.
>> > >
>> > > So you could try to include this commit to publish again, to see if
>> this
>> > > helps.
>> > >
>> > > Thanks.
>> > >
>> > > 张铎(Duo Zhang)  于2024年3月22日周五 09:18写道:
>> > > >
>> > > > If we exclude hbase-example from the binaries, will it be smaller
>> > enough
>> > > to fit?
>> > > >
>> > > > We already commit the changes to master I believe. Let me see if we
>> > > > can cherry-pick them and commit to branch-2.6 as well.
>> > > >
>> > > > Thanks.
>> > > >
>> > > > Bryan Beaudreault  于2024年3月22日周五 07:35写道:
>> > > > >
>> > > > > Thanks, I filed
>> > > > > https://issues.apache.org/jira/browse/INFRA-25634
>> > > > >
>> > > > > On Thu, Mar 21, 2024 at 5:46 PM Andrew Purtell <
>> apurt...@apache.org>
>> > > wrote:
>> > > > >
>> > > > > > The hadoop3 bin tarball for 2.5.8 is 352.8MB. Perhaps we have
>> just
>> > > barely
>> > > > > > and recently crossed a threshold. File an INFRA JIRA and ask
>> about
>> > > it.
>> > > > > > Perhaps some limit can be increased, or maybe they will ask us
>> to
>> > > live
>> > > > > > within it.
>> > > > > >
>> > > > > > Related, looking at the 2.5.8 hadoop3 bin tarball, the majority
>> of
>> > > the bulk
>> > > > > > is ./lib/shaded-clients/ . The shaded clients are certainly
>> useful
>> > > but
>> > > > > > probably are not the most popular options when taking a
>> dependency
>> > on
>> > > > > > HBase. Perhaps we can package these separately. We could exclude
>> > > them from
>> > > > > > the convenience tarballs as they will still be available from
>> the
>> > > Apache
>> > > > > > Maven repository.
>> > > > > >
>> > > > > > On Thu, 

Re: Aiming for 2.6.0RC0 tomorrow

2024-03-22 Thread Bryan Beaudreault
Unfortunately, it still failed. This time it actually failed
on hbase-2.6.0-hadoop3-client-bin.tar.gz, which is only 344MB. I tried it a
few times and it swaps between that one and hbase-2.6.0-hadoop3-bin.tar.gz
so maybe it's non-deterministically ordered.

I wonder if a limit was just recently introduced. I'm still waiting on a
response to my INFRA-25634 ticket.

On Fri, Mar 22, 2024 at 12:59 PM Andrew Purtell  wrote:

> > If something needs to be removed, I propose the full fat (
> > *hbase-shaded-client*) shaded client JAR.
> > That is never returned by the hbase command AFAIK, and is also the
> largest
> > in size.
>
> Sounds good, if removing examples is insufficient, the limit cannot be
> increased, and some other step need be taken.
>
>
> On Thu, Mar 21, 2024 at 10:40 PM Istvan Toth 
> wrote:
>
> > The *hbase classpath* and *hbase mapredcp* command outputs do include the
> > respective  *hbase-shaded-client-byo-hadoop* and *hbase-shaded-mapreduce*
> >  jars.
> >
> > At least the 'hbase mapredcp' jars are used by both Spark and Hive
> > integration, and expected to be available on the node filesystem.
> > We also plan to switch the Phoenix connectors to that.
> >
> > Having those two jars in a separate assembly would require further
> > configuration when installing HBase to tell it
> > where to find them, so that the classpath commands can include them.
> >
> > If something needs to be removed, I propose the full fat (
> > *hbase-shaded-client*) shaded client JAR.
> > That is never returned by the hbase command AFAIK, and is also the
> largest
> > in size.
> > (I plan to remove that one from the upcoming Hadoop-less assembly as
> well)
> >
> > Istvan
> >
> > On Fri, Mar 22, 2024 at 4:55 AM 张铎(Duo Zhang) 
> > wrote:
> >
> > > Tested locally, after removing hbase-example from tarball, the hadoop3
> > > tarball is about 351MB.
> > >
> > > So you could try to include this commit to publish again, to see if
> this
> > > helps.
> > >
> > > Thanks.
> > >
> > > 张铎(Duo Zhang)  于2024年3月22日周五 09:18写道:
> > > >
> > > > If we exclude hbase-example from the binaries, will it be smaller
> > enough
> > > to fit?
> > > >
> > > > We already commit the changes to master I believe. Let me see if we
> > > > can cherry-pick them and commit to branch-2.6 as well.
> > > >
> > > > Thanks.
> > > >
> > > > Bryan Beaudreault  于2024年3月22日周五 07:35写道:
> > > > >
> > > > > Thanks, I filed
> > > > > https://issues.apache.org/jira/browse/INFRA-25634
> > > > >
> > > > > On Thu, Mar 21, 2024 at 5:46 PM Andrew Purtell <
> apurt...@apache.org>
> > > wrote:
> > > > >
> > > > > > The hadoop3 bin tarball for 2.5.8 is 352.8MB. Perhaps we have
> just
> > > barely
> > > > > > and recently crossed a threshold. File an INFRA JIRA and ask
> about
> > > it.
> > > > > > Perhaps some limit can be increased, or maybe they will ask us to
> > > live
> > > > > > within it.
> > > > > >
> > > > > > Related, looking at the 2.5.8 hadoop3 bin tarball, the majority
> of
> > > the bulk
> > > > > > is ./lib/shaded-clients/ . The shaded clients are certainly
> useful
> > > but
> > > > > > probably are not the most popular options when taking a
> dependency
> > on
> > > > > > HBase. Perhaps we can package these separately. We could exclude
> > > them from
> > > > > > the convenience tarballs as they will still be available from the
> > > Apache
> > > > > > Maven repository.
> > > > > >
> > > > > > On Thu, Mar 21, 2024 at 2:33 PM Bryan Beaudreault <
> > > bbeaudrea...@apache.org
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I got most of the way through, but failed during publish-dist:
> > > > > > >
> > > > > > > Transmitting file data ..svn: E175002: Commit failed (details
> > > follow):
> > > > > > > svn: E175002: PUT request on
> > > > > > >
> > > > > > >
> > > > > >
> > >
> >
> '/repos/dist/!svn/txr/68050-1le9/dev/hbase/2.6.0RC0/hbase-2.6.0-hadoop3-bin.tar.gz'
> > > > > > &

[jira] [Created] (HBASE-28455) do-release-docker fails to setup gpg agent proxy if proxy container is slow to start

2024-03-22 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28455:
-

 Summary: do-release-docker fails to setup gpg agent proxy if proxy 
container is slow to start
 Key: HBASE-28455
 URL: https://issues.apache.org/jira/browse/HBASE-28455
 Project: HBase
  Issue Type: Improvement
Reporter: Bryan Beaudreault


In do-release-docker.sh we spin up the gpg-agent-proxy container and then 
immediately run ssh-keyscan and then immediately run ssh. Despite having 
{{{}set -e{}}}, both of these can fail without failing the script. This 
manifests as a really hard to debug failure in the hbase-rm container with 
"gpg: no gpg-agent running in this session"

With some debugging I realized that the ssh tunnel had not been created. 
looking at the logs, the gpg-agent-proxy.ssh-keyscan file is empty and the 
gpg-proxy.ssh.log shows a Connection refused error.

You'd think these would fail the script, but they don't for different reasons:
 # ssh-keyscan output is piped through sort. Running ssh-keyscan directly 
returns an error code, but piping it through sort turns it into a success code.
 # ssh is executed in background with {{{}&{}}}, which similarly loses the 
error code

I think we should add a step prior to ssh-keyscan which waits until port 6 
is available. I'm not sure how to retain the error codes in the above 2 
commands, but can try to look into that as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Aiming for 2.6.0RC0 tomorrow

2024-03-22 Thread Bryan Beaudreault
Thanks all. I'm running RC1 now with the included commits. I'll report back
in a few hours, since it takes a while to run

On Fri, Mar 22, 2024 at 1:40 AM Istvan Toth 
wrote:

> The *hbase classpath* and *hbase mapredcp* command outputs do include the
> respective  *hbase-shaded-client-byo-hadoop* and *hbase-shaded-mapreduce*
>  jars.
>
> At least the 'hbase mapredcp' jars are used by both Spark and Hive
> integration, and expected to be available on the node filesystem.
> We also plan to switch the Phoenix connectors to that.
>
> Having those two jars in a separate assembly would require further
> configuration when installing HBase to tell it
> where to find them, so that the classpath commands can include them.
>
> If something needs to be removed, I propose the full fat (
> *hbase-shaded-client*) shaded client JAR.
> That is never returned by the hbase command AFAIK, and is also the largest
> in size.
> (I plan to remove that one from the upcoming Hadoop-less assembly as well)
>
> Istvan
>
> On Fri, Mar 22, 2024 at 4:55 AM 张铎(Duo Zhang) 
> wrote:
>
> > Tested locally, after removing hbase-example from tarball, the hadoop3
> > tarball is about 351MB.
> >
> > So you could try to include this commit to publish again, to see if this
> > helps.
> >
> > Thanks.
> >
> > 张铎(Duo Zhang)  于2024年3月22日周五 09:18写道:
> > >
> > > If we exclude hbase-example from the binaries, will it be smaller
> enough
> > to fit?
> > >
> > > We already commit the changes to master I believe. Let me see if we
> > > can cherry-pick them and commit to branch-2.6 as well.
> > >
> > > Thanks.
> > >
> > > Bryan Beaudreault  于2024年3月22日周五 07:35写道:
> > > >
> > > > Thanks, I filed
> > > > https://issues.apache.org/jira/browse/INFRA-25634
> > > >
> > > > On Thu, Mar 21, 2024 at 5:46 PM Andrew Purtell 
> > wrote:
> > > >
> > > > > The hadoop3 bin tarball for 2.5.8 is 352.8MB. Perhaps we have just
> > barely
> > > > > and recently crossed a threshold. File an INFRA JIRA and ask about
> > it.
> > > > > Perhaps some limit can be increased, or maybe they will ask us to
> > live
> > > > > within it.
> > > > >
> > > > > Related, looking at the 2.5.8 hadoop3 bin tarball, the majority of
> > the bulk
> > > > > is ./lib/shaded-clients/ . The shaded clients are certainly useful
> > but
> > > > > probably are not the most popular options when taking a dependency
> on
> > > > > HBase. Perhaps we can package these separately. We could exclude
> > them from
> > > > > the convenience tarballs as they will still be available from the
> > Apache
> > > > > Maven repository.
> > > > >
> > > > > On Thu, Mar 21, 2024 at 2:33 PM Bryan Beaudreault <
> > bbeaudrea...@apache.org
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I got most of the way through, but failed during publish-dist:
> > > > > >
> > > > > > Transmitting file data ..svn: E175002: Commit failed (details
> > follow):
> > > > > > svn: E175002: PUT request on
> > > > > >
> > > > > >
> > > > >
> >
> '/repos/dist/!svn/txr/68050-1le9/dev/hbase/2.6.0RC0/hbase-2.6.0-hadoop3-bin.tar.gz'
> > > > > > failed
> > > > > >
> > > > > > Running manually, it looks to be a Request Entity Too Large. The
> > file in
> > > > > > question is 356MB. Anyone have any experience with this?
> > > > > >
> > > > > > On Thu, Mar 21, 2024 at 2:19 AM 张铎(Duo Zhang) <
> > palomino...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > HBASE-28444 has been resolved.
> > > > > > >
> > > > > > > Please go ahead to cut 2.6.0RC0, really a long journey :)
> > > > > > >
> > > > > > > 张铎(Duo Zhang)  于2024年3月20日周三 14:29写道:
> > > > > > > >
> > > > > > > > There is a security issue for zookeeper, but simply upgrading
> > > > > > > > zookeeper will break a test.
> > > > > > > >
> > > > > > > > Pelase see HBASE-28444 for more details.
> > > > > > > >
> > > > > > > > I think we should get this in befor

Re: Aiming for 2.6.0RC0 tomorrow

2024-03-21 Thread Bryan Beaudreault
Thanks, I filed
https://issues.apache.org/jira/browse/INFRA-25634

On Thu, Mar 21, 2024 at 5:46 PM Andrew Purtell  wrote:

> The hadoop3 bin tarball for 2.5.8 is 352.8MB. Perhaps we have just barely
> and recently crossed a threshold. File an INFRA JIRA and ask about it.
> Perhaps some limit can be increased, or maybe they will ask us to live
> within it.
>
> Related, looking at the 2.5.8 hadoop3 bin tarball, the majority of the bulk
> is ./lib/shaded-clients/ . The shaded clients are certainly useful but
> probably are not the most popular options when taking a dependency on
> HBase. Perhaps we can package these separately. We could exclude them from
> the convenience tarballs as they will still be available from the Apache
> Maven repository.
>
> On Thu, Mar 21, 2024 at 2:33 PM Bryan Beaudreault  >
> wrote:
>
> > I got most of the way through, but failed during publish-dist:
> >
> > Transmitting file data ..svn: E175002: Commit failed (details follow):
> > svn: E175002: PUT request on
> >
> >
> '/repos/dist/!svn/txr/68050-1le9/dev/hbase/2.6.0RC0/hbase-2.6.0-hadoop3-bin.tar.gz'
> > failed
> >
> > Running manually, it looks to be a Request Entity Too Large. The file in
> > question is 356MB. Anyone have any experience with this?
> >
> > On Thu, Mar 21, 2024 at 2:19 AM 张铎(Duo Zhang) 
> > wrote:
> >
> > > HBASE-28444 has been resolved.
> > >
> > > Please go ahead to cut 2.6.0RC0, really a long journey :)
> > >
> > > 张铎(Duo Zhang)  于2024年3月20日周三 14:29写道:
> > > >
> > > > There is a security issue for zookeeper, but simply upgrading
> > > > zookeeper will break a test.
> > > >
> > > > Pelase see HBASE-28444 for more details.
> > > >
> > > > I think we should get this in before cutting the RC.
> > > >
> > > > Thanks.
> > > >
> > > > Bryan Beaudreault  于2024年3月19日周二 23:51写道:
> > > > >
> > > > > I've finished auditing fixVersions and run ITBLL for an extended
> > > period of
> > > > > time in a real cluster. I'm not aware of any open blockers. So
> > > tomorrow I'm
> > > > > going to start generating the RC0.
> > > > >
> > > > > Please let me know if you have any concerns or reason for delay.
> > >
> >
>
>
> --
> Best regards,
> Andrew
>
> Unrest, ignorance distilled, nihilistic imbeciles -
> It's what we’ve earned
> Welcome, apocalypse, what’s taken you so long?
> Bring us the fitting end that we’ve been counting on
>- A23, Welcome, Apocalypse
>


Re: Aiming for 2.6.0RC0 tomorrow

2024-03-21 Thread Bryan Beaudreault
I got most of the way through, but failed during publish-dist:

Transmitting file data ..svn: E175002: Commit failed (details follow):
svn: E175002: PUT request on
'/repos/dist/!svn/txr/68050-1le9/dev/hbase/2.6.0RC0/hbase-2.6.0-hadoop3-bin.tar.gz'
failed

Running manually, it looks to be a Request Entity Too Large. The file in
question is 356MB. Anyone have any experience with this?

On Thu, Mar 21, 2024 at 2:19 AM 张铎(Duo Zhang)  wrote:

> HBASE-28444 has been resolved.
>
> Please go ahead to cut 2.6.0RC0, really a long journey :)
>
> 张铎(Duo Zhang)  于2024年3月20日周三 14:29写道:
> >
> > There is a security issue for zookeeper, but simply upgrading
> > zookeeper will break a test.
> >
> > Pelase see HBASE-28444 for more details.
> >
> > I think we should get this in before cutting the RC.
> >
> > Thanks.
> >
> > Bryan Beaudreault  于2024年3月19日周二 23:51写道:
> > >
> > > I've finished auditing fixVersions and run ITBLL for an extended
> period of
> > > time in a real cluster. I'm not aware of any open blockers. So
> tomorrow I'm
> > > going to start generating the RC0.
> > >
> > > Please let me know if you have any concerns or reason for delay.
>


Aiming for 2.6.0RC0 tomorrow

2024-03-19 Thread Bryan Beaudreault
I've finished auditing fixVersions and run ITBLL for an extended period of
time in a real cluster. I'm not aware of any open blockers. So tomorrow I'm
going to start generating the RC0.

Please let me know if you have any concerns or reason for delay.


[jira] [Resolved] (HBASE-28338) Bounded leak of FSDataInputStream buffers from checksum switching

2024-03-19 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28338.
---
Fix Version/s: 2.6.0
   3.0.0-beta-2
   Resolution: Fixed

> Bounded leak of FSDataInputStream buffers from checksum switching
> -
>
> Key: HBASE-28338
> URL: https://issues.apache.org/jira/browse/HBASE-28338
> Project: HBase
>  Issue Type: Bug
>    Reporter: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> In FSDataInputStreamWrapper, the unbuffer() method caches an unbuffer 
> instance the first time it is called. When an FSDataInputStreamWrapper is 
> initialized, it has hbase checksum disabled.
> In HFileInfo.initTrailerAndContext we get the stream, read the trailer, then 
> call unbuffer. At this point, checksums have not been enabled yet via 
> prepareForBlockReader. So the call to unbuffer() caches the current 
> non-checksum stream as the unbuffer instance.
> Later, in initMetaAndIndex we do a similar thing. This time, 
> prepareForBlockReader has been called, so we are now using hbase checksums. 
> When initMetaAndIndex calls unbuffer(), it uses the old unbuffer instance 
> which actually has been closed when we switched to hbase checksums. So that 
> call does nothing, and the new no-checksum input stream is never unbuffered.
> I haven't seen this cause an issue with normal hdfs replication (though 
> haven't gone looking). It's very problematic for Erasure Coding because 
> DFSStripedInputStream holds a large buffer (numDataBlocks * cellSize, so 6mb 
> for RS-6-3-1024k) that is only used for stream reads NOT pread. The 
> FSDataInputStreamWrapper we are talking about here is only used for pread in 
> hbase, so those 6mb buffers just hang around totally unused but 
> unreclaimable. Since there is an input stream per StoreFile, this can add up 
> very quickly on big servers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28385) Quota estimates are too optimistic for large scans

2024-03-13 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28385.
---
Fix Version/s: 3.0.0-beta-2
 Release Note: When hbase.quota.use.result.size.bytes is false, we will now 
estimate the amount of quota to grab for a scan based on the block bytes 
scanned of previous next() requests. This will increase throughput for large 
scans which might prefer to wait a little longer for a larger portion of the 
quota.
   Resolution: Fixed

> Quota estimates are too optimistic for large scans
> --
>
> Key: HBASE-28385
> URL: https://issues.apache.org/jira/browse/HBASE-28385
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ray Mattingly
>Assignee: Ray Mattingly
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> Let's say you're running a table scan with a throttle of 100MB/sec per 
> RegionServer. Ideally your scans are going to pull down large results, often 
> containing hundreds or thousands of blocks.
> You will estimate each scan as costing a single block of read capacity, and 
> if your quota is already exhausted then the server will evaluate the backoff 
> required for your estimated consumption (1 block) to be available. This will 
> often be ~1ms, causing your retries to basically be immediate.
> Obviously it will routinely take much longer than 1ms for 100MB of IO to 
> become available in the given configuration, so your retries will be destined 
> to fail. At worst this can cause a saturation of your server's RPC layer, and 
> at best this causes erroneous exhaustion of the client's retries.
> We should find a way to make these estimates a bit smarter for large scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28440) Add support for using mapreduce sort in HFileOutputFormat2

2024-03-13 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28440:
-

 Summary: Add support for using mapreduce sort in HFileOutputFormat2
 Key: HBASE-28440
 URL: https://issues.apache.org/jira/browse/HBASE-28440
 Project: HBase
  Issue Type: Improvement
  Components: backuprestore
Reporter: Bryan Beaudreault


Currently HFileOutputFormat2 uses CellSortReducer, which attempts to sort all 
of the cells of a row in memory using a TreeSet. There is a warning in the 
javadoc "If lots of columns per row, it will use lots of memory sorting." This 
can be problematic for WALPlayer, which uses HFileOutputFormat2. You could have 
reasonably sized row which just gets lots of edits in the time period of WALs 
being replayed, and that would cause an OOM. We are seeing this in some cases 
with incremental backups.

MapReduce has built-in sorting capabilities which are not limited to sorting in 
memory. It can spill to disk as necessary to sort very large datasets. We can 
get this capability in HFileOutputFormat2 with a couple changes:
 # Add support for a KeyOnlyCellComparable type as the map output key
 # When configured, use 
job.setSortComparatorClass(CellWritableComparator.class) and 
job.setReducerClass(PreSortedCellsReducer.class)
 # Update WALPlayer to have a mode which can output this new comparable instead 
of ImmutableBytesWritable

CellWritableComparator exists already for the Import job, so there is some 
prior art. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28260) Possible data loss in WAL after RegionServer crash

2024-03-12 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28260.
---
Fix Version/s: 2.5.9
   Resolution: Fixed

Pushed to branch-2.5

> Possible data loss in WAL after RegionServer crash
> --
>
> Key: HBASE-28260
> URL: https://issues.apache.org/jira/browse/HBASE-28260
> Project: HBase
>  Issue Type: Bug
>    Reporter: Bryan Beaudreault
>Assignee: Charles Connell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2, 2.5.9
>
>
> We recently had a production incident:
>  # RegionServer crashes, but local DataNode lives on
>  # WAL lease recovery kicks in
>  # Namenode reconstructs the block during lease recovery (which results in a 
> new genstamp). It chooses the replica on the local DataNode as the primary.
>  # Local DataNode reconstructs the block, so NameNode registers the new 
> genstamp.
>  # Local DataNode and the underlying host dies, before the new block could be 
> replicated to other replicas.
> This leaves us with a missing block, because the new genstamp block has no 
> replicas. The old replicas still remain, but are considered corrupt due to 
> GENSTAMP_MISMATCH.
> Thankfully we were able to confirm that the length of the corrupt blocks were 
> identical to the newly constructed and lost block. Further, the file in 
> question was only 1 block. So we downloaded one of those corrupt block files 
> and hdfs {{hdfs dfs -put -f}} to force that block to replace the file in 
> hdfs. So in this case we had no actual data loss, but it could have happened 
> easily if the file was more than 1 block or the replicas weren't fully in 
> sync prior to reconstruction.
> In order to avoid this issue, we should avoid writing WAL blocks too the 
> local datanode. We can use CreateFlag.NO_WRITE_LOCAL for this. Hat tip to 
> [~weichiu] for pointing this out.
> During reading of WALs we already reorder blocks so as to avoid reading from 
> the local datanode, but avoiding writing there altogether would be better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-28260) Possible data loss in WAL after RegionServer crash

2024-03-12 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault reopened HBASE-28260:
---
  Assignee: Charles Connell

Actually, since this is a bug and it applies cleanly to branch-2.5, I'm 
reopening for cherry-pick there.

> Possible data loss in WAL after RegionServer crash
> --
>
> Key: HBASE-28260
> URL: https://issues.apache.org/jira/browse/HBASE-28260
> Project: HBase
>  Issue Type: Bug
>    Reporter: Bryan Beaudreault
>Assignee: Charles Connell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> We recently had a production incident:
>  # RegionServer crashes, but local DataNode lives on
>  # WAL lease recovery kicks in
>  # Namenode reconstructs the block during lease recovery (which results in a 
> new genstamp). It chooses the replica on the local DataNode as the primary.
>  # Local DataNode reconstructs the block, so NameNode registers the new 
> genstamp.
>  # Local DataNode and the underlying host dies, before the new block could be 
> replicated to other replicas.
> This leaves us with a missing block, because the new genstamp block has no 
> replicas. The old replicas still remain, but are considered corrupt due to 
> GENSTAMP_MISMATCH.
> Thankfully we were able to confirm that the length of the corrupt blocks were 
> identical to the newly constructed and lost block. Further, the file in 
> question was only 1 block. So we downloaded one of those corrupt block files 
> and hdfs {{hdfs dfs -put -f}} to force that block to replace the file in 
> hdfs. So in this case we had no actual data loss, but it could have happened 
> easily if the file was more than 1 block or the replicas weren't fully in 
> sync prior to reconstruction.
> In order to avoid this issue, we should avoid writing WAL blocks too the 
> local datanode. We can use CreateFlag.NO_WRITE_LOCAL for this. Hat tip to 
> [~weichiu] for pointing this out.
> During reading of WALs we already reorder blocks so as to avoid reading from 
> the local datanode, but avoiding writing there altogether would be better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28260) Possible data loss in WAL after RegionServer crash

2024-03-12 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28260.
---
Fix Version/s: 2.6.0
   3.0.0-beta-2
   Resolution: Fixed

Pushed to branch-2.6+. Note that NO_LOCAL_WRITE was added back in 2016 for 
hbase's specific use, but apparently never used. So this Jira finally closes 
the loop on HDFS-3702. Thanks [~charlesconnell] for the contribution!

> Possible data loss in WAL after RegionServer crash
> --
>
> Key: HBASE-28260
> URL: https://issues.apache.org/jira/browse/HBASE-28260
> Project: HBase
>  Issue Type: Bug
>    Reporter: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> We recently had a production incident:
>  # RegionServer crashes, but local DataNode lives on
>  # WAL lease recovery kicks in
>  # Namenode reconstructs the block during lease recovery (which results in a 
> new genstamp). It chooses the replica on the local DataNode as the primary.
>  # Local DataNode reconstructs the block, so NameNode registers the new 
> genstamp.
>  # Local DataNode and the underlying host dies, before the new block could be 
> replicated to other replicas.
> This leaves us with a missing block, because the new genstamp block has no 
> replicas. The old replicas still remain, but are considered corrupt due to 
> GENSTAMP_MISMATCH.
> Thankfully we were able to confirm that the length of the corrupt blocks were 
> identical to the newly constructed and lost block. Further, the file in 
> question was only 1 block. So we downloaded one of those corrupt block files 
> and hdfs {{hdfs dfs -put -f}} to force that block to replace the file in 
> hdfs. So in this case we had no actual data loss, but it could have happened 
> easily if the file was more than 1 block or the replicas weren't fully in 
> sync prior to reconstruction.
> In order to avoid this issue, we should avoid writing WAL blocks too the 
> local datanode. We can use CreateFlag.NO_WRITE_LOCAL for this. Hat tip to 
> [~weichiu] for pointing this out.
> During reading of WALs we already reorder blocks so as to avoid reading from 
> the local datanode, but avoiding writing there altogether would be better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28359) Improve quota RateLimiter synchronization

2024-03-06 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28359.
---
Fix Version/s: 2.6.0
   3.0.0-beta-2
   Resolution: Fixed

Pushed to branch-2.6+. Thanks for the contribution [~rmdmattingly]!

> Improve quota RateLimiter synchronization
> -
>
> Key: HBASE-28359
> URL: https://issues.apache.org/jira/browse/HBASE-28359
> Project: HBase
>  Issue Type: Improvement
>    Reporter: Bryan Beaudreault
>Assignee: Ray Mattingly
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> We've been experiencing RpcThrottlingException with 0ms waitInterval. This 
> seems odd and wasteful, since the client side will immediately retry without 
> backoff. I think the problem is related to the synchronization of RateLimiter.
> The TimeBasedLimiter checkQuota method does the following:
> {code:java}
> if (!reqSizeLimiter.canExecute(estimateWriteSize + estimateReadSize)) {
>   RpcThrottlingException.throwRequestSizeExceeded(
> reqSizeLimiter.waitInterval(estimateWriteSize + estimateReadSize));
> } {code}
> Both canExecute and waitInterval are synchronized, but we're calling them 
> independently. So it's possible under high concurrency for canExecute to 
> return false, but then waitInterval returns 0 (would have been true)
> I think we should simplify the API to have a single synchronized call:
> {code:java}
> long waitInterval = reqSizeLimiter.tryAcquire(estimateWriteSize + 
> estimateReadSize);
> if (waitInterval > 0) {
>   RpcThrottlingException.throwRequestSizeExceeded(waitInterval);
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28423) Improvements to backup of bulkloaded files

2024-03-06 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28423:
-

 Summary: Improvements to backup of bulkloaded files
 Key: HBASE-28423
 URL: https://issues.apache.org/jira/browse/HBASE-28423
 Project: HBase
  Issue Type: Improvement
Reporter: Bryan Beaudreault


Backup/Restore has support for including bulkloaded files in incremental 
backups. There is a coprocessor hook which registers all bulkloads into a 
backup:system_bulk table. A cleaner plugin ensures that these files are not 
cleaned up from the archive until they are backed up. When the incremental 
backup occurs, the files are deleted from the system_bulk table and then 
cleaned up.

We have encountered two problems to be solved with this:
 # The deletion process only happens during incremental backups, not full 
backups. A full backup already includes all data in the table via a snapshot 
export. So we should clear any pending bulkloads upon full backup.
 # There is currently no linking of bulkload state to backupRoot. It's possible 
to have multiple backupRoots for tables. For example, you might backup to 2 
destinations with different schedules. Currently whichever backupRoot does an 
incremental backup first will be the one to include bulkloads, then the 
system_bulk table. We need some sort of mapping of bulkload to backupRoot, and 
we should only delete the rows from system_bulk once the files have been 
included in all active backupRoots.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Removing tests and/or Hadoop from the binary assemblies

2024-03-05 Thread Bryan Beaudreault
I'm +0 on hbase-examples, but +100 on any improvements we can make to
ltt/pe/chaos/minicluster/etc. It's extremely frustrating how much reliance
we have on test jars both generally but also specifically around these core
test executables. Unfortunately I haven't had time to dedicate to these
frustrations myself, but happy to help with review, etc.

On Tue, Mar 5, 2024 at 1:03 PM Nihal Jain  wrote:

> Thank you for bringing this up.
>
> +1 for this change.
>
> In fact, some time back, we had faced similar problem. Security scans found
> that we were bundling some vulnerable hadoop test jar. To deal with that we
> had to make a change in our internal HBase fork to exclude all HBase and
> Hadoop test jars from assembly. This helped us get rid of vulnerable jar.
> (Although I hadn't dealt with test scope dependencies there.)
>
> But, I have been thinking of pushing this change in Apache HBase, just
> wasn't sure if this was even acceptable. It's great to see same has been
> brought up here today.
>
> We hadn't dealt with the ltt, pe etc. tools and wrote a script to download
> them on demand to avoid massive code change in internal fork. But I have a
> +1 on the idea of identifying and moving all such tools to a new module.
> This would be great and make things easier for us as well.
>
> Also, a way we could help new users easily get started, in case we
> completely stop bundling hadoop jars, is by providing a script which starts
> a hbase cluster in a single node setup. In fact I had written a simple
> script sometime back that automates this process given a release link for
> both. It first downloads Hadoop and HBase binaries and then starts both
> with the hbase root directory set to be on hdfs. We could provide something
> similar to help new users to get started easily.
>
> Although I am also +1 on the idea to provide both variants as mentioned by
> Nick, which might not even need any such script.
>
> Also, I am willing to volunteer for help towards this effort. Please let me
> know if anything is needed.
>
> Thanks,
> Nihal
>
>
> On Tue, 5 Mar 2024, 15:35 Nick Dimiduk,  wrote:
>
> > This would be great cleanup, big +1 from me for all three of these
> > adjustments, including the promotion of pe, ltt, and friends out of the
> > test scope.
> >
> > I believe that we included hbase test jars because we used to freely mix
> > classes needed for minicluster between runtime and test jars, which in
> turn
> > relied on Hadoop minicluster capabilities. The big cleanup around
> > HBaseTestingUtil/it addressed much (or all) of these issues on branch-3.
> >
> > I believe that we include a Hadoop distribution in our assembly because
> > that makes it easy for a new user to download our release bin.tgz and get
> > started immediately with learning. I guess it’s high time that we work
> out
> > the with- and without-Hadoop variants.
> >
> > Thanks,
> > Nick
> >
> > On Tue, 5 Mar 2024 at 09:14, Istvan Toth  wrote:
> >
> > > DISCLAIMER: I don't have a patch ready, or even an elegant way mapped
> out
> > > to achieve this, this is about discussing whether we even want to make
> > > these changes.
> > > These are also substantial changes, but they could be targeted for
> HBase
> > > 3.0.
> > >
> > > One issue I have noticed is that we ship test jars and test
> dependencies
> > in
> > > the assembly.
> > > I can't see anyone using those, but it bloats the assembly and
> classpath,
> > > and adds unnecessary JARs with possible CVE issues. (for example Kerby
> > > which is a Hadoop minicluster dependency)
> > >
> > > My proposal is to exclude the test jars and the test scope dependencies
> > > from the assembly.
> > >
> > > The advantages would be:
> > > * Smaller distro size
> > > * Faster startup (this is marginal)
> > > * Less CVE-prone JARs in the binary assemblies
> > >
> > > The other issue is that the assembly includes much of the Hadoop
> > > distribution.
> > > The basic assumption in all scripts and instructions is that the node
> > has a
> > > fully configured Hadoop installation, and we include it in the
> classpath
> > of
> > > HBase.
> > >
> > > If that is true, then there is no reason to include Hadoop in the
> > assembly,
> > > HBase and its direct dependencies should be enough.
> > >
> > > One could argue that it would simplify the client side, which is true
> to
> > > some extent (though 95% of the client distro use cases are served
> better
> > by
> > > simply using hbase-shaded-client).
> > >
> > > We could either remove the Hadoop libraries from either or both of the
> > > assemblies unconditionally, or provide two variants for either or both
> > > assemblies, one with Hadoop included, and one without it.
> > > Spark already does this, it has binary distributions both with and
> > without
> > > Hadoop.
> > >
> > > The advantages would be:
> > > * Smaller distro size
> > > * Faster startup (this is marginal)
> > > * Less chance of conflicts with the Hadoop jars
> > > * Less CVE-prone JARs in the 

Re: [DISCUSS] Deprecating zookeeper-based client connectivity

2024-03-04 Thread Bryan Beaudreault
I’d say let’s do it. But if we want to do it for 2.6.0 then it’s be great
to put up a PR asap so it doesn’t block the release. I’m hoping to get the
RC0 out this week

On Mon, Mar 4, 2024 at 4:41 AM Nick Dimiduk  wrote:

> On Fri, Mar 1, 2024 at 6:12 PM Andrew Purtell  wrote:
>
> > I disagree. We can keep the current defaults AND deprecate
> > ZKConnectionRegistry as a warning to users that things will change in
> > future releases. That is the entire reason for deprecation, yes?
> >
>
> Indeed. For me, there is value in introducing the Deprecation warnings as
> early as possible, to give folks forewarning. So I suggest that we mark it
> deprecated in 2.6, it is no longer the default connection mechanism in 3.0,
> and it is removed in 4.0.
>
> On Fri, Mar 1, 2024 at 4:53 AM Nick Dimiduk  wrote:
> >
> > > On Fri, Mar 1, 2024 at 1:39 PM Istvan Toth  >
> > > wrote:
> > >
> > > > I checked our compatibility promise just now:
> > > > https://hbase.apache.org/book.html#hbase.versioning
> > > >
> > > > If we consider the way we use properties to define the cluster
> > > connection a
> > > > part of the client API
> > > > (and I personally do) then we cannot remove the ZK registry
> > > > functionality before 4.0, even
> > > > if it is deprecated in 2.6.
> > > >
> > >
> > > This makes sense -- thanks for keeping me honest, Istvan.
> > >
> > > So then, with no current plan to make HBase run without ZooKeeper,
> > there's
> > > really no need to deprecate the ZKConnectionRegistry. A ZooKeeper
> quorum
> > > connection string will continue to be a supported part of our supported
> > > client-facing interface until we have a reason to discard it? I'm fine
> > with
> > > this decision. If that's the consensus, we can close HBASE-23324 as
> Won't
> > > Fix.
> > >
> > > Let's see if any other voices join the thread.
> > >
> > > On Fri, Mar 1, 2024 at 10:12 AM 张铎(Duo Zhang) 
> > > wrote:
> > > >
> > > > > For 3.0.0, after moving the replication things out, there is no
> > > > > persistent data on zookeeper now. So it is possible to move off
> > > > > zookeeper now, of course, we still need at least something like
> etcd,
> > > > > as we need an external system to track the living region servers...
> > > > >
> > > > > And I think the registry interface is for connecting to a HBase
> > > > > cluster from outside, it does not need to know the internal
> > > > > implementation of HBase, i.e, whether to make use of zookeeper.
> > > > > For me, I think a possible problem is that we expose the meta
> > location
> > > > > in registry interface, since the splittable meta feature has been
> > > > > restarted, if later we support multiple meta regions in HBase, we
> > will
> > > > > need extra works if we still want to keep the zk based registry...
> > > > >
> > > > > Thanks.
> > > > >
> > > > > Nick Dimiduk  于2024年3月1日周五 16:25写道:
> > > > > >
> > > > > > On Fri, 1 Mar 2024 at 07:47, Istvan Toth
> >  > > >
> > > > > wrote:
> > > > > >
> > > > > > > That's a pretty fundamental change, and would break a lot of
> use
> > > > cases
> > > > > and
> > > > > > > applications that hard-code the assumption of the ZK registry.
> > > > > >
> > > > > >
> > > > > > To the best of my knowledge, the znode structure in ZooKeeper has
> > > never
> > > > > > been a part of our public API. I have no sympathy for systems
> that
> > > > assume
> > > > > > its presence.
> > > > > >
> > > > > > Making a breaking change like removing the previous default
> > > connection
> > > > > > > method in a minor version also feels wrong.
> > > > > > > (It may go against the compatibility policy, though I haven't
> > > > checked)
> > > > > >
> > > > > >
> > > > > > This is a fair argument.
> > > > > >
> > > > > > I think it would be better to deprecate it in 3.0 and remove it
> in
> > > 4.0,
> > > > > or
> > > > > > > at least deprecate it in 2.6 and remove it in 4.0.
> > > > > > > This is how the HBase 2.x API changes were handled, where the
> > > removal
> > > > > of
> > > > > > > the old HBase 1.x APIs were targeted to 3.0.
> > > > > > > The ZK registry code is small, and doesn't cost much to keep in
> > the
> > > > > > > codebase.
> > > > > >
> > > > > >
> > > > > > And in fact, I now realize that something like it will continue
> to
> > > > exist
> > > > > > even after the class is removed from our public API because I
> > suspect
> > > > > that
> > > > > > the HMaster will need to use it in order to bootstrap itself.
> > Still,
> > > it
> > > > > > could be moved into hbase-server and kept as an internal concern.
> > > > > >
> > > > > > So then, should we not deprecate it at all? We let the RPC
> > > > implementation
> > > > > > flip over as default in 3.0, but the ZK implementation sticks
> > around
> > > > into
> > > > > > perpetuity? As far as I know, we have no plan to move off of
> > > ZooKeeper
> > > > > > entirely ; etcd and RAFT are still just talk, right? If there’s
> > > nothing
> > > > > to
> > > > > > motivate its complete removal, I guess there no 

[jira] [Created] (HBASE-28400) WAL readers treat any exception as EOFException, which can lead to data loss

2024-02-23 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28400:
-

 Summary: WAL readers treat any exception as EOFException, which 
can lead to data loss
 Key: HBASE-28400
 URL: https://issues.apache.org/jira/browse/HBASE-28400
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault


In HBASE-28390, I found a bug in our WAL compression which manifests as an 
IllegalArgumentException or ArrayIndexOutOfBoundException. Even worse is that 
ProtobufLogReader.readNext catches any Exception and rethrows it as an 
EOFException. EOFException gets handled in a variety of ways by the readers of 
WALs, and not all of them make sense for an exception that isn't really EOF.

For example, WALInputFormat catches EOFException and returns false for 
nextKeyValue(), effectively skipping the rest of the WAL file but not failing 
the job.

ReplicationSourceWALReader has some much more complicated handling of 
EOFException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28390) WAL value compression fails for cells with large values

2024-02-23 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28390.
---
Fix Version/s: 2.6.0
   2.5.8
   3.0.0-beta-2
 Assignee: Bryan Beaudreault
   Resolution: Fixed

Pushed to branch-2.5+. Thanks [~apurtell] for the review

> WAL value compression fails for cells with large values
> ---
>
> Key: HBASE-28390
> URL: https://issues.apache.org/jira/browse/HBASE-28390
> Project: HBase
>  Issue Type: Bug
>    Reporter: Bryan Beaudreault
>    Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.5.8, 3.0.0-beta-2
>
>
> We are testing out WAL compression and noticed that it fails for large values 
> when both features (wal compression and wal value compression) are enabled. 
> It works fine with either feature independently, but not when combined. It 
> seems to fail for all of the value compressor types, and the failure is in 
> the LRUDictionary of wal key compression:
>  
> {code:java}
> java.io.IOException: Error  while reading 2 WAL KVs; started reading at 230 
> and read up to 396
>     at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufWALStreamReader.next(ProtobufWALStreamReader.java:94)
>  ~[classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.doTest(CompressedWALTestBase.java:181)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testForSize(CompressedWALTestBase.java:129)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testLarge(CompressedWALTestBase.java:94)
>  ~[test-classes/:?]
>     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:?]
>     at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  ~[?:?]
>     at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]
>     at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  ~[junit-4.13.2.jar:4.13.2]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
>     at java.lang.Thread.run(Thread.java:829) ~[?:?]
> Caused by: java.lang.IndexOutOfBoundsException: index (21) must be less than 
> size (1)
>     at 
> org.apache.hbase.thirdparty.com.g

[jira] [Created] (HBASE-28396) Quota throttling can cause a leak of scanners

2024-02-22 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28396:
-

 Summary: Quota throttling can cause a leak of scanners
 Key: HBASE-28396
 URL: https://issues.apache.org/jira/browse/HBASE-28396
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault


In RSRpcServices.scan, we check the quota after having created a new 
RegionScannerHolder. If the quota is exceeded, an exception will be thrown. In 
this case, we can't send the scannerName back to the client because it's just 
an exception. So the client will be forced to retry the openScanner call, but 
the RegionScannerHolder is not closed. Eventually the scanners will be cleaned 
up by the lease expiration, but this could cause many scanners to leak during 
periods of high throttling.

We could close the newly opened scanner before throwing the throttle exception, 
but I think it's better to not open the scanner at all until we've grabbed some 
quota.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28390) WAL compression fails for cells with large values when combined with WAL value compression

2024-02-21 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28390:
-

 Summary: WAL compression fails for cells with large values when 
combined with WAL value compression
 Key: HBASE-28390
 URL: https://issues.apache.org/jira/browse/HBASE-28390
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault


We are testing out WAL compression and noticed that it fails for large values 
when both features (wal compression and wal value compression) are enabled. It 
works fine with either feature independently, but not when combined. It seems 
to fail for all of the value compressor types, and the failure is in the 
LRUDictionary of wal key compression:

 
{code:java}
java.io.IOException: Error  while reading 2 WAL KVs; started reading at 230 and 
read up to 396
    at 
org.apache.hadoop.hbase.regionserver.wal.ProtobufWALStreamReader.next(ProtobufWALStreamReader.java:94)
 ~[classes/:?]
    at 
org.apache.hadoop.hbase.wal.CompressedWALTestBase.doTest(CompressedWALTestBase.java:181)
 ~[test-classes/:?]
    at 
org.apache.hadoop.hbase.wal.CompressedWALTestBase.testForSize(CompressedWALTestBase.java:129)
 ~[test-classes/:?]
    at 
org.apache.hadoop.hbase.wal.CompressedWALTestBase.testLarge(CompressedWALTestBase.java:94)
 ~[test-classes/:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:?]
    at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 ~[?:?]
    at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:?]
    at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
    at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 ~[junit-4.13.2.jar:4.13.2]
    at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 ~[junit-4.13.2.jar:4.13.2]
    at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 ~[junit-4.13.2.jar:4.13.2]
    at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 ~[junit-4.13.2.jar:4.13.2]
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 
~[junit-4.13.2.jar:4.13.2]
    at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
 ~[junit-4.13.2.jar:4.13.2]
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 
~[junit-4.13.2.jar:4.13.2]
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
 ~[junit-4.13.2.jar:4.13.2]
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
 ~[junit-4.13.2.jar:4.13.2]
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 
~[junit-4.13.2.jar:4.13.2]
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 
~[junit-4.13.2.jar:4.13.2]
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 
~[junit-4.13.2.jar:4.13.2]
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 
~[junit-4.13.2.jar:4.13.2]
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 
~[junit-4.13.2.jar:4.13.2]
    at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
~[junit-4.13.2.jar:4.13.2]
    at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
~[junit-4.13.2.jar:4.13.2]
    at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
 ~[junit-4.13.2.jar:4.13.2]
    at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
 ~[junit-4.13.2.jar:4.13.2]
    at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
    at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.lang.IndexOutOfBoundsException: index (21) must be less than 
size (1)
    at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:1371)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
    at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:1353)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
    at 
org.apache.hadoop.hbase.io.util.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:153)
 ~[classes/:?]
    at 
org.apache.hadoop.hbase.io.util.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:79)
 ~[classes/:?]
    at 
org.apache.hadoop.hbase.io.util.LRUDictionary.getEntry(LRUDictionary.java:43) 
~[classes/:?]
    at 
org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$CompressedKvDecoder.readIntoArray(WALCellCodec.java:366)
 ~[classes/:?]
    at 
org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$CompressedKvDecoder.parseCell(WALCellCodec.java:307)
 ~[classes/:?]
    at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:66) 
~[classes

[jira] [Resolved] (HBASE-28370) Default user quotas are refreshing too frequently

2024-02-19 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28370.
---
Fix Version/s: 2.6.0
   3.0.0-beta-2
   Resolution: Fixed

> Default user quotas are refreshing too frequently
> -
>
> Key: HBASE-28370
> URL: https://issues.apache.org/jira/browse/HBASE-28370
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ray Mattingly
>Assignee: Ray Mattingly
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> In [https://github.com/apache/hbase/pull/5666] we introduced default user 
> quotas, but I accidentally called UserQuotaState's default constructor rather 
> than passing in the current timestamp. The consequence is that we're 
> constantly refreshing these default user quotas, and this can be a bottleneck 
> for horizontal cluster scalability.
> This should be a 1 line fix in QuotaUtil's buildDefaultUserQuotaState method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28376) Column family ns does not exist in region during upgrade to 3.0.0-beta-2

2024-02-17 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28376:
-

 Summary: Column family ns does not exist in region during upgrade 
to 3.0.0-beta-2
 Key: HBASE-28376
 URL: https://issues.apache.org/jira/browse/HBASE-28376
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault


Upgrading from 2.5.x to 3.0.0-alpha-2, migrateNamespaceTable kicks in to copy 
data from the namespace table to an "ns" family of the meta table. If you don't 
have an "ns" family, the migration fails and the hmaster will crash loop. You 
then can't rollback, because the briefly alive upgraded hmaster created a 
procedure that can't be deserialized by 2.x (I don't have this log handy 
unfortunately). I tried pushing code to create the ns family on startup, but it 
doesnt work becuase the migration happens while the hmaster is still 
initializing.

So it seems imperative that you create the ns family before upgrading. We 
should handle this more gracefully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28365) ChaosMonkey batch suspend/resume action assume shell implementation

2024-02-13 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28365:
-

 Summary: ChaosMonkey batch suspend/resume action assume shell 
implementation
 Key: HBASE-28365
 URL: https://issues.apache.org/jira/browse/HBASE-28365
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault


These two actions have code like this:
{code:java}
case SUSPEND:
  server = serversToBeSuspended.remove();
  try {
suspendRs(server);
  } catch (Shell.ExitCodeException e) {
LOG.warn("Problem suspending but presume successful; code={}", 
e.getExitCode(), e);
  }
  suspendedServers.add(server);
  break; {code}
This only catches that one Shell.ExitCodeException, but operators may have an 
implementation of ClusterManager which does not use shell. We should expand 
this to catch all exceptions.

The implication here is that the uncaught exception propagates, and we don't 
add the server to suspendedServers. If the suspension actually succeeded, this 
leaves some processes in a permanently suspended state until manual 
intervention occurs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28364) Warn: Cache key had block type null, but was found in L1 cache

2024-02-13 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28364:
-

 Summary: Warn: Cache key had block type null, but was found in L1 
cache
 Key: HBASE-28364
 URL: https://issues.apache.org/jira/browse/HBASE-28364
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault


I'm ITBLL testing branch-2.6 and am seeing lots of these warns. This is new to 
me. I would expect a warn to be on the rare side or be indicative of a problem, 
but unclear from the code.

cc [~wchevreuil] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28363) Noisy exception from FlushRegionProcedure when result is CANNOT_FLUSH

2024-02-13 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28363:
-

 Summary: Noisy exception from FlushRegionProcedure when result is 
CANNOT_FLUSH
 Key: HBASE-28363
 URL: https://issues.apache.org/jira/browse/HBASE-28363
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault


Running ITBLL with chaos monkey in HBASE-28233. I noticed lots of exceptions:
{code:java}
[RS_FLUSH_OPERATIONS-regionserver/test-host:60020-1 
{event_type=RS_FLUSH_REGIONS, pid=741536}] ERROR 
org.apache.hadoop.hbase.regionserver.handler.RSProcedureHandler: pid=741536
java.io.IOException: Unable to complete flush {ENCODED => 
371d2ba6875913542893642c94634226, NAME => 
'IntegrationTestBigLinkedList,-\x82\xD8-\x82\xD8-\x80,1707761077516.371d2ba6875913542893642c94634226.',
 STARTKEY =
> '-\x82\xD8-\x82\xD8-\x80', ENDKEY => '3330'}
        at 
org.apache.hadoop.hbase.regionserver.FlushRegionCallable.doCall(FlushRegionCallable.java:61)
 ~[hbase-server-2.6-hubspot-SNAPSHOT.jar:2.6-hubspot-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.BaseRSProcedureCallable.call(BaseRSProcedureCallable.java:35)
 ~[hbase-server-2.6-hubspot-SNAPSHOT.jar:2.6-hubspot-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.BaseRSProcedureCallable.call(BaseRSProcedureCallable.java:23)
 ~[hbase-server-2.6-hubspot-SNAPSHOT.jar:2.6-hubspot-SNAPSHOT]
        at 
org.apache.hadoop.hbase.regionserver.handler.RSProcedureHandler.process(RSProcedureHandler.java:51)
 ~[hbase-server-2.6-hubspot-SNAPSHOT.jar:2.6-hubspot-SNAPSHOT]
        at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) 
~[hbase-server-2.6-hubspot-SNAPSHOT.jar:2.6-hubspot-SNAPSHOT]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) 
~[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) 
~[?:?]
        at java.lang.Thread.run(Thread.java:840) ~[?:?] {code}
I took a look at the HRegion.flushcache code, and there are 3 reasons for 
CANNOT_FLUSH. All only print at debug log level and none look like actual 
errors.

I think we shouldn't throw an exception here, or at least should downgrade to 
debug. It looks like a problem, but isn't (i dont think).

cc [~frostruan] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28362) NPE calling bootstrapNodeManager during RegionServer initialization

2024-02-13 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28362:
-

 Summary: NPE calling bootstrapNodeManager during RegionServer 
initialization
 Key: HBASE-28362
 URL: https://issues.apache.org/jira/browse/HBASE-28362
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault


Shortly after starting up, if a RegionServer is getting requests from clients 
before it's ready (i.e. it restarts and they haven't cleared meta cache yet), 
it will throw an NPE. This is because netty may bind and start accepting 
requests before HRegionServer.preRegistrationInitialization finishes.

I think this is similar to https://issues.apache.org/jira/browse/HBASE-28088. 
It's not critical because the RS self-resolves within a few seconds, but it 
causes noise in the logs and probably errors for clients.
{code:java}
2024-02-13T18:24:02,537 [RpcServer.default.FPBQ.handler=6,queue=6,port=60020 
{}] ERROR org.apache.hadoop.hbase.ipc.RpcServer: Unexpected throwable object
java.lang.NullPointerException: Cannot invoke 
"org.apache.hadoop.hbase.regionserver.BootstrapNodeManager.getBootstrapNodes()" 
because "this.bootstrapNodeManager" is null
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getBootstrapNodes(HRegionServer.java:4179)
 ~[hbase-server-2.6-hubspot-SNAPSHOT.jar:2.6-hubspot-SNAPSHOT]
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getAllBootstrapNodes(RSRpcServices.java:4140)
 ~[hbase-server-2.6-hubspot-SNAPSHOT.jar:2.6-hubspot-SNAPSHOT]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.BootstrapNodeProtos$BootstrapNodeService$2.callBlockingMethod(BootstrapNodeProtos.java:1259)
 ~[hbase-protocol-shaded-2.6-hubspot-SNAPSHOT.jar:2.6-hubspot-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:438) 
~[hbase-server-2.6-hubspot-SNAPSHOT.jar:2.6-hubspot-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
~[hbase-server-2.6-hubspot-SNAPSHOT.jar:2.6-hubspot-SNAPSHOT] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28360) [hbase-thirdparty] Upgrade Netty to 4.1.107.Final

2024-02-13 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28360.
---
Fix Version/s: thirdparty-4.1.6
 Assignee: Bryan Beaudreault
   Resolution: Fixed

Thanks [~nihaljain.cs] and [~rajeshbabu] for the review

> [hbase-thirdparty] Upgrade Netty to 4.1.107.Final
> -
>
> Key: HBASE-28360
> URL: https://issues.apache.org/jira/browse/HBASE-28360
> Project: HBase
>  Issue Type: Task
>    Reporter: Bryan Beaudreault
>    Assignee: Bryan Beaudreault
>Priority: Major
> Fix For: thirdparty-4.1.6
>
>
> https://netty.io/news/2024/02/13/4-1-107-Final.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Quick update on 2.6.0

2024-02-13 Thread Bryan Beaudreault
Netty 4.1.107.Final has been released. I pushed
https://github.com/apache/hbase-thirdparty/pull/108 to upgrade. Once that
is merged, I will look into how to do an hbase-thirdparty release (unless
anyone else has bandwidth?)

I also finally got ITBLL/ChaosMonkey working in my company's environment.
Am running ITBLL on 2.6.0-SNAPSHOT now.

On Thu, Jan 25, 2024 at 9:34 PM Bryan Beaudreault 
wrote:

> Also we need a release of hbase-thirdparty first anyway for some CVEs, so
> just trying to avoid having to make another release right afterward for
> this other fix.
>
> On Thu, Jan 25, 2024 at 9:33 PM Bryan Beaudreault 
> wrote:
>
>> No it’s not critical. It’s a small thing related to certificate parsing (
>> https://github.com/netty/netty/issues/13796). We need it for our usage
>> of https://github.com/apache/hbase/pull/5644. This wouldn't need to
>> block things, but I think the timing is good, and it also would be nice to
>> ship the TLS features without any known issues. If there were a more urgent
>> need, we could definitely go without.
>>
>> On Thu, Jan 25, 2024 at 8:11 PM 张铎(Duo Zhang) 
>> wrote:
>>
>>> Sounds good.
>>>
>>> And I think in this way we could also expect the rpc connection
>>> registry related issues to be fixed before tagging 2.6.0RC0.
>>>
>>> BTW, what is the TLS fix in netty? Is it critical?
>>>
>>> Thanks.
>>>
>>> Bryan Beaudreault  于2024年1月25日周四 22:08写道:
>>> >
>>> > We are almost ready. In my spare time I am still doing some testing on
>>> a
>>> > larger cluster. My initial tests uncovered some minor issues, which
>>> have
>>> > been fixed. If there are no other issues, we can move forward.
>>> >
>>> > For now I would like to wait for netty release 4.1.107.Final, which
>>> Normal
>>> > tells me should be in the next 2-3 weeks. The reason is that there is
>>> a fix
>>> > for TLS, and 2.6.0 is our first release with TLS.
>>> >
>>> > So my checklist at this point is:
>>> > 1. Continue testing on a larger cluster
>>> > 2. Resolve fixVersions discrepancies (there are only a few)
>>> > 3. Update hbase-thirdparty to 4.1.107.Final and create a release of
>>> that.
>>> > 4. Update hbase to use that new release of hbase-thirdparty
>>> > 5. Create 2.6.0RC0
>>> >
>>> > Let me know if anyone has issues with this timeline. We don't
>>> absolutely
>>> > _have_ to wait for netty, but it would be nice.
>>>
>>


[jira] [Created] (HBASE-28360) [hbase-thirdparty] Upgrade Netty to 4.1.107.Final

2024-02-13 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28360:
-

 Summary: [hbase-thirdparty] Upgrade Netty to 4.1.107.Final
 Key: HBASE-28360
 URL: https://issues.apache.org/jira/browse/HBASE-28360
 Project: HBase
  Issue Type: Task
Reporter: Bryan Beaudreault


https://netty.io/news/2024/02/13/4-1-107-Final.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28359) Improve quota RateLimiter synchronization

2024-02-13 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28359:
-

 Summary: Improve quota RateLimiter synchronization
 Key: HBASE-28359
 URL: https://issues.apache.org/jira/browse/HBASE-28359
 Project: HBase
  Issue Type: Improvement
Reporter: Bryan Beaudreault


We've been experiencing RpcThrottlingException with 0ms waitInterval. This 
seems odd and wasteful, since the client side will immediately retry without 
backoff. I think the problem is related to the synchronization of RateLimiter.

The TimeBasedLimiter checkQuota method does the following:
{code:java}
if (!reqSizeLimiter.canExecute(estimateWriteSize + estimateReadSize)) {
  RpcThrottlingException.throwRequestSizeExceeded(
reqSizeLimiter.waitInterval(estimateWriteSize + estimateReadSize));
} {code}
Both canExecute and waitInterval are synchronized, but we're calling them 
independently. So it's possible under high concurrency for canExecute to return 
false, but then waitInterval returns 0 (would have been true)

I think we should simplify the API to have a single synchronized call:
{code:java}
long waitInterval = reqSizeLimiter.tryAcquire(estimateWriteSize + 
estimateReadSize);
if (waitInterval > 0) {
  RpcThrottlingException.throwRequestSizeExceeded(waitInterval);
}{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28352) HTable batch does not honor RpcThrottlingException waitInterval

2024-02-11 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28352.
---
Fix Version/s: 2.6.0
 Assignee: Bryan Beaudreault
   Resolution: Fixed

Pushed to branch-2 and branch-2.6. I did not include in branch-2.5, because it 
seems we did not backport the original waitInterval support there. If we want 
it there, we should also backport HBASE-27798.

Thanks [~zhangduo] for the review!

> HTable batch does not honor RpcThrottlingException waitInterval
> ---
>
> Key: HBASE-28352
> URL: https://issues.apache.org/jira/browse/HBASE-28352
> Project: HBase
>  Issue Type: Bug
>    Reporter: Bryan Beaudreault
>    Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0
>
>
> I noticed that we only honor the waitInterval in 
> RpcRetryingCaller.callWithRetries. But HTable.batch (AsyncProcess) uses 
> custom retry logic. We need to update it to honor the waitInterval



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28358) AsyncProcess inconsistent exception thrown for operation timeout

2024-02-11 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28358:
-

 Summary: AsyncProcess inconsistent exception thrown for operation 
timeout
 Key: HBASE-28358
 URL: https://issues.apache.org/jira/browse/HBASE-28358
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault


I'm not sure if I'll get to this, but wanted to log it as a known issue.

AsyncProcess has a design where it breaks the batch into sub-batches based on 
regionserver, then submits a callable per regionserver in a threadpool. In the 
main thread, it calls waitUntilDone() with an operation timeout. If the 
callables don't finish within the operation timeout, a SocketTimeoutException 
is thrown. This exception is not very useful because it doesn't give you any 
sense of how many calls were in progress, on which servers, or why it's delayed.

Recently we've been improving the adherence to operation timeout within the 
callables themselves. The main driver here has been to ensure we don't 
erroneously clear the meta cache for operation timeout related errors. So we've 
added a new OperationTimeoutExceededException, which is thrown from within the 
callables and does not cause a meta cache clear. The added benefit is that if 
these bubble up to the caller, they are wrapped in 
RetriesExhaustedWithDetailsException which includes a lot more info about which 
server and which action is affected. 

Now we've covered most but not all cases where operation timeout is exceeded. 
So when exceeding operation timeout it's possible sometimes to see a 
SocketTimeoutException from waitUntilDone, and sometimes see 
OperationTimeoutExceededException from the callables. It will depend on which 
one fails first. It may be nice to finish the swing here, ensuring that we 
always throw OperationTimeoutExceededException from the callables.

The main remaining case is in the call to locateRegion, which hits meta and 
does not honor the call's operation timeout (instead meta operation timeout). 
Resolving this would require some refactoring of 
ConnectionImplementation.locateRegion to allow passing an operation timeout and 
having that affect the userRegionLock and meta scan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28349) Atomic requests should increment read usage in quotas

2024-02-09 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28349.
---
Fix Version/s: 2.6.0
   3.0.0-beta-2
 Release Note: Conditional atomic mutations which involve a 
read-modify-write (increment/append) or check-and-mutate, will now count as 
both a read and write when evaluating quotas. Previously they would just count 
as a write, despite involving a read as well.
   Resolution: Fixed

> Atomic requests should increment read usage in quotas
> -
>
> Key: HBASE-28349
> URL: https://issues.apache.org/jira/browse/HBASE-28349
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ray Mattingly
>Assignee: Ray Mattingly
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> Right now atomic operations are just treated as a single write from the quota 
> perspective. Since an atomic operation also encompasses a read, it would make 
> sense to increment readNum and readSize counts appropriately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28354) RegionSizeCalculator throws NPE when regions are in transition

2024-02-09 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28354:
-

 Summary: RegionSizeCalculator throws NPE when regions are in 
transition
 Key: HBASE-28354
 URL: https://issues.apache.org/jira/browse/HBASE-28354
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault


When a region is in transition, it may briefly have a null ServerName in meta. 
The RegionSizeCalculator calls RegionLocator.getAllRegionLocations() and does 
not handle the possibility that a RegionLocation.getServerName() could be null. 
The ServerName is eventually passed into an Admin call, which results in an NPE.

This has come up in other contexts. For example, taking a look at 
getAllRegionLocations() impl, we have checks to ensure that we don't call null 
server names. We need to similarly handle the possibility of nulls in 
RegionSizeCalculator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Pre splitting an existing table to avoid hotspotting issues.

2024-02-08 Thread Bryan Beaudreault
Yep, I forgot about that nuance. I agree we can add a splitRegion overload
which takes a byte[][] for multiple split points.

On Thu, Feb 8, 2024 at 8:23 PM Andrew Purtell  wrote:

> Rushabh already covered this but splitting is not complete until the region
> can be split again. This is a very important nuance. The daughter regions
> are online very quickly, as designed, but then background housekeeping
> (compaction) must copy the data before the daughters become splittable.
> Depending on compaction pressure, compaction queue depth, and the settings
> of various tunables, waiting for some split daughters to become ready to
> split again can take many minutes to hours.
>
> So let's say we have replicas of a table at two sites, site A and site B.
> The region boundaries of this table in A and B will be different. Now let's
> also say that table data is stored with a key prefix mapping to every
> unique tenant. When migrating a tenant, data copy will hotspot on the
> region(s) hosting keys with the tenant's prefix. This is fine if there are
> enough regions to absorb the load. We run into trouble when the region
> boundaries in the sub-keyspace of interest are quite different in B versus
> A. We get hotspotting and impact to operations until organic splitting
> eventually mitigates the hotspotting, but this might also require many
> minutes to hours, with noticeable performance degradation in the meantime.
> To avoid that degradation we pace the sender but then the copy may take so
> long as to miss SLA for the migration. To make the data movement performant
> and stay within SLA we want to apply one or more splits or merges so the
> region boundaries B roughly align to A, avoiding hotspotting. This will
> also make shipping this data by bulk load instead efficient too by
> minimizing the amount of HFile splitting necessary to load them at the
> receiver.
>
> So let's say we have some regions that need to be split N ways, where N is
> order of ~10, by that I mean more than 1 and less than 100, in order to
> (roughly) align region boundaries. We think this calls for an enhancement
> to the split request API where the split should produce a requested number
> of daughter-pairs. Today that is always 1 pair. Instead we might want 2, 5,
> 10, conceivably more. And it would be nice if guideposts for multi-way
> splitting can be sent over in byte[][].
>
> On Wed, Feb 7, 2024 at 10:03 AM Bryan Beaudreault  >
> wrote:
>
> > This is the first time I've heard of a region split taking 4 minutes. For
> > us, it's always on the order of seconds. That's true even for a large
> 50+gb
> > region. It might be worth looking into why that's so slow for you.
> >
> > On Wed, Feb 7, 2024 at 12:50 PM Rushabh Shah
> >  wrote:
> >
> > > Thank you Andrew, Bryan and Duo for your responses.
> > >
> > > > My main thought is that a migration like this should use bulk
> loading,
> > > > But also, I think, that data transfer should be in bulk
> > >
> > > We are working on moving to bulk loading.
> > >
> > > > With Admin.splitRegion, you can specify a split point. You can use
> that
> > > to
> > > iteratively add a bunch of regions wherever you need them in the
> > keyspace.
> > > Yes, it's 2 at a time, but it should still be quick enough in the grand
> > > scheme of a large migration.
> > >
> > >
> > > Trying to do some back of the envelope calculations.
> > > In a production environment, it took around 4 minutes to split a
> recently
> > > split region which had 4 store files with a total of 5 GB of data.
> > > Assuming we are migrating 5000 tenants at a time and normally we have
> > > around 10% of the tenants (500 tenants) which have data
> > >  spread across more than 1000 regions. We have around 10 huge tables
> > where
> > > we store the tenant's data for different use cases.
> > > All the above numbers are on the *conservative* side.
> > >
> > > To create a split structure for 1000 regions, we need 10 iterations of
> > the
> > > splits (2^10 = 1024). This assumes we are parallely splitting the
> > regions.
> > > Each split takes around 4 minutes. So to create 1000 regions just for 1
> > > tenant and for 1 table, it takes around 40 minutes.
> > > For 10 tables for 1 tenant, it takes around 400 minutes.
> > >
> > > For 500 tenants, this will take around *140 days*. To reduce this time
> > > further, we can also create a split structure for each tenant and each
> > > table in parallel.
> > > But this would put a lot of pressure on the cluster

[jira] [Created] (HBASE-28352) HTable batch does not honor RpcThrottlingException waitInterval

2024-02-08 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28352:
-

 Summary: HTable batch does not honor RpcThrottlingException 
waitInterval
 Key: HBASE-28352
 URL: https://issues.apache.org/jira/browse/HBASE-28352
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault


I noticed that we only honor the waitInterval in 
RpcRetryingCaller.callWithRetries. But HTable.batch (AsyncProcess) uses custom 
retry logic. We need to update it to honor the waitInterval



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Pre splitting an existing table to avoid hotspotting issues.

2024-02-07 Thread Bryan Beaudreault
This is the first time I've heard of a region split taking 4 minutes. For
us, it's always on the order of seconds. That's true even for a large 50+gb
region. It might be worth looking into why that's so slow for you.

On Wed, Feb 7, 2024 at 12:50 PM Rushabh Shah
 wrote:

> Thank you Andrew, Bryan and Duo for your responses.
>
> > My main thought is that a migration like this should use bulk loading,
> > But also, I think, that data transfer should be in bulk
>
> We are working on moving to bulk loading.
>
> > With Admin.splitRegion, you can specify a split point. You can use that
> to
> iteratively add a bunch of regions wherever you need them in the keyspace.
> Yes, it's 2 at a time, but it should still be quick enough in the grand
> scheme of a large migration.
>
>
> Trying to do some back of the envelope calculations.
> In a production environment, it took around 4 minutes to split a recently
> split region which had 4 store files with a total of 5 GB of data.
> Assuming we are migrating 5000 tenants at a time and normally we have
> around 10% of the tenants (500 tenants) which have data
>  spread across more than 1000 regions. We have around 10 huge tables where
> we store the tenant's data for different use cases.
> All the above numbers are on the *conservative* side.
>
> To create a split structure for 1000 regions, we need 10 iterations of the
> splits (2^10 = 1024). This assumes we are parallely splitting the regions.
> Each split takes around 4 minutes. So to create 1000 regions just for 1
> tenant and for 1 table, it takes around 40 minutes.
> For 10 tables for 1 tenant, it takes around 400 minutes.
>
> For 500 tenants, this will take around *140 days*. To reduce this time
> further, we can also create a split structure for each tenant and each
> table in parallel.
> But this would put a lot of pressure on the cluster and also it will
> require a lot of operational overhead and still we will end up with
>  the whole process taking days, if not months.
>
> Since we are moving our infrastructure to Public Cloud, we anticipate this
> huge migration happening once every month.
>
>
> > Adding a splitRegion method that takes byte[][] for multiple split points
> would be a nice UX improvement, but not
> strictly necessary.
>
> IMHO for all the reasons stated above, I believe this is necessary.
>
>
>
>
>
> On Mon, Jan 29, 2024 at 6:25 AM 张铎(Duo Zhang) 
> wrote:
>
> > As it is called 'pre' split, it means that it can only happen when
> > there is no data in table.
> >
> > If there are already data in the table, you can not always create
> > 'empty' regions, as you do not know whether there are already data in
> > the given range...
> >
> > And technically, if you want to split a HFile into more than 2 parts,
> > you need to design new algorithm as now in HBase we only support top
> > reference and bottom reference...
> >
> > Thanks.
> >
> > Bryan Beaudreault  于2024年1月27日周六 02:16写道:
> > >
> > > My main thought is that a migration like this should use bulk loading,
> > > which should be relatively easy given you already use MR
> > > (HFileOutputFormat2). It doesn't solve the region-splitting problem.
> With
> > > Admin.splitRegion, you can specify a split point. You can use that to
> > > iteratively add a bunch of regions wherever you need them in the
> > keyspace.
> > > Yes, it's 2 at a time, but it should still be quick enough in the grand
> > > scheme of a large migration. Adding a splitRegion method that takes
> > > byte[][] for multiple split points would be a nice UX improvement, but
> > not
> > > strictly necessary.
> > >
> > > On Fri, Jan 26, 2024 at 12:10 PM Rushabh Shah
> > >  wrote:
> > >
> > > > Hi Everyone,
> > > > At my workplace, we use HBase + Phoenix to run our customer
> workloads.
> > Most
> > > > of our phoenix tables are multi-tenant and we store the tenantID as
> the
> > > > leading part of the rowkey. Each tenant belongs to only 1 hbase
> > cluster.
> > > > Due to capacity planning, hardware refresh cycles and most recently
> > move to
> > > > public cloud initiatives, we have to migrate a tenant from one hbase
> > > > cluster (source cluster) to another hbase cluster (target cluster).
> > > > Normally we migrate a lot of tenants (in 10s of thousands) at a time
> > and
> > > > hence we have to copy a huge amount of data (in TBs) from multiple
> > source
> > > > clusters to a single target cluster. We have our internal tool which
> &

[jira] [Resolved] (HBASE-27800) Add support for default user quotas using USER => 'all'

2024-02-07 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-27800.
---
Fix Version/s: 2.6.0
   3.0.0-beta-2
 Release Note: Adds a bunch of new configs for default user machine quotas: 
hbase.quota.default.user.machine.read.num, 
hbase.quota.default.user.machine.read.size, 
hbase.quota.default.user.machine.write.num, 
hbase.quota.default.user.machine.write.size, 
hbase.quota.default.user.machine.request.num, 
hbase.quota.default.user.machine.request.size. Setting any these will apply the 
given limit as a default for users which are not explicitly covered by existing 
quotas defined through set_quota, etc.
   Resolution: Fixed

> Add support for default user quotas using USER => 'all' 
> 
>
> Key: HBASE-27800
> URL: https://issues.apache.org/jira/browse/HBASE-27800
> Project: HBase
>  Issue Type: Improvement
>    Reporter: Bryan Beaudreault
>Assignee: Ray Mattingly
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> If someone sets a quota with USER => 'all' (or maybe '*'), treat that as a 
> default quota for each individual user. When a request comes from a user, it 
> will lookup current QuotaState based on username. If one doesn't exist, it 
> will be pre-filled with whatever the 'all' quota was set to. Otherwise, if 
> you then define a quota for a specific user that will override whatever 
> default you have set for that user only.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-28345) Close HBase connection on exit from HBase Shell

2024-02-07 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault reopened HBASE-28345:
---

I dont see this backported to branch-3. Are you sure you cherry-picked 
everywhere?

> Close HBase connection on exit from HBase Shell
> ---
>
> Key: HBASE-28345
> URL: https://issues.apache.org/jira/browse/HBASE-28345
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 2.4.17
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 2.5.8, 3.0.0-beta-2
>
>
> When using Netty for the ZK client, hbase shell hangs on exit.
> This is caused by the non-deamon Netty threads that ZK creates.
> Wheter ZK should create daemon threads for Netty or not is debatable, but 
> explicitly closing the connection in hbase shell on exit fixes the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28348) Multi should return what results it can before rpc timeout

2024-02-07 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28348:
-

 Summary: Multi should return what results it can before rpc timeout
 Key: HBASE-28348
 URL: https://issues.apache.org/jira/browse/HBASE-28348
 Project: HBase
  Issue Type: Improvement
Reporter: Bryan Beaudreault


Scans have a nice feature where they try to return a heartbeat with whatever 
results they have accumulated before the rpc timeout expires. It targets 
returning in 1/2 the rpc timeout or max scanner time. The reason for scans is 
to avoid painful scanner timeouts which cause the scan to have to be restarted 
due to out of sync sequence id.

Multis have a similar problem. A big batch can come in which can't be served in 
the configured timeout. In this case the client side will abandon the request 
when the timeout is exceeded, and resubmit if there are retries/operation 
timeout left. This wastes work since it's likely that some of the results had 
been fetched by the time a timeout occurred.

Multis already can retry immediately when the batch exceeds the max result size 
limit. We can use the same functionality to also return when we've taken more 
than half the rpc timeout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28347) Update ref guide about isolation guarantees for scans

2024-02-06 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28347:
-

 Summary: Update ref guide about isolation guarantees for scans
 Key: HBASE-28347
 URL: https://issues.apache.org/jira/browse/HBASE-28347
 Project: HBase
  Issue Type: Improvement
Reporter: Bryan Beaudreault


In the "Consistency of Scans" section of 
[https://hbase.apache.org/acid-semantics.html,] there is some confusing and 
outdated information. First it's hard to realize that it's specifically talking 
about consistency across rows. Secondly, it's outdated because in modern hbase 
we acquire and maintain a memstore readPt for the lifetime of a scan in a 
region. So we should retain read committed behavior across rows, at least 
within the scope of a region.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27687) Enhance quotas to consume blockBytesScanned rather than response size

2024-02-06 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-27687.
---
Fix Version/s: 2.6.0
   3.0.0-beta-2
 Release Note: Read size quotas are now evaluated against block bytes 
scanned for a request, rather than result size. Block bytes scanned is a 
measure of the total size in bytes of all hfile blocks opened to serve a 
request. This results in a much more accurate picture of actual work done by a 
query and is the recommended mode. One can revert to the old behavior by 
setting hbase.quota.use.result.size.bytes to true.
   Resolution: Fixed

> Enhance quotas to consume blockBytesScanned rather than response size
> -
>
> Key: HBASE-27687
> URL: https://issues.apache.org/jira/browse/HBASE-27687
> Project: HBase
>  Issue Type: Improvement
>    Reporter: Bryan Beaudreault
>    Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> As of HBASE-27558 we now apply quota.getReadAvailable() to max block bytes 
> scanned by scans/multis. This issue enhances further so that we can track 
> read size consumed in Quotas based on block bytes scanned rather than 
> response size. In this mode, quotas would end-to-end be based on 
> blockBytesScanned.
> Right now we call quota.addGetResult or addScanResult. This would just be a 
> matter of no-oping those calls, and calling RpcCall.getBlockBytesScanned() in 
> Quota.close() instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28346) Expose checkQuota to Coprocessor Endpoints

2024-02-06 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28346:
-

 Summary: Expose checkQuota to Coprocessor Endpoints
 Key: HBASE-28346
 URL: https://issues.apache.org/jira/browse/HBASE-28346
 Project: HBase
  Issue Type: Improvement
Reporter: Bryan Beaudreault


Coprocessor endpoints may do non-trivial amounts of work, yet quotas do not 
throttle them. We can't generically apply quotas to coprocessors because we 
have no information on what a particular endpoint might do. One thing we could 
do is expose checkQuota to the RegionCoprocessorEnvironment. This way, 
coprocessor authors have the tools to ensure that quotas cover their 
implementations.

While adding this, we can update AggregationImplementation to call checkQuota 
since those endpoints can be quite expensive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28343) Write codec class into hfile header/trailer

2024-02-05 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28343:
-

 Summary: Write codec class into hfile header/trailer
 Key: HBASE-28343
 URL: https://issues.apache.org/jira/browse/HBASE-28343
 Project: HBase
  Issue Type: Improvement
Reporter: Bryan Beaudreault


We recently started playing around with the new bundled compression libraries 
as of 2.5.0. Specifically, we are experimenting with the different zstd codecs. 
The book says that aircompressor's zstd is not data compatible with hadoops, 
but doesn't say the same about zstd-jni.

In our experiments we ended up in a state where some hfiles were encoded with 
zstd-jni (zstd.ZstdCodec) while others were encoded with hadoop 
(ZStandardCodec). At this point the cluster became extremely unstable, with 
some files unable to be read because they encoded with a codec that didn't 
match the current runtime configration. Changing the runtime configuration 
caused the other files to not be readable.

I think this problem could be solved by writing the classname of the codec used 
into the hfile. This could be used as a hint so that a regionserver can read 
hfiles compressed with any compression codec that it supports.

[~apurtell] do you have any thoughts here since you brought us all of these 
great compression options?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28216) HDFS erasure coding support for table data dirs

2024-02-05 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-28216.
---
Fix Version/s: 2.6.0
   3.0.0-beta-2
 Release Note: If you use hadoop3, managing the erasure coding policy of a 
table's data directory is now possible with a new table descriptor setting 
ERASURE_CODING_POLICY. The policy you set must be available and enabled in 
hdfs, and hbase will validate that your cluster topology is sufficient to 
support that policy. After setting the policy, you must major compact the table 
for the change to take effect. Attempting to use this feature with hadoop2 will 
fail a validation check prior to making any changes.
   Resolution: Fixed

Thanks [~weichiu], [~nihaljain.cs], and [~zhangduo] for the advice and reviews! 
Merged to 2.6+.

We've been running this in production and it's helping to cut costs on some of 
our clusters.

> HDFS erasure coding support for table data dirs
> ---
>
> Key: HBASE-28216
> URL: https://issues.apache.org/jira/browse/HBASE-28216
> Project: HBase
>  Issue Type: New Feature
>    Reporter: Bryan Beaudreault
>    Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: patch-available, pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> [Erasure 
> coding|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html]
>  (EC) is a hadoop-3 feature which can drastically reduce storage 
> requirements, at the expense of locality. At my company we have a few hbase 
> clusters which are extremely data dense and take mostly write traffic, fewer 
> reads (cold data). We'd like to reduce the cost of these clusters, and EC is 
> a great way to do that since it can reduce replication related storage costs 
> by 50%.
> It's possible to enable EC policies on sub directories of HDFS. One can 
> manually set this with {{{}hdfs ec -setPolicy -path 
> /hbase/data/default/usertable -policy {}}}. This can work without any 
> hbase support.
> One problem with that is a lack of visibility by operators into which tables 
> might have EC enabled. I think this is where HBase can help. Here's my 
> proposal:
>  * Add a new TableDescriptor and ColumnDescriptor field ERASURE_CODING_POLICY
>  * In ModifyTableProcedure preflightChecks, if ERASURE_CODING_POLICY is set, 
> verify that the requested policy is available and enabled via 
> DistributedFileSystem.
> getErasureCodingPolicies().
>  * During ModifyTableProcedure, add a new state for 
> MODIFY_TABLE_SYNC_ERASURE_CODING_POLICY.
>  ** When adding or changing a policy, use DistributedFileSystem.
> setErasureCodingPolicy to sync it for the data and archive dir of that table 
> (or column in table)
>  ** When removing the property or setting it to empty, use 
> DistributedFileSystem.
> unsetErasureCodingPolicy to remove it from the data and archive dir.
> Since this new API is in hadoop-3 only, we'll need to add a reflection 
> wrapper class for managing the calls and verifying that the API is available. 
> We'll similarly do that API check in preflightChecks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   >