Re: Please add me to Jira

2022-07-01 Thread Josh Elser

You're now a contributor :)

On 6/27/22 10:19 AM, Luca Kovács wrote:

Hello,

My name is Luca, and I would like to contribute to the Apache project.
Please add me to HBase Jira project.

My username is: lkovacs

Many thanks,
Luca



[jira] [Resolved] (HBASE-20951) Ratis LogService backed WALs

2022-06-13 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-20951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-20951.

Resolution: Later

> Ratis LogService backed WALs
> 
>
> Key: HBASE-20951
> URL: https://issues.apache.org/jira/browse/HBASE-20951
> Project: HBase
>  Issue Type: New Feature
>  Components: wal
>    Reporter: Josh Elser
>Priority: Major
>
> Umbrella issue for the Ratis+WAL work:
> Design doc: 
> [https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#|https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit]
> The (over-simplified) goal is to re-think the current WAL APIs we have now, 
> ensure that they are de-coupled from the notion of being backed by HDFS, swap 
> the current implementations over to the new API, and then wire up the Ratis 
> LogService to the new WAL API.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HBASE-27042) hboss doesn't compile against hadoop branch-3.3 now that s3guard is cut

2022-05-23 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-27042.

Hadoop Flags: Reviewed
Release Note: Adds support for Apache Hadoop 3.3.3 and removes S3Guard 
vestiges.
  Resolution: Fixed

Thanks Steve!

> hboss doesn't compile against hadoop branch-3.3 now that s3guard is cut
> ---
>
> Key: HBASE-27042
> URL: https://issues.apache.org/jira/browse/HBASE-27042
> Project: HBase
>  Issue Type: Bug
>  Components: hboss
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: hbase-filesystem-1.0.0-alpha2
>
>
> HBoss doesn't compile against hadoop builds containing HADOOP-17409, "remove 
> s3guard", as test setup tries to turn it off.
> there's no need for s3guard any more, so hboss can just avoid all settings 
> and expect it to be disabled (hadoop 3.3.3. or earlier) or removed (3.4+)
> (hboss version is 1.0.0-alpha2-SNAPSHOT)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-27044) Serialized procedures which point to users from other Kerberos domains can prevent master startup

2022-05-16 Thread Josh Elser (Jira)
Josh Elser created HBASE-27044:
--

 Summary: Serialized procedures which point to users from other 
Kerberos domains can prevent master startup
 Key: HBASE-27044
 URL: https://issues.apache.org/jira/browse/HBASE-27044
 Project: HBase
  Issue Type: Bug
  Components: proc-v2
Reporter: Josh Elser


We ran into an interesting bug when test teams were running HBase against cloud 
storage without ensuring that the previous location was cleaned. This resulted 
in an hbase.rootdir that had:
 * A valid HBase MasterData Region
 * A valid hbase:meta
 * A valid collection of HBase tables
 * An empty ZooKeeper

Through the changes that we've worked on prior, those described in HBASE-24286 
were effective in getting every _except_ the Procedures back online without 
issue. Parsing the existing procedures produced an interesting error:
{noformat}
java.lang.IllegalArgumentException: Illegal principal name 
hbase/wrong-hostname.domain@WRONG_REALM: 
org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No 
rules applied to hbase/wrong-hostname.domain@WRONG_REALM
at org.apache.hadoop.security.User.(User.java:51)
at org.apache.hadoop.security.User.(User.java:43)
at 
org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1418)
at 
org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1402)
at 
org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.toUserInfo(MasterProcedureUtil.java:60)
at 
org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.deserializeStateData(ModifyTableProcedure.java:262)
at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:294)
at 
org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
at 
org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:411)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:78)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.load(ProcedureExecutor.java:339)
at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:285)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:330)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:600)
at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1581)
at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:835)
at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2205)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:514)
at java.lang.Thread.run(Thread.java:750) {noformat}
What's actually happening is that we are storing the {{User}} into the 
procedure and then relying on UserGroupInformation to parse the {{User}} 
protobuf into a UGI to get the "short" username.

When the serialized procedure (whether in the MasterData region over via PV2 
WAL files, I think) gets loaded, we end up needing Hadoop auth_to_local 
configuration to be able to parse that kerberos principal back to a name. 
However, Hadoop's KerberosName will only unwrap Kerberos principals which match 
the local Kerberos realm (defined by the krb5.conf's default_realm, 
[ref|https://github.com/frohoff/jdk8u-jdk/blob/master/src/share/classes/sun/security/krb5/Config.java#L978-L983])

The interesting part is that we don't seem to ever use the user _other_ than to 
display the {{owner}} attribute for procedures on the HBase UI. There is a 
method in hbase-procedure which can filter procedures based on Owner, but I 
didn't see any usages of that method.

Given the pushback against HBASE-24286, I assume that, for the same reasons, we 
would see pushback against fixing this issue. However, I wanted to call it out 
for posterity. The expectation of users is that HBase _should_ implicitly 
handle this case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [ANNOUNCE] New HBase committer Bryan Beaudreault

2022-04-26 Thread Josh Elser

Congrats Bryan!

On 4/9/22 7:44 AM, 张铎(Duo Zhang) wrote:

On behalf of the Apache HBase PMC, I am pleased to announce that Bryan
Beaudreault(bbeaudreault) has accepted the PMC's invitation to become a
committer on the project. We appreciate all of Bryan's generous
contributions thus far and look forward to his continued involvement.

Congratulations and welcome, Bryan Beaudreault!

我很高兴代表 Apache HBase PMC 宣布 Bryan Beaudreault 已接受我们的邀请,成为 Apache HBase 项目的
Committer。感谢 Bryan Beaudreault 一直以来为 HBase 项目做出的贡献,并期待他在未来继续承担更多的责任。

欢迎 Bryan Beaudreault!


[jira] [Resolved] (HBASE-26588) Implement a migration tool to help users migrate SFT implementation for a large set of tables

2022-04-04 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-26588.

Resolution: Later

Closing since we have HBASE-26673. Can re-open this if we have a reason that 
HBASE-26673 is insufficient.

> Implement a migration tool to help users migrate SFT implementation for a 
> large set of tables
> -
>
> Key: HBASE-26588
> URL: https://issues.apache.org/jira/browse/HBASE-26588
> Project: HBase
>  Issue Type: Sub-task
>  Components: tooling
>Reporter: Duo Zhang
>Priority: Major
>
> It will be very useful for our users who deploy HBase on S3 like systems.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: A tweak to our checkstyle configuration

2022-03-20 Thread Josh Elser

Going through my inbox...

1. Great to have tooling which can validate (and fix) code which is not 
currently to style.
2. I would prefer a style guide (such as Google's Java Style) which is 
"generally accepted" by the Java industry at large and we can use as-is. 
However, I don't feel strongly on this.
3. I have no objections to Nick's original ask to allow one-line if 
blocks on the same line of code with brackets.
4. I prefer brackets for one line if/else blocks over no-brackets for 
the same (as Andrew indicates about avoiding dangling if-else blocks), 
but would not -1 a change if the majority felt otherwise.


On 1/16/22 3:01 AM, 张铎(Duo Zhang) wrote:

On enforcing the coding standards, I've filed HBASE-26617, to introduce the
spotless plugin to HBase.

We can add 'mvn spotless:check'  to our pre commit checks, so we can
enforce the coding standards.

And 'mvn spotless:apply' will format everything for you.

Andrew Purtell  于2022年1月16日周日 07:39写道:


There are a handful of anti patterns to avoid, like dangling if-elses.
(Always use braces around code blocks!) Otherwise we have been following
the Java basic guidelines with modifications for indent width and maximum
line length and I see no pressing reason why this needs to change. Happy
with the status quo. That said I see no reason to reject Nicks’s small
proposed changes. We definitely don’t need to adopt a totally different
style guide in response to a modest proposal. This seems out of proportion
to the ask.

If we are going to change checkstyle rules it would be necessary for the
proposer to provide a linter for the rest of us to use as well as a Yetus
precommit phase that implements the checks. Otherwise it would be a half
completed proposal and worse than making no changes. Please also provide
HOWTOs for configuring the IDEA and Eclipse IDEs.


On Jan 15, 2022, at 1:07 AM, 张铎  wrote:

What about just switching to use google java style?

Nick Dimiduk  于2022年1月13日周四 03:22写道:


Hey all.

Discussion on the PR has resulted in an impasse of opinion, but also
renewed interest in improvements to static analysis in general
(HBASE-26617).

I think that this kind of code hygiene is very important for the

long-term

maintenance of a large project like ours and especially one that accepts
contributions from a broad audience. I would really appreciate it if

some

more folks would chime into these discussions on PRs, or bring your
concerns back up to this thread. I'm game to help see the work done,

but we

need more voices to participate in defining what is required by the
community.

Thanks in advance,
Nick


On Thu, Dec 9, 2021 at 3:58 PM Nick Dimiduk 

wrote:


Heya,

I have posted a small change to our checkstyle configuration on
HBASE-26536. This change will relax the whitespace rules regarding the
left-curly-bracket ('{') character. Specifically, I intend this change

to

allow short expressions that include a nested scope that fits entirely

on

one line. The example I provide is:

if (foo == null) { return null; }

This whitespace style is already present (though I think not in popular
usage) within the codebase. Please take a look and let me know if you

have

any concerns about making this change.

Thanks,
Nick

https://issues.apache.org/jira/browse/HBASE-26536
https://github.com/apache/hbase/pull/3913









Re: [DISCUSS] operator tools, HBase 3 and StoreFileTracking

2022-03-01 Thread Josh Elser
I tend to lean towards what Andrew is saying here, but I will also admit 
that this is in part from not having a good user-experience about 
getting up an HMaster in maintenance mode to do surgical stuff (feels 
like two steps instead of just one).


Naively, rebuilding the SFT meta files from the filesystem doesn't 
require the HMaster to be up because there isn't any other "state" to 
consider (which was a big reason behind pushing the work that hbck2 was 
doing into the active master to avoid split-brain).


Is doing logic in HBCK2 that doesn't talk to the HMaster a -1 from you, 
Duo? Similarly, is a utility in hbase-operator-tools (not a part of the 
hbck2 wrapper command) also a -1?


Either are feasible, but I do think trying to build this SFT 
rebuilding/recovery into a maintenance-mode HMaster will be more work.


On 2/21/22 12:27 PM, Andrew Purtell wrote:

There are some recovery cases where the cluster cannot be expected to be up
and running. What happens if we have no tooling for those? The user has a
dead cluster. So I don't think a requirement that the cluster be up and
running always is sufficient. For this type of recovery operator-tools must
be able to parse and write on disk formats. On the other hand hopefully the
cases for which that is not true are rare. In HBase 1, we had
OffineMetaRebuild. For my operations occasionally it has been necessary, in
test environments especially where users are not always clueful, and it has
shortened incident time from many hours to less than one hour. The
alternative would have been rebuild from scratch with total data loss,
which is a totally unsatisfying user experience.


On Sun, Feb 20, 2022 at 4:29 AM 张铎(Duo Zhang)  wrote:


Sorry a bit late...

IIRC, the design of HBCK2 is that, most of the actual fix logic should be
done inside hbase(usually as a procedure), and the hbase-operator-tools is
just a facade for calling these methods. It will query the cluster to find
out which features are supportted. So in general, the design here is to
always have the cluster up when fixing. We have a maintenance mode where we
will just bring up HMaster and make meta table online, without loading any
other regions.

So I prefer we just use snapshot dependencies of hbase in HBCK2. It is not
a big deal for end users as if we have not make the release yet, the new
fixing options can never be actually used against a production cluster.

Anyway, this means we need to publish nightly builds then.

Thanks.

Peter Somogyi  于2022年2月18日周五 06:40写道:


Makes sense. Thanks Andrew for clarifying!

On Thu, Feb 17, 2022, 21:28 Andrew Purtell  wrote:


On Thu, Feb 17, 2022 at 12:19 PM Peter Somogyi 
wrote:


I like the idea of including the store file tracking in 2.5.0 to

unblock

the HBCK development efforts.

Unfortunately, I was not following its development that much. Can it

cause

any issues if 2.5.0 has the feature but later an incompatible change

is

needed for SFT? Can it be marked as a beta feature where we are free

to

modify interfaces?



Yes, this is what I meant when I suggested we could mark it as
'experimental'. We have done this in the past. The word 'experimental'

is

prominently included adjacent to any discussion of the feature in
documentation and release notes. When we feel for sure it is stable

that

word is removed. We can do something different this time of course but

that

has been our past practice when introducing new functionality into
releasing code lines. And I presume we would use the Evolving interface
annotation everywhere.

Peter


On Tue, Feb 15, 2022 at 11:07 PM Andrew Purtell <

andrew.purt...@gmail.com>

wrote:


Another option which I do not see mentioned yet is to extract the

relevant

common proto and source files from the ‘hbase’ repository into a

new

repository (‘hbase-storage’?), from which we would release

artifacts

to

be

consumed by both hbase and hbase-operator-tools. This maintains

D.R.Y.

through refactoring although it may down the road cause some

complexity

in

coordinating evolution among the three (if not more) repositories

and

releases produced from them. This is like Josh’s Option 1 but

without

duplication.

Regarding the option 2 issue… If it would help we can drop SFT into
branch-2.5 along with the log4j2 changes and release 2.5.0

afterward.

We

are taking the opportunity of this minor increment to accelerate

log4j1

retirement, which is why it’s still waiting (but not for long). We

can

use

the same opportunity to release SFT even if we designate it as an
experimental feature if that would simplify some other logistics.

For

what

it’s worth.


On Feb 15, 2022, at 7:44 AM, Josh Elser 

wrote:


I was talking with Szabolcs prior to him sending this one, and

it's

a

tricky issue for sure.


To date, we've solved any HBase API issues by copying code into

HBCK2

e.g. HBCKMetaTableAccessor which copies parts of MetaTableAccessor,

or

we

push the logic down server-side to 

[jira] [Resolved] (HBASE-26767) Rest server should not use a large Header Cache.

2022-02-23 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-26767.

Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed! Thanks for the great work, Sergey.

> Rest server should not use a large Header Cache.
> 
>
> Key: HBASE-26767
> URL: https://issues.apache.org/jira/browse/HBASE-26767
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Affects Versions: 2.4.9
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-3, 2.4.10
>
>
> In the RESTServer we set the HeaderCache size to DEFAULT_HTTP_MAX_HEADER_SIZE 
> (65536). That's not compatible with jetty-9.4.x because the cache size is 
> limited by Character.MAX_VALUE - 1  (65534) there. According to the Jetty 
> source code comments, it's possible to have a buffer overflow in the cache 
> for higher values and that might lead to wrong/incomplete values returned by 
> cache and following incorrect header handling.  
> There are a couple of ways to fix it:
> 1. change the value of DEFAULT_HTTP_MAX_HEADER_SIZE to 65534
> 2. make header cache size configurable and set its size separately from the 
> header size. 
> I believe that the second would give us more flexibility.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] deprecating jdk8, progress on LTS jdk support

2022-02-17 Thread Josh Elser

On 2/16/22 12:24 AM, Sean Busbey wrote:

Regarding the original question, I would be in favor of the proposal. Time
marches on. I assume just to state the obvious that our destination of
minimum LTS would shift from 8 to 11.


Yes, sorry I should have expressly stated JDK11 would become the minimum
with some release after HBase 3.

I got here because I wanted to start working on qualifying JDK17 as a
runtime environment but then realized we were putting more caveats on JDK11
than I expected.

Hadoop 2 isn’t exactly dead, at least the source branch is still receiving

occasional update, but is not releasing. We should probably consider it
effectively EOL.


IIRC we've already dropped Hadoop 2 support for HBase 3.


Correct.


The Hadoop minimum could become 3.3. The primary consideration to my mind

is the state of S3A: in what version it can be said to be stable and
feature complete. I think 3.3 is the appropriate code line for that
criteria but perhaps 3.2 could serve as well.


I really like this as a criteria. Anyone else have an idea on this?


I believe we've been benefiting from S3A changes from Hadoop 3.3 inside 
at Cloudera already. However, I believe that we'll actually see more 
"pains" once we get the storefile tracking feature solid (whereas today, 
transient/perf problems we might face in S3A would be hidden by the fact 
that we're doubling our I/O costs on compaction, memstore flushes, etc).


I have not been following super-closely, but let me see if I can bring 
this in front of Steve or someone else from Cloudera to chime in.


Re: [DISCUSS] deprecating jdk8, progress on LTS jdk support

2022-02-15 Thread Josh Elser

Deprecating jdk8 for HBase 3 and requiring minJdk=11 seems reasonable to me.

Gotta start pushing the issue somehow.

On 2/15/22 1:47 PM, Sean Busbey wrote:

Hi folks!

It's been some time since we decided to stick to LTS JDK releases as a way
of getting a handle on the JDK treadmill.

What do folks think about deprecating JDK8? The openjdk8u project is still
going and there are commercial support options at least through 2030.

Deprecating it in HBase 3 would mean we could remove it in HBase 4, not
that we would _have_ to remove it. The way I think about likely timing of
these events goes like this:

* HBase 2 started alphas in June 2017, betas in January 2018, and came out
in April 2018
* HBase 3 started alphas in July 2021, and as of Feb 2022 we haven't
discussed how close we are to our stated beta goals (upgrades from active
2.x releases and removal of not-ready features).

Given the above, in the absence of us specifically pushing to roll through
major version numbers for some reason, I think a reasonably conservative
estimate is for HBase 3 to arrive in late 2022 or early 2023 and then HBase
4 to start alphas in ~2025. An HBase 5 prior to 2030 seems unlikely.

That all said, our current reference guide section on java versions does
not sound very confident about JDK11 support.


A Note on JDK11 *
Preliminary support for JDK11 is introduced with HBase 2.3.0. This

support is limited to

compilation and running the full test suite. There are open questions

regarding the runtime

compatibility of JDK11 with Apache ZooKeeper and Apache Hadoop

(HADOOP-15338).

Significantly, neither project has yet released a version with explicit

runtime support for

JDK11. The remaining known issues in HBase are catalogued in HBASE-22972.



Since that blurb was written, Hadoop has added JDK11 support [1] as has
ZooKeeper[2]. As a part of buttoning up our JDK11 support we could update
our minimum supported versions of these projects to match that support.

What do folks think?

[1]: https://hadoop.apache.org/docs/r3.3.0/index.html
[2]:
https://zookeeper.apache.org/doc/r3.6.0/zookeeperAdmin.html#sc_systemReq



Re: [DISCUSS] operator tools, HBase 3 and StoreFileTracking

2022-02-15 Thread Josh Elser
I was talking with Szabolcs prior to him sending this one, and it's a 
tricky issue for sure.


To date, we've solved any HBase API issues by copying code into HBCK2 
e.g. HBCKMetaTableAccessor which copies parts of MetaTableAccessor, or 
we push the logic down server-side to the HBase Master and invoke it 
over the Hbck RPC interface.


I definitely want to avoid HBase version specific builds of the 
operator-tools, so that is not an option in my mind for 2.x. The 
discussions we had (that I remember) around HBCK2 were limited in scope 
to HBase 2.x.


Option 1: we copy the necessary proto files from HBase into the 
operator-tools and try to remember that, if we make any change to the 
serialization of the storefile list files, we have to copy that change 
to HBCK2. Brittle on the surface but effective.


Option 2: We bump HBCK2 to hbase-2.6.0-SNAPSHOT. Problematic until we 
make an HBase 2.6.0[-alpha] release. We should already have wire compat 
between all of HBase 2.x which makes that a non-issue.


Option 3: We create an HBCK3 targeted for HBase 3.x. I'm not convinced 
we need to do that (hbck for hbase 3.x would be just like hbck for hbase 
2.x). This would also not solve the problem for the SFT feature in hbase 
2.6.


I think option 3 is a no-go. I am leaning towards option 1 at this 
point. Hopefully my thought process is helpful for others to weigh in.



On 2/14/22 11:31 AM, Szabolcs Bukros wrote:

Hi Folks!

While working on adding tools to handle potential FileBased
StoreFileTracker issues to HBCK2 (HBASE-26624
) I ran into multiple
problems I'm unsure how to solve.

First of all the tools would rely on files not yet available in any of the
released hbase artifacts. I tried to solve this without changing the hbase
dependency version to keep HBCK2 as hbase version independent as possible,
but none of the solutions I have found looked acceptable:
  - Pushing the logic to the hbase side (as far as I can tell) is not
feasible because it has to be able to repair meta which is easier when
hbase is down and the tool should be able to run without a working hbase.
  - The files tracking the store content are serialized proto objects so
while replicating those files in the operator tools is possible, it would
not be pretty.

Bumping operator tools to use hbase 2.6.0-SNAPSHOT (branch-2 has the SFT
changes) would mean that now we need that or a newer version to build the
project and a version check to avoid runtime problems with the new tools,
but otherwise this looks rather painless and backwards compatible. I know
operator tools tries to avoid having a hbase-specific release, but having
2.6 as a min version to build against might be acceptable.

While looking into this I also checked what needs to be done to make
operator tools work with hbase 3.0.0-alpha-3-SNAPSHOT. Most of the changes
are backwards compatible but not all of them and the ones that aren't would
make a big chunk of Fsck unusable with older hbases. For me that looks
acceptable since this is a major version change, but that would mean I can
not rely on a potential HBCK3 to fix SFT issues, I would also need a
solution for HBCK2.

I tried to look for plans/direction regarding the new 1.3 operator tools
but could not find any.

Do you think it would be possible to bump the hbase version it uses to
2.6.0-SNAPSHOT?
Do you think it would make sense to start working on a hbase3 compatible
branch or is it too early?

NOTE:
I'm aware hbase does not publish SNAPSHOT builds for years, but I do not
know how the internal build system works and if these artifacts would be
available for internal builds or not. I also do not know if necessary could
they be made available.



[jira] [Resolved] (HBASE-26644) Spurious compaction failures with file tracker

2022-02-10 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-26644.

Resolution: Not A Problem

Yep, all good. I believe you fixed this in HBASE-26675

> Spurious compaction failures with file tracker
> --
>
> Key: HBASE-26644
> URL: https://issues.apache.org/jira/browse/HBASE-26644
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction
>    Reporter: Josh Elser
>    Assignee: Josh Elser
>Priority: Major
>
> Noticed when running a basic {{{}hbase pe randomWrite{}}}, we'll see 
> compactions failing at various points.
> One example:
> {noformat}
> 2022-01-03 17:41:18,319 ERROR 
> [regionserver/localhost:16020-shortCompactions-0] 
> regionserver.CompactSplit(670): Compaction failed 
> region=TestTable,0004054490,1641249249856.2dc7251c6eceb660b9c7bb0b587db913.,
>  storeName=2dc7251c6eceb660b9c7bb0b587db913/info0,       priority=6, 
> startTime=1641249666161
> java.io.IOException: Root-level entries already added in single-level mode
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeSingleLevelIndex(HFileBlockIndex.java:1136)
>   at 
> org.apache.hadoop.hbase.io.hfile.CompoundBloomFilterWriter$MetaWriter.write(CompoundBloomFilterWriter.java:279)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl$1.writeToBlock(HFileWriterImpl.java:713)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeBlock(HFileBlock.java:1205)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:660)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:377)
>   at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.commitWriter(DefaultCompactor.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62)
>   at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125)
>   at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1141)
>   at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2388)
>   at 
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:654)
>   at 
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:697)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)  {noformat}
> This isn't a super-critical issue because compactions will be retried 
> automatically and they appear to eventually succeed. However, when the max 
> storefiles limit is reaching, this does cause ingest to hang (as I was doing 
> with my modest configuration).
> We had seen a similar kind of problem in our testing when backporting to 
> HBase 2.4 (not upstream as the decision was to not do this) which we 
> eventually tracked down to a bad merge-conflict resolution to the new HFile 
> Cleaner. However, initial investigations don't have the same exact problem.
> It seems that we have some kind of generic race condition. Would be good to 
> add more logging to catch this in the future (since we have two separate 
> instances of this category of bug already).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26655) Initial commit with basic functionality and example code

2022-01-20 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-26655.

Hadoop Flags: Reviewed
  Resolution: Fixed

> Initial commit with basic functionality and example code
> 
>
> Key: HBASE-26655
> URL: https://issues.apache.org/jira/browse/HBASE-26655
> Project: HBase
>  Issue Type: Sub-task
>  Components: security
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
> Fix For: HBASE-26553
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26687) Account for HBASE-24500 in regionInfoMismatch tool

2022-01-19 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-26687.

Hadoop Flags: Reviewed
  Resolution: Fixed

Thanks for the speedy review, Peter!

> Account for HBASE-24500 in regionInfoMismatch tool
> --
>
> Key: HBASE-26687
> URL: https://issues.apache.org/jira/browse/HBASE-26687
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2
>    Reporter: Josh Elser
>    Assignee: Josh Elser
>Priority: Minor
> Fix For: hbase-operator-tools-1.3.0
>
>
> Had a coworker try to use the RegionInfoMismatch tool I added in HBASE-26656. 
> Curiously, the tool failed on the sanity check I added.
> {noformat}
> Aborting: sanity-check failed on updated RegionInfo. Expected encoded region 
> name 736ee6186975de6967cd9e9e242423f0 but got 
> 323748c77dde5b05982df0285b013232.
> Incorrectly created RegionInfo was: {ENCODED => 
> 323748c77dde5b05982df0285b013232, NAME => 
> 'test4,,1642405560420_0002.323748c77dde5b05982df0285b013232.', STARTKEY => 
> '', ENDKEY => ''}
> {noformat}
> I couldn't understand why the tool wasn't working until I hooked up a 
> debugger and realized that the problem wasn't in my code :). The version of 
> HBase on the system did not have the fix from HBASE-24500 included which 
> meant that I was hitting the same "strange behavior", as Duo put it, in the 
> RegionInfoBuilder "copy constructor".
> While the versions of HBase which do not have this fix are EOL in terms of 
> Apache releases, we can easily work around this in operator-tools (which may 
> be used by any hbase 2.x release still in the wild).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26687) Account for HBASE-24500 in regionInfoMismatch tool

2022-01-19 Thread Josh Elser (Jira)
Josh Elser created HBASE-26687:
--

 Summary: Account for HBASE-24500 in regionInfoMismatch tool
 Key: HBASE-26687
 URL: https://issues.apache.org/jira/browse/HBASE-26687
 Project: HBase
  Issue Type: Bug
  Components: hbck2
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: hbase-operator-tools-1.3.0


Had a coworker try to use the RegionInfoMismatch tool I added in HBASE-26656. 
Curiously, the tool failed on the sanity check I added.
{noformat}
Aborting: sanity-check failed on updated RegionInfo. Expected encoded region 
name 736ee6186975de6967cd9e9e242423f0 but got 323748c77dde5b05982df0285b013232.
Incorrectly created RegionInfo was: {ENCODED => 
323748c77dde5b05982df0285b013232, NAME => 
'test4,,1642405560420_0002.323748c77dde5b05982df0285b013232.', STARTKEY => '', 
ENDKEY => ''}

{noformat}
I couldn't understand why the tool wasn't working until I hooked up a debugger 
and realized that the problem wasn't in my code :). The version of HBase on the 
system did not have the fix from HBASE-24500 included which meant that I was 
hitting the same "strange behavior", as Duo put it, in the RegionInfoBuilder 
"copy constructor".

While the versions of HBase which do not have this fix are EOL in terms of 
Apache releases, we can easily work around this in operator-tools (which may be 
used by any hbase 2.x release still in the wild).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26669) Add JWT section to HBase book

2022-01-13 Thread Josh Elser (Jira)
Josh Elser created HBASE-26669:
--

 Summary: Add JWT section to HBase book
 Key: HBASE-26669
 URL: https://issues.apache.org/jira/browse/HBASE-26669
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Reporter: Josh Elser
 Fix For: HBASE-26553


Add a chapter to the hbase book about JWT authentication and everything that 
users and admins need to know.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26668) Define user experience for JWT renewal

2022-01-13 Thread Josh Elser (Jira)
Josh Elser created HBASE-26668:
--

 Summary: Define user experience for JWT renewal
 Key: HBASE-26668
 URL: https://issues.apache.org/jira/browse/HBASE-26668
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser
 Fix For: HBASE-26553


We need to define what our level of support will be for an HBase application 
which must run longer than the lifetime of a JWT token.

The JWT 2.0 RFCs mention different kinds of tokens, notably a Refresh token may 
be helpful [https://datatracker.ietf.org/doc/html/rfc8693]

This is inter-twined with HBASE-26667. For example, if we maintained a Refresh 
token in the client, we would have to build in logic (like we have for Kerberos 
credentials) to automatically launch a thread and know where to obtain a new 
JWT token from.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26667) Integrate user-experience for hbase-client

2022-01-13 Thread Josh Elser (Jira)
Josh Elser created HBASE-26667:
--

 Summary: Integrate user-experience for hbase-client
 Key: HBASE-26667
 URL: https://issues.apache.org/jira/browse/HBASE-26667
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser
 Fix For: HBASE-26553


Today, we have two mechanism in order to get the tokens needed to authenticate:
 # Kerberos, we rely on a Kerberos ticket being present in a well-known 
location (defined by JVM properties) or via programmatic invocation of 
UserGroupInformation
 # Delegation tokens, we rely on special API to be called (our mapreduce API) 
which loads the token into the current UserGroupInformation "context" (the JAAS 
PrivilegedAction).

The JWT bearer token approach is very similar to the delegation token 
mechanism, but HBase does not generate this JWT (as we do with delegation 
tokens). How does a client provide this token to the hbase-client (i.e. 
{{ConnectionFactory.getConnection()}} or a {{UserGroupInformation}} call)? We 
should be mindful of all of the different "entrypoints" to HBase ({{{}hbase 
...{}}} commands, {{java -cp}} commands, Phoenix commands, Spark comands, etc). 
Our solution should be effective for all of these approaches and not require 
downstream changes.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26666) Address bearer token being sent over wire before RPC encryption is enabled

2022-01-13 Thread Josh Elser (Jira)
Josh Elser created HBASE-2:
--

 Summary: Address bearer token being sent over wire before RPC 
encryption is enabled
 Key: HBASE-2
 URL: https://issues.apache.org/jira/browse/HBASE-2
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser
 Fix For: HBASE-26553


Today, HBase must complete the SASL handshake (saslClient.complete()) prior to 
turning on any RPC encryption (hbase.rpc.protection=privacy, 
sasl.QOP=auth-conf).

This is a problem because we have to transmit the bearer token to the server 
before we can complete the sasl handshake. This would mean that we would 
insecurely transmit the bearer token (which is equivalent to any other 
password) which is a bad smell.

Ideally, if we can solve this problem for the oauth bearer mechanism, we could 
also apply it to our delegation token interface for digest-md5 (which, I 
believe, suffers the same problem).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26665) Standalone unit test in hbase-examples

2022-01-13 Thread Josh Elser (Jira)
Josh Elser created HBASE-26665:
--

 Summary: Standalone unit test in hbase-examples
 Key: HBASE-26665
 URL: https://issues.apache.org/jira/browse/HBASE-26665
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser
Assignee: Andor Molnar


Andor is already working on this with nimbus, but filing this for him.

We should have a unit test which exercises the oauth bearer authentication 
mechanism so that we know if the feature is functional at a basic level 
(without having to set up on OAuth server).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26656) [operator-tools] Provide a utility to detect and correct incorrect RegionInfo's in hbase:meta

2022-01-10 Thread Josh Elser (Jira)
Josh Elser created HBASE-26656:
--

 Summary: [operator-tools] Provide a utility to detect and correct 
incorrect RegionInfo's in hbase:meta
 Key: HBASE-26656
 URL: https://issues.apache.org/jira/browse/HBASE-26656
 Project: HBase
  Issue Type: Improvement
  Components: hbase-operator-tools, hbck2
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: hbase-operator-tools-2.0.0


HBASE-23328 describes a problem in which the serialized RegionInfo in the value 
of hbase:meta cells have an encoded regionname which doesn't match the encoded 
region name in the rowkey for that cell.

This problem is normally harmless as assignment only consults the rowkey to get 
the encoded region name. However, this problem does break other HBCK2 tooling, 
like {{{}extraRegionsInMeta{}}}. 

Rather than try to update each tool to account for when this problem may be 
present, create a new tool which an operator can run to correct meta and then 
use any subsequent tools as originally intended.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26644) Spurious compaction failures with file tracker

2022-01-04 Thread Josh Elser (Jira)
Josh Elser created HBASE-26644:
--

 Summary: Spurious compaction failures with file tracker
 Key: HBASE-26644
 URL: https://issues.apache.org/jira/browse/HBASE-26644
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser


Noticed when running a basic {{{}hbase pe randomWrite{}}}, we'll see 
compactions failing at various points.

One example:
{noformat}
2022-01-03 17:41:18,319 ERROR [regionserver/localhost:16020-shortCompactions-0] 
regionserver.CompactSplit(670): Compaction failed 
region=TestTable,0004054490,1641249249856.2dc7251c6eceb660b9c7bb0b587db913.,
 storeName=2dc7251c6eceb660b9c7bb0b587db913/info0,       priority=6, 
startTime=1641249666161
java.io.IOException: Root-level entries already added in single-level mode
  at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeSingleLevelIndex(HFileBlockIndex.java:1136)
  at 
org.apache.hadoop.hbase.io.hfile.CompoundBloomFilterWriter$MetaWriter.write(CompoundBloomFilterWriter.java:279)
  at 
org.apache.hadoop.hbase.io.hfile.HFileWriterImpl$1.writeToBlock(HFileWriterImpl.java:713)
  at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeBlock(HFileBlock.java:1205)
  at 
org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:660)
  at 
org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:377)
  at 
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.commitWriter(DefaultCompactor.java:70)
  at 
org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:386)
  at 
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62)
  at 
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125)
  at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1141)
  at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2388)
  at 
org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:654)
  at 
org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:697)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)  {noformat}
This isn't a super-critical issue because compactions will be retried 
automatically and they appear to eventually succeed. However, when the max 
storefiles limit is reaching, this does cause ingest to hang (as I was doing 
with my modest configuration).

We had seen a similar kind of problem in our testing when backporting to HBase 
2.4 (not upstream as the decision was to not do this) which we eventually 
tracked down to a bad merge-conflict resolution to the new HFile Cleaner. 
However, initial investigations don't have the same exact problem.

It seems that we have some kind of generic race condition. Would be good to add 
more logging to catch this in the future (since we have two separate instances 
of this category of bug already).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Second release candidate for hbase-operator-tools 1.2.0 is available for download

2021-12-22 Thread Josh Elser

+1 (binding)

* xsums/sigs OK
* apache-rat:check OK
* Can build from src
* Ran all UT
* Log4j 2.17 in use

Looks great, Guangxu!

On 12/20/21 1:15 AM, Guangxu Cheng wrote:

Please vote on this Apache hbase operator tools release candidate,
hbase-operator-tools-1.2.0RC1

The VOTE will remain open for at least 72 hours.

[ ] +1 Release this package as Apache hbase operator tools 1.2.0
[ ] -1 Do not release this package because ...

The tag to be voted on is 1.2.0RC1:

   https://github.com/apache/hbase-operator-tools/tree/1.2.0RC1

This tag currently points to git reference

   478af00af79f82624264fd2bb447b97fecc8e790

The release files, including signatures, digests, as well as CHANGES.md
and RELEASENOTES.md included in this RC can be found at:

   https://dist.apache.org/repos/dist/dev/hbase/hbase-operator-tools-1.2.0RC1

Maven artifacts are available in a staging repository at:

   https://repository.apache.org/content/repositories/org apache hbase-1479

Artifacts were signed with the 5EF3A66D57EC647A key which can be found in:

   https://downloads.apache.org/hbase/KEYS

hbase-operator-tools 1.2.0 contains a critical security fix for addressing
the log4j2
CVE-2021-44228 and CVE-2021-45105. All users who use hbase-operator-tools
should upgrade to hbase-operator-tools 1.2.0 ASAP.

To learn more about Apache hbase operator tools, please see

   http://hbase.apache.org/

Thanks,
Your HBase Release Manager
--
Best Regards,
Guangxu



Re: [VOTE] Third release candidate for hbase 3.0.0-alpha-2 is available for download

2021-12-22 Thread Josh Elser

+1 (binding)

* xsums/sigs look good
* apache-rat:check OK
* Built from source
* Can run PE on a bin tarball built from src
* bin/client-bin both look good (log4j2.17)

Good stuff, Duo. Thanks!

On 12/19/21 10:36 AM, Duo Zhang wrote:

Please vote on this Apache hbase release candidate,
hbase-3.0.0-alpha-2RC2

The VOTE will remain open for at least 72 hours.

[ ] +1 Release this package as Apache hbase 3.0.0-alpha-2
[ ] -1 Do not release this package because ...

The tag to be voted on is 3.0.0-alpha-2RC2:

   https://github.com/apache/hbase/tree/3.0.0-alpha-2RC2

This tag currently points to git reference

   314e924e960d0d5c0c5e8ec436c75aaa6190b4c1

The release files, including signatures, digests, as well as CHANGES.md
and RELEASENOTES.md included in this RC can be found at:

   https://dist.apache.org/repos/dist/dev/hbase/3.0.0-alpha-2RC2/

Maven artifacts are available in a staging repository at:

   https://repository.apache.org/content/repositories/orgapachehbase-1478/

Artifacts were signed with the 9AD2AE49 key which can be found in:

   https://downloads.apache.org/hbase/KEYS

3.0.0-alpha-2 is the second alpha release for our 3.0.0 major release line.
HBase 3.0.0 includes the following big feature/changes:
   Synchronous Replication
   OpenTelemetry Tracing
   Distributed MOB Compaction
   Backup and Restore
   Move RSGroup balancer to core
   Reimplement sync client on async client
   CPEPs on shaded proto
   Move the logging framework from log4j to log4j2

3.0.0-alpha-2 contains several critical security fixes for addressing the
log4j2
CVE-2021-44228, CVE-2021-45046 and CVE-2021-45105. All users who
already use 3.0.0-alpha-1 should upgrade to 3.0.0-alpha-2 ASAP.

Notice that this is not a production ready release. It is used to let our
users try and test the new major release, to get feedback before the final
GA release is out.
So please do NOT use it in production. Just try it and report back
everything you find unusual.

And this time we will not include CHANGES.md and RELEASENOTE.md
in our source code, you can find it on the download site. For getting these
two files for old releases, please go to

To learn more about Apache hbase, please see

   http://hbase.apache.org/

Thanks,
Your HBase Release Manager



Re: [VOTE] First release candidate for HBase 2.4.9 (RC0) is available

2021-12-22 Thread Josh Elser

+1 (binding)

* xsums/sigs are good
* bin and client-bin tarballs look fine
* apache-rat:check passes
* Can build from source with hadoop.profile=3.0
* Can run a simple pe randomWrite against tarball built from src
* API compat report looks great

- Josh

On 12/18/21 4:18 PM, Andrew Purtell wrote:

Please vote on this Apache HBase release candidate, hbase-2.4.9RC0

The VOTE will remain open for at least 72 hours.

[ ] +1 Release this package as Apache HBase 2.4.9
[ ] -1 Do not release this package because ...

The tag to be voted on is 2.4.9RC0:

   https://github.com/apache/hbase/tree/2.4.9RC0

This tag currently points to git reference c49f7f63fc.

The release files, including signatures, digests, as well as CHANGES.md
and RELEASENOTES.md included in this RC can be found at:

   https://dist.apache.org/repos/dist/dev/hbase/2.4.9RC0/

The API compatibility report can be found at:


https://dist.apache.org/repos/dist/dev/hbase/2.4.9RC0/api_compare_2.4.8_to_2.4.9RC0.html

There are no reported compatibility issues.

There are known flaky unit tests, see HBASE-26254.

Maven artifacts are available in a staging repository at:

   https://repository.apache.org/content/repositories/orgapachehbase-1477/

Artifacts were signed with the 0xD5365CCD key which can be found in:

   https://dist.apache.org/repos/dist/release/hbase/KEYS

To learn more about Apache HBase, please see http://hbase.apache.org/

Thanks,
Your HBase Release Manager



[RESULT] [VOTE] Merge HBASE-26067 (storefile tracking) into master and branch-2

2021-12-22 Thread Josh Elser

This merge vote passes with 4 binding +1's and 3 non-binding +1's.

Thanks everyone. I'll go ahead with the merge into master and see how 
easily it comes back to branch-2.


On 2021/12/16 21:30:09 Josh Elser wrote:

Hi!

I'm extremely pleased to send this official vote to merge the feature 
branch for HBASE-26067[1] into master and backport into branch-2 (2.x, 
not 2.5.x). This feature branch introduces the pluggable 
StoreFileTracker interface.


The StoreFileTracker allows the StoreFileEngine to be decoupled from 
where the HFiles can be found. The DEFAULT implementation of this 
StoreFileTracker (e.g. files in a family's directory) is still the 
default implementation. This merge would introduce a FILE implementation 
which uses a flat-file in each column family to track the files which 
make up this Store. This feature is notable for HBase as it invalidate 
the need for HBOSS (a distributed locking layer in hbase-filesystem) 
when Amazon S3 is used for HBase data.


We had a DISCUSS thread [2] in which the overall sentiment was positive 
to merge.


Covering some high-level details/FAQ on this work:
* Wellington and Szabolcs have successfully run ITBLL with Chaos Monkies 
using this feature.
* YCSB (load) indicates a slight performance improvement when using S3 
as the storage system for HBase as compared to using HBOSS [3]
* A new section was added to the HBase book which covers the feature and 
how to use it.
* There is some follow-on work expected, tracked in HBASE-26584 [4], 
which includes things like making user consumption easier and additional 
metrics to measure effectiveness of the feature.


As is customary, this vote will be open for at least 3 days (2021/12/19 
2130 GMT). We'll follow the standard ASF lazy-consensus rules for code 
modification (though I do not expect to need the lazy-consensus caveat). 
Please vote:


+1: Merge the changes from HBASE-26067 to master and branch-2
-1: Do not merge these changes because ...

Big thank you to all of the super hard work that Duo, Wellington, and 
Szabolcs have put into this feature.


- Josh

[1] https://issues.apache.org/jira/browse/HBASE-26067
[2] https://lists.apache.org/thread/6dblom3tc2oz05d263pvmrywlthqq1c1
[3] 
https://issues.apache.org/jira/browse/HBASE-26067?focusedCommentId=17448499=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17448499

[4] https://issues.apache.org/jira/browse/HBASE-26584



Re: [VOTE] Merge HBASE-26067 (storefile tracking) into master and branch-2

2021-12-21 Thread Josh Elser
Duo -- yes, you are correct, but I wanted to give the opportunity for
others who didn't work on it (and thus, have an interest in seeing it
merged) to also vote :). I plan to merge this into master given the
current response from the other devs.

Andrew -- I feel that very well. Thanks for taking the time to drop an email!

On Tue, Dec 21, 2021 at 5:09 PM Andrew Purtell  wrote:
>
> +1 (binding)
>
> Really happy to see this. Apologies that I have so limited bandwidth these
> days.
>
>
> On Mon, Dec 20, 2021 at 10:54 AM Josh Elser  wrote:
>
> > Thanks to all who took the time to vote already!
> >
> > So far, we have 5 +1's (2 binding, 3 non-binding) which is sufficient
> > for a lazy-consensus RESULT. However, I'd love to see a vote from
> > someone who didn't contribute code to this feature. I know running an
> > in-depth technical analysis is a major undertaking. I'd ask for a third
> > binding vote even if someone just does a high-level review of the work
> > on the feature branch.
> >
> > At the same time, I also know many folks will be relaxing with their
> > families. I'll leave this open for 2 more days in the hopes that some
> > other folks will still have time to weigh in.
> >
> > On 2021/12/16 21:30:09 Josh Elser wrote:
> > > Hi!
> > >
> > > I'm extremely pleased to send this official vote to merge the feature
> > > branch for HBASE-26067[1] into master and backport into branch-2 (2.x,
> > > not 2.5.x). This feature branch introduces the pluggable
> > > StoreFileTracker interface.
> > >
> > > The StoreFileTracker allows the StoreFileEngine to be decoupled from
> > > where the HFiles can be found. The DEFAULT implementation of this
> > > StoreFileTracker (e.g. files in a family's directory) is still the
> > > default implementation. This merge would introduce a FILE implementation
> > > which uses a flat-file in each column family to track the files which
> > > make up this Store. This feature is notable for HBase as it invalidate
> > > the need for HBOSS (a distributed locking layer in hbase-filesystem)
> > > when Amazon S3 is used for HBase data.
> > >
> > > We had a DISCUSS thread [2] in which the overall sentiment was positive
> > > to merge.
> > >
> > > Covering some high-level details/FAQ on this work:
> > > * Wellington and Szabolcs have successfully run ITBLL with Chaos Monkies
> > > using this feature.
> > > * YCSB (load) indicates a slight performance improvement when using S3
> > > as the storage system for HBase as compared to using HBOSS [3]
> > > * A new section was added to the HBase book which covers the feature and
> > > how to use it.
> > > * There is some follow-on work expected, tracked in HBASE-26584 [4],
> > > which includes things like making user consumption easier and additional
> > > metrics to measure effectiveness of the feature.
> > >
> > > As is customary, this vote will be open for at least 3 days (2021/12/19
> > > 2130 GMT). We'll follow the standard ASF lazy-consensus rules for code
> > > modification (though I do not expect to need the lazy-consensus caveat).
> > > Please vote:
> > >
> > > +1: Merge the changes from HBASE-26067 to master and branch-2
> > > -1: Do not merge these changes because ...
> > >
> > > Big thank you to all of the super hard work that Duo, Wellington, and
> > > Szabolcs have put into this feature.
> > >
> > > - Josh
> > >
> > > [1] https://issues.apache.org/jira/browse/HBASE-26067
> > > [2] https://lists.apache.org/thread/6dblom3tc2oz05d263pvmrywlthqq1c1
> > > [3]
> > >
> > https://issues.apache.org/jira/browse/HBASE-26067?focusedCommentId=17448499=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17448499
> > > [4] https://issues.apache.org/jira/browse/HBASE-26584
> > >
> >
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>- A23, Crosstalk


[jira] [Created] (HBASE-26612) Adjust log level when looking for .filelist/{f1,f2}

2021-12-20 Thread Josh Elser (Jira)
Josh Elser created HBASE-26612:
--

 Summary: Adjust log level when looking for .filelist/{f1,f2}
 Key: HBASE-26612
 URL: https://issues.apache.org/jira/browse/HBASE-26612
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser
Assignee: Josh Elser


Currently, we get a really big exception in the RegionServer log under normal 
assignment conditions when we are currently using .filelist/f2 as the tracker 
file.

We should move this to debug/trace to avoid making operators think there is a 
problem when there isn't one.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


re: [VOTE] Merge HBASE-26067 (storefile tracking) into master and branch-2

2021-12-20 Thread Josh Elser

Thanks to all who took the time to vote already!

So far, we have 5 +1's (2 binding, 3 non-binding) which is sufficient 
for a lazy-consensus RESULT. However, I'd love to see a vote from 
someone who didn't contribute code to this feature. I know running an 
in-depth technical analysis is a major undertaking. I'd ask for a third 
binding vote even if someone just does a high-level review of the work 
on the feature branch.


At the same time, I also know many folks will be relaxing with their 
families. I'll leave this open for 2 more days in the hopes that some 
other folks will still have time to weigh in.


On 2021/12/16 21:30:09 Josh Elser wrote:

Hi!

I'm extremely pleased to send this official vote to merge the feature 
branch for HBASE-26067[1] into master and backport into branch-2 (2.x, 
not 2.5.x). This feature branch introduces the pluggable 
StoreFileTracker interface.


The StoreFileTracker allows the StoreFileEngine to be decoupled from 
where the HFiles can be found. The DEFAULT implementation of this 
StoreFileTracker (e.g. files in a family's directory) is still the 
default implementation. This merge would introduce a FILE implementation 
which uses a flat-file in each column family to track the files which 
make up this Store. This feature is notable for HBase as it invalidate 
the need for HBOSS (a distributed locking layer in hbase-filesystem) 
when Amazon S3 is used for HBase data.


We had a DISCUSS thread [2] in which the overall sentiment was positive 
to merge.


Covering some high-level details/FAQ on this work:
* Wellington and Szabolcs have successfully run ITBLL with Chaos Monkies 
using this feature.
* YCSB (load) indicates a slight performance improvement when using S3 
as the storage system for HBase as compared to using HBOSS [3]
* A new section was added to the HBase book which covers the feature and 
how to use it.
* There is some follow-on work expected, tracked in HBASE-26584 [4], 
which includes things like making user consumption easier and additional 
metrics to measure effectiveness of the feature.


As is customary, this vote will be open for at least 3 days (2021/12/19 
2130 GMT). We'll follow the standard ASF lazy-consensus rules for code 
modification (though I do not expect to need the lazy-consensus caveat). 
Please vote:


+1: Merge the changes from HBASE-26067 to master and branch-2
-1: Do not merge these changes because ...

Big thank you to all of the super hard work that Duo, Wellington, and 
Szabolcs have put into this feature.


- Josh

[1] https://issues.apache.org/jira/browse/HBASE-26067
[2] https://lists.apache.org/thread/6dblom3tc2oz05d263pvmrywlthqq1c1
[3] 
https://issues.apache.org/jira/browse/HBASE-26067?focusedCommentId=17448499=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17448499

[4] https://issues.apache.org/jira/browse/HBASE-26584



[jira] [Created] (HBASE-26605) TestHStore#testRefreshStoreFiles broken due to unqualified and qualified paths

2021-12-18 Thread Josh Elser (Jira)
Josh Elser created HBASE-26605:
--

 Summary: TestHStore#testRefreshStoreFiles broken due to 
unqualified and qualified paths
 Key: HBASE-26605
 URL: https://issues.apache.org/jira/browse/HBASE-26605
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.9
Reporter: Josh Elser
Assignee: Josh Elser


Was looking at a failures of this method where 
{noformat}
[ERROR] org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles  
Time elapsed: 4.2 s  <<< ERROR!
java.util.NoSuchElementException
    at 
org.apache.hbase.thirdparty.com.google.common.collect.AbstractIndexedListIterator.next(AbstractIndexedListIterator.java:75)
    at 
org.apache.hadoop.hbase.regionserver.TestHStore.closeCompactedFile(TestHStore.java:962)
    at 
org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:1000)
 {noformat}
This was on a branch where I had some HBASE-26067 changes backported, so I 
thought the problem _was_ those changes. After a bit of digging, I believe the 
test case itself is "broken" (the test passes, but for the wrong reasons).

This test methods adds some files to a Store (via memstore flush or direct 
addition of a file) and eventually tries to get the first file which is 
candidate to be removed. The test {*}never compacted any files{*}. This was the 
first sign that the test itself was wrong.

After lots of comparison with the HBASE-26067 logging to compare against, I 
found that the Store was listing a file which was created by the memstore flush 
as a file to retain AND a file to remove. Second warning. Upon closer 
inspection, I finally noticed that one of the files was qualified with the 
filesystem URI and the other was not.
{noformat}
2021-12-18 16:57:10,903 INFO  [Time-limited test] regionserver.HStore(675): 
toBeAddedFiles=[file:/Users/jelser/projects/cldr/hbase-copy.git/hbase-server/target/test-data/f4ed7913-e62a-8d5f-a968-bb4c94d5494a/TestStoretestRefreshStoreFiles/data/default/table/297ad8361c3326bfb1520dbc54b1c3bd/family/dd8a430b391546d8b9bdc39bb77d447b,
 
file:/Users/jelser/projects/cldr/hbase-copy.git/hbase-server/target/test-data/f4ed7913-e62a-8d5f-a968-bb4c94d5494a/TestStoretestRefreshStoreFiles/data/default/table/297ad8361c3326bfb1520dbc54b1c3bd/family/d4c5442b772c43fd9ebdfed1a11c0e73],
 
toBeRemovedFiles=[/Users/jelser/projects/cldr/hbase-copy.git/hbase-server/target/test-data/f4ed7913-e62a-8d5f-a968-bb4c94d5494a/TestStoretestRefreshStoreFiles/data/default/table/297ad8361c3326bfb1520dbc54b1c3bd/family/d4c5442b772c43fd9ebdfed1a11c0e73]
 {noformat}
{{d4c5442b772c43fd9ebdfed1a11c0e73}} how are we both adding and removing this 
file! Turns out, this is because one of them is "/..." and the other is 
"file:/...". Either the problem is in TestHStore in how it is creating/adding 
these files behind the scenes or we should be qualifying the Path inside of 
StoreFileInfo with the filesystem that we're using.

I remember too vividly the problems when trying to separate the rootdir and 
waldir from each other and am cautious against adding something to 
StoreFileInfo to {{{}fs.qualifyPath(p){}}}. Need to look some more, but will 
get a patch up to fix.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Second release candidate for hbase 3.0.0-alpha-2 is available for download

2021-12-18 Thread Josh Elser

+1 (binding)

* Xsums/sigs good
* Can build from source
* Log4j 2.15 is included (more on this in the below)
* log4j2.formatMsgNoLookups=true is set (multiple times per process, but 
properly set)

* hbase-config.sh issue is fixed over rc1

Best as I've been able to keep up, it seems like we should already 
upgrade to log4j 2.16 due to issues in 2.15. There are alos rumblings 
that 2.16 may have issues still. It's my opinion that the changes we 
have here in rc2 are a massive improvement over before. I think this is 
fine; I just wanted to acknowledge that we may still need to update 
again real soon.


Thanks for your release manager work, Duo!

On 12/14/21 9:06 AM, Duo Zhang wrote:

Please vote on this Apache hbase release candidate,
hbase-3.0.0-alpha-2RC1

The VOTE will remain open for at least 72 hours.

[ ] +1 Release this package as Apache hbase 3.0.0-alpha-2
[ ] -1 Do not release this package because ...

The tag to be voted on is 3.0.0-alpha-2RC1:

   https://github.com/apache/hbase/tree/3.0.0-alpha-2RC1

This tag currently points to git reference

   a3ff8e4c812eefab6ad32af45ca449a1395a6510

The release files, including signatures, digests, as well as CHANGES.md
and RELEASENOTES.md included in this RC can be found at:

   https://dist.apache.org/repos/dist/dev/hbase/3.0.0-alpha-2RC1/

Maven artifacts are available in a staging repository at:

   https://repository.apache.org/content/repositories/orgapachehbase-1473/

Artifacts were signed with the 9AD2AE49 key which can be found in:

   https://downloads.apache.org/hbase/KEYS

3.0.0-alpha-2 is the second alpha release for our 3.0.0 major release line.
HBase 3.0.0 includes the following big feature/changes:
   Synchronous Replication
   OpenTelemetry Tracing
   Distributed MOB Compaction
   Backup and Restore
   Move RSGroup balancer to core
   Reimplement sync client on async client
   CPEPs on shaded proto
   Move the logging framework from log4j to log4j2

3.0.0-alpha-2 contains a critical security fix for addressing the log4j2
CVE-2021-44228. All users who already use 3.0.0-alpha-1 should upgrade
to 3.0.0-alpha-2 ASAP.

Notice that this is not a production ready release. It is used to let our
users try and test the new major release, to get feedback before the final
GA release is out.
So please do NOT use it in production. Just try it and report back
everything you find unusual.

And this time we will not include CHANGES.md and RELEASENOTE.md
in our source code, you can find it on the download site. For getting these
two files for old releases, please go to

   https://archive.apache.org/dist/hbase/

To learn more about Apache hbase, please see

   http://hbase.apache.org/

Thanks,
Your HBase Release Manager



[jira] [Created] (HBASE-26599) Netty exclusion through ZooKeeper not effective as intended

2021-12-17 Thread Josh Elser (Jira)
Josh Elser created HBASE-26599:
--

 Summary: Netty exclusion through ZooKeeper not effective as 
intended
 Key: HBASE-26599
 URL: https://issues.apache.org/jira/browse/HBASE-26599
 Project: HBase
  Issue Type: Bug
  Components: dependencies
Affects Versions: 2.4.8
Reporter: Josh Elser
Assignee: Josh Elser


Picking up where [~psomogyi] has been digging this week. We've been seeing an 
issue where MiniDFS-based tests fail to start due to missing netty classes.

HBASE-25969 seems to have intended to remove transitive Netty but was 
ineffective (at least for hadoop.profile=3.0). The dependency exclusion was for 
{{io.netty:netty}} and {{io.netty:netty-all}} but ZooKeeper 3.5.7 transitively 
depends on {{netty-handler}} and {{netty-transport-native-epoll}}  (per 
[https://search.maven.org/artifact/org.apache.zookeeper/zookeeper/3.5.7/jar)]

The funny part is that we _should_ have seen failures in any hbase unit test 
using MiniDFS because we excluded netty and netty-all in HBASE-25969, but 
because we missed the exclusions, we still keep running.

The intent of HBASE-25969 was good, but I think we need to revisit the 
execution. We need netty dependencies on the scope=test classpath. We just want 
to keep them off the scope=compile classpath (out of our client and server 
jars).

disclaimer: I have not yet looked at 2.5.x or master yet to see if this also 
affects them.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Merge HBASE-26067 (storefile tracking) into master and branch-2

2021-12-17 Thread Josh Elser

I guess I forgot to vote :)

+1 (binding)

On 12/17/21 8:57 AM, 张铎(Duo Zhang) wrote:

Big +1 from me.

This is a big step for making HBase more cloud native.

I've already rebased HBASE-26067 to the latest master.

Thanks~

Ankit Singhal  于2021年12月17日周五 07:17写道:


+1

On Thu, Dec 16, 2021 at 1:53 PM Andor Molnar  wrote:


+1 (non-binding)

Andor




On 2021. Dec 16., at 22:30, Josh Elser  wrote:

Hi!

I'm extremely pleased to send this official vote to merge the feature

branch for HBASE-26067[1] into master and backport into branch-2 (2.x,

not

2.5.x). This feature branch introduces the pluggable StoreFileTracker
interface.


The StoreFileTracker allows the StoreFileEngine to be decoupled from

where the HFiles can be found. The DEFAULT implementation of this
StoreFileTracker (e.g. files in a family's directory) is still the

default

implementation. This merge would introduce a FILE implementation which

uses

a flat-file in each column family to track the files which make up this
Store. This feature is notable for HBase as it invalidate the need for
HBOSS (a distributed locking layer in hbase-filesystem) when Amazon S3 is
used for HBase data.


We had a DISCUSS thread [2] in which the overall sentiment was positive

to merge.


Covering some high-level details/FAQ on this work:
* Wellington and Szabolcs have successfully run ITBLL with Chaos

Monkies

using this feature.

* YCSB (load) indicates a slight performance improvement when using S3

as the storage system for HBase as compared to using HBOSS [3]

* A new section was added to the HBase book which covers the feature

and

how to use it.

* There is some follow-on work expected, tracked in HBASE-26584 [4],

which includes things like making user consumption easier and additional
metrics to measure effectiveness of the feature.


As is customary, this vote will be open for at least 3 days (2021/12/19

2130 GMT). We'll follow the standard ASF lazy-consensus rules for code
modification (though I do not expect to need the lazy-consensus caveat).
Please vote:


+1: Merge the changes from HBASE-26067 to master and branch-2
-1: Do not merge these changes because ...

Big thank you to all of the super hard work that Duo, Wellington, and

Szabolcs have put into this feature.


- Josh

[1] https://issues.apache.org/jira/browse/HBASE-26067
[2] https://lists.apache.org/thread/6dblom3tc2oz05d263pvmrywlthqq1c1
[3]



https://issues.apache.org/jira/browse/HBASE-26067?focusedCommentId=17448499=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17448499

[4] https://issues.apache.org/jira/browse/HBASE-26584









Re: [VOTE] First release candidate for hbase-operator-tools 1.2.0 is available for download

2021-12-17 Thread Josh Elser

+1 (binding)

Thanks for putting this together, Guangxu!

* xsums/sigs are great
* RAT check passes on src release
* Can run unit tests (against 2.3.7, 2.4.4, 2.4.8)
* Can build from source
* CHANGES and RELEASENOTES look fine at a glance
* Public key published in KEYS
* Verified log4j2.16 is in the binary release (both as a jar and shaded 
inside hbase-hbck)


- Josh

On 12/14/21 10:32 PM, Guangxu Cheng wrote:

Please vote on this Apache hbase operator tools release candidate,
hbase-operator-tools-1.2.0RC0

The VOTE will remain open for at least 72 hours.

[ ] +1 Release this package as Apache hbase operator tools 1.2.0
[ ] -1 Do not release this package because ...

The tag to be voted on is 1.2.0RC0:

   https://github.com/apache/hbase-operator-tools/tree/1.2.0RC0

This tag currently points to git reference

   76d68624cebb66ec0e509b0a4c0d96445a601685

The release files, including signatures, digests, as well as CHANGES.md
and RELEASENOTES.md included in this RC can be found at:


https://dist.apache.org/repos/dist/dev/hbase/hbase-operator-tools-1.2.0RC0/

Maven artifacts are available in a staging repository at:

   https://repository.apache.org/content/repositories/orgapachehbase-1474/
Artifacts were signed with the 5EF3A66D57EC647A key which can be found in:

   https://dist.apache.org/repos/dist/release/hbase/KEYS

hbase-operator-tools 1.2.0 contains a critical security fix for addressing
the log4j2
CVE-2021-44228. All users who use hbase-operator-tools should upgrade
to hbase-operator-tools 1.2.0 ASAP.

To learn more about Apache hbase operator tools, please see

   http://hbase.apache.org/

Thanks,
Your HBase Release Manager

--
Best Regards,
Guangxu



[VOTE] Merge HBASE-26067 (storefile tracking) into master and branch-2

2021-12-16 Thread Josh Elser

Hi!

I'm extremely pleased to send this official vote to merge the feature 
branch for HBASE-26067[1] into master and backport into branch-2 (2.x, 
not 2.5.x). This feature branch introduces the pluggable 
StoreFileTracker interface.


The StoreFileTracker allows the StoreFileEngine to be decoupled from 
where the HFiles can be found. The DEFAULT implementation of this 
StoreFileTracker (e.g. files in a family's directory) is still the 
default implementation. This merge would introduce a FILE implementation 
which uses a flat-file in each column family to track the files which 
make up this Store. This feature is notable for HBase as it invalidate 
the need for HBOSS (a distributed locking layer in hbase-filesystem) 
when Amazon S3 is used for HBase data.


We had a DISCUSS thread [2] in which the overall sentiment was positive 
to merge.


Covering some high-level details/FAQ on this work:
* Wellington and Szabolcs have successfully run ITBLL with Chaos Monkies 
using this feature.
* YCSB (load) indicates a slight performance improvement when using S3 
as the storage system for HBase as compared to using HBOSS [3]
* A new section was added to the HBase book which covers the feature and 
how to use it.
* There is some follow-on work expected, tracked in HBASE-26584 [4], 
which includes things like making user consumption easier and additional 
metrics to measure effectiveness of the feature.


As is customary, this vote will be open for at least 3 days (2021/12/19 
2130 GMT). We'll follow the standard ASF lazy-consensus rules for code 
modification (though I do not expect to need the lazy-consensus caveat). 
Please vote:


+1: Merge the changes from HBASE-26067 to master and branch-2
-1: Do not merge these changes because ...

Big thank you to all of the super hard work that Duo, Wellington, and 
Szabolcs have put into this feature.


- Josh

[1] https://issues.apache.org/jira/browse/HBASE-26067
[2] https://lists.apache.org/thread/6dblom3tc2oz05d263pvmrywlthqq1c1
[3] 
https://issues.apache.org/jira/browse/HBASE-26067?focusedCommentId=17448499=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17448499

[4] https://issues.apache.org/jira/browse/HBASE-26584


[jira] [Resolved] (HBASE-26265) Update ref guide to mention the new store file tracker implementations

2021-12-16 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-26265.

Resolution: Fixed

> Update ref guide to mention the new store file tracker implementations
> --
>
> Key: HBASE-26265
> URL: https://issues.apache.org/jira/browse/HBASE-26265
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Duo Zhang
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: HBASE-26067
>
>
> For example, when to use these store file trackers.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26286) Add support for specifying store file tracker when restoring or cloning snapshot

2021-12-15 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-26286.

Hadoop Flags: Reviewed
  Resolution: Fixed

Thanks for everyone's careful reviews! Great work, Szabolcs!

> Add support for specifying store file tracker when restoring or cloning 
> snapshot
> 
>
> Key: HBASE-26286
> URL: https://issues.apache.org/jira/browse/HBASE-26286
> Project: HBase
>  Issue Type: Sub-task
>  Components: HFile, snapshots
>Reporter: Duo Zhang
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: HBASE-26067
>
>
> As discussed in HBASE-26280.
> https://issues.apache.org/jira/browse/HBASE-26280?focusedCommentId=17414894=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17414894



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Merge HBASE-26067 branch into master, and backport it to base 2 branches

2021-12-14 Thread Josh Elser

Thanks for your input, Andrew and Nick!

Big thank you to Duo for your hands-on-keyboard commitment as well for 
this whole feature.


I am also happy to target 2.x (and not 2.5.x) for the backport.

In the interest of getting rid of this feature branch (and the 
inevitable rebase pains the longer it runs parallel to master), I'd like 
to move ahead with a concrete plan to merge.


1. Given there was no objection, do folks feel the need for a VOTE? Even 
if one person would like a VOTE, I'm happy to start that. Please just 
say so.


2. We have three outstanding PRs for the sake of SFT which are all (IMO) 
very close to merging (#3851, #3861, and #3942). I think 3851 and 3942 
are easy to include and just need one more review cycle. If we feel like 
we are still far away on 3861, I think we set that aside and revisit it 
after the feature merge is done.


If there are any other concerns, please shout!

- Josh

On 12/8/21 9:07 PM, Andrew Purtell wrote:

+1 for merging to branch-2 (2.6)


On Dec 8, 2021, at 6:04 PM, 张铎  wrote:

I think here we just want this to be backported to 2.x, not 2.5.x.

So thanks Andrew for the quick action.

+1 on merging HBASE-26067 to master and backporting to branch-2(2.6.0).

Thanks.

Andrew Purtell  于2021年12月9日周四 08:45写道:


I concur with Nick, but let me help here by branching 2.5 today. It was
always going to be somewhat arbitrary a point.


On Wed, Dec 8, 2021 at 3:09 PM Nick Dimiduk  wrote:

Based solely on the comments made to this thread, I would recommend

against

a merge to branch-2, given that we are very close to 2.5. The points

about

existing gaps seem like things we're not ready to publish in the

impending

minor release. Once we have a branch-2.5, this particular concern of mine
will be alleviated.

Thanks,
Nick


On Wed, Dec 8, 2021 at 1:37 PM Josh Elser  wrote:



I was going to wait for some other folks to chime in, but I guess I can
be the next one :)

Duo, Wellington, and Szabolcs have been doing some excellent work on

the

storefile tracking (SFT) to a degree that I never expected to see. I
remember some of the original "Filesystem re-do" issues on Jira. The
idea was exceptional, but the result seemed unreachable.

These devs, building on the success of what Zach/Stephen first talked
about in HBASE-24749, came up with what I think is an excellent step
forward. I've yet to break it via my own testing, but do acknowledge
that there's always more work to be done.

I think this is at a reasonable place to merge this back into the
"mainline" branches from the feature branch (HBASE-26067). I believe
this is ready because:

1. The feature is completely opt-in (HBase works the same way by

default)

2. There is API to migrate tables into the new SFT implementation
3. There is also API to migrate tables back to the default

implementation


Some gaps still exist around bulk loading, documentation, snapshots,

and

recovery tooling, but these are being worked on. In the context of S3,
this makes a significantly more compelling offering of HBase by

removing

the complexity of HBOSS. For HBase in all installations, I think SFT
makes more a significantly more "deterministic" way of managing
regions/files.

+1 from me to merge HBASE-26067 into master and branch-2

- Josh

On 12/7/21 10:31 AM, Wellington Chevreuil wrote:

Hello everyone,

We have been making progress on the alternative way of tracking store

files

originally proposed by Duo in HBASE-26067.

To briefly summarize it for those not following it, this feature

introduces

an abstraction layer to track store files still used/needed by store
engines, allowing for plugging different approaches of identifying

store

files required by the given store. The design doc describing it in

more

detail is available here
<





https://docs.google.com/document/d/16Nr1Fn3VaXuz1g1FTiME-bnGR3qVK5B-raXshOkDLcY/edit#heading=h.calrs3kn4d8s


.

Our main goal within this feature is to avoid the need for using temp

files

and renames when creating new hfiles (whenever flushing, compacting,
splitting/merging or snapshotting). This is made possible by the

pluggable

tracker implementation labeled "FILE". The current behavior using

temp

dirs

and renames would still be the default approach (labeled "DEFAULT").

This "renameless" approach is appealing for deployments using Amazon

S3

Object store file system, where the lack of atomic rename operations
imposed the necessity of an additional layer of locking (HBOSS),

which

combined with the s3a rename operation can have a performance

overhead.


Some test runs on my employer infrastructure have shown promising

results.

A pure insertion ycsb run has shown ~6% performance gain on the

client

writes. Snapshot clone of hundreds of regions table completes in half

of

the time. There are also improvements in compaction, splits and

merges

times.

Talking with Duo Zhang and Josh Elser in the HBASE-2

[jira] [Resolved] (HBASE-26568) hbase master got stuck after running couple of days in Azure setup

2021-12-13 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-26568.

Resolution: Workaround

Resolving with "Workaround" being upgrade.

> hbase master got stuck after running couple of days in Azure setup
> --
>
> Key: HBASE-26568
> URL: https://issues.apache.org/jira/browse/HBASE-26568
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.1
> Environment: Azure cloud
>Reporter: kaushik mandal
>Priority: Major
> Attachments: hbase-master-log-0.txt, hbase-master-log-1.txt
>
>
> hadoop hbase version 2.0.1
> hadoop hdfs version 2.7.7
>  
> In Azure cluster setup, hbase master got hangs or not responding after 
> running couple of days
> and the only way to recover hbase master is delete /hbase and restart. Bellow 
> is the error getting in the hbase-master
>  
> Error message
> ==
> 2021-11-18 13:06:55,396 INFO 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16000] 
> assignment.AssignProcedure: Retry=10 of max=10; pid=320, ppid=319, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=hbase:meta, 
> region=1588230740; rit=OPENING, 
> location=nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1637238611975
>  2021-11-18 13:06:55,396 INFO [PEWorker-16] assignment.AssignProcedure: 
> Retry=11 of max=10; pid=320, ppid=319, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, 
> region=1588230740; rit=OFFLINE, location=null 2021-11-18 13:06:55,944 ERROR 
> [PEWorker-16] procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime 
> exception for pid=319, state=FAILED:RECOVER_META_ASSIGN_REGIONS, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true 
> java.lang.UnsupportedOperationException: unhandled 
> state=RECOVER_META_ASSIGN_REGIONS at 
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209)
>  at 
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52)
>  at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>  at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) 
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
>  2021-11-18 13:06:55,958 ERROR [PEWorker-16] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=319, 
> state=FAILED:RECOVER_META_ASSIGN_REGIONS, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true 
> java.lang.UnsupportedOperationException: unhandled 
> state=RECOVER_META_ASSIGN_REGIONS at 
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209)
>  at 
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52)
>  at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>  at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) 
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
>  2021-11-18 13:06:55,969 ERROR [PEWorker-16] procedure2.Procedure

[jira] [Reopened] (HBASE-26557) log4j2 has a critical RCE vulnerability

2021-12-13 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser reopened HBASE-26557:


> log4j2 has a critical RCE vulnerability
> ---
>
> Key: HBASE-26557
> URL: https://issues.apache.org/jira/browse/HBASE-26557
> Project: HBase
>  Issue Type: Bug
>  Components: logging, security
>Reporter: Yutong Xiao
>Assignee: Yutong Xiao
>Priority: Major
> Fix For: 3.0.0-alpha-2
>
>
> Impacted log4j version: Apache Log4j 2.x <= 2.14.1
> I found that our current log4j version at master is 2.14.1.
> Should upgrade the version to 2.15.0



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] First release candidate for hbase 3.0.0-alpha-2 is available for download

2021-12-13 Thread Josh Elser

-1 (binding)

Log4j2 CVE mitigation is ineffective due an incorrect `export` in 
bin/hbase-config.sh. Appears that HBASE-26557 tried to add the 
mitigation to HBASE_OPTS but added spaces around either side of the 
equals sign, e.g. `export HBASE_OPTS = ".."`, which is invalid syntax.




$ ./bin/start-hbase.sh
/Users/jelser/hbase300alpha2rc0/hbase300/hbase-3.0.0-alpha-2/bin/hbase-config.sh: 
line 167: export: `=': not a valid identifier
/Users/jelser/hbase300alpha2rc0/hbase300/hbase-3.0.0-alpha-2/bin/hbase-config.sh: 
line 167: export: ` -Dlog4j2.formatMsgNoLookups=true': not a valid 
identifier
/Users/jelser/hbase300alpha2rc0/hbase300/hbase-3.0.0-alpha-2/bin/hbase-config.sh: 
line 167: export: `=': not a valid identifier
/Users/jelser/hbase300alpha2rc0/hbase300/hbase-3.0.0-alpha-2/bin/hbase-config.sh: 
line 167: export: ` -Dlog4j2.formatMsgNoLookups=true': not a valid 
identifier



More naively, and just in plain bash:

bash-5.1$ export FOO = "$FOO bar"
bash: export: `=': not a valid identifier
bash: export: ` bar': not a valid identifier
bash-5.1$ echo $FOO


I'll post a PR to fix after sending this.

The good:
* xsums and sigs were OK
* Was able to run most unit tests locally
* Was able to launch using bin tarball
* Everything else looks great so far

- Josh

On 12/11/21 11:34 AM, Duo Zhang wrote:

Please vote on this Apache hbase release candidate,
hbase-3.0.0-alpha-2RC0

The VOTE will remain open for at least 72 hours.

[ ] +1 Release this package as Apache hbase 3.0.0-alpha-2
[ ] -1 Do not release this package because ...

The tag to be voted on is 3.0.0-alpha-2RC0:

   https://github.com/apache/hbase/tree/3.0.0-alpha-2RC0

This tag currently points to git reference

   8bca21b47d7c809a0940aea8ed12dd4d2af12432

The release files, including signatures, digests, as well as CHANGES.md
and RELEASENOTES.md included in this RC can be found at:

   https://dist.apache.org/repos/dist/dev/hbase/3.0.0-alpha-2RC0/

Maven artifacts are available in a staging repository at:

   https://repository.apache.org/content/repositories/orgapachehbase-1472/

Artifacts were signed with the 9AD2AE49 key which can be found in:

   https://downloads.apache.org/hbase/KEYS

3.0.0-alpha-2 is the second alpha release for our 3.0.0 major release line.
HBase 3.0.0 includes the following big feature/changes:
   Synchronous Replication
   OpenTelemetry Tracing
   Distributed MOB Compaction
   Backup and Restore
   Move RSGroup balancer to core
   Reimplement sync client on async client
   CPEPs on shaded proto
   Move the logging framework from log4j to log4j2

3.0.0-alpha-2 contains a critical security fix for addressing the log4j2
CVE-2021-44228. All users who already use 3.0.0-alpha-1 should upgrade
to 3.0.0-alpha-2 ASAP.

Notice that this is not a production ready release. It is used to let our
users try and test the new major release, to get feedback before the final
GA release is out.
So please do NOT use it in production. Just try it and report back
everything you find unusual.

And this time we will not include CHANGES.md and RELEASENOTE.md
in our source code, you can find it on the download site. For getting these
two files for old releases, please go to

   https://archive.apache.org/dist/hbase/

To learn more about Apache hbase, please see

   http://hbase.apache.org/

Thanks,
Your HBase Release Manager



Re: [NOTICE] Apache log4j2 security vulnerability

2021-12-13 Thread Josh Elser

Thanks Guangxu!

On 12/13/21 6:01 AM, Guangxu Cheng wrote:

If there is no objection, I’ll volunteer to RM hbase-operation-tools 1.2.0
--
Best Regards,
Guangxu


张铎(Duo Zhang)  于2021年12月12日周日 22:37写道:


Besides 3.0.0-alpha-2, we also need to make a new release for
hbase-operation-tools, any volunteers?

Thanks.

张铎(Duo Zhang)  于2021年12月10日周五 18:02写道:


Seems the 2.15.0 is already out. The log4j community decided to close the
vote earlier to solve the critical security issue.

A developer in our community has already filed an issue and opened a PR.

https://issues.apache.org/jira/browse/HBASE-26557
https://github.com/apache/hbase/pull/3933

Let's get the PR merged and publish 3.0.-alpha-2 ASAP.

Tak Lon (Stephen) Wu  于2021年12月10日周五 13:44写道:


Thanks for sharing! I found another post [2] that said how to perform

such

an attack.

Should we have a JIRA and keep tracking the solution for it?

[2] https://www.lunasec.io/docs/blog/log4j-zero-day/

-Stephen

On Thu, Dec 9, 2021 at 8:09 PM 张铎(Duo Zhang) 
wrote:


See this PR

https://github.com/apache/logging-log4j2/pull/608

Although the final 2.15.0 release for log4j2 has not been published

yet, at

least on the Chinese internet the details and how to make use of
this vulnerability has already been public[1].

HBase 3.0.0-alpha-1 is affected, so once 2.15.0 is out, we will push a
3.0.0-alpha-2 release out soon. And for those who already use HBase
3.0.0-alpha-1, please consider using the following ways to disable

JNDI


Add '-Dlog4j2.formatMsgNoLookups=true' when starting JVM
Add 'log4j2.formatMsgNoLookups=True' to config file
'export FORMAT_MESSAGES_PATTERN_DISABLE_LOOKUPS=true' before starting

JVM


Thanks.

1. https://nosec.org/home/detail/4917.html











[jira] [Created] (HBASE-26550) NPE if balance request comes in before master is initialized

2021-12-08 Thread Josh Elser (Jira)
Josh Elser created HBASE-26550:
--

 Summary: NPE if balance request comes in before master is 
initialized
 Key: HBASE-26550
 URL: https://issues.apache.org/jira/browse/HBASE-26550
 Project: HBase
  Issue Type: Bug
  Components: Balancer, master
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 3.0.0-alpha-2


Noticed this in a unit test from [https://github.com/apache/hbase/pull/3851]

I believe this is a result of the new balance() implementation in the Master, 
and a client submitting a request to the master before it's completed its 
instantiation. Simple fix to avoid the NPE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Merge HBASE-26067 branch into master, and backport it to base 2 branches

2021-12-08 Thread Josh Elser
I was going to wait for some other folks to chime in, but I guess I can 
be the next one :)


Duo, Wellington, and Szabolcs have been doing some excellent work on the 
storefile tracking (SFT) to a degree that I never expected to see. I 
remember some of the original "Filesystem re-do" issues on Jira. The 
idea was exceptional, but the result seemed unreachable.


These devs, building on the success of what Zach/Stephen first talked 
about in HBASE-24749, came up with what I think is an excellent step 
forward. I've yet to break it via my own testing, but do acknowledge 
that there's always more work to be done.


I think this is at a reasonable place to merge this back into the 
"mainline" branches from the feature branch (HBASE-26067). I believe 
this is ready because:


1. The feature is completely opt-in (HBase works the same way by default)
2. There is API to migrate tables into the new SFT implementation
3. There is also API to migrate tables back to the default implementation

Some gaps still exist around bulk loading, documentation, snapshots, and 
recovery tooling, but these are being worked on. In the context of S3, 
this makes a significantly more compelling offering of HBase by removing 
the complexity of HBOSS. For HBase in all installations, I think SFT 
makes more a significantly more "deterministic" way of managing 
regions/files.


+1 from me to merge HBASE-26067 into master and branch-2

- Josh

On 12/7/21 10:31 AM, Wellington Chevreuil wrote:

Hello everyone,

We have been making progress on the alternative way of tracking store files
originally proposed by Duo in HBASE-26067.

To briefly summarize it for those not following it, this feature introduces
an abstraction layer to track store files still used/needed by store
engines, allowing for plugging different approaches of identifying store
files required by the given store. The design doc describing it in more
detail is available here
<https://docs.google.com/document/d/16Nr1Fn3VaXuz1g1FTiME-bnGR3qVK5B-raXshOkDLcY/edit#heading=h.calrs3kn4d8s>
.

Our main goal within this feature is to avoid the need for using temp files
and renames when creating new hfiles (whenever flushing, compacting,
splitting/merging or snapshotting). This is made possible by the pluggable
tracker implementation labeled "FILE". The current behavior using temp dirs
and renames would still be the default approach (labeled "DEFAULT").

This "renameless" approach is appealing for deployments using Amazon S3
Object store file system, where the lack of atomic rename operations
imposed the necessity of an additional layer of locking (HBOSS), which
combined with the s3a rename operation can have a performance overhead.

Some test runs on my employer infrastructure have shown promising results.
A pure insertion ycsb run has shown ~6% performance gain on the client
writes. Snapshot clone of hundreds of regions table completes in half of
the time. There are also improvements in compaction, splits and merges
times.

Talking with Duo Zhang and Josh Elser in the HBASE-26067 jira, we feel
optimistic that the current implementation is in a good state to get merged
into master branch, but it would be nice to hear other opinions about it,
before we effectively commit it. Looking forward to hearing some
thoughts/concerns you might have.

Kind regards,
Wellington.



[jira] [Resolved] (HBASE-26512) Make timestamp format configurable in HBase shell scan output

2021-12-01 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-26512.

Hadoop Flags: Reviewed
  Resolution: Fixed

Thanks for the patch, Istvan! I took the liberty of writing some release notes. 
Please let me know if you have any changes.

> Make timestamp format configurable in HBase shell scan output
> -
>
> Key: HBASE-26512
> URL: https://issues.apache.org/jira/browse/HBASE-26512
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Affects Versions: 3.0.0-alpha-1
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.9
>
>
> HBASE-23930 and HBASE-24937 has changed the timestamp format shown in scan 
> results in HBase shells.
> This may break existing use cases that use hbase shell as a client. (as 
> opposed to the java, rest, or thrift APIs)
> I propose adding a configuration option to make it configurable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26461) [hboss] Delete self lock without orphaning znode

2021-11-17 Thread Josh Elser (Jira)
Josh Elser created HBASE-26461:
--

 Summary: [hboss] Delete self lock without orphaning znode
 Key: HBASE-26461
 URL: https://issues.apache.org/jira/browse/HBASE-26461
 Project: HBase
  Issue Type: Bug
Reporter: Josh Elser
Assignee: Josh Elser


Fallout from HBASE-26437
{quote}Could do the {{removeInMemoryLocks}} separately in HBASE-26453, but I 
think then znodes would get created again when unlocking, failing this PR 
tests. So, once we fix {{{}removeInMemoryLocks{}}}, we need to make sure 
{{rename}} and {{delete}} would not recreate the path again when calling 
{{{}unlock{}}}.
{quote}
The changes from HBASE-26453 inadvertently passed their unit tests because we 
didn't remove the Mutex object like we intended to do (after deleting a 
file/dir or renaming a file/dir, we intend to remove the mutex and znode for 
that file/dir and all beneath it).

Right now, we only actually delete the children (znode and mutex objects) for 
that deleted/renamed path. Meaning, we are still orphaning resources. I 
implemented the fix in lockRename based on what we did in lockDelete, so we're 
making incremental progress.

The lock cleanup process and Mutex logic need to be reworked because we cannot 
do it in two-phases as we currently do. In order to get the mutex to release it 
(when we are holding it already), we currently will re-create znodes back in 
ZooKeeper.

The other solution, based on googling, appears to be to use a 
[Reaper|https://www.javadoc.io/doc/org.apache.curator/curator-recipes/2.4.1/org/apache/curator/framework/recipes/locks/Reaper.html].
 This might also be an easier solution to the problem to do the rest of the 
cleanup.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26453) [hboss] removeInMemoryLocks can remove still in-use locks

2021-11-17 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-26453.

Fix Version/s: hbase-filesystem-1.0.0-alpha2
 Hadoop Flags: Reviewed
   Resolution: Fixed

> [hboss] removeInMemoryLocks can remove still in-use locks
> -
>
> Key: HBASE-26453
> URL: https://issues.apache.org/jira/browse/HBASE-26453
> Project: HBase
>  Issue Type: Bug
>Affects Versions: hbase-filesystem-1.0.0-alpha1
>    Reporter: Josh Elser
>    Assignee: Josh Elser
>Priority: Critical
> Fix For: hbase-filesystem-1.0.0-alpha2
>
>
> While implementing HBASE-26437, I was fighting with unit tests which just 
> wouldn't complete. After adding the code change to delete the locks held by 
> the {{src}} in a {{mv src dst}} operation, releasing the {{dst}} lock would 
> claim that the current thread doesn't hold the lock.
> After investigating, the specific contract test in question is doing a rename 
> of the form: {{{}mv /foo /foodest{}}}. This actually breaks the logic which 
> tries to determine if a lock's path is contained beneath the path we're 
> tryign to clean up. Specifically: cleaning up locks beneath {{/foo}} 
> incorrectly removes locks for {{/foodest}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26267) Master initialization fails if Master Region WAL dir is missing

2021-11-16 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-26267.

Hadoop Flags: Reviewed
  Resolution: Fixed

Thanks for the reminders to merge this, Duo. Merged this to branch-2.4, 
branch-2, and master.

> Master initialization fails if Master Region WAL dir is missing
> ---
>
> Key: HBASE-26267
> URL: https://issues.apache.org/jira/browse/HBASE-26267
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 2.4.6
>    Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.9
>
>
> From a recent branch-2.4 build:
> {noformat}
> 2021-09-07 19:31:19,666 ERROR [master/localhost:16000:becomeActiveMaster] 
> master.HMaster(159): * ABORTING master localhost,16000,1631057476442: 
> Unhandled exception. Starting shutdown. *
> java.io.FileNotFoundException: File 
> hdfs://localhost:8020/hbase-2.4-wals/MasterData/WALs does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1059)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1119)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1116)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1126)
> at 
> org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:226)
> at 
> org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:303)
> at 
> org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:839)
> at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2189)
> at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:512)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> If the WAL directory is missing but the Master Region already exists, we will 
> try to list the contents of the Master Region's WAL directory which may or 
> may not exist. If we simply check to make sure the directory exists and then 
> the rest of the initialization code works as expected.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26453) [hboss] removeInMemoryLocks can remove still in-use locks

2021-11-12 Thread Josh Elser (Jira)
Josh Elser created HBASE-26453:
--

 Summary: [hboss] removeInMemoryLocks can remove still in-use locks
 Key: HBASE-26453
 URL: https://issues.apache.org/jira/browse/HBASE-26453
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-filesystem-1.0.0-alpha1
Reporter: Josh Elser
Assignee: Josh Elser


While implementing HBASE-26437, I was fighting with unit tests which just 
wouldn't complete. After adding the code change to delete the locks held by the 
{{src}} in a {{mv src dst}} operation, releasing the {{dst}} lock would claim 
that the current thread doesn't hold the lock.

After investigating, the specific contract test in question is doing a rename 
of the form: {{{}mv /foo /foodest{}}}. This actually breaks the logic which 
tries to determine if a lock's path is contained beneath the path we're tryign 
to clean up. Specifically: cleaning up locks beneath {{/foo}} incorrectly 
removes locks for {{/foodest}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26437) [hboss] Rename does not clean up znodes for src location

2021-11-09 Thread Josh Elser (Jira)
Josh Elser created HBASE-26437:
--

 Summary: [hboss] Rename does not clean up znodes for src location
 Key: HBASE-26437
 URL: https://issues.apache.org/jira/browse/HBASE-26437
 Project: HBase
  Issue Type: Bug
  Components: hboss
Affects Versions: hbase-filesystem-1.0.0-alpha1
Reporter: Josh Elser
Assignee: Josh Elser


We ran into a fun situation where the partition hosting ZK data was repeatedly 
filling up while heavy ExportSnapshot+clone_snapshot operations were running 
(10's of TB). The cluster was previously working just fine.

Upon investigation of the ZK tree, we found a large number of znodes beneath 
/hboss, specifically many in the corresponding ZK HBOSS path for 
$hbase.rootdir/.tmp.

Tracing back from the code, we saw that the CloneSnapshotProcedure (like 
CreateTableProcedure) will create the table filesystem layout in 
$hbase.rootdir/.tmp and then rename it into $hbase.rootdir/data/. 
However, it appears that, upon rename, HBOSS was not cleaning up the src path's 
znode. This is a bug as it allows ZK to grow unbounded (which explains why this 
problem slowly arose and not suddenly).

As a workaround, HBase can be stopped and the corresponding ZK path for 
$hbase.rootdir/.tmp can be cleaned up to reclaim 1/2 the space taken up by 
znodes for imported hbase tables (we would still have znodes for 
$hbase.rootdir/data/...)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-22394) HdfsFileStatus incompatibility when used with Hadoop 3.1.x

2021-10-19 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-22394.

Resolution: Not A Bug

Like HBASE-24154, this is just "how it is" in HBase presently. The HBase PMC 
does not release multiple artifacts for both Hadoop2 and Hadoop3 support at the 
current time. Current HBase2 releases still compile against Hadoop2 by default, 
and using Hadoop 3 against HBase2 requires a recompilation of HBase because of 
incompatible changes between Hadoop2 and Hadoop3.

We may choose to publish multiple HBase artifacts (built against different 
Hadoop version) in the future, but that should start as a dev-list discussion 
as it will have lots of implications.

> HdfsFileStatus incompatibility when used with Hadoop 3.1.x
> --
>
> Key: HBASE-22394
> URL: https://issues.apache.org/jira/browse/HBASE-22394
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.1.4
>Reporter: Raymond Lau
>Priority: Major
>
> Hbase 2.1.4 works fine with Hadoop 3.0.3 but when I attempted to upgrade to 
> Hadoop 3.1.2, I get the following error in the region server:
> {noformat}
> 2019-05-10 12:49:10,303 ERROR HRegionServer - * ABORTING region server 
> [REDACTED],16020,1557506923574: Unhandled: Found interface 
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus, but class was expected *
> java.lang.IncompatibleClassChangeError: Found interface 
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus, but class was expected
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createOutput(FanOutOneBlockAsyncDFSOutputHelper.java:768)
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.access$400(FanOutOneBlockAsyncDFSOutputHelper.java:118)
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$16.doCall(FanOutOneBlockAsyncDFSOutputHelper.java:848)
> {noformat}
> Hadoop 3.1.1+ is listed as compatible with Hbase 2.1.x at 
> [https://hbase.apache.org/book.html#basic.prerequisites].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26350) Missing server side debugging on failed SASL handshake

2021-10-11 Thread Josh Elser (Jira)
Josh Elser created HBASE-26350:
--

 Summary: Missing server side debugging on failed SASL handshake
 Key: HBASE-26350
 URL: https://issues.apache.org/jira/browse/HBASE-26350
 Project: HBase
  Issue Type: Bug
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 2.5.0, 3.0.0-alpha-2, 2.3.7, 2.4.8


In trying to debug some problems with the pluggable authentication, I noticed 
that we are eating the IOException without logging it (at any level) in 
ServerRpcConnection.

This makes it super hard to debug when that pluggable interface has a problem 
because the context gets lost (clients just get a pretty useless DNRIOE).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25900) HBoss tests compile/failure against Hadoop 3.3.1

2021-10-05 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-25900.

Fix Version/s: hbase-filesystem-1.0.0-alpha2
 Hadoop Flags: Reviewed
   Resolution: Fixed

Thanks Steve and Peter for the help along the way.

> HBoss tests compile/failure against Hadoop 3.3.1
> 
>
> Key: HBASE-25900
> URL: https://issues.apache.org/jira/browse/HBASE-25900
> Project: HBase
>  Issue Type: Bug
>  Components: Filesystem Integration
>Affects Versions: 1.0.2
>Reporter: Steve Loughran
>Assignee: Josh Elser
>Priority: Major
> Fix For: hbase-filesystem-1.0.0-alpha2
>
>
> Changes in Hadoop 3.3.x stop the tests compiling/working. 
> * changes in signature of nominally private classes (HADOOP-17497): fix, 
> update
> * HADOOP-16721  -s3a rename throwing more exceptions, but no longer failing 
> if the dest parent doesn't exist. Fix: change s3a.xml
> * HADOOP-17531/HADOOP-17620 distcp moving to listIterator; test failures. 
> * HADOOP-13327: tests on syncable which expect files being written to to be 
> visible. Fix: skip that test
> The fix for HADOOP-17497 stops this compiling against Hadoop < 3.3.1. This is 
> unfortunate but I can't see an easy fix. The new signature takes a parameters 
> class, so we can (and already are) adding new config options without breaking 
> this signature again. And I've tagged it as LimitedPrivate so that future 
> developers will know it's used here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-26277) Revert 26240, Apply InterfaceAudience.Private to BalanceResponse$Builder

2021-09-10 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-26277.

Hadoop Flags: Reviewed
  Resolution: Fixed

Thanks for the quick fix-up, Bryan!

> Revert 26240, Apply InterfaceAudience.Private to BalanceResponse$Builder
> 
>
> Key: HBASE-26277
> URL: https://issues.apache.org/jira/browse/HBASE-26277
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Minor
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26267) Master initialization fails if Master Region WAL dir is missing

2021-09-08 Thread Josh Elser (Jira)
Josh Elser created HBASE-26267:
--

 Summary: Master initialization fails if Master Region WAL dir is 
missing
 Key: HBASE-26267
 URL: https://issues.apache.org/jira/browse/HBASE-26267
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 2.4.6
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.7


>From a recent branch-2.4 build:

{noformat}
2021-09-07 19:31:19,666 ERROR [master/localhost:16000:becomeActiveMaster] 
master.HMaster(159): * ABORTING master localhost,16000,1631057476442: 
Unhandled exception. Starting shutdown. *
java.io.FileNotFoundException: File 
hdfs://localhost:8020/hbase-2.4-wals/MasterData/WALs does not exist.
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1059)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1119)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1116)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1126)
at 
org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:226)
at 
org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:303)
at 
org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104)
at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:839)
at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2189)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:512)
at java.lang.Thread.run(Thread.java:748)
{noformat}

If the WAL directory is missing but the Master Region already exists, we will 
try to list the contents of the Master Region's WAL directory which may or may 
not exist. If we simply check to make sure the directory exists and then the 
rest of the initialization code works as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-26147) Add dry run mode to hbase balancer

2021-09-01 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-26147.

Resolution: Fixed

Thanks for the excellent work, Bryan. This really came together to be a great 
change. I took the liberty of writing some release notes – please feel free to 
update them as you see fit.

Thanks Duo, Nick, and everyone else who helped out in reviews.

My apologies that I botched application of the branch-2 PR (putting my own 
email address instead of Bryan's). I reverted my original commit and re-applied 
it with correct metadata. Sorry for sullying the commit log. 

> Add dry run mode to hbase balancer
> --
>
> Key: HBASE-26147
> URL: https://issues.apache.org/jira/browse/HBASE-26147
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer, master
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> It's often rather hard to know how the cost function changes you're making 
> will affect the balance of the cluster, and currently the only way to know is 
> to run it. If the cost decisions are not good, you may have just moved many 
> regions towards a non-ideal balance. Region moves themselves are not free for 
> clients, and the resulting balance may cause a regression.
> We should add a mode to the balancer so that it can be invoked without 
> actually executing any plans. This will allow an administrator to iterate on 
> their cost functions and used the balancer's logging to see how their changes 
> would affect the cluster. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26236) Simple travis build for hbase-filesystem

2021-08-29 Thread Josh Elser (Jira)
Josh Elser created HBASE-26236:
--

 Summary: Simple travis build for hbase-filesystem
 Key: HBASE-26236
 URL: https://issues.apache.org/jira/browse/HBASE-26236
 Project: HBase
  Issue Type: Improvement
  Components: hboss
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: hbase-filesystem-1.0.0-alpha2


Noticed that we don't have any kind of precommit checks. Time to make a quick 
one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26212) Allow AuthUtil automatic renewal to be disabled

2021-08-20 Thread Josh Elser (Jira)
Josh Elser created HBASE-26212:
--

 Summary: Allow AuthUtil automatic renewal to be disabled
 Key: HBASE-26212
 URL: https://issues.apache.org/jira/browse/HBASE-26212
 Project: HBase
  Issue Type: Improvement
  Components: Client, security
Reporter: Josh Elser
Assignee: Josh Elser


Talking with [~bbende] who was looking at some "spam" in the NiFi log where 
AuthUtil was complaining that it couldn't renew the UGI. This is did not cause 
him problems (NiFi could always read/write to HBase), but it generated a lot of 
noise in the NiFi log.

NiFi is special in that it's managing renewals on its own (for all services it 
can communicate with), rather than letting each client do it on its own. 
Specifically, one way they do this is by doing a keytab-based login via JAAS, 
constructing a UGI object from that JAAS login, and then invoking HBase in a 
normal UGI.doAs().

The problem comes in that AuthUtil _thinks_ that it is capable of renewing this 
UGI instance on its own. AuthUtil can determine that the current UGI came from 
a keytab, and thus thinks that it can renew it. However, this actually fails 
because the LoginContext inside UGI *isn't* actually something that UGI can 
renew (remember: because NiFI did it directly via JAAS and not via UGI)
{noformat}
2021-08-19 17:32:19,438 ERROR [Relogin service.Chore.1] 
org.apache.hadoop.hbase.AuthUtil Got exception while trying to refresh 
credentials: loginUserFromKeyTab must be done first
java.io.IOException: loginUserFromKeyTab must be done first
at 
org.apache.hadoop.security.UserGroupInformation.reloginFromKeytab(UserGroupInformation.java:1194)
at 
org.apache.hadoop.security.UserGroupInformation.checkTGTAndReloginFromKeytab(UserGroupInformation.java:1125)
at org.apache.hadoop.hbase.AuthUtil$1.chore(AuthUtil.java:206) 
{noformat}
After talking with Bryan about this: we don't see a good way for HBase to 
detect this specific "A UGI instance, but not created by UGI" case because the 
LoginContext inside UGI is private. It is great that AuthUtil will 
automatically try to renew keytab logins, even if not using 
{{hbase.client.keytab.file}} and {{hbase.client.keytab.principal}}, so I don't 
want to break that functionality{{.}}

NiFi is unique in this case that it is fully managing the renewals, so I think 
the best path forward is to add an option which lets NiFi disable AuthUtil 
since it knows it can safely do this. This should affect any others users (but 
also give us an option if AuthUtil ever does cause problems).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26165) 2.3.5 listed on website downloads page but row intends to be for 2.3.6

2021-08-02 Thread Josh Elser (Jira)
Josh Elser created HBASE-26165:
--

 Summary: 2.3.5 listed on website downloads page but row intends to 
be for 2.3.6
 Key: HBASE-26165
 URL: https://issues.apache.org/jira/browse/HBASE-26165
 Project: HBase
  Issue Type: Task
  Components: website
Reporter: Josh Elser
Assignee: Josh Elser


Typo on downloads.html. Row is for 2.3.6 but still says 2.3.5.

Missed in HBASE-26162. PR coming.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26164) Update dependencies in hbase-filesystem

2021-08-02 Thread Josh Elser (Jira)
Josh Elser created HBASE-26164:
--

 Summary: Update dependencies in hbase-filesystem
 Key: HBASE-26164
 URL: https://issues.apache.org/jira/browse/HBASE-26164
 Project: HBase
  Issue Type: Task
  Components: hboss
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: hbase-filesystem-1.0.0-alpha2


hbase-filesystem still has some old dependencies. Notably aws-java-sdk is at 
1.11.525 whereas hadoop is all the way at 1.11.1026.

We're also still building HBase 2 against 2.1.4 instead of anything newer. Bump 
up the relevant dependencies to something more current and make sure the code 
still works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Breakout discussion on storefile tracking storage solutions

2021-05-26 Thread Josh Elser

Thanks Stack! (access given, as google probably told you already).

Please keep me honest.

On 5/26/21 12:29 PM, Stack wrote:

And, what is there currently is a nice write-up
S

On Wed, May 26, 2021 at 9:26 AM Stack  wrote:


Can I have comment access please Josh?
S

On Tue, May 25, 2021 at 8:24 PM Josh Elser  wrote:


Hi folks,

This is a follow-on for the HBASE-24749 discussion on storefile
tracking, specifically focusing on where/how do we store the list of
files for each Store.

I tried to capture my thoughts and the suggestions by Duo and Wellington
in this google doc [1].

Please feel free to ask for edit permission (and send me a note if your
email address isn't one that I would otherwise recognize :) ) to
correct, improve, or expand on any other sections.

FWIW, I was initially not super excited about a per-Store file, but, the
more I think about it, the more I'm coming around to that idea. I think
it will be more "exception-handling", but avoid the long-term
operational burden of yet-another-important-system-table.

- Josh

[1]

https://docs.google.com/document/d/1yzjvQvQfnT-M8ZgKdcQNedF8HssTnQR2loPkZtlJGVg/edit?usp=sharing







[DISCUSS] Breakout discussion on storefile tracking storage solutions

2021-05-25 Thread Josh Elser

Hi folks,

This is a follow-on for the HBASE-24749 discussion on storefile 
tracking, specifically focusing on where/how do we store the list of 
files for each Store.


I tried to capture my thoughts and the suggestions by Duo and Wellington 
in this google doc [1].


Please feel free to ask for edit permission (and send me a note if your 
email address isn't one that I would otherwise recognize :) ) to 
correct, improve, or expand on any other sections.


FWIW, I was initially not super excited about a per-Store file, but, the 
more I think about it, the more I'm coming around to that idea. I think 
it will be more "exception-handling", but avoid the long-term 
operational burden of yet-another-important-system-table.


- Josh

[1] 
https://docs.google.com/document/d/1yzjvQvQfnT-M8ZgKdcQNedF8HssTnQR2loPkZtlJGVg/edit?usp=sharing


Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-25 Thread Josh Elser

Coming full circle on the "makes me worry" comment I left:

I asked the question in work channels about my concern and SteveL did 
confirm that the "S3 strong consistency" feature does apply generally to 
CRUD operations.


I believe this means, if we assume there is exactly one RegionServer 
which is hosting a Region at one time, that one RegionServer is capable 
of ensuring that the gaps which do exist in S3 are a non-issue (without 
the need for an HBOSS-like solution).


Taking the suggested on a file-per-store which enumerates the committed 
files: the RegionServer can make sure that operates which concurrently 
want to update that file are exclusive, e.g. a bulk load, a memstore 
flush, a compaction commit.


On my plate today is to incorporate this into a design doc specifically 
for storefile metadata (from the other message in this broader thread)


On 5/24/21 1:39 PM, Josh Elser wrote:
I got pulled into a call with some folks from S3 at the last minute late 
week.


There was a comment made in passing about reading the latest, written 
version of a file. At the moment, I didn't want to digress into that 
because of immutable HFiles. However, if we're tracking files-per-store 
in a file, that makes me worry.


To the nice digging both Duo and Andrew have shared here already and 
Nick's point about design, I definitely think stating what we expect and 
mapping that to the "platforms" which provide that "today" (as we know 
each will change) is the only way to insulate ourselves. The Hadoop FS 
contract tests are also a great thing we can adopt.


On 5/21/21 9:53 PM, 张铎(Duo Zhang) wrote:

So maybe we could introduce a .hfilelist directory, and put the hflielist
files under this directory, so we do not need to list all the files under
the region directory.

And considering the possible implementation for typical object storages,
listing the last directory on the whole path will be less expensive.

Andrew Purtell  于2021年5月22日周六 上午9:35 
写道:





On May 21, 2021, at 6:07 PM, 张铎  wrote:

Since we just make use of the general FileSystem API to do listing, is

it

possible to make use of ' bucket index listing'?


Yes, those words mean the same thing.



Andrew Purtell  于2021年5月22日周六 上午 
6:34写道:






On May 20, 2021, at 4:00 AM, Wellington Chevreuil <

wellington.chevre...@gmail.com> wrote:






IMO it should be a file per store.
Per region is not suitable here as compaction is per store.
Per file means we still need to list all the files. And usually, 
after

compaction, we need to do an atomic operation to remove several old

files

and add a new file, or even several files for stripe compaction. It

will be

easy if we just write one file to commit these changes.



Fine for me if it's simpler. Mentioned the per file approach 
because I

thought it could be easier/faster to do that, rather than having to

update
the store file list on every flush. AFAIK, append is out of the 
table,

so
updating this file would mean read it, write original content plus 
new

hfile to a temp file, delete original file, rename it).



That sounds right to be.

A minor potential optimization is the filename could have a timestamp
component, so a bucket index listing at that path would pick up a list
including the latest, and the latest would be used as the manifest of

valid

store files. The cloud object store is expected to provide an atomic
listing semantic where the file is written and closed and only then is

it

visible, and it is visible at once to everyone. (I think this is

available

on most.) Old manifest file versions could be lazily deleted.



Em qui., 20 de mai. de 2021 às 02:57, 张铎(Duo Zhang) <

palomino...@gmail.com>

escreveu:

IIRC S3 is the only object storage which does not guarantee
read-after-write consistency in the past...

This is the quick result after googling

AWS [1]

Amazon S3 delivers strong read-after-write consistency 
automatically

for

all applications



Azure[2]

Azure Storage was designed to embrace a strong consistency model 
that

guarantees that after the service performs an insert or update

operation,

subsequent read operations return the latest update.



Aliyun[3]


A feature requires that object operations in OSS be atomic, which
indicates that operations can only either succeed or fail without
intermediate states. To ensure that users can access only complete

data,

OSS does not return corrupted or partial data.

Object operations in OSS are highly consistent. For example, when a

user
receives an upload (PUT) success response, the uploaded object 
can be

read
immediately, and copies of the object are written to multiple 
devices

for
redundancy. Therefore, the situations where data is not obtained 
when

you
perform the read-after-write operation do not exist. The same is 
true

for

delete operations. After you delete an object, the object and its

copies

no

longer exist.



GCP[4]


Cloud Storage provides st

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-24 Thread Josh Elser
Without completely opening Pandora's box, I will say we definitely have 
multiple ways we can solve the metadata management for tracking (e.g. in 
meta, in some other system table, in some other system, in a per-store 
file). Each of them have pro's and con's, and each of them has "favor" 
as to what pain we've most recently felt as a project.


I don't want to defer having the discussion on what the "correct" one 
should be, but I do want to point out that it's only half of the problem 
of storefile tracking.


My hope is that we can make this tracking system be pluggable, such that 
we can prototype a solution that works "good enough" for now and enables 
the rest of the development work to keep moving forward.


I'm happy to see so many other folks also interested in the design of 
how we store this.


Could I suggest we move this discussion around the metadata storage into 
its own thread? If Duo doesn't already have a design doc started, I can 
also try to put one together this week.


Does that work for you all?

On 5/22/21 11:02 AM, 张铎(Duo Zhang) wrote:

I could put up a simple design doc for this.

But there is still a problem, about how to do rolling upgrading.

After we changed the behavior, the region server will write partial store
files directly into the data directory. For new region servers, this is not
a problem, as we will read the hfilelist file to find out the valid store
files.
But when rolling upgrading, we can not upgrade all the regionservers at
once, for old regionservers, they will initialize a store by listing the
store files, so if a new regionserver crashes when compacting and its
regions are assigned to old regionservers, the old regionservers will be in
trouble...

Stack  于2021年5月22日周六 下午12:14写道:


HBASE-24749 design and implementation had acknowledged compromises on
review: e.g. adding a new 'system table' to hold store files.  I'd suggest
the design and implementation need a revisit before we go forward; for
instance, factoring for systems other than s3 as suggested above (I like
the Duo list).

S

On Wed, May 19, 2021 at 8:19 AM 张铎(Duo Zhang) 
wrote:


What about just storing the hfile list in a file? Since now S3 has strong
consistency, we could safely overwrite a file then I think?

And since the hfile list file will be very small, renaming will not be a
big problem.

We could write the hfile list to a file called 'hfile.list.tmp', and then
rename it to 'hfile.list'.

This is safe for HDFS, and for S3, since it is not atomic, maybe we could
face that, the 'hfile.list' file is not there, but there is a
'hfile.list.tmp'.

So when opening a HStore, we first check if 'hfile.list' is there, if

not,

try 'hfile.list.tmp', rename it and load it. For safety, we could write

an

initial hfile list file with no hfiles. So if we can not load either
'hfile.list' or 'hfile.list.tmp', then we know something is wrong so

users

should try to fix  it with HBCK.
And in HBCK, we will do a listing and generate the 'hfile.list' file.

WDYT?

Thanks.

Wellington Chevreuil  于2021年5月19日周三
下午10:43写道:


Thank you, Andrew and Duo,

Talking internally with Josh Elser, initial idea was to rebase the

feature

branch with master (in order to catch with latest commits), then focus

on

work to have a minimal functioning hbase, in other words, together with

the

already committed work from HBASE-25391, make sure flush, compactions,
splits and merges all can take advantage of the persistent store file
manager and complete with no need to rely on renames. These all map to

the

substasks HBASE-25391, HBASE-25392 and HBASE-25393. Once we could test

and

validate this works well for our goals, we can then focus on snapshots,
bulkloading and tooling.

S3 now supports strong consistency, and I heard that they are also

implementing atomic renaming currently, so maybe that's one of the

reasons

why the development is silent now..


Interesting, I had no idea this was being implemented. I know,

however, a

version of this feature is already available on latest EMR releases (at
least from 6.2.0), and AWS team has published their own blog post with
their results:





https://aws.amazon.com/blogs/big-data/amazon-emr-6-2-0-adds-persistent-hfile-tracking-to-improve-performance-with-hbase-on-amazon-s3/


But I do not think store hfile list in meta is the only solution. It

will

cause cyclic dependencies for hbase:meta, and then force us a have a
fallback solution which makes the code a bit ugly. We should try to

see

if

this could be done with only the FileSystem.


This is indeed a relevant concern. One idea I had mentioned in the

original

design doc was to track committed/non-committed files through xattr (or
tags), which may have its own performance issues as explained by

Stephen

Wu, but is something that could be attempted.

Em qua., 19 de mai. de 2021 às 04:56, 张铎(Duo Zhang) <

palomino...@gmail.com



escreveu:


S3 now supports strong consistency, and I hear

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-24 Thread Josh Elser
rk on 3 or 5 major public

cloud

blobstore products as well as a smattering of on-prem technologies,

we

should be selective about what features we choose to rely on as
foundational to our implementation.

Or we are explicitly saying this will only work on S3 and we'll only
support other services when they can achieve this level of

compatibility.


Either way, we should be clear and up-front about what semantics we

demand.

Implementing some kind of a test harness that can check compatibility

would

help here, a similar effort to that of defining standard behaviors of

HDFS

implementations.

I love this discussion :)

And since the hfile list file will be very small, renaming will not

be

a

big problem.



Would this be a file per store? A file per region? Ah. Below you

imply

it's

per store.

Wellington Chevreuil  于2021年5月19日周三

下午10:43写道:


Thank you, Andrew and Duo,

Talking internally with Josh Elser, initial idea was to rebase the

feature

branch with master (in order to catch with latest commits), then

focus

on

work to have a minimal functioning hbase, in other words, together

with

the

already committed work from HBASE-25391, make sure flush,

compactions,

splits and merges all can take advantage of the persistent store

file

manager and complete with no need to rely on renames. These all map

to

the

substasks HBASE-25391, HBASE-25392 and HBASE-25393. Once we could

test

and

validate this works well for our goals, we can then focus on

snapshots,

bulkloading and tooling.

S3 now supports strong consistency, and I heard that they are also

implementing atomic renaming currently, so maybe that's one of the

reasons

why the development is silent now..


Interesting, I had no idea this was being implemented. I know,

however, a

version of this feature is already available on latest EMR releases

(at

least from 6.2.0), and AWS team has published their own blog post

with

their results:











https://aws.amazon.com/blogs/big-data/amazon-emr-6-2-0-adds-persistent-hfile-tracking-to-improve-performance-with-hbase-on-amazon-s3/


But I do not think store hfile list in meta is the only solution.

It

will

cause cyclic dependencies for hbase:meta, and then force us a have

a

fallback solution which makes the code a bit ugly. We should try

to

see

if

this could be done with only the FileSystem.


This is indeed a relevant concern. One idea I had mentioned in the

original

design doc was to track committed/non-committed files through xattr

(or

tags), which may have its own performance issues as explained by

Stephen

Wu, but is something that could be attempted.

Em qua., 19 de mai. de 2021 às 04:56, 张铎(Duo Zhang) <

palomino...@gmail.com



escreveu:


S3 now supports strong consistency, and I heard that they are also
implementing atomic renaming currently, so maybe that's one of the

reasons

why the development is silent now...

For me, I also think deploying hbase on cloud storage is the

future,

so I

would also like to participate here.

But I do not think store hfile list in meta is the only solution.

It

will

cause cyclic dependencies for hbase:meta, and then force us a have

a

fallback solution which makes the code a bit ugly. We should try

to

see

if

this could be done with only the FileSystem.

Thanks.

Andrew Purtell  于2021年5月19日周三 上午8:04写道:


Wellington (and et. al),

S3 is also an important piece of our future production plans.
Unfortunately,  we were unable to assist much with last year's

work,

on

account of being sidetracked by more immediate concerns.

Fortunately,

this

renewed interest is timely in that we have an HBase 2 project

where,

if

this can land in a 2.5 or a 2.6, it could be an important cost to

serve

optimization, and one we could and would make use of. Therefore I

would

like to restate my employer's interest in this work too. It may

just

be

Viraj and myself in the early days.

I'm not sure how best to collaborate. We could review changes

from

the

original authors, new changes, and/or divide up the development

tasks.

We

can certainly offer our time for testing, and can afford the

costs

of

testing against the S3 service.


On Tue, May 18, 2021 at 12:16 PM Wellington Chevreuil <
wellington.chevre...@gmail.com> wrote:


Greetings everyone,

HBASE-24749 has been proposed almost a year ago, introducing a

new

StoreFile tracker as a way to allow for any hbase hfile

modifications

to

be

safely completed without needing a file system rename. This

seems

pretty

relevant for deployments over S3 file systems, where rename

operations

are

not atomic and can have a performance degradation when multiple

requests

get concurrently submitted to the same bucket. We had done

superficial

tests and ycsb runs, where individual renames of files larger

than

5GB

can

take a few hundreds of seconds to complete. We also observed

impacts

in

write loads throughput, the bottleneck potentially being the

renames.


With S3 bein

Re: [ANNOUNCE] New HBase committer Geoffrey Jacoby

2021-04-09 Thread Josh Elser

Congrats and well-deserved, Geoffrey!

On 4/9/21 12:47 PM, Bharath Vissapragada wrote:

Congrats Geoffrey.

On Fri, Apr 9, 2021 at 9:25 AM Tak-Lon (Stephen) Wu 
wrote:


Congrats Geoffrey !

-Stephen

On Fri, Apr 9, 2021 at 8:53 AM Rushabh Shah
 wrote:


Congratulations Geoffrey !


Rushabh Shah

- Software Engineering LMTS | Salesforce
-
   - Mobile: 213 422 9052



On Fri, Apr 9, 2021 at 4:24 AM Viraj Jasani  wrote:


On behalf of the Apache HBase PMC I am pleased to announce that

Geoffrey

Jacoby has accepted the PMC's invitation to become a committer on the
project.

Thanks so much for the work you've been contributing. We look forward

to

your continued involvement.

Congratulations and welcome, Geoffrey!







[jira] [Resolved] (HBASE-22078) corrupted procs in proc WAL

2021-03-31 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-22078.

Resolution: Incomplete

Goin' looking at missing stacks for an instance I've just run into. Came across 
this -- expect it to not go anywhere after 2 years and no logs.

> corrupted procs in proc WAL
> ---
>
> Key: HBASE-22078
> URL: https://issues.apache.org/jira/browse/HBASE-22078
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> Not sure what the root cause is... there are ~500 proc wal files (I actually 
> wonder if cleanup is also blocked by this, since I see these lines on master 
> restart, do WALs with abandoned procedures like that get deleted?).
> {noformat}
> 2019-03-20 07:37:53,212 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7571, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7600, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7610, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7631, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7650, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7651, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7657, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7683, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> {noformat}
> Followed by 
> {noformat}
> 2019-03-20 07:37:53,751 ERROR [master/...:17000:becomeActiveMaster] 
> procedure2.ProcedureExecutor: Corrupt pid=66829, 
> state=WAITING:DISABLE_TABLE_ADD_REPLICATION_BARRIER, hasLock=false; 
> DisableTableProcedure table=...
> {noformat}
> And 1000s of child procedures and grandchild procedures of this procedure.
> I think this area needs general review... we should have a record for the 
> procedure durably persisted before we create any child procedures, so I'm not 
> sure how this could happen. Actually, I also wonder why we even have separate 
> proc WAL when HBase already has a working WAL that's more or less time 
> tested... 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25712) Port failure to close InputStream to 1.x

2021-03-29 Thread Josh Elser (Jira)
Josh Elser created HBASE-25712:
--

 Summary: Port failure to close InputStream to 1.x
 Key: HBASE-25712
 URL: https://issues.apache.org/jira/browse/HBASE-25712
 Project: HBase
  Issue Type: Sub-task
Reporter: Josh Elser
Assignee: Josh Elser


Port the parent issue (replication not closing a socket) to branch-1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25692) Failure to instantiate WALCellCodec leaks socket

2021-03-24 Thread Josh Elser (Jira)
Josh Elser created HBASE-25692:
--

 Summary: Failure to instantiate WALCellCodec leaks socket
 Key: HBASE-25692
 URL: https://issues.apache.org/jira/browse/HBASE-25692
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.4.2, 2.4.1, 2.3.4, 2.3.2, 2.2.6, 2.2.5, 2.4.0, 2.2.4, 
2.1.9, 2.3.3, 2.2.3, 2.1.8, 2.2.2, 2.1.7, 2.1.6, 2.2.1, 2.1.5, 2.0.6, 2.1.4, 
2.3.1, 2.3.0, 2.1.3, 2.1.2, 2.1.1, 2.2.0, 2.1.0
Reporter: Josh Elser
Assignee: Josh Elser


I was looking at an HBase user's cluster with [~danilocop] where they saw two 
otherwise identical clusters where one of them was regularly had sockets in 
CLOSE_WAIT going from RegionServers to a distributed storage appliance.

After a lot of analysis, we eventually figured out that these sockets in 
CLOSE_WAIT were directly related to an FSDataInputStream which we forgot to 
close inside of the RegionServer. The subtlety was that only one of these HBase 
clusters was set up to do replication (to the other cluster). The HBase cluster 
experiencing this problem was shipping edits to a peer, and had previously been 
using Phoenix. At some point, the cluster had Phoenix removed from it.

What we found was that replication still had WALs to ship which were for 
Phoenix tables. Phoenix, in this version, still used the custom WALCellCodec; 
however, this codec class was missing from the RS classpath after the owner of 
the cluster removed Phoenix.

When we try to instantiate the Codec implementation via ReflectionUtils, we end 
up throwing an UnsupportedOperationException which wraps a 
NoClassDefFoundException. However, in WALFactory, we _only_ close the 
FSDataInputStream when we catch an IOException. 

Thus, replication sits in a "fast" loop, trying to ship these edits, each time 
leaking a new socket because of the InputStream not being closed. There is an 
obvious workaround for this specific issue, but we should not leak this inside 
HBase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Updating the 'stable' pointer to 2.4.2

2021-03-18 Thread Josh Elser
Fun timing, as we've been trying to lift ourselves off of 2.2 and into 
2.3. Our Norbert has been hard at work with these efforts.


It's important to me that we do the best we can to stay in line with the 
rest of you doing great work closer to the tip.


While I go looking at the changelog on my own, any "notable" things from 
the previous 2.x version to go to 2.4 that would jump out? I know we 
have the procedure store moving into a region (rolling upgrade, forward 
only). The book doesn't have any 2.4 upgrade considerations.


Anything else I (or our operators) should read/care about?

On 3/17/21 4:48 PM, Andrew Purtell wrote:

I would like to propose we update the 'stable' release pointer, currently
pointing at 2.3.4, to 2.4.2.

In my testing with aggressive chaos and ITBLL (but in, unfortunately, due
to resource constraints, in small cluster settings of approximately 10
nodes) 2.4.2 is very stable.

Our sister project Phoenix has updated their build system to support
building against 2.4.1 and later, and the stability of their unit and
integration test suite is not impacted by any known HBase issue.

If there is other criteria that should be considered, I'd like for us to
discuss it. Does there need to be public acknowledgement of a production
user? At scale? (How would we know?) Would you like me to attempt an
at-scale test? On the order of 100 nodes might be possible? If so, what
should be the test scenario and criteria for success? What distinguishes
2.3.x (2.3.4) from 2.4.x (2.4.2) at this point? What would be the area(s)
of concern with respect to moving the stable pointer forward?



[jira] [Created] (HBASE-25601) Remove search hadoop references in book

2021-02-23 Thread Josh Elser (Jira)
Josh Elser created HBASE-25601:
--

 Summary: Remove search hadoop references in book
 Key: HBASE-25601
 URL: https://issues.apache.org/jira/browse/HBASE-25601
 Project: HBase
  Issue Type: Task
  Components: documentation
Reporter: Josh Elser
Assignee: Josh Elser


Remove references to this newly-owned domain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25449) 'dfs.client.read.shortcircuit' should not be set in hbase-default.xml

2021-01-08 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-25449.

Hadoop Flags: Reviewed
Release Note: The presence of HDFS short-circuit read configuration 
properties in hbase-default.xml inadvertently causes short-circuit reads to not 
happen inside of RegionServers, despite short-circuit reads being enabled in 
hdfs-site.xml.
  Resolution: Fixed

Thanks for a great fix (and test), [~shenshengli]!

> 'dfs.client.read.shortcircuit' should not be set in hbase-default.xml
> -
>
> Key: HBASE-25449
> URL: https://issues.apache.org/jira/browse/HBASE-25449
> Project: HBase
>  Issue Type: Improvement
>  Components: conf
>Affects Versions: 2.0.1
>Reporter: shenshengli
>Assignee: shenshengli
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.4.1, 2.3.5
>
>
> I think this parameter is not suitable for in hbase-default.xml, because in 
> this case, HDFS explicitly set to "dfs.client.read.shortcircuit=true", hbase 
> rely on HDFS configuration, the parameters in hbase service still is 
> false.Must be explicitly in hbase-site.xml is set to 
> "dfs.client.read.shortcircuit=true" to take effect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Clarifying guidance around Signed-off-by in commit messages

2020-12-01 Thread Josh Elser
The ask of this thread was originally to change the semantics of 
"signed-off-by" to only be "A committer who gave an explicit +1". That 
was the ask from a community member and why I started this.


I want to tease this apart from the "reviewed-by" suggestion, as this 
obviously needs a little more polishing. Specifically, this would change 
what you had stated as what we "don't care about" today -- a committer 
or community member can (today) be listed as the target of a Signed-off-by.


Are you OK with just the change of who can be listed in Signed-off-by, 
Nick, while we continue to circle around on how to ensure contributor 
reviews recognize merit? I would agree that we need to be sure that we 
have the tooling in place to ensure that merit is assigned equally 
(especially in a pull-request-centric world we now live in).


On 11/30/20 2:16 PM, Nick Dimiduk wrote:

Nice discussion here.

For my part, I am +1 for our community to define our meaning around this
aspect of metadata.

However, I don't like using both "signed-off-by" and "reviewed-by" as a
manual annotation on the part of the committer, because we as a community
don't care about the distinction between a committer and a community member
in this context. I think that enforcing correct usage of two metadata
annotations, without automation, will be error-prone. If we had some plan
to make use of this metadata, maybe it's worth it, but so far I see no
concrete plan to use this information. So why increase the burden on
committers?

On Sun, Nov 22, 2020 at 11:41 PM Yu Li  wrote:


TL;DR: +1 for document rules / guidance of review trailers in commit
message, and +1 for continuing using the signed-off-by message for
"reviewed by" and/or "co-authored-by" semantic (committers only), adding
explicit preamble in the "Git best practice" chapter in our hbase book [1].

I did some research around signed-off-by [2] [3], reviewed-by [3] and
co-authored-by [4], and would like to share my thoughts here:

1. We have been using signed-off-by as the "reviewed by" and/or
"co-authored by" semantic for a long time, starting from the review-board
era (long before github PR).
2. I second that our usage of signed-off-by is a bit of a perversion of the
original [2], thus adding preamble as clarification is necessary.
3. Git offers a signed-off-by switch (-s/--signoff) while no reviewed-by or
co-authored-by support yet, so we need to manually type the message if
choose to use Reviewed-by or Co-authored-by trailers, which means
additional efforts.
4. Based on #3, I suggest that contributors / committers are free but not
required to add "Reviewed-by" and / or "Co-authored-by" trailers manually.
5. Regarding recognizing the review efforts of (new) non-committer
contributors, I suggest we use the Github search [5] (and the commit
efforts as well [6]).

Best Regards,
Yu

[1] http://hbase.apache.org/book.html#git.best.practices
[2]

https://stackoverflow.com/questions/1962094/what-is-the-sign-off-feature-in-git-for
[3] https://wiki.samba.org/index.php/CodeReview#commit_message_tags
[4]

https://docs.github.com/en/free-pro-team@latest/github/committing-changes-to-your-project/creating-a-commit-with-multiple-authors
[5] https://github.com/apache/hbase/pulls?q=is%3Apr+involves%3Acarp84
[6] https://github.com/apache/hbase/commits?author=carp84

On Mon, 23 Nov 2020 at 04:06, Sean Busbey  wrote:


I expressly would like to see non-commiters given credit for reviews and
have made a point of including them in prior commits for signed-off-by to
do that.

I'm fine with the idea of us using some other means to indicate this, but
I'd like us to make sure there's not some already widely used bit of git
metadata we could use before picking our own.

It's kind of like when we moved away from amending author (I think that

was

the phrase?) To co authored by when github started pushing that as a way

to

show multiple authors on a commit.

One thing to keep in mind also is that a big stumbling block to our
consistent crediting of reviewers is a lack of tooling. Having to
distinguish between binding and non binding reviews for putting together
commit metadata will make that more complicated.

On Fri, Nov 20, 2020, 18:15 Stack  wrote:


Thanks for taking the time to do a write up Josh.

Looks good to me.

When Sean started in on the 'Signed-off-by:' I didn't get it

(especially

after reading the git definition). Sean then set me straight explaining

our

use is a bit of a perversion of the original. I notice his definition

is

not in the refguide. Suggest a sentence preamble definition of
'Signed-off-by:' and that we intentionally are different from the
definition cited by Bharath.

I like the Bharath idea on 'Reviewed-by' too. We can talk up

'Reviewed-by'

credits as a way to earn standing in the community, of how they are

given

weight evaluating whe

Re: [DISCUSS] Clarifying guidance around Signed-off-by in commit messages

2020-12-01 Thread Josh Elser
Yeah, that's the intent of what Bharath had suggested and I liked. In 
parallel, see other part of thread from Yu and Nick.


On 11/21/20 11:31 AM, Reid Chan wrote:

Does that mean:
Signed-off-by for binding +1 (from committer),
Reviewed-by for non-binding +1 (from volunteer)?

Sounds good to me.





--

Best regards,
R.C




From: Jan Hentschel 
Sent: 21 November 2020 19:37
To: dev@hbase.apache.org
Subject: Re: [DISCUSS] Clarifying guidance around Signed-off-by in commit 
messages

Also +1 for both suggestions as long as it is clear when to use which. Starting 
point (after the discussion) probably would be to include it in our ref guide.

From: Wellington Chevreuil 
Reply-To: "dev@hbase.apache.org" 
Date: Saturday, November 21, 2020 at 11:37 AM
To: dev 
Subject: Re: [DISCUSS] Clarifying guidance around Signed-off-by in commit 
messages

+1 for both suggestions ('Signed-off-by' and 'Reviewed-by');

Em sáb., 21 de nov. de 2020 às 00:15, Stack 
mailto:st...@duboce.net>> escreveu:

Thanks for taking the time to do a write up Josh.

Looks good to me.

When Sean started in on the 'Signed-off-by:' I didn't get it (especially
after reading the git definition). Sean then set me straight explaining our
use is a bit of a perversion of the original. I notice his definition is
not in the refguide. Suggest a sentence preamble definition of
'Signed-off-by:' and that we intentionally are different from the
definition cited by Bharath.

I like the Bharath idea on 'Reviewed-by' too. We can talk up 'Reviewed-by'
credits as a way to earn standing in the community, of how they are given
weight evaluating whether to make a candidate a committer/PMC'er or not.

S

On Fri, Nov 20, 2020 at 3:13 PM Josh Elser 
mailto:els...@apache.org>> wrote:


On 11/20/20 1:07 PM, Bharath Vissapragada wrote:

* All individuals mentioned in a sign-off*must*  be capable of giving

a

binding vote (i.e. they are an HBase committer)


It appears that the original intent
<



http://web.archive.org/web/20160507011446/http://gerrit.googlecode.com/svn/documentation/2.0/user-signedoffby.html<http://web.archive.org/web/20160507011446/http:/gerrit.googlecode.com/svn/documentation/2.0/user-signedoffby.html>

of
this sign-off feature in git mandates that the signing-off party to be

a

maintainer. So agree with you in theory. However, most times

non-committers

also give great feedback and help with the code review process (code
reviews, testing, perf etc). I think acknowledging their contribution

in

some form would be nice and that encourages potential-future-committers

to

actively review PRs IMO. So how about we annotate their names with
Reviewed-by tags? A related discussion
<https://lists.x.org/archives/xorg-devel/2009-October/003036.html>

on a

different open source project has more tag definitions if we are

interested

in taking that route.

(I know you are only talking about the "signed-off by" tag but I

thought

this discussion would be relevant when documenting this in the dev
guidelines, hence bringing it up). What do you think?


I would be happy with distinguishing Signed-off-by and Reviewed-by as a
way to better track metrics on contributors who review others' code.

Great idea!






[jira] [Resolved] (HBASE-24268) REST and Thrift server do not handle the "doAs" parameter case insensitively

2020-11-24 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-24268.

Hadoop Flags: Reviewed
Release Note: This change allows the REST and Thrift servers to handle the 
"doAs" parameter case-insensitively, which is deemed as correct per the 
"specification" provided by the Hadoop community.
  Resolution: Fixed

Thanks for the contribution, [~RichardAntal]!

> REST and Thrift server do not handle the "doAs" parameter case insensitively
> 
>
> Key: HBASE-24268
> URL: https://issues.apache.org/jira/browse/HBASE-24268
> Project: HBase
>  Issue Type: Bug
>  Components: REST, Thrift
>Reporter: Istvan Toth
>Assignee: Richard Antal
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Hadoop does a case-insensitve comparison on the doAs parameter name when 
> handling principal impersonation.
> The HBase Rest and Thrift servers do not do that, they only accept the "doAs" 
> form.
> According to HADOOP-11083, the the correct Hadoop behaviour is accepting doAs 
> in any case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Clarifying guidance around Signed-off-by in commit messages

2020-11-20 Thread Josh Elser

On 11/20/20 1:07 PM, Bharath Vissapragada wrote:

* All individuals mentioned in a sign-off*must*  be capable of giving a
binding vote (i.e. they are an HBase committer)


It appears that the original intent
of
this sign-off feature in git mandates that the signing-off party to be a
maintainer. So agree with you in theory. However, most times non-committers
also give great feedback and help with the code review process (code
reviews, testing, perf etc). I think acknowledging their contribution in
some form would be nice and that encourages potential-future-committers to
actively review PRs IMO. So how about we annotate their names with
Reviewed-by tags? A related discussion
  on a
different open source project has more tag definitions if we are interested
in taking that route.

(I know you are only talking about the "signed-off by" tag but I thought
this discussion would be relevant when documenting this in the dev
guidelines, hence bringing it up). What do you think?


I would be happy with distinguishing Signed-off-by and Reviewed-by as a 
way to better track metrics on contributors who review others' code.


Great idea!


[DISCUSS] Clarifying guidance around Signed-off-by in commit messages

2020-11-20 Thread Josh Elser

Hi!

As most of you know, we've been using the "Signed-off-by:  
" line in out commit messages more and more lately to indicate 
who reviewed some change.


We've recently had an event in which one of these Signed-off-by lines 
showed up with someone's name who didn't consider themselves to have 
signed-off on the change. This is akin to saying someone gave a +1 for 
some change when they did not. As an RTC community, that's worrisome.


I went reading the HBase book and was surprised to not find guidance on 
how we expect this to work, so I'd like to have some discussion about 
how we should treat these lines. I'll start this off by making 
suggestions about what seems reasonable to me.


When a committer is applying some change in a commit:

* All individuals mentioned in a sign-off *must* be capable of giving a 
binding vote (i.e. they are an HBase committer)
* Any individual in a sign-off *must* have given approval via an 
explicit "+1" or the "Approved" via the Github Pull Request review function.
* Approval *must* be publicly visible and memorialized on the code 
review (e.g. no private emails or chat message to give approval)
* The committer _should_ (not *must*) create a sign-off line for each 
binding reviewer who gave approval


I think these are generally how we have been operating, but it would be 
good to make sure they are documented as such.


Thoughts/concerns?

- Josh


[jira] [Created] (HBASE-25279) Non-daemon thread in ZKWatcher

2020-11-12 Thread Josh Elser (Jira)
Josh Elser created HBASE-25279:
--

 Summary: Non-daemon thread in ZKWatcher
 Key: HBASE-25279
 URL: https://issues.apache.org/jira/browse/HBASE-25279
 Project: HBase
  Issue Type: Bug
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 3.0.0-alpha-1


ZKWatcher spawns an ExecutorService which doesn't mark its threads as daemons 
which will prevent clean shut downs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25278) Add option to toggle CACHE_BLOCKS in count.rb

2020-11-12 Thread Josh Elser (Jira)
Josh Elser created HBASE-25278:
--

 Summary: Add option to toggle CACHE_BLOCKS in count.rb
 Key: HBASE-25278
 URL: https://issues.apache.org/jira/browse/HBASE-25278
 Project: HBase
  Issue Type: New Feature
  Components: shell
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 3.0.0-alpha-1, 2.4.0


A trick I've found myself doing a couple of times (hat-tip to [~psomogyi]) is 
to edit table.rb so that the `count` shell command will not instruct 
RegionServers to not cache any data blocks. This is a quick+dirty way to force 
a table to be loaded into block cache (i.e. for performance testing).

We can easily add another option to avoid having to edit the ruby files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: EOL 1.3.x?

2020-08-12 Thread Josh Elser

It's gone both ways, if memory serves.

On 8/11/20 2:43 PM, Zach York wrote:

+1

Do we typically do a final release or can we just EOL as is?

On Tue, Aug 11, 2020 at 9:43 AM Geoffrey Jacoby  wrote:


+1 (non-binding)

Geoffrey

On Tue, Aug 11, 2020 at 8:19 AM Josh Elser  wrote:


Sounds good to me.

On 8/11/20 4:01 AM, 张铎(Duo Zhang) wrote:

The last release for 1.3.x is 2019.10.20, which means we do not have a
release for this release line for about 10 months.

Let's make it EOL and tell users to at least upgrade to 1.4.x?

Thanks.









[jira] [Resolved] (HBASE-24860) Bump copyright year in NOTICE

2020-08-11 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-24860.

Resolution: Fixed

Pushed without review.

FYI [~zhangduo], in case you have to do a 3.4.0-rc2

> Bump copyright year in NOTICE
> -
>
> Key: HBASE-24860
> URL: https://issues.apache.org/jira/browse/HBASE-24860
> Project: HBase
>  Issue Type: Task
>  Components: thirdparty
>    Reporter: Josh Elser
>    Assignee: Josh Elser
>Priority: Trivial
> Fix For: thirdparty-3.5.0
>
>
> year in NOTICE is 2018, should be 2020



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24860) Bump copyright year in NOTICE

2020-08-11 Thread Josh Elser (Jira)
Josh Elser created HBASE-24860:
--

 Summary: Bump copyright year in NOTICE
 Key: HBASE-24860
 URL: https://issues.apache.org/jira/browse/HBASE-24860
 Project: HBase
  Issue Type: Task
  Components: thirdparty
Reporter: Josh Elser
Assignee: Josh Elser






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24858) Builds without a `clean` phase fail on hbase-shaded-jetty

2020-08-11 Thread Josh Elser (Jira)
Josh Elser created HBASE-24858:
--

 Summary: Builds without a `clean` phase fail on hbase-shaded-jetty
 Key: HBASE-24858
 URL: https://issues.apache.org/jira/browse/HBASE-24858
 Project: HBase
  Issue Type: Bug
Reporter: Josh Elser
Assignee: Josh Elser


In 3.4.0-rc1 that [~zhangduo] created, I noticed that builds of the project 
failed on hbase-shaded-jetty when I did not include {{clean}} in my build 
command.

{noformat}
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time:  3.604 s
[INFO] Finished at: 2020-08-11T14:38:45-04:00
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-shade-plugin:3.2.4:shade (default) on project 
hbase-shaded-jetty: Error creating shaded jar: duplicate entry: 
META-INF/services/org.apache.hbase.thirdparty.org.eclipse.jetty.http.HttpFieldPreEncoder
 -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
{noformat}

Poking around, it appears that this is actually to do with the creating of the 
shaded source jar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Second release candidate for hbase-thirdparty 3.4.0 is available for download

2020-08-11 Thread Josh Elser

+1 (binding)

Everything with 3.4.0 looks fine to me. I used the staged artifacts at 
the URL which Viraj corrected. A couple of non-blocking issues.


* NOTICE needs updating (2018 year)
* hbase-thirdparty build fails w/o a `clean` (see below)
* TestRESTApiClusterManager failed in the hbase build, looking for 
non-shaded jersey. Not a blocker for this release (also below)



```
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-shade-plugin:3.2.4:shade (default) on 
project hbase-shaded-jetty: Error creating shaded jar: duplicate entry: 
META-INF/services/org.apache.hbase.thirdparty.org.eclipse.jetty.http.HttpFieldPreEncoder 
-> [Help 1]

```

```
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
0.308 s <<< FAILURE! - in org.apache.hadoop.hbase.TestRESTApiClusterManager
[ERROR] 
org.apache.hadoop.hbase.TestRESTApiClusterManager.isRunningPositive 
Time elapsed: 0.165 s  <<< ERROR!
java.lang.RuntimeException: java.lang.ClassNotFoundException: 
org.glassfish.jersey.client.JerseyClientBuilder
	at 
org.apache.hadoop.hbase.TestRESTApiClusterManager.before(TestRESTApiClusterManager.java:63)
Caused by: java.lang.ClassNotFoundException: 
org.glassfish.jersey.client.JerseyClientBuilder
	at 
org.apache.hadoop.hbase.TestRESTApiClusterManager.before(TestRESTApiClusterManager.java:63)

```

I'll try to help out fixing these.

On 8/8/20 11:02 AM, 张铎(Duo Zhang) wrote:

Please vote on this Apache hbase thirdparty release candidate,
hbase-thirdparty-3.4.0RC1

The VOTE will remain open for at least 72 hours.

[ ] +1 Release this package as Apache hbase thirdparty 3.4.0
[ ] -1 Do not release this package because ...

The tag to be voted on is 3.4.0RC1:

   https://github.com/apache/hbase-thirdparty/tree/3.4.0RC1

The release files, including signatures, digests, as well as CHANGES.md
and RELEASENOTES.md included in this RC can be found at:

   https://dist.apache.org/repos/dist/dev/hbase/3.4.0RC1/

Maven artifacts are available in a staging repository at:

   https://repository.apache.org/content/repositories/orgapachehbase-1403/

Artifacts were signed with the 9AD2AE49 key which can be found in:

   https://dist.apache.org/repos/dist/release/hbase/KEYS

We shade jetty and jersey in this version for addressing the conflicts
between jetty 9.3 and 9.4, as hadoop recently upgraded its jetty from 9.3
to 9.4 in patch releases.

To learn more about Apache hbase thirdparty, please see

   http://hbase.apache.org/

Thanks,
Your HBase Release Manager



Re: EOL 1.3.x?

2020-08-11 Thread Josh Elser

Sounds good to me.

On 8/11/20 4:01 AM, 张铎(Duo Zhang) wrote:

The last release for 1.3.x is 2019.10.20, which means we do not have a
release for this release line for about 10 months.

Let's make it EOL and tell users to at least upgrade to 1.4.x?

Thanks.



[jira] [Resolved] (HBASE-24834) TestReplicationSource.testWALEntryFilter failing in branch-2+

2020-08-11 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-24834.

Resolution: Cannot Reproduce

Maybe fixed by HBASE-24830?

> TestReplicationSource.testWALEntryFilter failing in branch-2+
> -
>
> Key: HBASE-24834
> URL: https://issues.apache.org/jira/browse/HBASE-24834
> Project: HBase
>  Issue Type: Task
>  Components: Replication, test
>    Reporter: Josh Elser
>    Assignee: Josh Elser
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> {noformat}
> [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 46.497 s <<< FAILURE! - in 
> org.apache.hadoop.hbase.replication.regionserver.TestReplicationSource
> [ERROR] 
> org.apache.hadoop.hbase.replication.regionserver.TestReplicationSource.testWALEntryFilter
>   Time elapsed: 1.405 s  <<< FAILURE!
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hbase.replication.regionserver.TestReplicationSource.testWALEntryFilter(TestReplicationSource.java:177)
>  {noformat}
> Noticed this during HBASE-24779, but didn't fix it. Believe it comes from 
> HBASE-24817.
> Looks to be that the edit we expect to not be filtered is actually getting 
> filtered because it has no cells (thus, no change and doesn't need to be 
> replicated). Should be a simple fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24834) TestReplicationSource.testWALEntryFilter failing in branch-2+

2020-08-07 Thread Josh Elser (Jira)
Josh Elser created HBASE-24834:
--

 Summary: TestReplicationSource.testWALEntryFilter failing in 
branch-2+
 Key: HBASE-24834
 URL: https://issues.apache.org/jira/browse/HBASE-24834
 Project: HBase
  Issue Type: Task
  Components: Replication, test
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 3.0.0-alpha-1, 2.4.0


{noformat}
[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 46.497 
s <<< FAILURE! - in 
org.apache.hadoop.hbase.replication.regionserver.TestReplicationSource
[ERROR] 
org.apache.hadoop.hbase.replication.regionserver.TestReplicationSource.testWALEntryFilter
  Time elapsed: 1.405 s  <<< FAILURE!
java.lang.AssertionError
at 
org.apache.hadoop.hbase.replication.regionserver.TestReplicationSource.testWALEntryFilter(TestReplicationSource.java:177)
 {noformat}

Noticed this during HBASE-24779, but didn't fix it. Believe it comes from 
HBASE-24817.

Looks to be that the edit we expect to not be filtered is actually getting 
filtered because it has no cells (thus, no change and doesn't need to be 
replicated). Should be a simple fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24779) Improve insight into replication WAL readers hung on checkQuota

2020-07-27 Thread Josh Elser (Jira)
Josh Elser created HBASE-24779:
--

 Summary: Improve insight into replication WAL readers hung on 
checkQuota
 Key: HBASE-24779
 URL: https://issues.apache.org/jira/browse/HBASE-24779
 Project: HBase
  Issue Type: Task
Reporter: Josh Elser
Assignee: Josh Elser


Helped a customer this past weekend who, with a large number of RegionServers, 
has some RegionServers which replicated data to a peer without issues while 
other RegionServers did not.

The number of queue logs varied over the past 24hrs in the same manner. Some 
spikes in queued logs into 100's of logs, but other times, only 1's-10's of 
logs were queued.

We were able to validate that there were "good" and "bad" RegionServers by 
creating a test table, assigning it to a regionserver, enabling replication on 
that table, and validating if the local puts were replicated to a peer. On a 
good RS, data was replicated immediately. On a bad RS, data was never 
replicated (at least, on the order of 10's of minutes which we waited).

On the "bad RS", we were able to observe that the \{{wal-reader}} thread(s) on 
that RS were spending time in a Thread.sleep() in a different location than the 
other. Specifically it was sitting in the 
{{ReplicationSourceWALReader#checkQuota()}}'s sleep call, _not_ the 
{{handleEmptyWALBatch()}} method on the same class.

My only assumption is that, somehow, these RegionServers got into a situation 
where they "allocated" memory from the quota but never freed it. Then, because 
the WAL reader thinks it has no free memory, it blocks indefinitely and there 
are no pending edits to ship and (thus) free that memory. A cursory glance at 
the code gives me a _lot_ of anxiety around places where we don't properly 
clean it up (e.g. batches that fail to ship, dropping a peer). As a first stab, 
let me add some more debugging so we can actually track this state properly for 
the operators and their sanity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] HBASE-24749

2020-07-22 Thread Josh Elser
Yeah, that's the struggle with the multiple branches -- we want to see 
our changes in a version of HBase we're using, but that may not be the 
right place to land the changes :)


Since this is an "opt-in" and you obviously have _something_ working 
(given the benchmarks), I'd suggest breaking down the work into some 
milestones you can track. Set "exit criteria" for each: what do you 
expect should work when that milestone is "done"? Bulk loads don't have 
to come right away, but should be there before you can call the feature 
"done".


Another benefit is that this will make it a bit more manageable for 
others to get involved and poke at it.


I can also see "fold new system table into hbase:meta" as a later 
milestone. I think if you can show this works with its own table, it 
should be much easier to just fold that into meta than building the 
initial feature :)


On 7/22/20 3:38 AM, Tak-Lon (Stephen) Wu wrote:

Thanks Josh, and yeah object store is a bit different lol.

the major reason we didn't try to fold that into meta table were that
we don't know how well meta table can be scale, e.g. as Stack
mentioned about a previous design in HBASE-14090, it matches our
initial estimate that these piece of new data could be vary from 100+
MB-level to ~5 GB-level. With the splitting meta table and meta table
could be handling more work, we'd definitely move that into meta.
(side note we started with branch-2.2 :p )

good call on bulk load, thanks. Also, we will try to support snapshot
related features well.

-Stephen



On Tue, Jul 21, 2020 at 4:54 PM Josh Elser  wrote:


Oh, and don't forget, you have to update bulk load to work with this
approach.

Never knew that we had a utility to pick up files that folks wrote
directly into the hbase.rootdir (RefreshHFilesClient). I am 110% behind
ripping that out. We have bulk loading as the supported path for a reason :)

On 7/21/20 1:45 PM, Tak-Lon (Stephen) Wu wrote:

Hi guys,

I'm sending this email to get more comments and thoughts from the dev@list
for an open discussion item on HBASE-24749
<https://issues.apache.org/jira/browse/HBASE-24749>.

mainly we're proposing a feature with a new store engine to skip the use of
.tmp directory in the HFile commit stage and write directly to data
directory.

The proposal doc
<https://issues.apache.org/jira/secure/attachment/13008049/Apache%20HBase%20-%20Direct%20insert%20HFiles%20and%20Persist%20in-memory%20HFile%20tracking.pdf>
is on the JIRA and we have provided initial results
<https://issues.apache.org/jira/secure/attachment/13008050/1B100m-25m25m-performance.pdf>
with YCSB 25m and 1B that shows it's positive with the changes.

Improvement Highlights
1. Lower write latency, especially the p99+
2. Higher write throughput on flush and compaction
3. Lower MTTR on region (re)open or assignment
4. Remove consistent check dependencies (e.g. DynamoDB) supported by file
system implementation

Again, any suggestions are welcomed.

Thanks,
Stephen



[jira] [Resolved] (HBASE-22146) SpaceQuotaViolationPolicy Disable is not working in Namespace level

2020-07-21 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-22146.

Resolution: Fixed

Thanks for your patch and thoughtful tests, Surbhi!

> SpaceQuotaViolationPolicy Disable is not working in Namespace level
> ---
>
> Key: HBASE-22146
> URL: https://issues.apache.org/jira/browse/HBASE-22146
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: Uma Maheswari
>Assignee: Surbhi Kochhar
>Priority: Major
>  Labels: Quota, space
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0, 2.2.7
>
>
> SpaceQuotaViolationPolicy Disable is not working in Namespace level
> PFB the steps:
>  * Create Namespace and set Quota violation policy as Disable
>  * Create tables under namespace and violate Quota
> Expected result: Tables to get disabled
> Actual Result: Tables are not getting disabled
> Note: mutation operation is not allowed on the table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] HBASE-24749

2020-07-21 Thread Josh Elser
Oh, and don't forget, you have to update bulk load to work with this 
approach.


Never knew that we had a utility to pick up files that folks wrote 
directly into the hbase.rootdir (RefreshHFilesClient). I am 110% behind 
ripping that out. We have bulk loading as the supported path for a reason :)


On 7/21/20 1:45 PM, Tak-Lon (Stephen) Wu wrote:

Hi guys,

I'm sending this email to get more comments and thoughts from the dev@list
for an open discussion item on HBASE-24749
.

mainly we're proposing a feature with a new store engine to skip the use of
.tmp directory in the HFile commit stage and write directly to data
directory.

The proposal doc

is on the JIRA and we have provided initial results

with YCSB 25m and 1B that shows it's positive with the changes.

Improvement Highlights
1. Lower write latency, especially the p99+
2. Higher write throughput on flush and compaction
3. Lower MTTR on region (re)open or assignment
4. Remove consistent check dependencies (e.g. DynamoDB) supported by file
system implementation

Again, any suggestions are welcomed.

Thanks,
Stephen



Re: [DISCUSS] HBASE-24749

2020-07-21 Thread Josh Elser
Great idea -- big problem on slow storage :) (but I'm sure I'm not 
telling you anything new).


A quick quesiton: with the split-able meta work going on, any reason to 
not put these files in meta itself (rather than yet-another-system-table)?


FWIW, listing the files for each region in the meta table is what 
Accumulo does today. While it's been a while, the approach generally 
worked well.


On 7/21/20 1:45 PM, Tak-Lon (Stephen) Wu wrote:

Hi guys,

I'm sending this email to get more comments and thoughts from the dev@list
for an open discussion item on HBASE-24749
.

mainly we're proposing a feature with a new store engine to skip the use of
.tmp directory in the HFile commit stage and write directly to data
directory.

The proposal doc

is on the JIRA and we have provided initial results

with YCSB 25m and 1B that shows it's positive with the changes.

Improvement Highlights
1. Lower write latency, especially the p99+
2. Higher write throughput on flush and compaction
3. Lower MTTR on region (re)open or assignment
4. Remove consistent check dependencies (e.g. DynamoDB) supported by file
system implementation

Again, any suggestions are welcomed.

Thanks,
Stephen



Re: HBase 2 slower than HBase 1?

2020-07-14 Thread Josh Elser

Wow. Great stuff, Andrew!

Thank you for compiling and posting it all here. I can only imagine how 
time-consuming this was.


On 6/26/20 1:57 PM, Andrew Purtell wrote:

Hey Anoop, I opened https://issues.apache.org/jira/browse/HBASE-24637 and
attached the patches and script used to make the comparison.

On Fri, Jun 26, 2020 at 2:33 AM Anoop John  wrote:


Great investigation Andy.  Do you know any Jiras which made changes in SQM?
Would be great if you can attach your patch which tracks the scan flow.  If
we have a Jira for this issue, can you pls attach?

Anoop

On Fri, Jun 26, 2020 at 1:56 AM Andrew Purtell 
wrote:


Related, I think I found a bug in branch-1 where we don’t heartbeat in

the

filter all case until we switch store files, so scanning a very large

store

file might time out with client defaults. Remarking on this here so I

don’t

forget to follow up.


On Jun 25, 2020, at 12:27 PM, Andrew Purtell 

wrote:



I repeated this test with pe --filterAll and the results were

revealing,

at least for this case. I also patched in thread local hash map for

atomic

counters that I could update from code paths in SQM, StoreScanner,
HFileReader*, and HFileBlock. Because a RPC is processed by a single
handler thread I could update counters and accumulate micro-timings via
System#nanoTime() per RPC and dump them out of CallRunner in some new

trace

logging. I spent a couple of days making sure the instrumentation was
placed equivalently in both 1.6 and 2.2 code bases and was producing
consistent results. I can provide these patches upon request.


Again, test tables with one family and 1, 5, 10, 20, 50, and 100

distinct column-qualifiers per row. After loading the table I made a
snapshot and cloned the snapshot for testing, for both 1.6 and 2.2, so

both

versions were tested using the exact same data files on HDFS. I also used
the 1.6 version of PE for both, so the only change is on the server (1.6

vs

2.2 masters and regionservers).


It appears a refactor to ScanQueryMatcher and friends has disabled the

ability of filters to provide SKIP hints, which prevents us from

bypassing

version checking (so some extra cost in SQM), and appears to disable an
optimization that avoids reseeking, leading to a serious and proportional
regression in reseek activity and time spent in that code path. So for
queries that use filters, there can be a substantial regression.


Other test cases that did not use filters did not show a regression.

A test case where I used ROW_INDEX_V1 encoding showed an expected

modest

proportional regression in seeking time, due to the fact it is optimized
for point queries and not optimized for the full table scan case.


I will come back here when I understand this better.

Here are the results for the pe --filterAll case:


1.6.0 c1  2.2.5 c1
1.6.0 c5  2.2.5 c5
1.6.0 c10 2.2.5 c10
1.6.0 c20 2.2.5 c20
1.6.0 c50 2.2.5 c50
1.6.0 c1002.2.5 c100
Counts






















 (better heartbeating)
 (better heartbeating)
 (better heartbeating)
 (better heartbeating)
 (better heartbeating)
rpcs  1   2   200%2   6   300%2   10

500%3   17  567%4   37  925%8   72

900%

block_reads   11507   11508   100%57255   57257   100%114471

114474  100%230372  230377  100%578292  578298  100%1157955
1157963 100%

block_unpacks 11507   11508   100%57255   57257   100%114471

114474  100%230372  230377  100%578292  578298  100%1157955
1157963 100%

seeker_next   10001000100%5000

5000100%1   1   100%2
  2   100%5   5   100%
10  10  100%

store_next10009988268 100%500049940082

   100%1   99879401100%2
  199766539   100%5   499414653   100%
10  998836518   100%

store_reseek  1   11733   > ! 2   59924   > ! 8

  120607  > ! 6   233467  > ! 10  585357  > ! 8
  1163490 > !




















cells_matched 20002000100%6000

6000100%11000   11000   100%21000
  21000   100%51000   51000   100%
101000  101000  100%

column_hint_include   10001000100%5000

   5000100%1   1   100%
2   2   100%5   5
  100%10  10  100%

filter_hint_skip  10001000100%5000

   5000100%1   1   100%
2   2   100%5   5
  100%10  10  100%

sqm_hint_done 999 999 

Re: [DISCUSS] Removing problematic terms from our project

2020-06-22 Thread Josh Elser

+1

On 6/22/20 4:03 PM, Sean Busbey wrote:

We should change our use of these terms. We can be equally or more clear in
what we are trying to convey where they are present.

That they have been used historically is only useful if the advantage we
gain from using them through that shared context outweighs the potential
friction they add. They make me personally less enthusiastic about
contributing. That's enough friction for me to advocate removing them.

AFAICT reworking our replication stuff in terms of "active" and "passive"
clusters did not result in a big spike of folks asking new questions about
where authority for state was.

On Mon, Jun 22, 2020, 13:39 Andrew Purtell  wrote:


In response to renewed attention at the Foundation toward addressing
culturally problematic language and terms often used in technical
documentation and discussion, several projects have begun discussions, or
made proposals, or started work along these lines.

The HBase PMC began its own discussion on private@ on June 9, 2020 with an
observation of this activity and this suggestion:

There is a renewed push back against classic technology industry terms that
have negative modern connotations.

In the case of HBase, the following substitutions might be proposed:

- Coordinator instead of master

- Worker instead of slave

Recommendations for these additional substitutions also come up in this
type of discussion:

- Accept list instead of white list

- Deny list instead of black list

Unfortunately we have Master all over our code base, baked into various
APIs and configuration variable names, so for us the necessary changes
amount to a new major release and deprecation cycle. It could well be worth
it in the long run. We exist only as long as we draw a willing and
sufficient contributor community. It also wouldn’t be great to have an
activist fork appear somewhere, even if unlikely to be successful.

Relevant JIRAs are:

- HBASE-12677 :
Update replication docs to clarify terminology
- HBASE-13852 :
Replace master-slave terminology in book, site, and javadoc with a more
modern vocabulary
- HBASE-24576 :
Changing "whitelist" and "blacklist" in our docs and project

In response to this proposal, a member of the PMC asked if the term
'master' used by itself would be fine, because we only have use of 'slave'
in replication documentation and that is easily addressed. In response to
this question, others on the PMC suggested that even if only 'master' is
used, in this context it is still a problem.

For folks who are surprised or lacking context on the details of this
discussion, one PMC member offered a link to this draft RFC as background:
https://tools.ietf.org/id/draft-knodel-terminology-00.html

There was general support for removing the term "master" / "hmaster" from
our code base and using the terms "coordinator" or "leader" instead. In the
context of replication, "worker" makes less sense and perhaps "destination"
or "follower" would be more appropriate terms.

One PMC member's thoughts on language and non-native English speakers is
worth including in its entirety:

While words like blacklist/whitelist/slave clearly have those negative
references, word master might not have the same impact for non native
English speakers like myself where the literal translation to my mother
tongue does not have this same bad connotation. Replacing all references
for word *master *on our docs/codebase is a huge effort, I guess such a
decision would be more suitable for native English speakers folks, and
maybe we should consider the opinion of contributors from that ethinic
minority as well?

These are good questions for public discussion.

We have a consensus in the PMC, at this time, that is supportive of making
the above discussed terminology changes. However, we also have concerns
about what it would take to accomplish meaningful changes. Several on the
PMC offered support in the form of cycles to review pull requests and
patches, and two PMC members offered  personal bandwidth for creating and
releasing new code lines as needed to complete a deprecation cycle.

Unfortunately, the terms "master" and "hmaster" appear throughout our code
base in class names, user facing API subject to our project compatibility
guidelines, and configuration variable names, which are also implicated by
compatibility guidelines given the impact of changes to operators and
operations. The changes being discussed are not backwards compatible
changes and cannot be executed with swiftness while simultaneously
preserving compatibility. There must be a deprecation cycle. First, we must
tag all implicated public API and configuration variables as deprecated,
and release HBase 3 with these deprecations in place. Then, we must
undertake rename and removal as appropriate, and release the 

[jira] [Resolved] (HBASE-19365) FSTableDescriptors#get() can return null reference, in some cases, it is not checked

2020-06-17 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-19365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-19365.

Hadoop Flags: Reviewed
  Resolution: Fixed

Thanks for the review, [~vjasani]!

> FSTableDescriptors#get() can return null reference, in some cases, it is not 
> checked
> 
>
> Key: HBASE-19365
> URL: https://issues.apache.org/jira/browse/HBASE-19365
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.6
>Reporter: Hua Xiang
>Assignee: Josh Elser
>Priority: Minor
> Fix For: 1.7.0
>
>
> In one of our cases, 1.2.0 based master could not start because the null 
> reference is not checked. Master crashed because of the following exception.
> {code}
> 2017-11-20 08:30:20,178 FATAL org.apache.hadoop.hbase.master.HMaster: Failed 
> to become active master
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2993)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:494)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:821)
> at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:192)
> at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1827)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24235) Java client with IBM JDK does not work if HBase is configured with Kerberos

2020-06-17 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-24235.

Resolution: Incomplete

> Java client with IBM JDK does not work if HBase is configured with Kerberos
> ---
>
> Key: HBASE-24235
> URL: https://issues.apache.org/jira/browse/HBASE-24235
> Project: HBase
>  Issue Type: Bug
>  Components: Client, java
>Affects Versions: 2.1.0
>Reporter: Mubashir Kazia
>Priority: Major
>
> When a java HBase client is run with IBM JDK connecting to a HBase cluster 
> configured with Kerberos Authentication, the client fails to connect to 
> HBase. The client is using {{UGI.loginUserFromKeytab(principal, keytab) }} to 
> get a Kerberos ticket and then it is creating create connection, table, 
> scanner and iterate. Code works fine with Oracle/Open JDK. It fails when run 
> with IBM JDK.
> Following exception is found in logs with DEBUG level logging:
> {code:java}
> DEBUG client.RpcRetryingCallerImpl: Call exception, tries=6, retries=11, 
> started=4700 ms ago, cancelled=false, msg=Call to 
> nightly6x-1.nightly6x.root.hwx.site/172.27.21.201:22101 failed on local 
> exception: javax.security.sasl.SaslException: Failure to initialize security 
> context [Caused by org.ietf.jgss.GSSException, major code: 13, minor code: 0
> major string: Invalid credentials
> minor string: SubjectCredFinder: no JAAS Subject], details=row 
> 'users,,99' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, 
> hostname=nightly6x-1.nightly6x.root.hwx.site,22101,1587511170413, seqNum=-1, 
> see https://s.apache.org/timeout, 
> exception=javax.security.sasl.SaslException: Call to 
> nightly6x-1.nightly6x.root.hwx.site/172.27.21.201:22101 failed on local 
> exception: javax.security.sasl.SaslException: Failure to initialize security 
> context [Caused by org.ietf.jgss.GSSException, major code: 13, minor code: 0
> major string: Invalid credentials
> minor string: SubjectCredFinder: no JAAS Subject] [Caused by 
> javax.security.sasl.SaslException: Failure to initialize security context 
> [Caused by org.ietf.jgss.GSSException, major code: 13, minor code: 0
> major string: Invalid credentials
> minor string: SubjectCredFinder: no JAAS Subject]]
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:83)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:57)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:437)
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:220)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
> at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
> at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
> at 
> org.apache.hadoop.hbase.ipc.BufferCallBeforeInitHandler.userEventTriggered(BufferCallBeforeInitHandler.java:92)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:326)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:312)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:304)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1426)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:326)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:312)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:924)
> at 
> org.apache.hadoop.hbase.ipc.N

Re: apply

2020-06-15 Thread Josh Elser

Hi,

Anyone is able to contribute to the project. When you submit your first 
code change, we'll make sure that the appropriate Jira group(s) are 
updated to allow you to be the "Assignee".


Thanks for your interest.

On 6/14/20 6:13 PM, wang120445...@sina.com wrote:


hi hbase:
I want to apply Contributor's  List.
end


wang120445...@sina.com



Re: [DISCUSS] Change the IA for MutableSizeHistogram and MutableTimeHistogram to LImitedPrivate

2020-06-15 Thread Josh Elser

+1 to "let's do both"

FWIW, our stance on semver is to "strive to follow it", not "must follow 
it". I think when we have a concrete change we can look at, we could 
make the call to say whether or not we feel comfortable bringing this 
into maintenance release lines. The circumstances of this request 
certainly feel "sufficient" to me to push a new facade into a bugfix 
release (because it would increase later stability by getting Phoenix 
off private API).


On 6/11/20 3:07 PM, Andrew Purtell wrote:

That's unfortunate, but needs must, IMHO.

A potential benefit of also marking the impls LP(COPROC) is this captures
any implicit dependency on semantics and functionality of the
implementation classes not directly exposed in the hbase-metrics-api facade.

So, let's do both? (Facade improvement, raise to LP the impl classes)


On Thu, Jun 11, 2020 at 12:00 PM Geoffrey Jacoby  wrote:


Couple points:

1. I like Andrew's proposed solution, and we should do it, but I'm not sure
it's sufficient for Rushabh's purposes because of semver rules. Phoenix
supports HBase 1.3 -1.5 (soon to add 1.6) and HBase 2.0 (soon to gain 2.1
and 2.2, with 2.3 coming shortly after its release here.) If we add the new
sizeHistogram and timeHistogram methods to hbase-metrics, they'll be
available in Phoenix only in HBase 1.7 and 2.4. (since 2.3 is
mostly-frozen)

  Since Phoenix will be supporting earlier versions of both HBase branches
for a good while, there will need to be a compatibility shim. And the
older-version instance of the shim will probably need to access the classes
directly. (Please correct me if I'm wrong, Rushabh or Andrew.) So it still
might need a LimitedPrivate IA.

2. I agree with Nick that it's better to use LimitedPrivate.COPROC rather
than LimitedPrivate.PHOENIX.

Geoffrey



On Thu, Jun 11, 2020 at 11:28 AM Josh Elser  wrote:


Sounds reasonable to me!

On 6/11/20 1:06 PM, Andrew Purtell wrote:

hbase-metrics-api is available for coprocessors already and interfaces
within are already LimitedPrivate(COPROC). However, that package is

mostly

interface and seems geared toward consuming metrics instantiated and
registered via private stuff. Or, rather, I didn't see how Phoenix

could

choose

which of MutableSizeHistogram and MutableTimeHistogram to instantiate

using

those interfaces, there is only Histogram

MetricRegistry#histogram(String

name). So I think it is also worth some time to review the utility of
hbase-metrics-api and decide if more need be done there. Would the

addition

of

Histogram MetricRegistry#sizeHistogram(String name)
Histogram MetricRegistry#timeHistogram(String name)

achieve the desired objective instead?


On Thu, Jun 11, 2020 at 9:16 AM Nick Dimiduk 

wrote:



I was just about to reply with the same -- Josh is faster :) +1 on
considering the full surface area of the APIs being exposed.

I also wonder if exposing the metrics infrastructure is something of
interest more broadly than Phoenix. Seems like any coprocessor might

want

to provide or monitor some metric value.

On Thu, Jun 11, 2020 at 9:08 AM Josh Elser  wrote:


My only concern is that you can't just mark these two classes a
LimitedPrivate for Phoenix -- you would also have to mark
MutableRangeHistogram, MutableHistogram (and the rest of the class
hierarchy) to make sure that we don't make it super confusing as to

what

comes from LimitedPrivate classes and what is coming from Private

classes.


Would it be better to just say: make
./hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/lib
LimitedPrivate?

Do you also need the stuff in
hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase to push
metrics back through the HBase metrics subsystem?

Sorry for the late reply. Just want to make sure we open up the
audience, we open it sufficiently.

On 6/8/20 1:15 PM, Rushabh Shah wrote:

Hi,
Currently the IA for MutableSizeHistogram and MutableTimeHistogram

is

private. We want to use these classes in PHOENIX project and I

thought

we

can leverage the existing implementation from hbase histo

implementation.

IIUC the private IA can't be used in other projects. Proposing to

make

it

LimitedPrivate and mark HBaseInterfaceAudience.PHOENIX. Please

suggest.

Related jira: https://issues.apache.org/jira/browse/HBASE-24520

















[jira] [Resolved] (HBASE-23195) FSDataInputStreamWrapper unbuffer can NOT invoke the classes that NOT implements CanUnbuffer but its parents class implements CanUnbuffer

2020-06-12 Thread Josh Elser (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-23195.

Hadoop Flags: Reviewed
  Resolution: Fixed

Thanks for the work, [~zhaoyim]! Sorry it took so long to get it committed. 
Thanks for sticking with it.

> FSDataInputStreamWrapper unbuffer can NOT invoke the classes that NOT 
> implements CanUnbuffer but its parents class implements CanUnbuffer 
> --
>
> Key: HBASE-23195
> URL: https://issues.apache.org/jira/browse/HBASE-23195
> Project: HBase
>  Issue Type: Bug
>  Components: io
>Affects Versions: 2.0.2
>Reporter: Zhao Yi Ming
>Assignee: Zhao Yi Ming
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.4.0, 2.2.6
>
>
> FSDataInputStreamWrapper unbuffer can NOT invoke the classes that NOT 
> implements CanUnbuffer but its parents class implements CanUnbuffer
> For example:
> There are 1 interface I1 and one class implements I1 named PC1 and the class 
> C1 extends from PC1
> If we want to invoke the C1 unbuffer() method the FSDataInputStreamWrapper 
> unbuffer  can NOT do that. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   >