[jira] [Created] (HBASE-28647) Support streams in org.apache.hadoop.hbase.rest.client.Client

2024-06-10 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28647:
---

 Summary: Support streams in 
org.apache.hadoop.hbase.rest.client.Client
 Key: HBASE-28647
 URL: https://issues.apache.org/jira/browse/HBASE-28647
 Project: HBase
  Issue Type: Improvement
Reporter: Istvan Toth


Support using stream for sending/receiving data in 
org.apache.hadoop.hbase.rest.client.Client .

Also update tests to use the new methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28646) Use Streams to unmarshall protobuf REST data

2024-06-10 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28646:
---

 Summary: Use Streams to unmarshall protobuf REST data
 Key: HBASE-28646
 URL: https://issues.apache.org/jira/browse/HBASE-28646
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


We've recently optimized REST marshalling by using streams directly.

We should do the same for unmarshalling.

The easy part is the server side, as that affects only a small set files.

However, we should also support streams on the client side, which requires 
duplicating each method the returns / expects a byte array to also work with 
streams.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28645) Add build information to the REST server version endpoint

2024-06-09 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28645:
---

 Summary: Add build information to the REST server version endpoint
 Key: HBASE-28645
 URL: https://issues.apache.org/jira/browse/HBASE-28645
 Project: HBase
  Issue Type: New Feature
  Components: REST
Reporter: Istvan Toth


There is currently no way to check the REST server version / build number 
remotely.

The */version/cluster* endpoint takes the version from master (fair enough),
and the */version/rest* does not include the build information.

We should add a version field to the /version/rest endpoint, which reports the 
version of the REST server component.

We should also log this at startup, just like we log the cluster version now.

We may have to add and store the version in the hbase-rest code during build, 
similarly to how do it for the other components.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28540) Cache Results in org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner

2024-06-07 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28540.
-
Fix Version/s: 2.7.0
   3.0.0-beta-2
   2.6.1
   2.5.9
   Resolution: Fixed

> Cache Results in org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner
> -
>
> Key: HBASE-28540
> URL: https://issues.apache.org/jira/browse/HBASE-28540
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> The implementation of org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner
> is very inefficient, as the standard next() methods makes separate a http 
> request for each row.
> Performance can be improved by not specifying the row count in the REST call 
> and caching the returned Results.
> Chunk size can still be influenced by scan.setBatch();



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Marking Filters based on their hinting / row stateful behaviours.

2024-06-04 Thread Istvan Toth
Committed the discussed fix as HBASE-28622 .
Thank you Andrew for discussing this, and Duo for the review.

Istvan

On Fri, May 31, 2024 at 2:36 PM Istvan Toth  wrote:

> It turns out that ColumnPaginationFilter is both row stateful and can
> return a seek hint.
> I have removed the HintingFilter marker from it to preserve the correct
> operation.
>
> With this change, ColumnPaginationFilter is no worse off than it was, but
> the rest of the hinting
> filters will work correctly.
>
> On Fri, May 31, 2024 at 9:32 AM Istvan Toth  wrote:
>
>> This is indeed quite a small change.
>> The PR is up at https://github.com/apache/hbase/pull/5955
>>
>> On Wed, May 29, 2024 at 10:07 AM Istvan Toth  wrote:
>>
>>> Thanks for the detailed reply, Andrew.
>>>
>>> I was also considering default methods, but it turns out that Filter is
>>> not an interface, but an abstract class, so it doesn't apply.
>>>
>>> Children not implementing a marker interface or marker method would
>>> inherit the marker method implementation from the closest parent the same
>>> way they would inherit the marker interface, so I think they are equivalent
>>> in this aspect, too.
>>>
>>> I think that marker interface(s) and overridable non-abstract getter(s)
>>> in Filter are mostly equivalent from both logical and source compatibility
>>> aspects.
>>> The only difference is that the marker interfaces cannot be removed in a
>>> subclass, while the getter can be overridden anywhere, but with well-chosen
>>> defaults it shouldn't be much of a limitation.
>>>
>>> Now that I think about it, we could cache the markers' values in an
>>> array when creating the filter lists, so even the cost of looking them up
>>> doesn't matter as it wouldn't happen in the hot code path.
>>>
>>> Using the marker interfaces is more elegant, and discourages problematic
>>> subclassing, so I am leaning towards that.
>>>
>>> Istvan
>>>
>>> On Wed, May 29, 2024 at 2:30 AM Andrew Purtell 
>>> wrote:
>>>
>>>> Actually source compatibility with default methods would be fine too. I
>>>> forget this is the main reason default methods were invented. The code
>>>> of
>>>> derived classes would not need to be changed, unless the returned value
>>>> of
>>>> the new method should be changed, and this is no worse than having a
>>>> marker
>>>> interface, which would also require code changes to implement
>>>> non-default
>>>> behaviors.
>>>>
>>>> A marker interface does remain as an option. It might make a difference
>>>> in
>>>> chained use cases. Consider a chain of filter instances that mixes
>>>> derived
>>>> code that is unaware of isHinting() and base code that is. The filter
>>>> chain
>>>> can be examined for the presence or absence of the marker interface and
>>>> would not need to rely on every filter in the chain passing return
>>>> values
>>>> of isHinting back.
>>>>
>>>> Marker interfaces can also be added to denote stateful or stateless
>>>> filters, if distinguishing between them would be useful, perhaps down
>>>> the
>>>> road.
>>>>
>>>> On Tue, May 28, 2024 at 5:13 PM Andrew Purtell 
>>>> wrote:
>>>>
>>>> > I think you've clearly put a lot of time into the analysis and it is
>>>> > plausible.
>>>> >
>>>> > Adding isHinting as a default method will preserve binary
>>>> compatibility.
>>>> > Source compatibility for derived custom filters would be broken
>>>> though and
>>>> > that probably prevents this going back into a releasing code line.
>>>> >
>>>> > Have you considered adding a marker interface instead? That would
>>>> preserve
>>>> > both source and binary compatibility. It wouldn't require any changes
>>>> to
>>>> > derived custom filters. A runtime instanceof test would determine if
>>>> the
>>>> > filter is a hinting filter or not. No need for a new method, default
>>>> or
>>>> > otherwise.
>>>> >
>>>> > On Tue, May 28, 2024 at 12:41 AM Istvan Toth 
>>>> wrote:
>>>> >
>>>> >> I have recently opened HBASE-28622
>>>> >> <https://issues.apache.org/jira/

Re: [DISCUSS] Marking Filters based on their hinting / row stateful behaviours.

2024-05-31 Thread Istvan Toth
It turns out that ColumnPaginationFilter is both row stateful and can
return a seek hint.
I have removed the HintingFilter marker from it to preserve the correct
operation.

With this change, ColumnPaginationFilter is no worse off than it was, but
the rest of the hinting
filters will work correctly.

On Fri, May 31, 2024 at 9:32 AM Istvan Toth  wrote:

> This is indeed quite a small change.
> The PR is up at https://github.com/apache/hbase/pull/5955
>
> On Wed, May 29, 2024 at 10:07 AM Istvan Toth  wrote:
>
>> Thanks for the detailed reply, Andrew.
>>
>> I was also considering default methods, but it turns out that Filter is
>> not an interface, but an abstract class, so it doesn't apply.
>>
>> Children not implementing a marker interface or marker method would
>> inherit the marker method implementation from the closest parent the same
>> way they would inherit the marker interface, so I think they are equivalent
>> in this aspect, too.
>>
>> I think that marker interface(s) and overridable non-abstract getter(s)
>> in Filter are mostly equivalent from both logical and source compatibility
>> aspects.
>> The only difference is that the marker interfaces cannot be removed in a
>> subclass, while the getter can be overridden anywhere, but with well-chosen
>> defaults it shouldn't be much of a limitation.
>>
>> Now that I think about it, we could cache the markers' values in an array
>> when creating the filter lists, so even the cost of looking them up doesn't
>> matter as it wouldn't happen in the hot code path.
>>
>> Using the marker interfaces is more elegant, and discourages problematic
>> subclassing, so I am leaning towards that.
>>
>> Istvan
>>
>> On Wed, May 29, 2024 at 2:30 AM Andrew Purtell 
>> wrote:
>>
>>> Actually source compatibility with default methods would be fine too. I
>>> forget this is the main reason default methods were invented. The code of
>>> derived classes would not need to be changed, unless the returned value
>>> of
>>> the new method should be changed, and this is no worse than having a
>>> marker
>>> interface, which would also require code changes to implement non-default
>>> behaviors.
>>>
>>> A marker interface does remain as an option. It might make a difference
>>> in
>>> chained use cases. Consider a chain of filter instances that mixes
>>> derived
>>> code that is unaware of isHinting() and base code that is. The filter
>>> chain
>>> can be examined for the presence or absence of the marker interface and
>>> would not need to rely on every filter in the chain passing return values
>>> of isHinting back.
>>>
>>> Marker interfaces can also be added to denote stateful or stateless
>>> filters, if distinguishing between them would be useful, perhaps down the
>>> road.
>>>
>>> On Tue, May 28, 2024 at 5:13 PM Andrew Purtell 
>>> wrote:
>>>
>>> > I think you've clearly put a lot of time into the analysis and it is
>>> > plausible.
>>> >
>>> > Adding isHinting as a default method will preserve binary
>>> compatibility.
>>> > Source compatibility for derived custom filters would be broken though
>>> and
>>> > that probably prevents this going back into a releasing code line.
>>> >
>>> > Have you considered adding a marker interface instead? That would
>>> preserve
>>> > both source and binary compatibility. It wouldn't require any changes
>>> to
>>> > derived custom filters. A runtime instanceof test would determine if
>>> the
>>> > filter is a hinting filter or not. No need for a new method, default or
>>> > otherwise.
>>> >
>>> > On Tue, May 28, 2024 at 12:41 AM Istvan Toth  wrote:
>>> >
>>> >> I have recently opened HBASE-28622
>>> >> <https://issues.apache.org/jira/browse/HBASE-28622> , which has
>>> turned
>>> >> out
>>> >> to be another aspect of the problem discussed in HBASE-20565
>>> >> <https://issues.apache.org/jira/browse/HBASE-20565> .
>>> >>
>>> >> The problem is discussed in detail in HBASE-20565
>>> >> <https://issues.apache.org/jira/browse/HBASE-20565> , but it boils
>>> down
>>> >> to
>>> >> the API design decision that the filters returning
>>> SEEK_NEXT_USING_HINT
>>> >> rely on filterCell() getting cal

Re: [DISCUSS] Marking Filters based on their hinting / row stateful behaviours.

2024-05-31 Thread Istvan Toth
This is indeed quite a small change.
The PR is up at https://github.com/apache/hbase/pull/5955

On Wed, May 29, 2024 at 10:07 AM Istvan Toth  wrote:

> Thanks for the detailed reply, Andrew.
>
> I was also considering default methods, but it turns out that Filter is
> not an interface, but an abstract class, so it doesn't apply.
>
> Children not implementing a marker interface or marker method would
> inherit the marker method implementation from the closest parent the same
> way they would inherit the marker interface, so I think they are equivalent
> in this aspect, too.
>
> I think that marker interface(s) and overridable non-abstract getter(s) in
> Filter are mostly equivalent from both logical and source compatibility
> aspects.
> The only difference is that the marker interfaces cannot be removed in a
> subclass, while the getter can be overridden anywhere, but with well-chosen
> defaults it shouldn't be much of a limitation.
>
> Now that I think about it, we could cache the markers' values in an array
> when creating the filter lists, so even the cost of looking them up doesn't
> matter as it wouldn't happen in the hot code path.
>
> Using the marker interfaces is more elegant, and discourages problematic
> subclassing, so I am leaning towards that.
>
> Istvan
>
> On Wed, May 29, 2024 at 2:30 AM Andrew Purtell 
> wrote:
>
>> Actually source compatibility with default methods would be fine too. I
>> forget this is the main reason default methods were invented. The code of
>> derived classes would not need to be changed, unless the returned value of
>> the new method should be changed, and this is no worse than having a
>> marker
>> interface, which would also require code changes to implement non-default
>> behaviors.
>>
>> A marker interface does remain as an option. It might make a difference in
>> chained use cases. Consider a chain of filter instances that mixes derived
>> code that is unaware of isHinting() and base code that is. The filter
>> chain
>> can be examined for the presence or absence of the marker interface and
>> would not need to rely on every filter in the chain passing return values
>> of isHinting back.
>>
>> Marker interfaces can also be added to denote stateful or stateless
>> filters, if distinguishing between them would be useful, perhaps down the
>> road.
>>
>> On Tue, May 28, 2024 at 5:13 PM Andrew Purtell 
>> wrote:
>>
>> > I think you've clearly put a lot of time into the analysis and it is
>> > plausible.
>> >
>> > Adding isHinting as a default method will preserve binary compatibility.
>> > Source compatibility for derived custom filters would be broken though
>> and
>> > that probably prevents this going back into a releasing code line.
>> >
>> > Have you considered adding a marker interface instead? That would
>> preserve
>> > both source and binary compatibility. It wouldn't require any changes to
>> > derived custom filters. A runtime instanceof test would determine if the
>> > filter is a hinting filter or not. No need for a new method, default or
>> > otherwise.
>> >
>> > On Tue, May 28, 2024 at 12:41 AM Istvan Toth  wrote:
>> >
>> >> I have recently opened HBASE-28622
>> >> <https://issues.apache.org/jira/browse/HBASE-28622> , which has turned
>> >> out
>> >> to be another aspect of the problem discussed in HBASE-20565
>> >> <https://issues.apache.org/jira/browse/HBASE-20565> .
>> >>
>> >> The problem is discussed in detail in HBASE-20565
>> >> <https://issues.apache.org/jira/browse/HBASE-20565> , but it boils
>> down
>> >> to
>> >> the API design decision that the filters returning SEEK_NEXT_USING_HINT
>> >> rely on filterCell() getting called.
>> >>
>> >> On the other hand, some filters maintain an internal row state that
>> sets
>> >> counters for calls of filterCell(), which interacts with the results of
>> >> previous filters in a filterList.
>> >>
>> >> When filters return different results for filterRowkey(), then filters
>> >> returning  SEEK_NEXT_USING_HINT that have returned false must have
>> >> filterCell() called, otherwise the scan will degenerate into a full
>> scan.
>> >>
>> >> On the other hand, filters that maintain an internal row state must
>> only
>> >> be
>> >> called if all previous filters have INCLUDEed the Cell, otherwise their
>> >> internal stat

[jira] [Resolved] (HBASE-28629) Using JDK17 resulted in regionserver reportForDuty failing

2024-05-30 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28629.
-
Resolution: Information Provided

Hbase 2.1.1 has been end of life for a long time.

Use an active release, like 2.5 or 2.6.

> Using JDK17 resulted in regionserver reportForDuty failing
> --
>
> Key: HBASE-28629
> URL: https://issues.apache.org/jira/browse/HBASE-28629
> Project: HBase
>  Issue Type: Bug
>  Components: netty, regionserver, rpc
>Affects Versions: 2.1.1
> Environment: test environment:
> mem:32G
> hadoop version:2.7.2
> core:40
> hbase version:2.1.1
>Reporter: 高建达
>Priority: Major
> Attachments: image-2024-05-30-16-23-34-561.png, 
> image-2024-05-30-17-00-45-266.png, image-2024-05-30-17-02-18-965.png
>
>
> I am currently using HBASE-2.1.1 version to adapt to JDK17 and have 
> encountered some issues: 1) Java. lang. NoSuchFiledException: modifiers; 2) 
> Unable to make static boolean Java.nio Bits.unaligned() accessible: module 
> java.base does not "opens java.nio" to unnamed moudle; 3) Regionserver 
> HRegionServer: error telling master we are up。 Problem 1 is solved through 
> HBASE-25516 [JDK17] reflective access Field. class. getDeclaredField 
> ("modifiers") not supported - ASF JIRA (apache. org). Problem 2 is solved by 
> adding – add open Java. base/Java. lang=ALL-UNNAMED – add open Java. 
> base/Java. lang. reflect=ALL-UNNAMED – add open Java. base/Java. 
> nio=ALL-UNNAMED parameter. However, there is currently no idea for problem 3. 
> How can I handle this. now master is running  normally.
> regionserver:
> !image-2024-05-30-16-23-34-561.png!
> master:
> !image-2024-05-30-17-02-18-965.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28628) Use Base64.getUrlEncoder().withoutPaddding() in REST tests

2024-05-29 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28628:
---

 Summary: Use Base64.getUrlEncoder().withoutPaddding() in REST tests
 Key: HBASE-28628
 URL: https://issues.apache.org/jira/browse/HBASE-28628
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth


The encoder returned by java.util.Base64.getUrlEncoder() is unsuitable for the 
purpose.

To get an encode that is actually usable in URLs, 
ava.util.Base64.getUrlEncoder().withoutPadding() must be used.

The relevant Java bug is https://bugs.openjdk.org/browse/JDK-8026330 , however 
instead of fixing the encode, Java has decided to keep the broken default, and 
add the .withoutPadding()  method as a way to get a working one.

Due to sheer luck (or rather bad luck), this is not triggered in our tests, but 
anyone using them as a template will be in for a ride when hit by this problem.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28627) REST ScannerModel doesn't support includeStartRow/includeStopRow

2024-05-29 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28627:
---

 Summary: REST ScannerModel doesn't support 
includeStartRow/includeStopRow
 Key: HBASE-28627
 URL: https://issues.apache.org/jira/browse/HBASE-28627
 Project: HBase
  Issue Type: Bug
  Components: REST
 Environment: includeStartRow/includeStopRow should be transparently 
supported.
The current behaviour is limited and confiusing.

The only problem is that adding them may break backwards compatibility.
Need to test if the XML unmarshaller can handle nonexistent fields.

Reporter: Istvan Toth






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28623) Scan with MultiRowRangeFilter very slow

2024-05-29 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28623.
-
Resolution: Won't Fix

> Scan with MultiRowRangeFilter very slow
> ---
>
> Key: HBASE-28623
> URL: https://issues.apache.org/jira/browse/HBASE-28623
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.4.14
>Reporter: chaijunjie
>Priority: Major
>
> when  scan a big table({*}more than 500 regions{*}) with 
> {*}MultiRowRangeFilter{*}, it is very slow...
> it seems to {*}scan all regions{*}...
> for example:
> we scan 3 ranges..
> startRow: 097_28220_ stopRow: 097_28220_~
> startRow: 098_28221_ stopRow: 098_28221_~
> startRow: 099_28222_ stopRow: 099_28222_~
> and enable TRACE log in hbase client
> we find there are too many scans
> {code:java}
>  1713987938886.93886cc52eea6200518feb7ebce7e1a4.', STARTKEY => '', ENDKEY => 
> '000_2147757104_4641'}
>     行 139: 1716188377677.a2e0d724dd73196d81ecbfb58c77b611.', STARTKEY => 
> '000_2147757104_4641', ENDKEY => '000_21
>     行 162: 1716188377677.b377942c957c300286afcb763f0dd338.', STARTKEY => 
> '000_2148042968_3081', ENDKEY => '000_21
>     行 185: 1714319482833.4e5bfdfb6f2bcf381681726429bf2adb.', STARTKEY => 
> '000_2148518165_26648', ENDKEY => '000_3
>     行 197: 1715031138715.36bac123de7eec3c4c08a775d592f387.', STARTKEY => 
> '000_389786_4001', ENDKEY => '000_434112
>     行 211: 1715031138715.2dc9f1a78f532454ce8381ff9738e93e.', STARTKEY => 
> '000_434112_88683', ENDKEY => '000~'}
>     行 225: 1713890960521.94e341a71b5b3e98569809d7a0f4354e.', STARTKEY => 
> '000~', ENDKEY => '001_2147735632_4395'}
>     行 250: 1716239834572.3061c9f457b91ed40c938d801f8cac5f.', STARTKEY => 
> '001_2147735632_4395', ENDKEY => '001_21
>     行 264: 1716239834572.e56a4d6aae43b5d42561e4ee6f0e3132.', STARTKEY => 
> '001_2148043057_5975', ENDKEY => '001_23
>     行 278: 1714252181329.5de683912a8120bae9f37833fb286a30.', STARTKEY => 
> '001_238065_2439', ENDKEY => '001_400433
>     行 292: 1714858026179.941a4921968267374876b52fdb33a1d7.', STARTKEY => 
> '001_400433_45599', ENDKEY => '001_43429
>     行 306: 1714858026179.16e7de83bd7944e9d23b3568b14eaf9c.', STARTKEY => 
> '001_434296_34588', ENDKEY => '001~'}
>     行 331: 1714082282269.6853c99dc6d17b2340e04307e5492d58.', STARTKEY => 
> '001~', ENDKEY => '002_2147741550_785'}
>     行 345: 1714463331546.80f60ef11f1d337bcc09d7f24d390b28.', STARTKEY => 
> '002_2147741550_785', ENDKEY => '002_214
>     行 359: 1714463331546.9281d964d08863aab2745f8331c148ad.', STARTKEY => 
> '002_2148386148_27094', ENDKEY => '002_4
>     行 373: 1714685085875.2affd725c347399ad8c77eabd0a5d4f2.', STARTKEY => 
> '002_400185_74884', ENDKEY => '002_45861
>     行 387: 1714685085875.910cbc03d1d8571f1eda21e3441f9359.', STARTKEY => 
> '002_458618_25467', ENDKEY => '002~'}
>     行 401: 1714065682984.2358541c9c8d3f2f8c4496a1fd350c6c.', STARTKEY => 
> '002~', ENDKEY => '003_2147739809_4985'}
>     行 415: 1716251410111.c60662b46cabd2cd0638d39796f11827.', STARTKEY => 
> '003_2147739809_4985', ENDKEY => '003_21
>     行 429: 1716251410111.016507ab001379f86acdf0c40a5b93be.', STARTKEY => 
> '003_2148024128_3054', ENDKEY => '003_21
>     行 443: 1714348539371.e7a41938549f7384192edd059d7e4a3e.', STARTKEY => 
> '003_2148386097_25973', ENDKEY => '003_3
>     行 457: 1714925889818.a6c3c09cddd2c3e359c0f1497a302d6d.', STARTKEY => 
> '003_396959_86147', ENDKEY => '003_45861
>     行 471: 1714925889818.eb98caf696d333714fc917c95839ea8e.', STARTKEY => 
> '003_458619_61964', ENDKEY => '003~'}
>     行 485: 1713919439849.22b315f87ea850b2f1b052ccacf40a5c.', STARTKEY => 
> '003~', ENDKEY => '004_2147804164_6378'}
>     行 499: 1714553829364.ee60c3e63e43e18487afa3ebd9db7890.', STARTKEY => 
> '004_2147804164_6378', ENDKEY => '004_21
>     行 516: 1714553829364.30e09f836793166fb64f1799b63c56fc.', STARTKEY => 
> '004_2148363241_1674', ENDKEY => '004_40
>     行 530: 1714831210652.05d86d46eb1717408f7b6d189c711b6d.', STARTKEY => 
> '004_400633_98138', ENDKEY => '004_45953
>     行 544: 1714831210652.7ebc65054e3819ff8f3848108f07a1da.', STARTKEY => 
> '004_459534_8710', ENDKEY => '004~'}
>     行 558: 1714049632767.4eb7c320ce17d5e6c79d37ad1235cd56.', STARTKEY => 
> '004~', ENDKEY => '005_2147868266_5368'}
>     行 572: 1714364810854.f65ec5a2f28317951dab5e241d2e100f.', STARTKEY => 
> '005_2147868266_5368', ENDKEY => '005_21
>     行 586: 

[jira] [Created] (HBASE-28626) MultiRowRangeFilter deserialization fails in org.apache.hadoop.hbase.rest.model.ScannerModel

2024-05-29 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28626:
---

 Summary: MultiRowRangeFilter deserialization fails in 
org.apache.hadoop.hbase.rest.model.ScannerModel
 Key: HBASE-28626
 URL: https://issues.apache.org/jira/browse/HBASE-28626
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


org.apache.hadoop.hbase.filter.MultiRowRangeFilter.BasicRowRange has several 
getters that have no corresponing setters. 

jackson serializes the pseudo-getters' values, but when it tries to 
deserialize, there are no corresponding setters and it errors out.

{noformat}
com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized 
field "ascendingOrder" (class 
org.apache.hadoop.hbase.filter.MultiRowRangeFilter$RowRange), not marked as 
ignorable (4 known properties: "startRow", "startRowInclusive", "stopRow", 
"stopRowInclusive"])
 at [Source: 
(String)"{"type":"FilterList","op":"MUST_PASS_ALL","comparator":null,"value":null,"filters":[{"type":"MultiRowRangeFilter","op":null,"comparator":null,"value":null,"filters":null,"limit":null,"offset":null,"family":null,"qualifier":null,"ifMissing":null,"latestVersion":null,"minColumn":null,"minColumnInclusive":null,"maxColumn":null,"maxColumnInclusive":null,"dropDependentColumn":null,"chance":null,"prefixes":null,"ranges":[{"startRow":"MQ==","startRowInclusive":true,"stopRow":"MQ==","stopRowInclusive":t"[truncated
 553 chars]; line: 1, column: 526] (through reference chain: 
org.apache.hadoop.hbase.rest.model.ScannerModel$FilterModel["filters"]->java.util.ArrayList[0]->org.apache.hadoop.hbase.rest.model.ScannerModel$FilterModel["ranges"]->java.util.ArrayList[0]->org.apache.hadoop.hbase.filter.MultiRowRangeFilter$RowRange["ascendingOrder"])
at 
com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61)
at 
com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127)
at 
com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2036)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1678)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:320)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177)
at 
com.fasterxml.jackson.databind.deser.std.CollectionDeserializer._deserializeFromArray(CollectionDeserializer.java:355)
at 
com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:244)
at 
com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:28)
at 
com.fasterxml.jackson.databind.deser.impl.FieldProperty.deserializeAndSet(FieldProperty.java:138)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:314)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177)
at 
com.fasterxml.jackson.databind.deser.std.CollectionDeserializer._deserializeFromArray(CollectionDeserializer.java:355)
at 
com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:244)
at 
com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:28)
at 
com.fasterxml.jackson.databind.deser.impl.FieldProperty.deserializeAndSet(FieldProperty.java:138)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:314)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177)
at 
com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
at 
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4674)
at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3629)
at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3597)
at 
org.apache.hadoop.hbase.rest.model.ScannerModel.

Re: [DISCUSS] Marking Filters based on their hinting / row stateful behaviours.

2024-05-29 Thread Istvan Toth
Thanks for the detailed reply, Andrew.

I was also considering default methods, but it turns out that Filter is not
an interface, but an abstract class, so it doesn't apply.

Children not implementing a marker interface or marker method would
inherit the marker method implementation from the closest parent the same
way they would inherit the marker interface, so I think they are equivalent
in this aspect, too.

I think that marker interface(s) and overridable non-abstract getter(s) in
Filter are mostly equivalent from both logical and source compatibility
aspects.
The only difference is that the marker interfaces cannot be removed in a
subclass, while the getter can be overridden anywhere, but with well-chosen
defaults it shouldn't be much of a limitation.

Now that I think about it, we could cache the markers' values in an array
when creating the filter lists, so even the cost of looking them up doesn't
matter as it wouldn't happen in the hot code path.

Using the marker interfaces is more elegant, and discourages problematic
subclassing, so I am leaning towards that.

Istvan

On Wed, May 29, 2024 at 2:30 AM Andrew Purtell  wrote:

> Actually source compatibility with default methods would be fine too. I
> forget this is the main reason default methods were invented. The code of
> derived classes would not need to be changed, unless the returned value of
> the new method should be changed, and this is no worse than having a marker
> interface, which would also require code changes to implement non-default
> behaviors.
>
> A marker interface does remain as an option. It might make a difference in
> chained use cases. Consider a chain of filter instances that mixes derived
> code that is unaware of isHinting() and base code that is. The filter chain
> can be examined for the presence or absence of the marker interface and
> would not need to rely on every filter in the chain passing return values
> of isHinting back.
>
> Marker interfaces can also be added to denote stateful or stateless
> filters, if distinguishing between them would be useful, perhaps down the
> road.
>
> On Tue, May 28, 2024 at 5:13 PM Andrew Purtell 
> wrote:
>
> > I think you've clearly put a lot of time into the analysis and it is
> > plausible.
> >
> > Adding isHinting as a default method will preserve binary compatibility.
> > Source compatibility for derived custom filters would be broken though
> and
> > that probably prevents this going back into a releasing code line.
> >
> > Have you considered adding a marker interface instead? That would
> preserve
> > both source and binary compatibility. It wouldn't require any changes to
> > derived custom filters. A runtime instanceof test would determine if the
> > filter is a hinting filter or not. No need for a new method, default or
> > otherwise.
> >
> > On Tue, May 28, 2024 at 12:41 AM Istvan Toth  wrote:
> >
> >> I have recently opened HBASE-28622
> >> <https://issues.apache.org/jira/browse/HBASE-28622> , which has turned
> >> out
> >> to be another aspect of the problem discussed in HBASE-20565
> >> <https://issues.apache.org/jira/browse/HBASE-20565> .
> >>
> >> The problem is discussed in detail in HBASE-20565
> >> <https://issues.apache.org/jira/browse/HBASE-20565> , but it boils down
> >> to
> >> the API design decision that the filters returning SEEK_NEXT_USING_HINT
> >> rely on filterCell() getting called.
> >>
> >> On the other hand, some filters maintain an internal row state that sets
> >> counters for calls of filterCell(), which interacts with the results of
> >> previous filters in a filterList.
> >>
> >> When filters return different results for filterRowkey(), then filters
> >> returning  SEEK_NEXT_USING_HINT that have returned false must have
> >> filterCell() called, otherwise the scan will degenerate into a full
> scan.
> >>
> >> On the other hand, filters that maintain an internal row state must only
> >> be
> >> called if all previous filters have INCLUDEed the Cell, otherwise their
> >> internal state will be off. (This still has caveats, as described in
> >> HBASE-20565 <https://issues.apache.org/jira/browse/HBASE-20565>)
> >>
> >> In my opinion, the current code from HBASE-20565
> >> <https://issues.apache.org/jira/browse/HBASE-20565> strikes a bad
> balance
> >> between features, as while it fixes some use cases for row stateful
> >> filters, it also often negates the performance benefits of the filters
> >> providing hints, which in practice makes them unusable in many filter
> list
&

[DISCUSS] Marking Filters based on their hinting / row stateful behaviours.

2024-05-28 Thread Istvan Toth
I have recently opened HBASE-28622
 , which has turned out
to be another aspect of the problem discussed in HBASE-20565
 .

The problem is discussed in detail in HBASE-20565
 , but it boils down to
the API design decision that the filters returning SEEK_NEXT_USING_HINT
rely on filterCell() getting called.

On the other hand, some filters maintain an internal row state that sets
counters for calls of filterCell(), which interacts with the results of
previous filters in a filterList.

When filters return different results for filterRowkey(), then filters
returning  SEEK_NEXT_USING_HINT that have returned false must have
filterCell() called, otherwise the scan will degenerate into a full scan.

On the other hand, filters that maintain an internal row state must only be
called if all previous filters have INCLUDEed the Cell, otherwise their
internal state will be off. (This still has caveats, as described in
HBASE-20565 )

In my opinion, the current code from HBASE-20565
 strikes a bad balance
between features, as while it fixes some use cases for row stateful
filters, it also often negates the performance benefits of the filters
providing hints, which in practice makes them unusable in many filter list
combinations.

Without completely re-designing the filter system, I think that the best
solution would be adding a method to distinguish the filters that can
return hints from the rest of them. (This was also suggested in HBASE-20565
 , but it was not
implemented)

In theory, we have four combinations of hinting and row stateful filters,
but currently we have no filters that are both hinting and row stateful,
and I don't think that there is valid use case for those. The ones that are
neither hinting nor stateful could be handled as either, but treating them
as non-hinting seems faster.

Once we have that, we can improve the filterList behaviour a lot:
- in filterRowKey(), if any hinting filter returns false, then we could
return false
- in filterCell(), rather than returning on the first non-include result,
we could process the remaining hinting filters, while skipping the
non-hinting ones.

The code changes are minimal, we just need to add a new method like
isHinting() to the Filter class, and change the above two methods.

We could add this even in 2.5, by defaulting isHinting() to return false in
the Filter class, which would preserve the current API and behaviour for
existing custom filters.

I was looking at it from the AND filter perspective, but if needed, similar
changes could be made to the OR filter.

What do you think ?
Is this a good idea ?

Istvan


[jira] [Created] (HBASE-28622) FilterListWithAND can swallow SEEK_NEXT_USING_HINT

2024-05-27 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28622:
---

 Summary: FilterListWithAND can swallow SEEK_NEXT_USING_HINT
 Key: HBASE-28622
 URL: https://issues.apache.org/jira/browse/HBASE-28622
 Project: HBase
  Issue Type: Bug
  Components: Filters
Reporter: Istvan Toth
Assignee: Istvan Toth


org.apache.hadoop.hbase.filter.FilterListWithAND.filterRowKey(Cell) will return 
true if ANY of the filters returns true for Filter#filterRowKey().

However, the SEEK_NEXT_USING_HINT mechanism relies on filterRowKey() returning 
false, so that filterCell() can return SEEK_NEXT_USING_HINT.

If none of the filters matches, but one of them returns true for 
filterRowKey(), then the  filter(s) that returned to false, so that they can 
return SEEK_NEXT_USING_HINT in filterCell() never get a chance to return 
SEEK_NEXT_USING_HINT, and instead of seeking, FilterListWithAND will do very 
slow full scan.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28621) PrefixFilter should use SEEK_NEXT_USING_HINT

2024-05-27 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28621:
---

 Summary: PrefixFilter should use SEEK_NEXT_USING_HINT 
 Key: HBASE-28621
 URL: https://issues.apache.org/jira/browse/HBASE-28621
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: Istvan Toth
Assignee: Istvan Toth


Looking at PrefixFilter, I have noticed that it doesn't use the 
SEEK_NEXT_USING_HINT mechanism.

AFAICT, we could safely set the the prefix as a next row hint, which could be a 
huge performance win.

Of course, ideally the user would set the scan startRow to the prefix, which 
avoids the problem, if the user doesn't, then we effectively do a full scan 
until the prefix is reached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28613) Use streaming when marshalling protobuf REST output

2024-05-23 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28613:
---

 Summary: Use streaming when marshalling protobuf REST output
 Key: HBASE-28613
 URL: https://issues.apache.org/jira/browse/HBASE-28613
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


We are currently marshalling protobuf into a byte array, and then send that to 
the client.
This is both slow and memory intensive.

Using streaming instead results in huge perf improvements. In my bechnmark, 
both the wall clock time was almost halved, while the REST server CPU usage was 
reduced by 40%.

wall clock: 120s ->65s
Total REST CPU: 300s -> 180s




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28501) Support non-SPNEGO authentication methods and implement session handling in REST java client library

2024-05-21 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28501.
-
Resolution: Fixed

> Support non-SPNEGO authentication methods and implement session handling in 
> REST java client library
> 
>
> Key: HBASE-28501
> URL: https://issues.apache.org/jira/browse/HBASE-28501
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>    Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.4.18, 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> The current java client only supports the SPENGO authentication method.
> This does not support the case when an application proxy like Apache Knox 
> performs AAA conversion from BASIC/DIGEST to kerberos authentication.
> Add support for BASIC username/password auth the client.
> Generally, the authentication code in the client looks quite backwards, it 
> seems that most of the kerberos / auth cookie code duplicates HttpClient 
> functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
> letting it handle authentication by itself would be a better and more generic 
> solution.
> -Also add support for specifying a prefix for the URL path.-



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28553) SSLContext not used for Kerberos auth negotiation in rest client

2024-05-16 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28553.
-
Resolution: Duplicate

Fix included in HBASE-28501

> SSLContext not used for Kerberos auth negotiation in rest client
> 
>
> Key: HBASE-28553
> URL: https://issues.apache.org/jira/browse/HBASE-28553
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>
> The included REST client now supports specifying a Trust store for SSL 
> connections.
> However, the configured SSL library is not used when the Kerberos negotation 
> is performed by the Hadoop library, which uses its own client.
> We need to set up the Hadoop auth process to use the same SSLContext.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28597) Support native Cell format for protobuf in REST server and client

2024-05-16 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28597:
---

 Summary: Support native Cell format for protobuf in REST server 
and client
 Key: HBASE-28597
 URL: https://issues.apache.org/jira/browse/HBASE-28597
 Project: HBase
  Issue Type: Wish
  Components: REST
Reporter: Istvan Toth


REST currently uses its own (outdated) CellSetModel format for transferring 
cells.

This is fine for XML and JSON, which are slow anyway and even slower handling 
byte arrays, and is expected to be used in cases where a simple  client code 
which does not depend on the hbase java libraries is more important than raw 
performance.

However, we perform the same marshalling and unmarshalling when we are using 
protobuf, which doesn't really add value, but eats up resources.

We could add a new encoding for Results which uses the native cell format in 
protobuf, by simply dumping the binary cell bytestreams into the REST response 
body.

This should save a lot of resources on the server side, and would be either 
faster, or the same speed on the client.

As an additional advantage, the resulting Cells would be of native HBase Cell 
type instead of the REST Cell type.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Dropping Java 8 support in HBase 3

2024-05-07 Thread Istvan Toth
I'd expect the automated backporting process to only work for fairly
trivial patches which do not use protobuf, etc.
More involved patches would need manual work anyway.

If we want to make sure that everything compiles with JDK8, it's easier to
just compile the master branch with JDK8 (along with 11/17),
and fail the CI check if it doesn't.

We need to find a balance between using the new Java features and keeping
the workload manageable.
We could keep compiling master with JDK8 for a year or two, and when
activity on the 2.x branches tapers off, we could remove that restriction.


On Tue, May 7, 2024 at 3:56 PM Andrew Purtell 
wrote:

> I also like the suggestion to have CI help us here too.
>
> > On May 7, 2024, at 9:42 AM, Bryan Beaudreault 
> wrote:
> >
> > I'm nervous about creating more big long-term divergences between the
> > branches. Already I sometimes get caught up on HBaseTestingUtil vs
> > HBaseTestingUtility. And we all know the burden of maintaining the old
> > HTable impl.
> >
> > I'm not sure if this is a useful suggestion since it would require
> someone
> > to do a good deal of work, but I wonder if we could automate backport
> > testing a bit. Our yetus checks already check the patch, maybe it could
> > apply the patch to branch-2. This would increase the cost of master
> branch
> > PRs but maybe speed us up overall.
> >
> >> On Tue, May 7, 2024 at 9:21 AM 张铎(Duo Zhang) 
> wrote:
> >>
> >> The problem is that, if we only compile and run tests on JDK11+, the
> >> contributors may implicitly use some JDK11+ only features and
> >> introduce difference when backporting to branch-2.x.
> >>
> >> Maybe a possible policy is that, once a patch should go into
> >> branch-2.x too, before mering the master PR, we should make sure the
> >> contributor open a PR for branch-2.x too, so we can catch the
> >> differences between the 2 PRs, and whether to align them.
> >>
> >> WDYT?
> >>
> >> Thanks.
> >>
> >> Andrew Purtell  于2024年5月7日周二 20:20写道:
> >>>
> >>> I don’t expect 2.x to wind down for up to several more years. We will
> be
> >>> still using it in production at my employer for a long time and I would
> >>> continue my role as RM for 2.x as needed. HBase 3 is great but not GA
> yet
> >>> and then some users will want to wait one to a couple years before
> >> adopting
> >>> the new major version, especially if migration is not seamless. (We
> even
> >>> faced breaking changes in a minor upgrade from 2.4 to 2.5 that brought
> >> down
> >>> a cluster during a rolling upgrade, so there should be no expectation
> of
> >> a
> >>> seamless upgrade.) My plan is to continue releasing 2.x until, like
> with
> >>> 1.x, the commits to branch-2 essentially stop, or until the PMC stops
> >>> allowing release of the candidates.
> >>>
> >>> Perhaps we do not need to do a total ban on use of 11 features. We
> should
> >>> allow a case by case discussion. We can minimize their scope and even
> >>> potentially offer multiversion support like we do with Unsafe access
> >>> utility classes in hbase-thirdparty. There are no planned uses of new
> 11+
> >>> APIs and features now anyhow.
> >>>
> >>>
> >>> On Tue, May 7, 2024 at 7:40 AM 张铎(Duo Zhang) 
> >> wrote:
> >>>
> >>>> For me I think Istvan's plan is also acceptable.
> >>>>
> >>>> So in conclusion, we should
> >>>>
> >>>> 1. Jump to JDK11/JDK17(we could start a new thread to discuss this,
> >>>> maybe also on the user mailing list)
> >>>> 2. Claim and also make sure 3.x does not work with JDK8
> >>>> 3. Introduce a policy to only allow JDK8 features on master and
> >>>> branch-3.x for a while(maybe still keep the release version as 8?)
> >>>>
> >>>> Any other suggestions?
> >>>>
> >>>> Thanks.
> >>>>
> >>>> Istvan Toth  于2024年4月30日周二 12:45写道:
> >>>>>
> >>>>> Spring is a good argument for JDK17.
> >>>>>
> >>>>> Duo's suggestion is a great step forward, firmly stating that JDK8
> >> is not
> >>>>> officially supported solves most of our expected future CVE problems.
> >>>>>
> >>>>> However, I think that ripping off the bandaid, and making sure that
> >>>&g

[jira] [Resolved] (HBASE-28556) Reduce memory copying in Rest server when serializing CellModel to Protobuf

2024-05-07 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28556.
-
Fix Version/s: 2.4.18
   3.0.0
   2.7.0
   2.6.1
   2.5.9
   Resolution: Fixed

Committed to all active branches.
Thanks for the review [~zhangduo].

> Reduce memory copying in Rest server when serializing CellModel to Protobuf
> ---
>
> Key: HBASE-28556
> URL: https://issues.apache.org/jira/browse/HBASE-28556
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.4.18, 3.0.0, 2.7.0, 2.6.1, 2.5.9
>
>
> The REST server does a lot of unneccessary coping, which could be avoided at 
> least for protobuf encoding.
> - -It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses 
> the client API, so it should never encounter ByteBuffer backed cells.-
> - It clones everything from the cells (sometimes multiple times) before 
> serializing to protbuf.
> We could mimic the structure in Cell, with array, offset and length for each 
> field, in CellModel and use the appropriate protobuf setters to avoid the 
> extra copies.
> There may or may not be a way to do the same for JSON and XML via jax-rs, I 
> don't know the frameworks well enough to tell, but if not, we could just do 
> the copying in the getters for them, which would not make things worse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28561) Add separate fields for column family and qualifier in REST message format

2024-05-01 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28561:
---

 Summary: Add separate fields for column family and qualifier in 
REST message format
 Key: HBASE-28561
 URL: https://issues.apache.org/jira/browse/HBASE-28561
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


The current format uses the archaic column field, which requires extra 
processing and copying at both the server and client side.

We need to:
- Add a version field to the requests, to be enabled by clients that support 
the new format
- Add the new fields to the JSON, XML and protobuf formats, and logic to use 
them.

This should be doable in a backwards-compatible manner, with the server falling 
back to the old format if it receives an unversioned request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28523) Use a single get call in REST multiget endpoint

2024-04-30 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28523.
-
Resolution: Fixed

Committed to all active branches.

> Use a single get call in REST multiget endpoint
> ---
>
> Key: HBASE-28523
> URL: https://issues.apache.org/jira/browse/HBASE-28523
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 2.4.18, 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> The REST multiget endpoint currently issues a separate HBase GET operation 
> for each key.
> Use the method that accepts a list of keys instead.
> That should be faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Dropping Java 8 support in HBase 3

2024-04-29 Thread Istvan Toth
Spring is a good argument for JDK17.

Duo's suggestion is a great step forward, firmly stating that JDK8 is not
officially supported solves most of our expected future CVE problems.

However, I think that ripping off the bandaid, and making sure that HBase 3
does not work with Java 8 would be better.
It's easier to accept such a change in a major version than in a minor
version.

IMO users that are so conservative that they are still using Java 8 are
unlikely to be first movers to a new major release anyway.

I think that the following upgrade path would optimal:

- User stays on (supported) Hbase 2.x until ready to upgrade Java
- User upgrades to Java 11/17 with the same HBase
- User upgrades to Hbase 3.x

As noted, we will need to support 2.x for some time anyway (just like 1.x
was supported for a long time).

As for the backporting issues:
We could make it a policy to avoid using Java 11+ features in Hbase code
until 2.x supports winds down.
This has worked quite well for Phoenix with Java 7 / Java 8.









On Tue, Apr 30, 2024 at 3:59 AM 张铎(Duo Zhang)  wrote:

> AFAIK spring 6 and spring-boot 3 have jumped to java17 directly, so if we
> want to upgrade, I also suggest that we jump to java 17 directly.
>
> While upgrading to java 17 can reduce our compatibility work on branch-3+,
> but consider the widely usage for java 8, I think we still need to support
> branch-2 for several years, then this will increase the compatibility work
> as the code between branch-3+ and branch-2.x will be more and more
> different.
>
> So for me, a workable solution is
>
> 1. We first claim that branch-3+ will move minimum java support to 11 or
> 17.
> 2. Start to move the compilation to java 11 or 17, but still keep release
> version 8, and still keep the pre commit pipeline to run java 8, 11, 17, to
> minimum our compatibility work before we have the first 3.0.0 release.
> 3. Cut branch-3.0 and release 3.0.0, so we have a 3.0.0 release, actually
> which can still run on java 8, so it will be easier for our users to
> upgrade to 3.x and reduce our pressure on maintaining branch-2, especially
> do not need to back port new features there.
> 4. Start to move the release version to 11 or 17 on branch-3+, and prepare
> for 3.1.0 release, which will be the real 11 or 17 only release.
>
> Thanks.
>
> Bryan Beaudreault 于2024年4月30日 周二02:54写道:
>
> > I am a huge +1 for dropping java8.
> >
> > One reason I would suggest going to 17 is that it seems so hard to change
> > these things given our long development cycle on major releases. There
> are
> > some nice language features in 17, but more importantly is that the
> initial
> > release of java11 was released 6 years ago and java17 released 3 years.
> > Java21 is already released as well. So I could see java17 being widely
> > available enough that we could jump "in the middle" rather than to the
> > oldest LTS.
> >
> > I will say that we're already running java 21 on all of our hbase/hadoop
> in
> > prod (70 clusters, 7k regionservers). I know not every organization can
> be
> > that aggressive, and I wouldn't suggest jumping to 21 in the codebase.
> Just
> > pointing it out in terms of basic support already existing and being
> > stable.
> >
> > On Mon, Apr 29, 2024 at 2:33 PM Andrew Purtell  >
> > wrote:
> >
> > > I also agree that mitigation of security problems in dependencies will
> be
> > > increasingly difficult, as we cannot expect our dependencies to
> continue
> > to
> > > support Java 8. They might, but as time goes on it is less likely.
> > >
> > > A minimum of Java 11 makes a lot of sense. This is where the center of
> > > gravity of the Java ecosystem is, probably.
> > >
> > > A minimum of 17 is aggressive and I don’t see the point unless there
> is a
> > > feature in 17 that we would like to base an improvement on.
> > >
> > > > On Apr 29, 2024, at 1:23 PM, chrajeshbab...@gmail.com wrote:
> > > >
> > > > Hi!
> > > >
> > > > With 3.0 on the horizon, we could look into bumping the minimum
> > required
> > > > Java version for HBase.
> > > >
> > > > The last discussion I could find was four years ago, when dropping
> 8.0
> > > > support was rejected.
> > > >
> > > > https://lists.apache.org/thread/ph8xry0x37cvjj89fp2jk1k48yb7gs46
> > > >
> > > > Now it's four years later, and the end of OpenJDK support for Java 8
> > and
> > > 11
> > > > are much closer.
> > > > (Oracle public support is so short that I consider that irrelevant)
> > > >
> > > > Some critical dependencies (like Jetty) have ended even regular
> > security
> > > > support for Java 8.
> > > >
> > > > By supporting Java 8 we are alse limiting ourselves to using an
> already
> > > 10
> > > > year old Java release, ignoring any developments in the language.
> > > >
> > > > My take is that with the current dogmatic emphasis on CVE mitigation
> > the
> > > > benefits of bumping the required JDK version outweigh the benefits
> even
> > > for
> > > > the legacy install base, especially it's getting harder and 

[jira] [Created] (HBASE-28556) Reduce memory copying in Rest server when converting CellModel to Protobuf

2024-04-29 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28556:
---

 Summary: Reduce memory copying in Rest server when converting 
CellModel to Protobuf
 Key: HBASE-28556
 URL: https://issues.apache.org/jira/browse/HBASE-28556
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


The REST server does a lot of unneccessary coping, which could be avoided at 
least for protobuf encoding.

- It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses the 
client API, so it sjpuld never encounter ByteBuffer backed cells.
- It clones everything from the cells (sometimes multiple times) before 
serializing to protbuf.

We could mimic the structure in Cell, with array, offset and length for each 
field, and use the appropriate protobuf setters to avoid the extra copies.

There may or may not be a way to do the same for JSON and XML via jax-rs, I 
don't know the frameworks well enough to tell, but if not, we could just do the 
copying in the getters for them.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28553) SSLContext not used for Kerberos auth negotiation in rest client

2024-04-25 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28553:
---

 Summary: SSLContext not used for Kerberos auth negotiation in rest 
client
 Key: HBASE-28553
 URL: https://issues.apache.org/jira/browse/HBASE-28553
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


The included REST client now supports specifying a Trust store for SSL 
connections.
However, the configured SSL library is not used when the Kerberos negotation is 
performed by the Hadoop library, which uses its own client.

We need to set up the Hadoop auth process to use the same SSLContext.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] The second release candidate for 2.6.0 (RC3) is available

2024-04-24 Thread Istvan Toth
I can merge https://github.com/apache/hbase/pull/5852 as soon as I get a
review on it for the above issue.

best regards
Istvan


On Thu, Apr 25, 2024 at 4:14 AM 张铎(Duo Zhang)  wrote:

> HBASE-25818 introduced a breaking change, it removed the SCAN_FILTER
> field, and introduced two new fields in
> org.apache.hadoop.hbase.rest.Constants.
>
> But unfortunately, org.apache.hadoop.hbase.rest.Constants is IA.Public
> so we can not remove its field without a deprecation cycle...
>
> Bryan Beaudreault  于2024年4月25日周四 09:21写道:
> >
> > Please vote on this Apache hbase release candidate,
> > hbase-2.6.0RC3
> >
> > The VOTE will remain open for at least 72 hours.
> >
> > [ ] +1 Release this package as Apache hbase 2.6.0
> > [ ] -1 Do not release this package because ...
> >
> > The tag to be voted on is 2.6.0RC3:
> >
> >   https://github.com/apache/hbase/tree/2.6.0RC3
> >
> > This tag currently points to git reference
> >
> >   df3343989d02966752ce7562546619f86a36169a
> >
> > The release files, including signatures, digests, as well as CHANGES.md
> > and RELEASENOTES.md included in this RC can be found at:
> >
> >   https://dist.apache.org/repos/dist/dev/hbase/2.6.0RC3/
> >
> > Maven artifacts are available in a staging repository at:
> >
> >
> https://repository.apache.org/content/repositories/orgapachehbase-1540/
> >
> > Maven artifacts for hadoop3 are available in a staging repository at:
> >
> >
> https://repository.apache.org/content/repositories/orgapachehbase-1541/
> >
> > Artifacts were signed with the 0x74EFF462 key which can be found in:
> >
> >   https://downloads.apache.org/hbase/KEYS
> >
> > To learn more about Apache hbase, please see
> >
> >   http://hbase.apache.org/
> >
> > Thanks,
> > Your HBase Release Manager
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com 
[image: Cloudera] 
[image: Cloudera on Twitter]  [image:
Cloudera on Facebook]  [image: Cloudera
on LinkedIn] 
--
--


[jira] [Created] (HBASE-28550) Provide working benchmark tool for REST server

2024-04-24 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28550:
---

 Summary: Provide working benchmark tool for REST server
 Key: HBASE-28550
 URL: https://issues.apache.org/jira/browse/HBASE-28550
 Project: HBase
  Issue Type: Umbrella
  Components: REST
Reporter: Istvan Toth


This is an umbrella ticket for the individual changes.

The goal is to be able to performance test the rest server performance either 
directly or via Knox or other proxies / load balancers, and compare this with 
the results when going via the native client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28544) org.apache.hadoop.hbase.rest.PerformanceEvaluation does not evaluate REST performance

2024-04-22 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28544:
---

 Summary: org.apache.hadoop.hbase.rest.PerformanceEvaluation does 
not evaluate REST performance
 Key: HBASE-28544
 URL: https://issues.apache.org/jira/browse/HBASE-28544
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth


org.apache.hadoop.hbase.rest.PerformanceEvaluation only uses the REST interface 
for Admin tasks like creating tables.

All data access is done via the native RPC client, which makes the whole tool a 
big red herring.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28543) org.apache.hadoop.hbase.rest.PerformanceEvaluation does not read hbase-site.xml

2024-04-22 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28543:
---

 Summary: org.apache.hadoop.hbase.rest.PerformanceEvaluation does 
not read hbase-site.xml
 Key: HBASE-28543
 URL: https://issues.apache.org/jira/browse/HBASE-28543
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


I am trying to run org.apache.hadoop.hbase.rest.PerformanceEvaluation.
It cannot connect to the ZK quorum specified in hbase-site.xml.

It implements the Configurable interface incorrectly.
Fixing the Configurable implementation results in connecing to ZK properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28540) Cache Results in org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner

2024-04-22 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28540:
---

 Summary: Cache Results in 
org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner
 Key: HBASE-28540
 URL: https://issues.apache.org/jira/browse/HBASE-28540
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


The implementation of org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner
is very inefficient, as the standard next() methods makes separate a http 
request for each row.

Performance can be improved by not specifying the row count in the REST call 
and caching the returned Results.

Chunk size can still be influenced by scan.setBatch();




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28500) Rest Java client library assumes stateless servers

2024-04-17 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28500.
-
Resolution: Fixed

> Rest Java client library assumes stateless servers
> --
>
> Key: HBASE-28500
> URL: https://issues.apache.org/jira/browse/HBASE-28500
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> The Rest Java client library accepts a list of rest servers, and does random 
> load balancing between them for each request.
> This does not work for scans, which do have state on the rest server instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-28500) Rest Java client library assumes stateless servers

2024-04-16 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth reopened HBASE-28500:
-

The spotbugs warning makes daily bugs go red.
Gonna push an addendum for it.

> Rest Java client library assumes stateless servers
> --
>
> Key: HBASE-28500
> URL: https://issues.apache.org/jira/browse/HBASE-28500
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> The Rest Java client library accepts a list of rest servers, and does random 
> load balancing between them for each request.
> This does not work for scans, which do have state on the rest server instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28526) hbase-rest jar does not work with hbase-shaded-client with protobuf encoding

2024-04-16 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28526:
---

 Summary: hbase-rest jar does not work with hbase-shaded-client 
with protobuf encoding
 Key: HBASE-28526
 URL: https://issues.apache.org/jira/browse/HBASE-28526
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth


When trying to decode a protobof encoded CellSet, I get 
{noformat}
Exception in thread "main" java.lang.NoSuchMethodError: 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.mergeFrom(Lcom/google/protobuf/Message$Builder;[B)V
at 
org.apache.hadoop.hbase.rest.model.CellSetModel.getObjectFromMessage(CellSetModel.java:129)
at RestClientExample.getMulti(RestClientExample.java:191)
at RestClientExample.start(RestClientExample.java:138)
at RestClientExample.main(RestClientExample.java:124)

{noformat}

Seems to be caused by relocating protobuf 2.5 in hbase-shaded-client.

It works fine with the unrelcoated client i.e. when using the 
{noformat}
export CLASSPATH=`hbase --internal-classpath classpath`:
{noformat}
command to set up the classpath for the client.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28525) Document all REST endpoints

2024-04-16 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28525:
---

 Summary: Document all REST endpoints
 Key: HBASE-28525
 URL: https://issues.apache.org/jira/browse/HBASE-28525
 Project: HBase
  Issue Type: Improvement
  Components: documentation, REST
Reporter: Istvan Toth


The new features added in HBASE-28518 do not have documentation.
While reviewing, I also found other undocumented interfaces, like TableScan, 
and options like globbed gets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28518) Allow specifying a filter for the REST multiget endpoint

2024-04-16 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28518.
-
Fix Version/s: 2.6.0
   2.4.18
   4.0.0-alpha-1
   2.7.0
   3.0.0-beta-2
   2.5.9
   Resolution: Fixed

Committed to all active branches.
Thanks for the review [~ankit].

> Allow specifying a filter for the REST multiget endpoint
> 
>
> Key: HBASE-28518
> URL: https://issues.apache.org/jira/browse/HBASE-28518
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> The native HBase API allows specifying Filters for get operations.
> The REST interface does not currently expose this functionality.
> Add a parameter to the multiget enpoint to allow specifying filters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28524) Backport HBASE-28174 to branch-2.4 and branch-2.5

2024-04-15 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28524.
-
Fix Version/s: 2.4.18
   2.5.9
 Release Note: Done.
   Resolution: Fixed

> Backport HBASE-28174 to branch-2.4 and branch-2.5
> -
>
> Key: HBASE-28524
> URL: https://issues.apache.org/jira/browse/HBASE-28524
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.4.17, 2.5.8
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
> Fix For: 2.4.18, 2.5.9
>
>
> The changes are backwards compatible and the REST interface is super limited 
> without them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28524) Backport HBASE-28174 to branch-2.4 and branch-2.5

2024-04-15 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28524:
---

 Summary: Backport HBASE-28174 to branch-2.4 and branch-2.5
 Key: HBASE-28524
 URL: https://issues.apache.org/jira/browse/HBASE-28524
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.5.8, 2.4.17
Reporter: Istvan Toth
Assignee: Istvan Toth


The changes are backwards compatible and the REST interface is super limited 
without them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28523) Use a single get call in REST multiget endpoint

2024-04-14 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28523:
---

 Summary: Use a single get call in REST multiget endpoint
 Key: HBASE-28523
 URL: https://issues.apache.org/jira/browse/HBASE-28523
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


The REST multiget endpoint currently issues a separate HBase GET operation for 
each key.

Use the method that accepts a list of keys instead.
That should be faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28518) Allow specifying a filter for the REST multiget endpoint

2024-04-12 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28518:
---

 Summary: Allow specifying a filter for the REST multiget endpoint
 Key: HBASE-28518
 URL: https://issues.apache.org/jira/browse/HBASE-28518
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


The native HBase API allows specifying Filters for get operations.
The REST interface does not currently expose this functionality.

Add a parameter to the multiget enpoint to allow specifying filters.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28504) Implement eviction logic for scanners in Rest APIs to prevent scanner leakage

2024-04-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28504:
---

 Summary: Implement eviction logic for scanners in Rest APIs to 
prevent scanner leakage
 Key: HBASE-28504
 URL: https://issues.apache.org/jira/browse/HBASE-28504
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


The REST API maintains a map of _ScannerInstanceResource_s (which are 
ultimately tracking Scanner objects).

The user is supposed to delete these after using them, but if for any reason it 
does not, then these objects are maintained indefinitely.

Implement logic to evict old scanners automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28501) Support non-SPNEGO authentication methods in REST java client library

2024-04-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28501:
---

 Summary: Support non-SPNEGO authentication methods in REST java 
client library
 Key: HBASE-28501
 URL: https://issues.apache.org/jira/browse/HBASE-28501
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


The current java client only supports the SPENGO authentication method.

This does not support the case when an application proxy like Apache Knox 
performs AAA conversion from BASIC/DIGEST to kerberos authentication.

Add support for BASIC username/password auth the client.

Generally, the authentication code in the client looks quite backwards, it 
seems that most of the kerberos / auth cookie code duplicates HttpClient 
functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
letting it handle authentication by itself would be a better and more generic 
solution.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28500) Rest Java client library assumes stateless servers

2024-04-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28500:
---

 Summary: Rest Java client library assumes stateless servers
 Key: HBASE-28500
 URL: https://issues.apache.org/jira/browse/HBASE-28500
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth


The Rest Java client library accepts a list of rest servers, and does random 
load balancing between them for each request.
This does not work for scans, which do have state on the rest server instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-28489) Implement HTTP session support in REST server and client

2024-04-08 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth reopened HBASE-28489:
-

My assumption that the REST interface is stateless was incorrect.
Scan objects are maintained on the REST server, so sticky sessions are a must 
for any kind of HA/LB solution.

> Implement HTTP session support in REST server and client
> 
>
> Key: HBASE-28489
> URL: https://issues.apache.org/jira/browse/HBASE-28489
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>
> The REST server (and java client) currently does not implement sessions.
> While is not  necessary for the REST API to work, implementing sessions would 
> be a big improvement in throughput and resource usage.
> * It would make load balancing with sticky sessions possible (though it's not 
> really needed for REST)
> * It would save the overhead of performing authentication for each request
>  The gains are particularly big when using SPENGO:
> * The full SPENGO handshake can be skipped for subsequent requests
> * When Knox performs SPENGO authentication for the proxied client, it access 
> the identity store each time. When the session is set, this step is only 
> perfomed on the initial request.
> The same change has resulted in spectacular performance improvements for 
> Phoenix Query Server when implemented in Avatica.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28499) Use the latest Httpclient/Httpcore 5.x in HBase

2024-04-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28499:
---

 Summary: Use the latest Httpclient/Httpcore 5.x  in HBase
 Key: HBASE-28499
 URL: https://issues.apache.org/jira/browse/HBASE-28499
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


HttpClient 4.x is not actively maintained.

We use Httpclient directly in the REST client code, and in the tests for 
several modules.

Httpclient 4.5 is a transitive dependency at least from Hadoop and Thrift, but 
httpclient 5.x uses a separate java package, so 4.5 and 5.x  should be able to 
co-exist fine.

As of now, Httpclient 4.5 is in maintenance mode:
https://hc.apache.org/status.html




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28489) Implement HTTP session support in REST server and client

2024-04-08 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28489.
-
Resolution: Invalid

Nothing to do, all relevant cases work already.

> Implement HTTP session support in REST server and client
> 
>
> Key: HBASE-28489
> URL: https://issues.apache.org/jira/browse/HBASE-28489
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>
> The REST server (and java client) currently does not implement sessions.
> While is not  necessary for the REST API to work, implementing sessions would 
> be a big improvement in throughput and resource usage.
> * It would make load balancing with sticky sessions possible (though it's not 
> really needed for REST)
> * It would save the overhead of performing authentication for each request
>  The gains are particularly big when using SPENGO:
> * The full SPENGO handshake can be skipped for subsequent requests
> * When Knox performs SPENGO authentication for the proxied client, it access 
> the identity store each time. When the session is set, this step is only 
> perfomed on the initial request.
> The same change has resulted in spectacular performance improvements for 
> Phoenix Query Server when implemented in Avatica.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28489) Implement HTTP session support in REST server and client

2024-04-05 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28489:
---

 Summary: Implement HTTP session support in REST server and client
 Key: HBASE-28489
 URL: https://issues.apache.org/jira/browse/HBASE-28489
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


The REST server (and java client) currently does not implement sessions.

While is not seem to necessary for the REST API to work, implementing sessions 
would be a big improvement in throughput and resource usage.

* It would make balancing with sticky sessions possible
* It would save the overhead of performing authentication for each call

 The gains are particularly big when using SPENGO:

* The full SPENGO handshake can be skipped for subsequent requests
* When Knox performs SPENGO authentication for the proxied client, it access 
the identity store each time. When the session is set, this step is only 
perromed on the initial request.

The same change has resulted in spectacular performance improvements for 
Phoenix Query Server when implemented in Avatica.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New HBase committer Istvan Toth

2024-04-03 Thread Istvan Toth
Thank you!

I'm looking forward to working with you on HBase.

Istvan

On Wed, Apr 3, 2024 at 7:00 AM Nihal Jain  wrote:

> Congratulations Istvan. Welcome !
>
> On Wed, 3 Apr 2024, 01:53 Rushabh Shah,  .invalid>
> wrote:
>
> > Congratulations Istvan, welcome !!
> >
> >
> > On Tue, Apr 2, 2024 at 4:23 AM Duo Zhang  wrote:
> >
> > > On behalf of the Apache HBase PMC, I am pleased to announce that
> > > Istvan Toth(stoty)
> > > has accepted the PMC's invitation to become a committer on the
> > > project. We appreciate all
> > > of Istvan Toth's generous contributions thus far and look forward to
> > > his continued involvement.
> > >
> > > Congratulations and welcome, Istvan Toth!
> > >
> > > 我很高兴代表 Apache HBase PMC 宣布 Istvan Toth 已接受我们的邀请,成
> > > 为 Apache HBase 项目的 Committer。感谢 Istvan Toth 一直以来为 HBase 项目
> > > 做出的贡献,并期待他在未来继续承担更多的责任。
> > >
> > > 欢迎 Istvan Toth!
> > >
> >
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--
--


Re: Aiming for 2.6.0RC0 tomorrow

2024-03-21 Thread Istvan Toth
The *hbase classpath* and *hbase mapredcp* command outputs do include the
respective  *hbase-shaded-client-byo-hadoop* and *hbase-shaded-mapreduce*
 jars.

At least the 'hbase mapredcp' jars are used by both Spark and Hive
integration, and expected to be available on the node filesystem.
We also plan to switch the Phoenix connectors to that.

Having those two jars in a separate assembly would require further
configuration when installing HBase to tell it
where to find them, so that the classpath commands can include them.

If something needs to be removed, I propose the full fat (
*hbase-shaded-client*) shaded client JAR.
That is never returned by the hbase command AFAIK, and is also the largest
in size.
(I plan to remove that one from the upcoming Hadoop-less assembly as well)

Istvan

On Fri, Mar 22, 2024 at 4:55 AM 张铎(Duo Zhang)  wrote:

> Tested locally, after removing hbase-example from tarball, the hadoop3
> tarball is about 351MB.
>
> So you could try to include this commit to publish again, to see if this
> helps.
>
> Thanks.
>
> 张铎(Duo Zhang)  于2024年3月22日周五 09:18写道:
> >
> > If we exclude hbase-example from the binaries, will it be smaller enough
> to fit?
> >
> > We already commit the changes to master I believe. Let me see if we
> > can cherry-pick them and commit to branch-2.6 as well.
> >
> > Thanks.
> >
> > Bryan Beaudreault  于2024年3月22日周五 07:35写道:
> > >
> > > Thanks, I filed
> > > https://issues.apache.org/jira/browse/INFRA-25634
> > >
> > > On Thu, Mar 21, 2024 at 5:46 PM Andrew Purtell 
> wrote:
> > >
> > > > The hadoop3 bin tarball for 2.5.8 is 352.8MB. Perhaps we have just
> barely
> > > > and recently crossed a threshold. File an INFRA JIRA and ask about
> it.
> > > > Perhaps some limit can be increased, or maybe they will ask us to
> live
> > > > within it.
> > > >
> > > > Related, looking at the 2.5.8 hadoop3 bin tarball, the majority of
> the bulk
> > > > is ./lib/shaded-clients/ . The shaded clients are certainly useful
> but
> > > > probably are not the most popular options when taking a dependency on
> > > > HBase. Perhaps we can package these separately. We could exclude
> them from
> > > > the convenience tarballs as they will still be available from the
> Apache
> > > > Maven repository.
> > > >
> > > > On Thu, Mar 21, 2024 at 2:33 PM Bryan Beaudreault <
> bbeaudrea...@apache.org
> > > > >
> > > > wrote:
> > > >
> > > > > I got most of the way through, but failed during publish-dist:
> > > > >
> > > > > Transmitting file data ..svn: E175002: Commit failed (details
> follow):
> > > > > svn: E175002: PUT request on
> > > > >
> > > > >
> > > >
> '/repos/dist/!svn/txr/68050-1le9/dev/hbase/2.6.0RC0/hbase-2.6.0-hadoop3-bin.tar.gz'
> > > > > failed
> > > > >
> > > > > Running manually, it looks to be a Request Entity Too Large. The
> file in
> > > > > question is 356MB. Anyone have any experience with this?
> > > > >
> > > > > On Thu, Mar 21, 2024 at 2:19 AM 张铎(Duo Zhang) <
> palomino...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > HBASE-28444 has been resolved.
> > > > > >
> > > > > > Please go ahead to cut 2.6.0RC0, really a long journey :)
> > > > > >
> > > > > > 张铎(Duo Zhang)  于2024年3月20日周三 14:29写道:
> > > > > > >
> > > > > > > There is a security issue for zookeeper, but simply upgrading
> > > > > > > zookeeper will break a test.
> > > > > > >
> > > > > > > Pelase see HBASE-28444 for more details.
> > > > > > >
> > > > > > > I think we should get this in before cutting the RC.
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > Bryan Beaudreault  于2024年3月19日周二
> 23:51写道:
> > > > > > > >
> > > > > > > > I've finished auditing fixVersions and run ITBLL for an
> extended
> > > > > > period of
> > > > > > > > time in a real cluster. I'm not aware of any open blockers.
> So
> > > > > > tomorrow I'm
> > > > > > > > going to start generating the RC0.
> > > > > > > >
> > > > > > > > Please let me know if you have any concerns or reason for
> delay.
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew
> > > >
> > > > Unrest, ignorance distilled, nihilistic imbeciles -
> > > > It's what we’ve earned
> > > > Welcome, apocalypse, what’s taken you so long?
> > > > Bring us the fitting end that we’ve been counting on
> > > >- A23, Welcome, Apocalypse
> > > >
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com 
[image: Cloudera] 
[image: Cloudera on Twitter]  [image:
Cloudera on Facebook]  [image: Cloudera
on LinkedIn] 
--
--


[jira] [Created] (HBASE-28431) Cleaning up binary assemblies and diagnostic tools

2024-03-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28431:
---

 Summary: Cleaning up binary assemblies and diagnostic tools
 Key: HBASE-28431
 URL: https://issues.apache.org/jira/browse/HBASE-28431
 Project: HBase
  Issue Type: Umbrella
Affects Versions: 3.0.0-beta-1
Reporter: Istvan Toth


As discussed on the mailing list, the current binary assembly has several 
problems.

The discussed improvements:
* Provide assembly versions without transitive Hadoop dependencies
* Remove test JARs and their dependencies from the assemblies
* Move useful diagnostic tools into the runtime jars



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Removing tests and/or Hadoop from the binary assemblies

2024-03-08 Thread Istvan Toth
Thank you Nihal.
I'm not very familiar with the tools in the test code, so you can probably
plan that work better.
I just have some generic steps in mind:
* Identify all the tools / scripts in the test jars
* Identify and analyze their dependencies (compared to the current runtime
deps)
* Decide which ones to move to the runtime JARs.
* Move them to the runtime code (or perhaps a separate module)

I have created https://issues.apache.org/jira/browse/HBASE-28431 as an
umbrella ticket to organize the sub-tasks.

Istvan

On Fri, Mar 8, 2024 at 7:06 PM Nihal Jain  wrote:

> Sure I will be able to take up. Please create tasks with necessary details
> or let me know if you want me to create.
>
> On Fri, 8 Mar 2024, 12:45 Istvan Toth,  wrote:
>
> > Thanks for volunteering, Nihal.
> >
> > I could work on the Hadoop-less, and assemblies, and you could work on
> > cleaning up the test jars.
> > Would that work for you ?
> > I know that I'm picking the smaller part, but it turns out that I won't
> > have as much time to work on this as I hoped.
> >
> > (Unless there are other volunteers, of course)
> >
> > Istvan
> >
> > On Wed, Mar 6, 2024 at 7:03 PM Istvan Toth  wrote:
> >
> > > We seem to be in agreement in principle, however the devil is in the
> > > details.
> > >
> > > The first step should be moving the diagnostic tools out of the test
> > jars.
> > > Are there any tools we don't want to move out ?
> > > Do the diagnostic tools pull in extra dependencies compared to the
> > current
> > > runtime JARs, and if they do, what are those ?
> > > I haven't thought of the chaosmonkey tests yet, do those have specific
> > > additional dependencies / scripts ?
> > >
> > > Should we move the tools simply to the normal jars, or should we move
> > them
> > > to a new module (could be called hbase-diagnostics) ?
> > >
> > > Istvan
> > >
> > > On Tue, Mar 5, 2024 at 7:10 PM Bryan Beaudreault <
> > bbeaudrea...@apache.org>
> > > wrote:
> > >
> > >> I'm +0 on hbase-examples, but +100 on any improvements we can make
> > to
> > >> ltt/pe/chaos/minicluster/etc. It's extremely frustrating how much
> > reliance
> > >> we have on test jars both generally but also specifically around these
> > >> core
> > >> test executables. Unfortunately I haven't had time to dedicate to
> these
> > >> frustrations myself, but happy to help with review, etc.
> > >>
> > >> On Tue, Mar 5, 2024 at 1:03 PM Nihal Jain 
> > wrote:
> > >>
> > >> > Thank you for bringing this up.
> > >> >
> > >> > +1 for this change.
> > >> >
> > >> > In fact, some time back, we had faced similar problem. Security
> scans
> > >> found
> > >> > that we were bundling some vulnerable hadoop test jar. To deal with
> > >> that we
> > >> > had to make a change in our internal HBase fork to exclude all HBase
> > and
> > >> > Hadoop test jars from assembly. This helped us get rid of vulnerable
> > >> jar.
> > >> > (Although I hadn't dealt with test scope dependencies there.)
> > >> >
> > >> > But, I have been thinking of pushing this change in Apache HBase,
> just
> > >> > wasn't sure if this was even acceptable. It's great to see same has
> > been
> > >> > brought up here today.
> > >> >
> > >> > We hadn't dealt with the ltt, pe etc. tools and wrote a script to
> > >> download
> > >> > them on demand to avoid massive code change in internal fork. But I
> > >> have a
> > >> > +1 on the idea of identifying and moving all such tools to a new
> > module.
> > >> > This would be great and make things easier for us as well.
> > >> >
> > >> > Also, a way we could help new users easily get started, in case we
> > >> > completely stop bundling hadoop jars, is by providing a script which
> > >> starts
> > >> > a hbase cluster in a single node setup. In fact I had written a
> simple
> > >> > script sometime back that automates this process given a release
> link
> > >> for
> > >> > both. It first downloads Hadoop and HBase binaries and then starts
> > both
> > >> > with the hbase root directory set to be on hdfs. We could provide
> > >> something
> > >> > si

Re: [DISCUSS] Removing tests and/or Hadoop from the binary assemblies

2024-03-07 Thread Istvan Toth
Thanks for volunteering, Nihal.

I could work on the Hadoop-less, and assemblies, and you could work on
cleaning up the test jars.
Would that work for you ?
I know that I'm picking the smaller part, but it turns out that I won't
have as much time to work on this as I hoped.

(Unless there are other volunteers, of course)

Istvan

On Wed, Mar 6, 2024 at 7:03 PM Istvan Toth  wrote:

> We seem to be in agreement in principle, however the devil is in the
> details.
>
> The first step should be moving the diagnostic tools out of the test jars.
> Are there any tools we don't want to move out ?
> Do the diagnostic tools pull in extra dependencies compared to the current
> runtime JARs, and if they do, what are those ?
> I haven't thought of the chaosmonkey tests yet, do those have specific
> additional dependencies / scripts ?
>
> Should we move the tools simply to the normal jars, or should we move them
> to a new module (could be called hbase-diagnostics) ?
>
> Istvan
>
> On Tue, Mar 5, 2024 at 7:10 PM Bryan Beaudreault 
> wrote:
>
>> I'm +0 on hbase-examples, but +100 on any improvements we can make to
>> ltt/pe/chaos/minicluster/etc. It's extremely frustrating how much reliance
>> we have on test jars both generally but also specifically around these
>> core
>> test executables. Unfortunately I haven't had time to dedicate to these
>> frustrations myself, but happy to help with review, etc.
>>
>> On Tue, Mar 5, 2024 at 1:03 PM Nihal Jain  wrote:
>>
>> > Thank you for bringing this up.
>> >
>> > +1 for this change.
>> >
>> > In fact, some time back, we had faced similar problem. Security scans
>> found
>> > that we were bundling some vulnerable hadoop test jar. To deal with
>> that we
>> > had to make a change in our internal HBase fork to exclude all HBase and
>> > Hadoop test jars from assembly. This helped us get rid of vulnerable
>> jar.
>> > (Although I hadn't dealt with test scope dependencies there.)
>> >
>> > But, I have been thinking of pushing this change in Apache HBase, just
>> > wasn't sure if this was even acceptable. It's great to see same has been
>> > brought up here today.
>> >
>> > We hadn't dealt with the ltt, pe etc. tools and wrote a script to
>> download
>> > them on demand to avoid massive code change in internal fork. But I
>> have a
>> > +1 on the idea of identifying and moving all such tools to a new module.
>> > This would be great and make things easier for us as well.
>> >
>> > Also, a way we could help new users easily get started, in case we
>> > completely stop bundling hadoop jars, is by providing a script which
>> starts
>> > a hbase cluster in a single node setup. In fact I had written a simple
>> > script sometime back that automates this process given a release link
>> for
>> > both. It first downloads Hadoop and HBase binaries and then starts both
>> > with the hbase root directory set to be on hdfs. We could provide
>> something
>> > similar to help new users to get started easily.
>> >
>> > Although I am also +1 on the idea to provide both variants as mentioned
>> by
>> > Nick, which might not even need any such script.
>> >
>> > Also, I am willing to volunteer for help towards this effort. Please
>> let me
>> > know if anything is needed.
>> >
>> > Thanks,
>> > Nihal
>> >
>> >
>> > On Tue, 5 Mar 2024, 15:35 Nick Dimiduk,  wrote:
>> >
>> > > This would be great cleanup, big +1 from me for all three of these
>> > > adjustments, including the promotion of pe, ltt, and friends out of
>> the
>> > > test scope.
>> > >
>> > > I believe that we included hbase test jars because we used to freely
>> mix
>> > > classes needed for minicluster between runtime and test jars, which in
>> > turn
>> > > relied on Hadoop minicluster capabilities. The big cleanup around
>> > > HBaseTestingUtil/it addressed much (or all) of these issues on
>> branch-3.
>> > >
>> > > I believe that we include a Hadoop distribution in our assembly
>> because
>> > > that makes it easy for a new user to download our release bin.tgz and
>> get
>> > > started immediately with learning. I guess it’s high time that we work
>> > out
>> > > the with- and without-Hadoop variants.
>> > >
>> > > Thanks,
>> > > Nick
>> > >
>> > > On Tue, 5 Mar 2

Re: [DISCUSS] Removing tests and/or Hadoop from the binary assemblies

2024-03-06 Thread Istvan Toth
We seem to be in agreement in principle, however the devil is in the
details.

The first step should be moving the diagnostic tools out of the test jars.
Are there any tools we don't want to move out ?
Do the diagnostic tools pull in extra dependencies compared to the current
runtime JARs, and if they do, what are those ?
I haven't thought of the chaosmonkey tests yet, do those have specific
additional dependencies / scripts ?

Should we move the tools simply to the normal jars, or should we move them
to a new module (could be called hbase-diagnostics) ?

Istvan

On Tue, Mar 5, 2024 at 7:10 PM Bryan Beaudreault 
wrote:

> I'm +0 on hbase-examples, but +100 on any improvements we can make to
> ltt/pe/chaos/minicluster/etc. It's extremely frustrating how much reliance
> we have on test jars both generally but also specifically around these core
> test executables. Unfortunately I haven't had time to dedicate to these
> frustrations myself, but happy to help with review, etc.
>
> On Tue, Mar 5, 2024 at 1:03 PM Nihal Jain  wrote:
>
> > Thank you for bringing this up.
> >
> > +1 for this change.
> >
> > In fact, some time back, we had faced similar problem. Security scans
> found
> > that we were bundling some vulnerable hadoop test jar. To deal with that
> we
> > had to make a change in our internal HBase fork to exclude all HBase and
> > Hadoop test jars from assembly. This helped us get rid of vulnerable jar.
> > (Although I hadn't dealt with test scope dependencies there.)
> >
> > But, I have been thinking of pushing this change in Apache HBase, just
> > wasn't sure if this was even acceptable. It's great to see same has been
> > brought up here today.
> >
> > We hadn't dealt with the ltt, pe etc. tools and wrote a script to
> download
> > them on demand to avoid massive code change in internal fork. But I have
> a
> > +1 on the idea of identifying and moving all such tools to a new module.
> > This would be great and make things easier for us as well.
> >
> > Also, a way we could help new users easily get started, in case we
> > completely stop bundling hadoop jars, is by providing a script which
> starts
> > a hbase cluster in a single node setup. In fact I had written a simple
> > script sometime back that automates this process given a release link for
> > both. It first downloads Hadoop and HBase binaries and then starts both
> > with the hbase root directory set to be on hdfs. We could provide
> something
> > similar to help new users to get started easily.
> >
> > Although I am also +1 on the idea to provide both variants as mentioned
> by
> > Nick, which might not even need any such script.
> >
> > Also, I am willing to volunteer for help towards this effort. Please let
> me
> > know if anything is needed.
> >
> > Thanks,
> > Nihal
> >
> >
> > On Tue, 5 Mar 2024, 15:35 Nick Dimiduk,  wrote:
> >
> > > This would be great cleanup, big +1 from me for all three of these
> > > adjustments, including the promotion of pe, ltt, and friends out of the
> > > test scope.
> > >
> > > I believe that we included hbase test jars because we used to freely
> mix
> > > classes needed for minicluster between runtime and test jars, which in
> > turn
> > > relied on Hadoop minicluster capabilities. The big cleanup around
> > > HBaseTestingUtil/it addressed much (or all) of these issues on
> branch-3.
> > >
> > > I believe that we include a Hadoop distribution in our assembly because
> > > that makes it easy for a new user to download our release bin.tgz and
> get
> > > started immediately with learning. I guess it’s high time that we work
> > out
> > > the with- and without-Hadoop variants.
> > >
> > > Thanks,
> > > Nick
> > >
> > > On Tue, 5 Mar 2024 at 09:14, Istvan Toth  wrote:
> > >
> > > > DISCLAIMER: I don't have a patch ready, or even an elegant way mapped
> > out
> > > > to achieve this, this is about discussing whether we even want to
> make
> > > > these changes.
> > > > These are also substantial changes, but they could be targeted for
> > HBase
> > > > 3.0.
> > > >
> > > > One issue I have noticed is that we ship test jars and test
> > dependencies
> > > in
> > > > the assembly.
> > > > I can't see anyone using those, but it bloats the assembly and
> > classpath,
> > > > and adds unnecessary JARs with possible CVE issues. (for example
> Kerby
> > > > which is a Hadoop

Re: [DISCUSS] removing hbase-examples from the assembly

2024-03-05 Thread Istvan Toth
This sounds great to me.
The current PR does this, so I think we are all in agreement.

On Tue, Mar 5, 2024 at 10:49 AM 张铎(Duo Zhang)  wrote:

> I prefer we still have the hbase-examples in the main repo and publish
> it to maven central, but we do not need to ship it in the binary
> releases. The most important thing for hbase-examples is its source
> code, so including it in binary releases does not help.
>
> Istvan Toth  于2024年3月5日周二 03:28写道:
> >
> > I don't have a problem with having an examples module in the main repo,
> it
> > can be useful, and this way it is guaranteed to always work with the
> latest
> > version, and we don't have to maintain another repo.
> >
> > Publishing the binary artifact to maven (as we do now) doesn't sound very
> > useful, but if nothing depends on it then it doesn't hurt either. It's
> > easier to keep publishing it than it is to disable publishing.
> >
> > I don't really see the need for a separate download (as long as the
> > examples can be found easily via the docs).
> >
> > Thanks,
> > Istvan
> >
> >
> > On Mon, Mar 4, 2024 at 7:24 PM Nick Dimiduk  wrote:
> >
> > > Should we remove hbase-examples from the main repository entirely?
> Should
> > > it be its own download? Should we even ship it in binary form at all?
> > >
> > > Anyway I’m fine with removing it from the assembly.
> > >
> > > Thanks,
> > > Nick
> > >
> > > On Mon, 4 Mar 2024 at 13:27, Istvan Toth  wrote:
> > >
> > > > hbase assembly (and consequently the binary distributions) now
> depend on
> > > > hbase-examples.
> > > >
> > > > I think this is problematic, as
> > > > * many of those examples are explicitly not production quality.
> > > > * It adds extra curator dependencies to the assembly and to the
> various
> > > > HBase classpaths. (whic the rest of HBase does not use)
> > > >
> > > > I propose removing hbase-examples and its dependencies from the HBase
> > > > assembly, starting with HBase 3.0.
> > > >
> > > > This would have two effects:
> > > > - The example code will not be present on the classpath
> > > > - Curator libraries will not be added to the HBase classpath.
> Depending
> > > on
> > > > the shaded/non shaded classpath, the Curator from Hadoop in
> relocated or
> > > > unrelocated form will still be present.
> > > >
> > > > Related tickets:
> > > > HBASE-28416 <https://issues.apache.org/jira/browse/HBASE-28416> :
> This
> > > > proposal
> > > > HBASE-28415 <http://issues.apache.org/jira/browse/HBASE-28415> :
> > > Removing
> > > > erroneous curator dependency from hbase-endpoint (no brainer)
> > > > HBASE-28411 <https://issues.apache.org/jira/browse/HBASE-28411> :
> The
> > > > original proposal to remove curator completely
> > > >
> > > > best regards
> > > > Istvan
> > > >
> > >
> >
> >
> > --
> > *István Tóth* | Sr. Staff Software Engineer
> > *Email*: st...@cloudera.com
> > cloudera.com <https://www.cloudera.com>
> > [image: Cloudera] <https://www.cloudera.com/>
> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera
> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > --
> > --
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--
--


Re: [DISCUSS] Removing tests and/or Hadoop from the binary assemblies

2024-03-05 Thread Istvan Toth
I agree, we don't want to omit those from the binary distro.
We should identify what those tools are. (Should be easy based on the
presence of main() or the Tool interface)
Such tools could either be moved into a new module, like hbase-tools, or
simply moved to the runtime JARs.

Istvan

On Tue, Mar 5, 2024 at 10:34 AM 张铎(Duo Zhang)  wrote:

> There are some tools in the tests jar, such as PerformanceEvaluation.
>
> But anyway, maybe they should be moved to main...
>
> Istvan Toth  于2024年3月5日周二 16:14写道:
> >
> > DISCLAIMER: I don't have a patch ready, or even an elegant way mapped out
> > to achieve this, this is about discussing whether we even want to make
> > these changes.
> > These are also substantial changes, but they could be targeted for HBase
> > 3.0.
> >
> > One issue I have noticed is that we ship test jars and test dependencies
> in
> > the assembly.
> > I can't see anyone using those, but it bloats the assembly and classpath,
> > and adds unnecessary JARs with possible CVE issues. (for example Kerby
> > which is a Hadoop minicluster dependency)
> >
> > My proposal is to exclude the test jars and the test scope dependencies
> > from the assembly.
> >
> > The advantages would be:
> > * Smaller distro size
> > * Faster startup (this is marginal)
> > * Less CVE-prone JARs in the binary assemblies
> >
> > The other issue is that the assembly includes much of the Hadoop
> > distribution.
> > The basic assumption in all scripts and instructions is that the node
> has a
> > fully configured Hadoop installation, and we include it in the classpath
> of
> > HBase.
> >
> > If that is true, then there is no reason to include Hadoop in the
> assembly,
> > HBase and its direct dependencies should be enough.
> >
> > One could argue that it would simplify the client side, which is true to
> > some extent (though 95% of the client distro use cases are served better
> by
> > simply using hbase-shaded-client).
> >
> > We could either remove the Hadoop libraries from either or both of the
> > assemblies unconditionally, or provide two variants for either or both
> > assemblies, one with Hadoop included, and one without it.
> > Spark already does this, it has binary distributions both with and
> without
> > Hadoop.
> >
> > The advantages would be:
> > * Smaller distro size
> > * Faster startup (this is marginal)
> > * Less chance of conflicts with the Hadoop jars
> > * Less CVE-prone JARs in the binary assemblies
> >
> >
> > Thirdly, we could consider excluding the
> > full-fat org.apache.hbase:hbase-shaded-client JAR from the Hadoop-less
> > binary assemblies. It is not used by the assembly, and AFAIK it is not
> > included in any of the 'hbase classpath' command variants.
> >
> > This would make sure that no Hadoop libraries are included (even in
> shaded
> > form) and would make the HBase distribution fully insulated from Hadoop's
> > CVE issues.
> >
> > (The full-fat hbase-shaded-client works best as direct build-time
> > dependency anyway)
> >
> > best regards
> > Istvan
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--
--


[DISCUSS] Removing tests and/or Hadoop from the binary assemblies

2024-03-05 Thread Istvan Toth
DISCLAIMER: I don't have a patch ready, or even an elegant way mapped out
to achieve this, this is about discussing whether we even want to make
these changes.
These are also substantial changes, but they could be targeted for HBase
3.0.

One issue I have noticed is that we ship test jars and test dependencies in
the assembly.
I can't see anyone using those, but it bloats the assembly and classpath,
and adds unnecessary JARs with possible CVE issues. (for example Kerby
which is a Hadoop minicluster dependency)

My proposal is to exclude the test jars and the test scope dependencies
from the assembly.

The advantages would be:
* Smaller distro size
* Faster startup (this is marginal)
* Less CVE-prone JARs in the binary assemblies

The other issue is that the assembly includes much of the Hadoop
distribution.
The basic assumption in all scripts and instructions is that the node has a
fully configured Hadoop installation, and we include it in the classpath of
HBase.

If that is true, then there is no reason to include Hadoop in the assembly,
HBase and its direct dependencies should be enough.

One could argue that it would simplify the client side, which is true to
some extent (though 95% of the client distro use cases are served better by
simply using hbase-shaded-client).

We could either remove the Hadoop libraries from either or both of the
assemblies unconditionally, or provide two variants for either or both
assemblies, one with Hadoop included, and one without it.
Spark already does this, it has binary distributions both with and without
Hadoop.

The advantages would be:
* Smaller distro size
* Faster startup (this is marginal)
* Less chance of conflicts with the Hadoop jars
* Less CVE-prone JARs in the binary assemblies


Thirdly, we could consider excluding the
full-fat org.apache.hbase:hbase-shaded-client JAR from the Hadoop-less
binary assemblies. It is not used by the assembly, and AFAIK it is not
included in any of the 'hbase classpath' command variants.

This would make sure that no Hadoop libraries are included (even in shaded
form) and would make the HBase distribution fully insulated from Hadoop's
CVE issues.

(The full-fat hbase-shaded-client works best as direct build-time
dependency anyway)

best regards
Istvan


Re: [DISCUSS] removing hbase-examples from the assembly

2024-03-04 Thread Istvan Toth
I don't have a problem with having an examples module in the main repo, it
can be useful, and this way it is guaranteed to always work with the latest
version, and we don't have to maintain another repo.

Publishing the binary artifact to maven (as we do now) doesn't sound very
useful, but if nothing depends on it then it doesn't hurt either. It's
easier to keep publishing it than it is to disable publishing.

I don't really see the need for a separate download (as long as the
examples can be found easily via the docs).

Thanks,
Istvan


On Mon, Mar 4, 2024 at 7:24 PM Nick Dimiduk  wrote:

> Should we remove hbase-examples from the main repository entirely? Should
> it be its own download? Should we even ship it in binary form at all?
>
> Anyway I’m fine with removing it from the assembly.
>
> Thanks,
> Nick
>
> On Mon, 4 Mar 2024 at 13:27, Istvan Toth  wrote:
>
> > hbase assembly (and consequently the binary distributions) now depend on
> > hbase-examples.
> >
> > I think this is problematic, as
> > * many of those examples are explicitly not production quality.
> > * It adds extra curator dependencies to the assembly and to the various
> > HBase classpaths. (whic the rest of HBase does not use)
> >
> > I propose removing hbase-examples and its dependencies from the HBase
> > assembly, starting with HBase 3.0.
> >
> > This would have two effects:
> > - The example code will not be present on the classpath
> > - Curator libraries will not be added to the HBase classpath. Depending
> on
> > the shaded/non shaded classpath, the Curator from Hadoop in relocated or
> > unrelocated form will still be present.
> >
> > Related tickets:
> > HBASE-28416 <https://issues.apache.org/jira/browse/HBASE-28416> : This
> > proposal
> > HBASE-28415 <http://issues.apache.org/jira/browse/HBASE-28415> :
> Removing
> > erroneous curator dependency from hbase-endpoint (no brainer)
> > HBASE-28411 <https://issues.apache.org/jira/browse/HBASE-28411> : The
> > original proposal to remove curator completely
> >
> > best regards
> > Istvan
> >
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--
--


[DISCUSS] removing hbase-examples from the assembly

2024-03-04 Thread Istvan Toth
hbase assembly (and consequently the binary distributions) now depend on
hbase-examples.

I think this is problematic, as
* many of those examples are explicitly not production quality.
* It adds extra curator dependencies to the assembly and to the various
HBase classpaths. (whic the rest of HBase does not use)

I propose removing hbase-examples and its dependencies from the HBase
assembly, starting with HBase 3.0.

This would have two effects:
- The example code will not be present on the classpath
- Curator libraries will not be added to the HBase classpath. Depending on
the shaded/non shaded classpath, the Curator from Hadoop in relocated or
unrelocated form will still be present.

Related tickets:
HBASE-28416  : This
proposal
HBASE-28415  : Removing
erroneous curator dependency from hbase-endpoint (no brainer)
HBASE-28411  : The
original proposal to remove curator completely

best regards
Istvan


[jira] [Created] (HBASE-28416) Remove hbase-examples from assembly

2024-03-03 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28416:
---

 Summary: Remove hbase-examples from assembly
 Key: HBASE-28416
 URL: https://issues.apache.org/jira/browse/HBASE-28416
 Project: HBase
  Issue Type: Improvement
Reporter: Istvan Toth
Assignee: Istvan Toth


hbase-assembly is supposed to contain programming examples for HBase.
However, it is added to the assembly API, and becomes part of the HBase 
distrbutions.

On hand this adds some potentially useful components and coprocessors to HBase, 
on the other hand many of those are not production quality, and were never 
meant to be used as-is.

It also adds the Curator libraries to the general HBase classpath, which is 
used but by but a single example.

Removing hbase-examples from the assembly would fix both problems.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Deprecating zookeeper-based client connectivity

2024-03-01 Thread Istvan Toth
I checked our compatibility promise just now:
https://hbase.apache.org/book.html#hbase.versioning

If we consider the way we use properties to define the cluster connection a
part of the client API
(and I personally do) then we cannot remove the ZK registry
functionality before 4.0, even
if it is deprecated in 2.6.

Istvan

On Fri, Mar 1, 2024 at 10:12 AM 张铎(Duo Zhang)  wrote:

> For 3.0.0, after moving the replication things out, there is no
> persistent data on zookeeper now. So it is possible to move off
> zookeeper now, of course, we still need at least something like etcd,
> as we need an external system to track the living region servers...
>
> And I think the registry interface is for connecting to a HBase
> cluster from outside, it does not need to know the internal
> implementation of HBase, i.e, whether to make use of zookeeper.
> For me, I think a possible problem is that we expose the meta location
> in registry interface, since the splittable meta feature has been
> restarted, if later we support multiple meta regions in HBase, we will
> need extra works if we still want to keep the zk based registry...
>
> Thanks.
>
> Nick Dimiduk  于2024年3月1日周五 16:25写道:
> >
> > On Fri, 1 Mar 2024 at 07:47, Istvan Toth 
> wrote:
> >
> > > That's a pretty fundamental change, and would break a lot of use cases
> and
> > > applications that hard-code the assumption of the ZK registry.
> >
> >
> > To the best of my knowledge, the znode structure in ZooKeeper has never
> > been a part of our public API. I have no sympathy for systems that assume
> > its presence.
> >
> > Making a breaking change like removing the previous default connection
> > > method in a minor version also feels wrong.
> > > (It may go against the compatibility policy, though I haven't checked)
> >
> >
> > This is a fair argument.
> >
> > I think it would be better to deprecate it in 3.0 and remove it in 4.0,
> or
> > > at least deprecate it in 2.6 and remove it in 4.0.
> > > This is how the HBase 2.x API changes were handled, where the removal
> of
> > > the old HBase 1.x APIs were targeted to 3.0.
> > > The ZK registry code is small, and doesn't cost much to keep in the
> > > codebase.
> >
> >
> > And in fact, I now realize that something like it will continue to exist
> > even after the class is removed from our public API because I suspect
> that
> > the HMaster will need to use it in order to bootstrap itself. Still, it
> > could be moved into hbase-server and kept as an internal concern.
> >
> > So then, should we not deprecate it at all? We let the RPC implementation
> > flip over as default in 3.0, but the ZK implementation sticks around into
> > perpetuity? As far as I know, we have no plan to move off of ZooKeeper
> > entirely ; etcd and RAFT are still just talk, right? If there’s nothing
> to
> > motivate its complete removal, I guess there no reason to deprecate it.
> >
> > Thanks,
> > Nick
> >
> > On Fri, Mar 1, 2024 at 12:15 AM Andrew Purtell 
> wrote:
> > >
> > > > +1 for deprecating ZKConnectionRegistry beginning with/in 2.6.0.
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Feb 29, 2024 at 2:30 AM Nick Dimiduk 
> > > wrote:
> > > >
> > > > > Heya,
> > > > >
> > > > > We have long had the ambition to get away from ZooKeeper as the
> means
> > > by
> > > > > which a client interfaces with an HBase cluster. The
> ConnectionRegistry
> > > > was
> > > > > introduced in 2.0 as part of the asynchronous client implementation
> > > [0],
> > > > > then called the ClusterRegistry. The name changed and a new
> > > > implementation
> > > > > backed by an HMaster endpoint was introduced, called the
> > > > > MasterConnectionRegistry. That implementation was made more
> generic as
> > > > the
> > > > > RpcConnectionRegistry, which can be backed by HMaster or
> RegionServer
> > > > > processes. Finally, many of the teething issues [1] with the
> > > > > RpcConnectionRegistry have been worked out. As of now,
> > > > > RpcConnectionRegistry is the default path for client cluster
> access on
> > > > > branch-3 [2].
> > > > >
> > > > > With 2.6 upon us, we'd like to formalize the deprecation cycle for
> > > client
> > > > > implementations connecting to a cluster using the
> ZKConnectionRegistry

Re: [DISCUSS] Deprecating zookeeper-based client connectivity

2024-02-29 Thread Istvan Toth
That's a pretty fundamental change, and would break a lot of use cases and
applications that hard-code the assumption of the ZK registry.
Making a breaking change like removing the previous default connection
method in a minor version also feels wrong.
(It may go against the compatibility policy, though I haven't checked)

I think it would be better to deprecate it in 3.0 and remove it in 4.0, or
at least deprecate it in 2.6 and remove it in 4.0.
This is how the HBase 2.x API changes were handled, where the removal of
the old HBase 1.x APIs were targeted to 3.0.
The ZK registry code is small, and doesn't cost much to keep in the
codebase.

Istvan

On Fri, Mar 1, 2024 at 12:15 AM Andrew Purtell  wrote:

> +1 for deprecating ZKConnectionRegistry beginning with/in 2.6.0.
>
>
>
>
> On Thu, Feb 29, 2024 at 2:30 AM Nick Dimiduk  wrote:
>
> > Heya,
> >
> > We have long had the ambition to get away from ZooKeeper as the means by
> > which a client interfaces with an HBase cluster. The ConnectionRegistry
> was
> > introduced in 2.0 as part of the asynchronous client implementation [0],
> > then called the ClusterRegistry. The name changed and a new
> implementation
> > backed by an HMaster endpoint was introduced, called the
> > MasterConnectionRegistry. That implementation was made more generic as
> the
> > RpcConnectionRegistry, which can be backed by HMaster or RegionServer
> > processes. Finally, many of the teething issues [1] with the
> > RpcConnectionRegistry have been worked out. As of now,
> > RpcConnectionRegistry is the default path for client cluster access on
> > branch-3 [2].
> >
> > With 2.6 upon us, we'd like to formalize the deprecation cycle for client
> > implementations connecting to a cluster using the ZKConnectionRegistry.
> >
> > I have been using the RpcConnectionRegistry in several deployments since
> > the 2.4 release line. In a deployment without using secured connections,
> > it's a drop-in replacement. For secured deployments, it's simpler,
> because
> > clients don't need to be granted ZooKeeper connection credentials.
> Movement
> > of RPC burden from the ZooKeeper cluster to Region Servers is really nice
> > for spreading out the load.
> >
> > Maybe others have deployed the feature as well and have some experience
> to
> > report back?
> >
> > Based on my experience, I am in favor of marking ZKConnectionRegistry as
> > Deprecated starting in 2.6 with a plan to remove it in 3.1 ... or 3.2 if
> > necessary.
> >
> > What do you say? Any objections?
> >
> > Thanks,
> > Nick
> >
> > [0]: https://issues.apache.org/jira/browse/HBASE-15921
> > [1]: https://issues.apache.org/jira/browse/HBASE-26149
> > [2]: https://issues.apache.org/jira/browse/HBASE-26174
> >
>
>
> --
> Best regards,
> Andrew
>
> Unrest, ignorance distilled, nihilistic imbeciles -
> It's what we’ve earned
> Welcome, apocalypse, what’s taken you so long?
> Bring us the fitting end that we’ve been counting on
>- A23, Welcome, Apocalypse
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com 
[image: Cloudera] 
[image: Cloudera on Twitter]  [image:
Cloudera on Facebook]  [image: Cloudera
on LinkedIn] 
--
--


[jira] [Created] (HBASE-28411) Remove direct dependency on Curator

2024-02-29 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28411:
---

 Summary: Remove direct dependency on Curator
 Key: HBASE-28411
 URL: https://issues.apache.org/jira/browse/HBASE-28411
 Project: HBase
  Issue Type: Improvement
Reporter: Istvan Toth


The only place where Curator is used is 
ZooKeeperScanPolicyObserver.java in hbase-examples.

That functionality can be re-implementend without curator, and a problematic 
dependency can be removed from HBase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28410) Upgrade curator to 5.6.0

2024-02-29 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28410:
---

 Summary: Upgrade curator to 5.6.0
 Key: HBASE-28410
 URL: https://issues.apache.org/jira/browse/HBASE-28410
 Project: HBase
  Issue Type: Improvement
  Components: Zookeeper
Reporter: Istvan Toth


HBase still uses Curator 4.2.0, because it's the last version to support ZK 3.4.

Now that HBase uses a recent ZK, we can use the latest Curator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28404) Use "set -x" when running release script in debug mode

2024-02-26 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28404:
---

 Summary: Use "set -x" when running release script in debug mode
 Key: HBASE-28404
 URL: https://issues.apache.org/jira/browse/HBASE-28404
 Project: HBase
  Issue Type: Improvement
  Components: scripts
Reporter: Istvan Toth


Phoenix release scripts are forked from HBase.
I found using the  bash "set -x" command  very useful when diagnosing problems.

It is implemented as part of PHOENIX-7236, and could very easily ported back to 
HBase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28353) Close HBase connection on implicit exit from HBase shell

2024-02-09 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28353:
---

 Summary: Close HBase connection on implicit exit from HBase shell
 Key: HBASE-28353
 URL: https://issues.apache.org/jira/browse/HBASE-28353
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Istvan Toth
Assignee: Istvan Toth


The fix in HBASE-28345 only works when the exit function is explicitly called.
It does not work when scripts are piped in in non-interactive mode.

Hook the connection close into Ruby at_exit instead of the exit shell command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28345) Close HBase connection on exit from HBase Shell

2024-02-06 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28345:
---

 Summary: Close HBase connection on exit from HBase Shell
 Key: HBASE-28345
 URL: https://issues.apache.org/jira/browse/HBASE-28345
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 2.4.17
Reporter: Istvan Toth
Assignee: Istvan Toth


When using Netty for the HBase client, hbase shell hangs on exit.
This is caused by the non-deamon threads that Netty creates.

Wheter ZK should create daemon threads for Netty or not is debatable, but 
explicitly closing the connection in hbase shell on exit fixes the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28314) Generate sources artifacts for hbase-server

2024-01-15 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28314:
---

 Summary: Generate sources artifacts for hbase-server
 Key: HBASE-28314
 URL: https://issues.apache.org/jira/browse/HBASE-28314
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.5.7, 3.0.0-beta-1, 2.4.17, 2.6.0, 4.0.0-alpha-1
Reporter: Istvan Toth


There is no source jar generated for hbase-server.
Enabling maven-source-plugin seems to work fine, and Eclipse can use the 
generated sources jar for debugging etc.

It seems that these source JARs are not generated at least since HBase 2.0.0, 
but we did have source JARs sometime in the 1.x time frame.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28213) Evaluate using hbase-shaded-client-byo-hadoop for Spark connector

2024-01-08 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28213.
-
  Assignee: Istvan Toth
Resolution: Done

Spark has reverted to using unshaded Hadoop due the cloud connector 
compatibility issues.

> Evaluate using hbase-shaded-client-byo-hadoop for Spark connector
> -
>
> Key: HBASE-28213
> URL: https://issues.apache.org/jira/browse/HBASE-28213
> Project: HBase
>  Issue Type: Improvement
>  Components: spark
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>
> Since 3.2 Spark now uses hadoop-client-api and hadoop-client-runtime.
> While we don't actually specify what HBase libraries are needed on the Spark 
> client side for the connector, at least the Cloudera docs specify the classes 
> provided by "hbase mapredcp"
> which includes the full unshaded Hadoop JAR set.
> Investigate whether  *hbase-shaded-client-byo-hadoop* and the 
> *hbase-client-api* and *hbase-client-runtime* is enough for the connector, 
> and if yes, document how to set the Spark classpath.
> Alternatively, if *hbase-shaded-client-byo-hadoop*  is not enough, check if 
> *hbase-shaded-mapreduce* plus the above two shaded Hadoop client JAR provides 
> everything needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28261) Add --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED -version

2023-12-14 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28261:
---

 Summary: Add --add-opens 
java.base/java.util.concurrent.atomic=ALL-UNNAMED -version
 Key: HBASE-28261
 URL: https://issues.apache.org/jira/browse/HBASE-28261
 Project: HBase
  Issue Type: Bug
Reporter: Istvan Toth






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28252) Update JDK11 and add JDK17 options to hbase script

2023-12-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28252:
---

 Summary: Update JDK11 and add JDK17 options to hbase script
 Key: HBASE-28252
 URL: https://issues.apache.org/jira/browse/HBASE-28252
 Project: HBase
  Issue Type: Bug
  Components: scripts
Reporter: Istvan Toth
Assignee: Istvan Toth


As noted in HBASE-28247, HBase can run into module permission issues that are 
not handled in the current JDK11 options in the hbase startup script.

The surefire test config also includes some JDK17 specific options, we should 
also add those as needed.

We are not yet aware of any additional JVM options required by Java 21.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28247) Add java.base/sun.net.dns and java.base/sun.net.util export to jdk11 JVM test flags

2023-12-06 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28247:
---

 Summary: Add java.base/sun.net.dns and java.base/sun.net.util  
export to jdk11 JVM test flags
 Key: HBASE-28247
 URL: https://issues.apache.org/jira/browse/HBASE-28247
 Project: HBase
  Issue Type: Bug
  Components: java
Affects Versions: 2.5.6, 2.4.17, 3.0.0-alpha-4, 2.6.0
Reporter: Istvan Toth
Assignee: Istvan Toth


While testing with JDK17 we have found  that we need to add 
{noformat}
  --add-exports java.base/sun.net.dns=ALL-UNNAMED
  --add-exports java.base/sun.net.util=ALL-UNNAMED
{noformat}
on top of what is already defined in _hbase-surefire.jdk11.flags_ , otherwise 
RS and Master startup fails in the Hadoop security code.

While this does not affect the test suite (at least not the commonly run 
tests), I consider hbase-surefire.jdk11.flags to be an unoffical resource to 
getting HBase to run on newer JDK versions.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] Introduce HBase Connection URL - Phoenix solution

2023-11-27 Thread Istvan Toth
Hi,

I have read Nick's recent HBase connection URL email, but I wasn't
subscribed yet, so I cannot properly respond to it.

In Phoenix, we have recently added support for the new registries in
https://issues.apache.org/jira/browse/PHOENIX-6523.
I've documented it at https://phoenix.apache.org/classpath_and_url.html

Much of the new code was copied from HBase, and it does roughly what Nick
has suggested.
The only painful part is the clashes between the ":" as separator, and ":"
used for separating hosts and ports, which requires a lot of escaping.

best regards
Istvan

I'm copying the release notes for PHOENIX-6523 here for quick perusal:


Add support for MasterRegistry and RPCConnectionRegistry to Phoenix.

Introduces the new URL protocol variants:
* jdbc:phoenix+zk: Uses Zookeeper. This is the original registry supported
since the inception of HBase and Phoenix.
* jdbc:phoenix+rpc: Uses RPC to connecto to the specified HBase RS/Master
nodes.
* jdbc:phoenix+master: Uses RPC to connect to the specified HBase Master
nodes

The syntax:
"jdbc:phoenix" : uses the default registry and and connection from
hbase-site.xml

"jdbc:phoenix:param1:param2...": Protocol/Registry is determined from Hbase
version and hbase-site.xml configuration, and parameters are interpreted
accoring to the registry.

"jdbc:phoenix+zk:hosts:ports:zknode:principal:keytab;options..." : Behaves
the same as jdbc:phoenix... URL previously. Any missing parameters use
defaults fom hbase-site.xml or the environment.

"jdbc:phoenix+rpc:hosts:ports::principal:keytab;options..." : Uses
RPCConnectionRegistry. If more than two options are specified, then the
third one (he unused zkNode paramater) must always be blank.

"jdbc:phoenix+master:hosts:ports::principal:keytab;options..." : Uses
RPCMasterRegistry. If more than two options are specified, then the third
one (he unused zkNode paramater) must always be blank.

Phoenix now also supports heterogenous ports defined in HBASE-12706
 for every registry.
When specifying the ports for each host separately the colon ":" character
must be escaped with a backslash, i.e.
"jdbc:phoenix+zk:host1\:123,host2\:345:/hbase:principal:keytab", or
"jdbc:phoenix+rpc:host1\:123,host2\:345" You may need to add extra escapes
to preserve the backslashes if defined in java code, etc.

Note that while the phoenix+zk URL handling code has heuristics that tries
to handle some omitted parameters, the Master and ConnectionRPC registry
code strictly maps the URL parameters to by their ordering.

Note that Phoenix now internally normalizes the URL. Whether you specify an
explicit connection, or use the default "jdbc:phoenix" URL, Phoenix will
internally normalize the connection, and set the properties for the
internal HBase Connection objects appropriately.

Also note that for most non-HA use cases an explicit connection URL should
NOT be used. The preferred way to specify the connection is to have an
up-to-date hbase-site.xml with both Hbase and Phoenix client properties set
correctly (with other Hadoop conficguration files as needed) on the Phoenix
application classpath , and using the default "jdbc:phoenix" URL.




[jira] [Created] (HBASE-28219) Document spark.hadoopRDD.ignoreEmptySplits issue for Spark Connector

2023-11-23 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28219:
---

 Summary: Document spark.hadoopRDD.ignoreEmptySplits issue for 
Spark Connector
 Key: HBASE-28219
 URL: https://issues.apache.org/jira/browse/HBASE-28219
 Project: HBase
  Issue Type: Bug
  Components: spark
Reporter: Istvan Toth


For Spark 3.2.0, the connector needs 
spark.hadoopRDD.ignoreEmptySplits=false to work correctly.

This crucial piece of information is not documented either in the README, or in 
the main Hbase connector section.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28214) Document Spark classpath requirements for the Spark connector

2023-11-22 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28214:
---

 Summary: Document Spark classpath requirements for the Spark 
connector
 Key: HBASE-28214
 URL: https://issues.apache.org/jira/browse/HBASE-28214
 Project: HBase
  Issue Type: Bug
  Components: spark
Reporter: Istvan Toth


The README for the Spark connector details the classpath requirements for the 
HBase server side, but does not talk about how to set up the Spark classpath 
for HBase.

The Cloudera docs 
[https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/accessing-hbase/topics/hbase-configure-spark-connector.html]
 suggest using "hbase mapredcp" It is, however inconsistent, as "hbase 
mapredcp" includes the unshaded hadoop libraries, while the example command 
line omits the hadoop libraries.

Figure this out, and update the documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28213) Evalue using hbase-shaded-client-byo-hadoop for Spark connector

2023-11-22 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28213:
---

 Summary: Evalue using hbase-shaded-client-byo-hadoop for Spark 
connector
 Key: HBASE-28213
 URL: https://issues.apache.org/jira/browse/HBASE-28213
 Project: HBase
  Issue Type: Improvement
  Components: spark
Reporter: Istvan Toth


Since 3.2 Spark now uses hadoop-client-api and hadoop-client-runtime.
While we don't actually specify what HBase libraries are needed on the Spark 
client side for the connector, at least the Cloudera docs specify the classes 
provided by "hbase mapredcp"
which includes the full unshaded Hadoop JAR set.

Investigate whether  *hbase-shaded-client-byo-hadoop* and the 
*hbase-client-api* and *hbase-client-runtime  __* is enough for the connector, 
and if yes, document how to set the Spark classpath.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28208) HBase build cannot find protoc 2.5.0 for OSX Aarch64

2023-11-16 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28208:
---

 Summary: HBase build cannot find protoc 2.5.0 for OSX Aarch64
 Key: HBASE-28208
 URL: https://issues.apache.org/jira/browse/HBASE-28208
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 2.5.6, 2.4.17, 2.6.0
Reporter: Istvan Toth


Hbase as a profile for building on Aarch64 Linux, but it doesn't work on Mac.

We have solved the same problem on Phoenix by using x86_64 in this case via 
emulation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28201) Add Endpoint and Method Name to COPROC_EXEC Spans

2023-11-13 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28201:
---

 Summary: Add Endpoint  and Method Name to COPROC_EXEC Spans
 Key: HBASE-28201
 URL: https://issues.apache.org/jira/browse/HBASE-28201
 Project: HBase
  Issue Type: Improvement
  Components: tracing
Reporter: Istvan Toth


If we assume parentBased=on, then it's enough to add this information on the 
client side.
However, we may want to also this on the server side for stochastic tracing.



We could call these:

db.hbase.endpoint.name
db.hbase.endpoint.method

or 

db.hbase.coprocessor.name
db.hbase.coprocessor.method



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28200) Enrich trace span information

2023-11-13 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28200:
---

 Summary: Enrich trace span information
 Key: HBASE-28200
 URL: https://issues.apache.org/jira/browse/HBASE-28200
 Project: HBase
  Issue Type: Improvement
  Components: tracing
Affects Versions: 2.5.6, 2.6.0, 3.0.0
Reporter: Istvan Toth
Assignee: Istvan Toth


As I'm working on adding usable Trace data to Phoenix, I find that HBase trace 
spans are missing crucial information.

This an umbrella ticket to track those issues, and add crucial information that 
is needed for interpreting trace data.

The goal is to add the most useful data while keeping the Span data as compact 
as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28137) Add scala-parser-combinators dependency to connectors for Spark 2.4

2023-10-06 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28137:
---

 Summary: Add scala-parser-combinators dependency to connectors for 
Spark 2.4
 Key: HBASE-28137
 URL: https://issues.apache.org/jira/browse/HBASE-28137
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Istvan Toth


The Spark connector doesn't compile with Spark 2.4 because of a missing 
scala-parser-combinators dependency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28135) Specify -Xms for tests

2023-10-06 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28135:
---

 Summary: Specify -Xms for tests
 Key: HBASE-28135
 URL: https://issues.apache.org/jira/browse/HBASE-28135
 Project: HBase
  Issue Type: Improvement
  Components: test
Reporter: Istvan Toth
Assignee: Istvan Toth


The default -Xmx value is JVM dependent, but the host memory size is usually 
included in the calculation.

-Xmx in turn is used to calculate some GC parameters, for example NewSize and 
OldSize, which affect the behaviour of tests.

As the memory consumption on the tests is not dependent on the host VM size, we 
could set -Xms for the tests explictly, and enjoy more consistent test results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28133) TestSyncTimeRangeTracker fails with OOM on Aarch64

2023-10-05 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28133:
---

 Summary: TestSyncTimeRangeTracker fails with OOM on Aarch64
 Key: HBASE-28133
 URL: https://issues.apache.org/jira/browse/HBASE-28133
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.4.17
Reporter: Istvan Toth


This test seems to be cutting real close to the heap size.
On ARM, it consistently fails on my RHEL8.8 Aarch64 VM with Java 8.
{noformat}
mvn test -P runDevTests -Dtest.build.data.basedirectory=/ram2G 
-Dhadoop.profile=3.0 -fn -B -Dtest=TestSyncTimeRangeTracker* -pl hbase-server
...
[ERROR] 
org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker.testConcurrentIncludeTimestampCorrectness
  Time elapsed: 1.969 s  <<< ERROR!
java.lang.OutOfMemoryError: Java heap space
{noformat}
It seems that Java on ARM has some higher memory overhead than x86_64.

Simply bumping -Xmx from the default 2200m to 2300m allows it to pass. 
{noformat}
mvn test -P runDevTests -Dtest.build.data.basedirectory=/ram2G 
-Dhadoop.profile=3.0 -fn -B -Dtest=TestSyncTimeRangeTracker* -pl hbase-server 
-Dsurefire.Xmx=2300m
...
[INFO] Running org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.395 s 
- in org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker
{noformat}
However, the real solution should be reducing the memory usage for this test.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28132) TestTableShell times out

2023-10-05 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28132:
---

 Summary: TestTableShell times out
 Key: HBASE-28132
 URL: https://issues.apache.org/jira/browse/HBASE-28132
 Project: HBase
  Issue Type: Bug
  Components: shell, test
Affects Versions: 4.0.0-alpha-1
Reporter: Istvan Toth


When I try to run the dev test suit on master HEAD, TestTableShell  times out.
The output I attach happens on ARM and master HEAD, but I have seen the same 
error on branch-2.4 and x86_64.

At first I thought that the test hangs for ever, but it turns out that there is 
a 1 second timout (which seems a bit excessive) and it times out after 
almost three hours.
{noformat}
mvn clean install -DskipTests
nohup mvn test -P runDevTests -Dtest.build.data.basedirectory=/ram2G -fn -B 
&{noformat}
{noformat}
...
[INFO] --- maven-surefire-plugin:3.1.0:test (secondPartTestsExecution) @ 
hbase-shell ---[INFO] Using configured provider 
org.apache.maven.surefire.junitcore.JUnitCoreProvider[INFO] [INFO] 
---[INFO]  T E S T S[INFO] 
---[INFO] Running 
org.apache.hadoop.hbase.client.TestShellNoCluster[INFO] Tests run: 1, Failures: 
0, Errors: 0, Skipped: 0, Time elapsed: 0.036 s - in 
org.apache.hadoop.hbase.client.TestShellNoCluster[INFO] Running 
org.apache.hadoop.hbase.client.TestTableShell[ERROR] Tests run: 2, Failures: 0, 
Errors: 2, Skipped: 0, Time elapsed: 10,934.388 s <<< FAILURE! - in 
org.apache.hadoop.hbase.client.TestTableShell[ERROR] 
org.apache.hadoop.hbase.client.TestTableShell  Time elapsed: 10,934.324 s  <<< 
ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 780 
seconds at java.lang.UNIXProcess.forkAndExec(Native Method) at 
java.lang.UNIXProcess.(UNIXProcess.java:247)   at 
java.lang.ProcessImpl.start(ProcessImpl.java:134)at 
java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) at 
jline.internal.TerminalLineSettings.exec(TerminalLineSettings.java:308)  at 
jline.internal.TerminalLineSettings.stty(TerminalLineSettings.java:282)  at 
jline.internal.TerminalLineSettings.undef(TerminalLineSettings.java:158) at 
jline.UnixTerminal.init(UnixTerminal.java:94)at 
jline.TerminalFactory.create(TerminalFactory.java:116)   at 
jline.TerminalFactory.get(TerminalFactory.java:180)  at 
jline.TerminalFactory.get(TerminalFactory.java:186)  at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at 
org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:442)
  at 
org.jruby.javasupport.JavaMethod.invokeStaticDirect(JavaMethod.java:364) at 
org.jruby.java.invokers.StaticMethodInvoker.call(StaticMethodInvoker.java:31)   
 at 
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:351)
at 
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:144)at 
org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:345)
   at 
org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:72)
at 
org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:80) 
 at 
org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:164)
at 
org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:151)
at 
org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:210)   
 at 
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:351)
at 
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:144)at 
org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:345)
   at 
org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:72)
at 
org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:80) 
 at 
org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:164)
at 
org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:151)
at 
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:362)
at 
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:154)at 
org.jruby.RubyClass.newInstance(RubyClass.java:883)  at 
org.jruby.RubyClass$INVOKER$i$newInstance.call(RubyClass$INVOKER$i$newInstance.gen)
  at 
org.jruby.internal.runtime.methods.JavaMethod$JavaMethodZeroOrNBlock.call(JavaMethod.java:332)
   at 
org.jruby.runtime.callsite.Cachin

[jira] [Created] (HBASE-27801) Remove redundant avro.version property from Kafka connector

2023-04-17 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-27801:
---

 Summary: Remove redundant avro.version property from Kafka 
connector
 Key: HBASE-27801
 URL: https://issues.apache.org/jira/browse/HBASE-27801
 Project: HBase
  Issue Type: Bug
  Components: kafka
Reporter: Istvan Toth


1.7.7

is defined both in the main connectors pom, and in the kafka module.
This is not useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27747) Clean up hbase-connector dependencies and assembly

2023-03-23 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-27747:
---

 Summary: Clean up hbase-connector dependencies and assembly
 Key: HBASE-27747
 URL: https://issues.apache.org/jira/browse/HBASE-27747
 Project: HBase
  Issue Type: Bug
  Components: kafka, spark
Reporter: Istvan Toth


We ship a lot of external jars in the /lib directory of the assembly.
I know that for Spark and Hive none of those are needed, because we take 
everything from the Spark/Hive classpath.
I am not familiar with the Kafka connector, but according to the readme, that 
one also adds the jars from 'hbase classpath' , so which includes most/all of 
the external JARs in the lib directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-26491) I can't read/write a Hbase table by spark-hbase connector when the table is in non-default namespace

2023-03-01 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-26491.
-
Resolution: Duplicate

> I can't read/write a Hbase table by spark-hbase connector when the table is  
> in non-default namespace
> -
>
> Key: HBASE-26491
> URL: https://issues.apache.org/jira/browse/HBASE-26491
> Project: HBase
>  Issue Type: Bug
>  Components: hbase-connectors
>Affects Versions: 1.0.0
>Reporter: mengdou
>Priority: Minor
> Attachments: image-2021-11-26-17-32-53-507.png, 
> image-2021-12-01-20-52-11-664.png, image-2021-12-01-20-53-05-405.png
>
>
> I found I can't read/write a Hbase table by spark-hbase connector when the 
> hbase table is  in a non-default namespace.
>  
> Because when spark opens a table(related to a hbase table), it creates a 
> HBaseRelation instance first, and initializes a HBaseTableCatalog from the 
> table definition saved in spark catalog. But in the function 'convert' the 
> field 'tableCatalog' is constructed from a string template, in which the 
> namespace is set as 'default', leading to a wrong namespace. This namespace 
> is not  the one defined when user created the table before.
>  
> Pls have a look:
> !image-2021-11-26-17-32-53-507.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27633) Add usage docs to HBase-Spark connector

2023-02-10 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-27633:
---

 Summary: Add usage docs to HBase-Spark connector
 Key: HBASE-27633
 URL: https://issues.apache.org/jira/browse/HBASE-27633
 Project: HBase
  Issue Type: Bug
  Components: hbase-connectors, spark
Reporter: Istvan Toth


Docs for the HBase Spark connectors are mostly non-existent.

We could start by adding most of the contentes from 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27625) Spark Connector rest failure with Java 11

2023-02-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-27625:
---

 Summary: Spark Connector rest failure with Java 11
 Key: HBASE-27625
 URL: https://issues.apache.org/jira/browse/HBASE-27625
 Project: HBase
  Issue Type: Improvement
  Components: hbase-connectors, spark
Affects Versions: connector-1.0.0
Reporter: Istvan Toth


When trying to run the tests with Java 11, I get:
{noformat}
[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.733 s 
<<< FAILURE! - in org.apache.hadoop.hbase.spark.TestJavaHBaseContext
[ERROR] org.apache.hadoop.hbase.spark.TestJavaHBaseContext  Time elapsed: 0.721 
s  <<< ERROR!
java.lang.ExceptionInInitializerError
    at 
org.apache.hadoop.hbase.spark.TestJavaHBaseContext.setUpBeforeClass(TestJavaHBaseContext.java:86)
Caused by: java.lang.NullPointerException
    at 
org.apache.hadoop.hbase.spark.TestJavaHBaseContext.setUpBeforeClass(TestJavaHBaseContext.java:86)[ERROR]
 org.apache.hadoop.hbase.spark.TestJavaHBaseContext  Time elapsed: 0.721 s  <<< 
ERROR!
java.lang.NullPointerException
    at 
org.apache.hadoop.hbase.spark.TestJavaHBaseContext.tearDownAfterClass(TestJavaHBaseContext.java:106)
{noformat}
The same test runs fine with Java 8.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27624) Cannot Specify Namespace via the hbase.table Option in Spark Connector

2023-02-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-27624:
---

 Summary: Cannot Specify Namespace via the hbase.table Option in 
Spark Connector
 Key: HBASE-27624
 URL: https://issues.apache.org/jira/browse/HBASE-27624
 Project: HBase
  Issue Type: Bug
  Components: hbase-connectors, spark
Affects Versions: 1.0.1
Reporter: Istvan Toth


When using the old mapping format and specifying the HBase table via the 
_hbase.table_ option, the connector passes the namespaced string to HBase, and 
we get


{noformat}
Caused by: java.lang.IllegalArgumentException: Illegal character code:58, <:> 
at 7. User-space table qualifiers may only contain 'alphanumeric characters' 
and digits: staplesHbaseNamespace:staplesHbaseTableName
at 
org.apache.hadoop.hbase.TableName.isLegalTableQualifierName(TableName.java:187)
at 
org.apache.hadoop.hbase.TableName.isLegalTableQualifierName(TableName.java:138)
at org.apache.hadoop.hbase.TableName.(TableName.java:320)
at 
org.apache.hadoop.hbase.TableName.createTableNameIfNecessary(TableName.java:354)
at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:484){noformat}

This seems to be related to the changes in HBASE-24276



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] HBase 2.5 / Hadoop 3 artifacts

2022-11-26 Thread Istvan Toth
Thanks a lot, Duo!

Looking at HBASE-27359, the -hadoop3 artifacts should be exactly the same
that we're rebuilding in our build process.

Ran a quick test:

Added the rc repo to my settings.xml
built Phoenix locally with
mvn clean install -Dhbase.profile=2.5 -Dhbase.version=2.5.2-hadoop3

The test suite passed, everything looks good.

Thank you again!




On Fri, Nov 25, 2022 at 5:25 AM 张铎(Duo Zhang)  wrote:

> I've put up 2.5.2RC0, which contains a hadoop3 dist and also hadoop3
> maven artifacts, it is built with hadoop 3.2.4.
>
> The dist is available here
> https://dist.apache.org/repos/dist/dev/hbase/2.5.2RC0/
>
> And the maven artifacts is available here
> https://repository.apache.org/content/repositories/orgapachehbase-1504/
>
> Notice that the version for hadoop3 maven artifacts is 2.5.2-hadoop3.
>
> Please take a look and have a try.
>
> Thanks.
>
>
>
> 张铎(Duo Zhang)  于2022年10月31日周一 12:02写道:
>
>
> >
> > Some progress here.
> > With other developers help(especially Nick, Andrew and Guanghao), I've
> > successfully made the release scripts able to publish binaries and
> > maven artifacts for hadoop3, in a dry run mode,
> >
> > https://github.com/apache/hbase/pull/4856
> >
> > I've put up a discussion thread, for quickly releasing 2.5.2 for the
> > 2.5 release line, with hadoop3 binaries. Please shout if you have any
> > ideas.
> >
> > Thanks.
> >
> > 张铎(Duo Zhang)  于2022年10月24日周一 12:27写道:
> > >
> > > HBASE-27434 has been landed to branch-2.5+. Branch-2.4 does not have a
> > > flatten plugin so do not apply HBASE-27434 to it.
> > >
> > > Filed HBASE-27442 for changing the way of bumping versions in release
> scripts.
> > >
> > > After this change, let's finally go back to HBASE-27359 to make the
> > > release scripts publish different artifacts for hadoop2 and hadoop3.
> > >
> > > Thanks.
> > >
> > > Andrew Purtell  于2022年10月19日周三 23:36写道:
> > > >
> > > > Suggestions:
> > > >
> > > > - For HBase 2.x releases, we should continue to publish default
> builds,
> > > > those without any -hadoop3- or -widgetfoo- modifiers, against Hadoop
> 2.
> > > >
> > > > - For HBase 3, it makes sense to move the default to Hadoop 3, no
> other
> > > > build variants needed there. This is the kind of thing a major
> version
> > > > increment allows us to do per our dependency compatibility
> guidelines.
> > > >
> > > > - While eventually it may be necessary to differentiate between minor
> > > > release lines of Hadoop it would be simpler to pick one Hadoop 3
> version,
> > > > like 3.3.4, and build and publish a -hadoop3- artifact for each
> current
> > > > releasing 2.x code line: 2.4.15-hadoop3, 2.5.2-hadoop3,
> 2.6.0-hadoop3.
> > > >
> > > > - The process of building releases is automated by create-release,
> which
> > > > all RMs use now. create-release automates the process of building and
> > > > signing tarballs and publishing to Nexus. There should be no
> significant
> > > > new burden on the RM, beyond an increase in time for create-release
> > > > execution, to parameterize it and iterate over one or more variant
> builds.
> > > > That is a long way of suggesting we do publish variant tarballs too,
> they
> > > > are almost "for free" if we've gone to the trouble to build for
> publishing
> > > > to Nexus.
> > > >
> > > >
> > > > On Wed, Oct 19, 2022 at 12:52 AM 张铎(Duo Zhang) <
> palomino...@gmail.com>
> > > > wrote:
> > > >
> > > > > After some investigating, I think using the $revision placeholder
> can
> > > > > solve the problem here, i.e, using different command line to
> publish
> > > > > different artifacts for hadoop2 and hadoop3, with the same souce
> code.
> > > > > You can see the comment on HBASE-27359 for more details.
> > > > >
> > > > > Next I will open an issue to land the $revision change. And here, I
> > > > > think first we need to discuss how many new artifacts we want to
> > > > > publish. For example, for 2.6.0, we only want to publish a
> > > > > 2.6.0-hadoop3, with the default hadoop3 version? Or we publish
> > > > > 2.6.0-hadoop3.2, 2.6.0-hadoop3.3 for different hadoop minor release
> > > > > lines? And do we want to publish different tarballs for hadoop2 and
> > > > > hadoop3?
> > > > >
> > > > > Thanks.
> > > > >
> > > > > Andrew Purtell  于2022年8月31日周三 00:19写道:
> > > > > >
> > > > > > I also don't think we should change the defaults in branch-2
> until
> > > > > Hadoop 2
> > > > > > is EOLed.
> > > > > >
> > > > > > On Mon, Aug 29, 2022 at 10:22 AM Sean Busbey 
> wrote:
> > > > > >
> > > > > > > I think changing the default hadoop profile for builds in
> branch-2
> > > > > would
> > > > > > > unnecessarily complicate our compatibility messaging so long
> as Hadoop
> > > > > 2
> > > > > > > hasn't gone EOL.
> > > > > > >
> > > > > > > On Mon, Aug 29, 2022 at 5:30 AM Nick Dimiduk <
> ndimi...@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > > Should we also make hadoop3 the default active profile for
> branch-2
> > > > > going
> > > > > > > > forward?
> > > > > > > >
> > > 

[jira] [Created] (HBASE-27077) Synchronous API calls for Split, Merge, and Compaction operations for testing

2022-05-30 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-27077:
---

 Summary: Synchronous API calls for Split, Merge, and Compaction 
operations for testing
 Key: HBASE-27077
 URL: https://issues.apache.org/jira/browse/HBASE-27077
 Project: HBase
  Issue Type: Improvement
Reporter: Istvan Toth


While generally split, merge, and compaction operations are too slow for 
synchrounous calls, for many tests we do need to wait until these operations 
are finished to be able to check their results.
At least in the Phoenix tests, we also need to to do this while the 
EnvirenmentEdge clock is stopped.
The polling method Admin.getLastMajorCompactionTimestamp() the we used for 
compactions has stopped working with EnvironmentEdgeManager in 2.5, see 
HBASE-27058 for details. We've also had similar issues in the past, where new 
versions made the previous workaround for synchronous operations fail.

A longer-term solution for the problem would be having Synchronous API calls 
for testing, which block on the client side until the requested operation is 
finished.

These could be added as variants to Admin / AsyncAdmin, or could be somewhere 
else, it doesn't really matter, as these would not be well suited for 
production use anyway.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-27069) Hbase SecureBulkload permission regression

2022-05-29 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-27069:
---

 Summary: Hbase SecureBulkload permission regression
 Key: HBASE-27069
 URL: https://issues.apache.org/jira/browse/HBASE-27069
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.5.0, 3.0.0-alpha-3
Reporter: Istvan Toth
Assignee: Istvan Toth


HBASE-26707 has introduced a bug, where setting the permission of the bulk 
loaded HFileti 777 is made conditional.

However, as discussed in HBASE-15790, that permission is essential for HBase's 
correct operation.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-27058) Admin#getLastMajorCompactionTimestamp() doesn't get updated when the EnvironmentEdgeManager clock is stopped

2022-05-23 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-27058:
---

 Summary: Admin#getLastMajorCompactionTimestamp() doesn't get 
updated when the EnvironmentEdgeManager clock is stopped
 Key: HBASE-27058
 URL: https://issues.apache.org/jira/browse/HBASE-27058
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Istvan Toth


In Hbase 2.0-2.4 it is possible to check for a finished compaction by polling 

Admin.getLastMajorCompactionTimestamp() for the table under compaction, even 
when the clock is stopped via EnvironmentEdgeManager.

However, in Hbase 2.5 the Admin.getLastMajorCompactionTimestamp() will not be 
updated even after the compaction is finished, and getCompactionState() returns 
NONE.

I am not even sure that this is bug, however, this has broken one of our 
Phoenix tests, and may cause problems for others.

This is the test code that breaks:
[https://github.com/apache/phoenix/blob/8aa825ed88828a99d40fdb68eb2f930981cd8a6b/phoenix-core/src/test/java/org/apache/phoenix/util/TestUtil.java#L818]

Admin.getLastMajorCompactionTimestamp() seems to take the value from the 
Metrics, so I guess that the metrics no longer get updated somewhere when the 
clock is stopped.

I did not dig deeper than that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HBASE-26991) Phoenix PartialIndexRebuilderIT regression on 2.4.11

2022-05-02 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-26991.
-
Resolution: Invalid

Probably a Phoenix bug.
Will reopen if it turns out to be on the HBase side.

> Phoenix PartialIndexRebuilderIT regression on 2.4.11
> 
>
> Key: HBASE-26991
> URL: https://issues.apache.org/jira/browse/HBASE-26991
> Project: HBase
>  Issue Type: Bug
>  Components: phoenix
>Affects Versions: 2.4.11
>    Reporter: Istvan Toth
>Priority: Major
>
> We have noticed that using HBase 2.4.11 reliably breaks
> _PartialIndexRebuilderIT.testConcurrentUpsertsWithRebuild_
> and makes _ConcurrentMutationsExtendedIT.testConcurrentUpserts_ very flakey.
> The same tests run file with earlier versions, including HBase 2.4.10.
> At best this is a behaviour change in an HBase minor version, at worst this 
> is a plain regression.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-26991) Phoenix PartialIndexRebuilderIT regression on 2.4.11

2022-04-29 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-26991:
---

 Summary: Phoenix PartialIndexRebuilderIT regression on 2.4.11
 Key: HBASE-26991
 URL: https://issues.apache.org/jira/browse/HBASE-26991
 Project: HBase
  Issue Type: Bug
  Components: phoenix
Affects Versions: 2.4.11
Reporter: Istvan Toth


We have noticed that using HBase 2.4.11 reliably breaks
_PartialIndexRebuilderIT.testConcurrentUpsertsWithRebuild_
and makes _ConcurrentMutationsExtendedIT.testConcurrentUpserts_ very flakey.

The same tests run file with earlier versions, including HBase 2.4.10.

At best this is a behaviour change in an HBase minor version, at worst this is 
a plain regression.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-26935) Update httpcommons to version 5.1

2022-04-06 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-26935:
---

 Summary: Update httpcommons to version 5.1
 Key: HBASE-26935
 URL: https://issues.apache.org/jira/browse/HBASE-26935
 Project: HBase
  Issue Type: Improvement
Reporter: Istvan Toth


HTTPCommons 5 is major rewrite.
One of the main improvements is that it uses slf4j for logging, instead of 
log4j.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26896) list_quota_snapshots fails with ‘ERROR NameError: uninitialized constant Shell::Commands::ListQuotaSnapshots::TABLE’

2022-03-28 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-26896:
---

 Summary: list_quota_snapshots fails with ‘ERROR NameError: 
uninitialized constant Shell::Commands::ListQuotaSnapshots::TABLE’
 Key: HBASE-26896
 URL: https://issues.apache.org/jira/browse/HBASE-26896
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 2.4.11, 2.5.0, 3.0.0-alpha-3
Reporter: Istvan Toth
Assignee: Istvan Toth


The list_quota_snapshots command fails with 
{noformat}
ERROR NameError: uninitialized constant 
Shell::Commands::ListQuotaSnapshots::TABLE{noformat}
regardless of the parameters it's called with.

Apparently, it used to work in HBase 2.2.

I don't know enough about Ruby to tell why it used to work, and what broke this 
exactly,  but using qualified constants fixes the problem.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26778) Replace HRegion.get() calls

2022-02-28 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-26778:
---

 Summary: Replace HRegion.get() calls
 Key: HBASE-26778
 URL: https://issues.apache.org/jira/browse/HBASE-26778
 Project: HBase
  Issue Type: Bug
Reporter: Istvan Toth


HBASE-26036 made a change where Region.get() always clones Off-Heap cells 
before returning them.


In HBase all non-test occurances of this code were changed to create their own 
RegionScanner to avoid copying the off-heap cells. We should do the same.

This would also let Phoenix run correctly with HBase 2.4.5+ , as the fix in 
HBASE-26036 seems to introduce another bug, HBASE-26777, which causes failures 
in Phoenix.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26777) BufferedDataBlockEncoder$OffheapDecodedExtendedCell.deepClone throws UnsupportedOperationException

2022-02-28 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-26777:
---

 Summary: 
BufferedDataBlockEncoder$OffheapDecodedExtendedCell.deepClone throws 
UnsupportedOperationException
 Key: HBASE-26777
 URL: https://issues.apache.org/jira/browse/HBASE-26777
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 2.4.10
Reporter: Istvan Toth


BufferedDataBlockEncoder$OffheapDecodedExtendedCell.deepClone throws an 
unsupportedException.

However, org.apache.hadoop.hbase.regionserver.HRegion.get(Get, boolean, long, 
long)

calls the method:
{code:java}
      // Copy EC to heap, then close the scanner.
      // This can be an EXPENSIVE call. It may make an extra copy from offheap 
to onheap buffers.
      // See more details in HBASE-26036.
  for (Cell cell : tmp) {
        results.add(cell instanceof ByteBufferExtendedCell ?
          ((ByteBufferExtendedCell) cell).deepClone(): cell);
      } {code}

According to the comment above, this is probably caused by HBASE-26036.



 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26652) Unintuitive AccesControlClient.getUserPermissions() semantics

2022-01-06 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-26652:
---

 Summary: Unintuitive AccesControlClient.getUserPermissions() 
semantics
 Key: HBASE-26652
 URL: https://issues.apache.org/jira/browse/HBASE-26652
 Project: HBase
  Issue Type: Improvement
  Components: acl
Reporter: Istvan Toth


The behaviour of the AccesControlClient.getUserPermissions() calls is 
unintuitive.

It takes a tablename regex, and return the union of all permissinons on all 
tables that it matches.
While the returned UserPermission objects do have the information on the object 
they apply to, this still requires post-processing the results.

To get the permissions for a single table, one has to either do something like

Admin.getTablePermission(conn, 
"^"+tableName.getNameWithNamespaceInclAsString()+"$")

or post-process the results of the call.

We should add some methods that return the permission for a single table / 
family / qualifier without making the caller jump though hoops, or at least 
call out the non-intuitive behavior in the Javadoc, and advise on how to use 
the API to get the results the caller likely wants.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


  1   2   >