[jira] [Updated] (HBASE-28613) Use streaming when marshalling protobuf REST output

2024-05-28 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28613:

Fix Version/s: 2.7.0
   3.0.0-beta-2
   2.6.1
   2.5.9
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to all active branches.

> Use streaming when marshalling protobuf REST output
> ---
>
> Key: HBASE-28613
> URL: https://issues.apache.org/jira/browse/HBASE-28613
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> We are currently marshalling protobuf into a byte array, and then send that 
> to the client.
> This is both slow and memory intensive.
> I see a ~25% reduction in the REST server CPU usage for my benchmark with 
> this patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28622) FilterListWithAND can swallow SEEK_NEXT_USING_HINT

2024-05-28 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849925#comment-17849925
 ] 

Istvan Toth commented on HBASE-28622:
-

Opened a [DISCUSS] thread on the topic.

> FilterListWithAND can swallow SEEK_NEXT_USING_HINT
> --
>
> Key: HBASE-28622
> URL: https://issues.apache.org/jira/browse/HBASE-28622
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> org.apache.hadoop.hbase.filter.FilterListWithAND.filterRowKey(Cell) will 
> return true if ANY of the filters returns true for Filter#filterRowKey().
> However, the SEEK_NEXT_USING_HINT mechanism relies on filterRowKey() 
> returning false, so that filterCell() can return SEEK_NEXT_USING_HINT.
> If none of the filters matches, but one of them returns true for 
> filterRowKey(), then the  filter(s) that returned to false, so that they can 
> return SEEK_NEXT_USING_HINT in filterCell() never get a chance to return 
> SEEK_NEXT_USING_HINT, and instead of seeking, FilterListWithAND will do very 
> slow full scan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28621) PrefixFilter should use SEEK_NEXT_USING_HINT

2024-05-28 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28621:

Description: 
Looking at PrefixFilter, I have noticed that it doesn't use the 
SEEK_NEXT_USING_HINT mechanism.

AFAICT, we could safely set the the prefix as a next row hint, which could be a 
huge performance win.

Of course, ideally the user would set the scan startRow to the prefix, which 
avoids the problem, but the user may forget to do that, or may use the filter 
in a filterList that doesn't allow for setting the start/stop rows close tho 
the prefix.

  was:
Looking at PrefixFilter, I have noticed that it doesn't use the 
SEEK_NEXT_USING_HINT mechanism.

AFAICT, we could safely set the the prefix as a next row hint, which could be a 
huge performance win.

Of course, ideally the user would set the scan startRow to the prefix, which 
avoids the problem, if the user doesn't, then we effectively do a full scan 
until the prefix is reached.


> PrefixFilter should use SEEK_NEXT_USING_HINT 
> -
>
> Key: HBASE-28621
> URL: https://issues.apache.org/jira/browse/HBASE-28621
> Project: HBase
>  Issue Type: Improvement
>  Components: Filters
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: beginner, beginner-friendly
>
> Looking at PrefixFilter, I have noticed that it doesn't use the 
> SEEK_NEXT_USING_HINT mechanism.
> AFAICT, we could safely set the the prefix as a next row hint, which could be 
> a huge performance win.
> Of course, ideally the user would set the scan startRow to the prefix, which 
> avoids the problem, but the user may forget to do that, or may use the filter 
> in a filterList that doesn't allow for setting the start/stop rows close tho 
> the prefix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28621) PrefixFilter should use SEEK_NEXT_USING_HINT

2024-05-28 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28621:

Labels: beginner beginner-friendly  (was: )

> PrefixFilter should use SEEK_NEXT_USING_HINT 
> -
>
> Key: HBASE-28621
> URL: https://issues.apache.org/jira/browse/HBASE-28621
> Project: HBase
>  Issue Type: Improvement
>  Components: Filters
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: beginner, beginner-friendly
>
> Looking at PrefixFilter, I have noticed that it doesn't use the 
> SEEK_NEXT_USING_HINT mechanism.
> AFAICT, we could safely set the the prefix as a next row hint, which could be 
> a huge performance win.
> Of course, ideally the user would set the scan startRow to the prefix, which 
> avoids the problem, if the user doesn't, then we effectively do a full scan 
> until the prefix is reached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28622) FilterListWithAND can swallow SEEK_NEXT_USING_HINT

2024-05-27 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28622:
---

 Summary: FilterListWithAND can swallow SEEK_NEXT_USING_HINT
 Key: HBASE-28622
 URL: https://issues.apache.org/jira/browse/HBASE-28622
 Project: HBase
  Issue Type: Bug
  Components: Filters
Reporter: Istvan Toth
Assignee: Istvan Toth


org.apache.hadoop.hbase.filter.FilterListWithAND.filterRowKey(Cell) will return 
true if ANY of the filters returns true for Filter#filterRowKey().

However, the SEEK_NEXT_USING_HINT mechanism relies on filterRowKey() returning 
false, so that filterCell() can return SEEK_NEXT_USING_HINT.

If none of the filters matches, but one of them returns true for 
filterRowKey(), then the  filter(s) that returned to false, so that they can 
return SEEK_NEXT_USING_HINT in filterCell() never get a chance to return 
SEEK_NEXT_USING_HINT, and instead of seeking, FilterListWithAND will do very 
slow full scan.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28621) PrefixFilter should use SEEK_NEXT_USING_HINT

2024-05-27 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28621:
---

 Summary: PrefixFilter should use SEEK_NEXT_USING_HINT 
 Key: HBASE-28621
 URL: https://issues.apache.org/jira/browse/HBASE-28621
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: Istvan Toth
Assignee: Istvan Toth


Looking at PrefixFilter, I have noticed that it doesn't use the 
SEEK_NEXT_USING_HINT mechanism.

AFAICT, we could safely set the the prefix as a next row hint, which could be a 
huge performance win.

Of course, ideally the user would set the scan startRow to the prefix, which 
avoids the problem, if the user doesn't, then we effectively do a full scan 
until the prefix is reached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28613) Use streaming when marshalling protobuf REST output

2024-05-23 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28613:

Status: Patch Available  (was: Open)

> Use streaming when marshalling protobuf REST output
> ---
>
> Key: HBASE-28613
> URL: https://issues.apache.org/jira/browse/HBASE-28613
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>
> We are currently marshalling protobuf into a byte array, and then send that 
> to the client.
> This is both slow and memory intensive.
> I see a ~25% reduction in the REST server CPU usage for my benchmark with 
> this patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28613) Use streaming when marshalling protobuf REST output

2024-05-23 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848961#comment-17848961
 ] 

Istvan Toth commented on HBASE-28613:
-

By using CodedOutputStream directly, I was able to further improve performance 
by avoiding pre-computing the object size and tuning the buffer size in 
CodedOutputStream.

> Use streaming when marshalling protobuf REST output
> ---
>
> Key: HBASE-28613
> URL: https://issues.apache.org/jira/browse/HBASE-28613
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> We are currently marshalling protobuf into a byte array, and then send that 
> to the client.
> This is both slow and memory intensive.
> I see a ~25% reduction in the REST server CPU usage for my benchmark with 
> this patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28613) Use streaming when marshalling protobuf REST output

2024-05-23 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28613:

Description: 
We are currently marshalling protobuf into a byte array, and then send that to 
the client.
This is both slow and memory intensive.

I see a ~25% reduction in the REST server CPU usage for my benchmark with this 
patch.


  was:
We are currently marshalling protobuf into a byte array, and then send that to 
the client.
This is both slow and memory intensive.

I see a ~15% reduction in the REST server CPU usage for my benchmark with this 
patch.



> Use streaming when marshalling protobuf REST output
> ---
>
> Key: HBASE-28613
> URL: https://issues.apache.org/jira/browse/HBASE-28613
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> We are currently marshalling protobuf into a byte array, and then send that 
> to the client.
> This is both slow and memory intensive.
> I see a ~25% reduction in the REST server CPU usage for my benchmark with 
> this patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28613) Use streaming when marshalling protobuf REST output

2024-05-23 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848886#comment-17848886
 ] 

Istvan Toth commented on HBASE-28613:
-

Updated the description with the correct (less spectacular) benchmark results.

> Use streaming when marshalling protobuf REST output
> ---
>
> Key: HBASE-28613
> URL: https://issues.apache.org/jira/browse/HBASE-28613
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> We are currently marshalling protobuf into a byte array, and then send that 
> to the client.
> This is both slow and memory intensive.
> I see a ~15% reduction in the REST server CPU usage for my benchmark with 
> this patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28613) Use streaming when marshalling protobuf REST output

2024-05-23 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28613:

Description: 
We are currently marshalling protobuf into a byte array, and then send that to 
the client.
This is both slow and memory intensive.

I see a ~15% reduction in the REST server CPU usage for my benchmark with this 
patch.


  was:
We are currently marshalling protobuf into a byte array, and then send that to 
the client.
This is both slow and memory intensive.

Using streaming instead results in huge perf improvements. In my bechnmark, 
both the wall clock time was almost halved, while the REST server CPU usage was 
reduced by 40%.

wall clock: 120s ->65s
Total REST CPU: 300s -> 180s



> Use streaming when marshalling protobuf REST output
> ---
>
> Key: HBASE-28613
> URL: https://issues.apache.org/jira/browse/HBASE-28613
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> We are currently marshalling protobuf into a byte array, and then send that 
> to the client.
> This is both slow and memory intensive.
> I see a ~15% reduction in the REST server CPU usage for my benchmark with 
> this patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28613) Use streaming when marshalling protobuf REST output

2024-05-23 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848862#comment-17848862
 ] 

Istvan Toth commented on HBASE-28613:
-

Hmm, I have re-run the tests, and now I see similar results with the old code.
Will need to investigate further.

> Use streaming when marshalling protobuf REST output
> ---
>
> Key: HBASE-28613
> URL: https://issues.apache.org/jira/browse/HBASE-28613
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> We are currently marshalling protobuf into a byte array, and then send that 
> to the client.
> This is both slow and memory intensive.
> Using streaming instead results in huge perf improvements. In my bechnmark, 
> both the wall clock time was almost halved, while the REST server CPU usage 
> was reduced by 40%.
> wall clock: 120s ->65s
> Total REST CPU: 300s -> 180s



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28613) Use streaming when marshalling protobuf REST output

2024-05-23 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28613:
---

 Summary: Use streaming when marshalling protobuf REST output
 Key: HBASE-28613
 URL: https://issues.apache.org/jira/browse/HBASE-28613
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


We are currently marshalling protobuf into a byte array, and then send that to 
the client.
This is both slow and memory intensive.

Using streaming instead results in huge perf improvements. In my bechnmark, 
both the wall clock time was almost halved, while the REST server CPU usage was 
reduced by 40%.

wall clock: 120s ->65s
Total REST CPU: 300s -> 180s




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28501) Support non-SPNEGO authentication methods and implement session handling in REST java client library

2024-05-21 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848083#comment-17848083
 ] 

Istvan Toth commented on HBASE-28501:
-

Pushed the addendum to all active branches.

> Support non-SPNEGO authentication methods and implement session handling in 
> REST java client library
> 
>
> Key: HBASE-28501
> URL: https://issues.apache.org/jira/browse/HBASE-28501
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.4.18, 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> The current java client only supports the SPENGO authentication method.
> This does not support the case when an application proxy like Apache Knox 
> performs AAA conversion from BASIC/DIGEST to kerberos authentication.
> Add support for BASIC username/password auth the client.
> Generally, the authentication code in the client looks quite backwards, it 
> seems that most of the kerberos / auth cookie code duplicates HttpClient 
> functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
> letting it handle authentication by itself would be a better and more generic 
> solution.
> -Also add support for specifying a prefix for the URL path.-



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28501) Support non-SPNEGO authentication methods and implement session handling in REST java client library

2024-05-21 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28501.
-
Resolution: Fixed

> Support non-SPNEGO authentication methods and implement session handling in 
> REST java client library
> 
>
> Key: HBASE-28501
> URL: https://issues.apache.org/jira/browse/HBASE-28501
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.4.18, 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> The current java client only supports the SPENGO authentication method.
> This does not support the case when an application proxy like Apache Knox 
> performs AAA conversion from BASIC/DIGEST to kerberos authentication.
> Add support for BASIC username/password auth the client.
> Generally, the authentication code in the client looks quite backwards, it 
> seems that most of the kerberos / auth cookie code duplicates HttpClient 
> functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
> letting it handle authentication by itself would be a better and more generic 
> solution.
> -Also add support for specifying a prefix for the URL path.-



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28501) Support non-SPNEGO authentication methods and implement session handling in REST java client library

2024-05-20 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848044#comment-17848044
 ] 

Istvan Toth commented on HBASE-28501:
-

Thanks for catching this, [~zhangduo].
I have put up a PR with the fix : https://github.com/apache/hbase/pull/5928 

We should probably deprecate the old constructors, maybe I will put up a 
separate PR once all my
PerformanceEvaluation related work has been merged.


> Support non-SPNEGO authentication methods and implement session handling in 
> REST java client library
> 
>
> Key: HBASE-28501
> URL: https://issues.apache.org/jira/browse/HBASE-28501
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.4.18, 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> The current java client only supports the SPENGO authentication method.
> This does not support the case when an application proxy like Apache Knox 
> performs AAA conversion from BASIC/DIGEST to kerberos authentication.
> Add support for BASIC username/password auth the client.
> Generally, the authentication code in the client looks quite backwards, it 
> seems that most of the kerberos / auth cookie code duplicates HttpClient 
> functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
> letting it handle authentication by itself would be a better and more generic 
> solution.
> -Also add support for specifying a prefix for the URL path.-



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28597) Support native Cell format in REST server and client

2024-05-17 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847206#comment-17847206
 ] 

Istvan Toth commented on HBASE-28597:
-

The fastest solution would be simply taking the CellBlock ByteBuffers from the 
protobuf responses, and directly sending those out in the Http Body. This would 
require zero memory copying, or in fact any processing, just straight DMA.

The client would not have control over the encryption and Codec, but we could 
give those in headers, and the client needs to include the native hbase client 
library anyway for this to work.

Hbase is not set up to make this feasible now, and AFAICT this would need 
horrible reflection hacks and/or major additions to the HBase API, and I am not 
comfortable enough with the RPC internals to attempt this.

Once we are getting cells from the HBase API, the CellBlocks have already been 
decoded and copied to the HEAP, so much of the "damage" in memory and GC 
pressure is already done.

A way to mitgate that would be if we were able to use ByteBuffer backed cells 
on the client side, but the client API does not support that. At first glance, 
ByteBuffer backed cells seem to be only generated mostly when reading HFiles, 
and in the RPC write (Puts) path. This looks more feasible than copying raw 
CellBlocks, but it would still be a very large change.

The next step where we can perhaps save cycles and GC pressure is  encoding and 
sending the data via HTTP.
Even for the current protobuf implementation, if we could use ByteBuffer backed 
CodedOutputStream like the HBase RPC code does, and somehow get Jetty to send 
that ByteBuffer directly, then we may be able to save some overhead.





> Support native Cell format in REST server and client
> 
>
> Key: HBASE-28597
> URL: https://issues.apache.org/jira/browse/HBASE-28597
> Project: HBase
>  Issue Type: Wish
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>
> REST currently uses its own (outdated) CellSetModel format for transferring 
> cells.
> This is fine for XML and JSON, which are slow anyway and even slower handling 
> byte arrays, and is expected to be used in cases where a simple  client code 
> which does not depend on the hbase java libraries is more important than raw 
> performance.
> However, we perform the same marshalling and unmarshalling when we are using 
> protobuf, which doesn't really add value, but eats up resources.
> We could add a new encoding for Results which uses the native cell format, by 
> simply dumping the binary cell bytestreams into the REST response body.
> This should save a lot of resources on the server side, and would be either 
> faster, or the same speed on the client.
> As an additional advantage, the resulting Cells would be of native HBase Cell 
> type instead of the REST Cell type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28597) Support native Cell format in REST server and client

2024-05-16 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847154#comment-17847154
 ] 

Istvan Toth commented on HBASE-28597:
-

I poked around in the code a bit.

Looks like org.apache.hadoop.hbase.ipc.CellBlockBuilder does everything we 
want, including optional compression.

The intrinsic compression of storing the rowKey separately is lost, but if 
that's good enough for the native RPC protocol, it should be good enough for 
REST.



> Support native Cell format in REST server and client
> 
>
> Key: HBASE-28597
> URL: https://issues.apache.org/jira/browse/HBASE-28597
> Project: HBase
>  Issue Type: Wish
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>
> REST currently uses its own (outdated) CellSetModel format for transferring 
> cells.
> This is fine for XML and JSON, which are slow anyway and even slower handling 
> byte arrays, and is expected to be used in cases where a simple  client code 
> which does not depend on the hbase java libraries is more important than raw 
> performance.
> However, we perform the same marshalling and unmarshalling when we are using 
> protobuf, which doesn't really add value, but eats up resources.
> We could add a new encoding for Results which uses the native cell format, by 
> simply dumping the binary cell bytestreams into the REST response body.
> This should save a lot of resources on the server side, and would be either 
> faster, or the same speed on the client.
> As an additional advantage, the resulting Cells would be of native HBase Cell 
> type instead of the REST Cell type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28561) Add separate fields for column family and qualifier in REST message formats

2024-05-16 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28561:

Issue Type: Wish  (was: Improvement)

> Add separate fields for column family and qualifier in REST message formats
> ---
>
> Key: HBASE-28561
> URL: https://issues.apache.org/jira/browse/HBASE-28561
> Project: HBase
>  Issue Type: Wish
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> The current format uses the archaic column field, which requires extra 
> processing and copying to encode/decode the CF and CQ at both the server and 
> client side.
> We need to:
> - Add a version field to the requests, to be enabled by clients that support 
> the new format
> - Add the new fields to the JSON, XML and protobuf formats, and logic to use 
> them.
> This should be doable in a backwards-compatible manner, with the server 
> falling back to the old format if it receives an unversioned request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (HBASE-28553) SSLContext not used for Kerberos auth negotiation in rest client

2024-05-16 Thread Istvan Toth (Jira)


[ https://issues.apache.org/jira/browse/HBASE-28553 ]


Istvan Toth deleted comment on HBASE-28553:
-

was (Author: stoty):
The fix is part of HBASE-28501 .

> SSLContext not used for Kerberos auth negotiation in rest client
> 
>
> Key: HBASE-28553
> URL: https://issues.apache.org/jira/browse/HBASE-28553
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> The included REST client now supports specifying a Trust store for SSL 
> connections.
> However, the configured SSL library is not used when the Kerberos negotation 
> is performed by the Hadoop library, which uses its own client.
> We need to set up the Hadoop auth process to use the same SSLContext.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28553) SSLContext not used for Kerberos auth negotiation in rest client

2024-05-16 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846878#comment-17846878
 ] 

Istvan Toth commented on HBASE-28553:
-

The fix is part of HBASE-28501 .

> SSLContext not used for Kerberos auth negotiation in rest client
> 
>
> Key: HBASE-28553
> URL: https://issues.apache.org/jira/browse/HBASE-28553
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> The included REST client now supports specifying a Trust store for SSL 
> connections.
> However, the configured SSL library is not used when the Kerberos negotation 
> is performed by the Hadoop library, which uses its own client.
> We need to set up the Hadoop auth process to use the same SSLContext.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28553) SSLContext not used for Kerberos auth negotiation in rest client

2024-05-16 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28553.
-
Resolution: Duplicate

Fix included in HBASE-28501

> SSLContext not used for Kerberos auth negotiation in rest client
> 
>
> Key: HBASE-28553
> URL: https://issues.apache.org/jira/browse/HBASE-28553
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> The included REST client now supports specifying a Trust store for SSL 
> connections.
> However, the configured SSL library is not used when the Kerberos negotation 
> is performed by the Hadoop library, which uses its own client.
> We need to set up the Hadoop auth process to use the same SSLContext.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28501) Support non-SPNEGO authentication methods and implement session handling in REST java client library

2024-05-16 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28501:

Fix Version/s: 2.4.18
   2.7.0
   3.0.0-beta-2
   2.6.1
   2.5.9
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed to all active branches.

> Support non-SPNEGO authentication methods and implement session handling in 
> REST java client library
> 
>
> Key: HBASE-28501
> URL: https://issues.apache.org/jira/browse/HBASE-28501
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.4.18, 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> The current java client only supports the SPENGO authentication method.
> This does not support the case when an application proxy like Apache Knox 
> performs AAA conversion from BASIC/DIGEST to kerberos authentication.
> Add support for BASIC username/password auth the client.
> Generally, the authentication code in the client looks quite backwards, it 
> seems that most of the kerberos / auth cookie code duplicates HttpClient 
> functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
> letting it handle authentication by itself would be a better and more generic 
> solution.
> -Also add support for specifying a prefix for the URL path.-



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28597) Support native Cell format in REST server and client

2024-05-16 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28597:

Summary: Support native Cell format in REST server and client  (was: 
Support native Cell format for protobuf in REST server and client)

> Support native Cell format in REST server and client
> 
>
> Key: HBASE-28597
> URL: https://issues.apache.org/jira/browse/HBASE-28597
> Project: HBase
>  Issue Type: Wish
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>
> REST currently uses its own (outdated) CellSetModel format for transferring 
> cells.
> This is fine for XML and JSON, which are slow anyway and even slower handling 
> byte arrays, and is expected to be used in cases where a simple  client code 
> which does not depend on the hbase java libraries is more important than raw 
> performance.
> However, we perform the same marshalling and unmarshalling when we are using 
> protobuf, which doesn't really add value, but eats up resources.
> We could add a new encoding for Results which uses the native cell format, by 
> simply dumping the binary cell bytestreams into the REST response body.
> This should save a lot of resources on the server side, and would be either 
> faster, or the same speed on the client.
> As an additional advantage, the resulting Cells would be of native HBase Cell 
> type instead of the REST Cell type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28597) Support native Cell format for protobuf in REST server and client

2024-05-16 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28597:

Description: 
REST currently uses its own (outdated) CellSetModel format for transferring 
cells.

This is fine for XML and JSON, which are slow anyway and even slower handling 
byte arrays, and is expected to be used in cases where a simple  client code 
which does not depend on the hbase java libraries is more important than raw 
performance.

However, we perform the same marshalling and unmarshalling when we are using 
protobuf, which doesn't really add value, but eats up resources.

We could add a new encoding for Results which uses the native cell format, by 
simply dumping the binary cell bytestreams into the REST response body.

This should save a lot of resources on the server side, and would be either 
faster, or the same speed on the client.

As an additional advantage, the resulting Cells would be of native HBase Cell 
type instead of the REST Cell type.



  was:
REST currently uses its own (outdated) CellSetModel format for transferring 
cells.

This is fine for XML and JSON, which are slow anyway and even slower handling 
byte arrays, and is expected to be used in cases where a simple  client code 
which does not depend on the hbase java libraries is more important than raw 
performance.

However, we perform the same marshalling and unmarshalling when we are using 
protobuf, which doesn't really add value, but eats up resources.

We could add a new encoding for Results which uses the native cell format in 
protobuf, by simply dumping the binary cell bytestreams into the REST response 
body.

This should save a lot of resources on the server side, and would be either 
faster, or the same speed on the client.

As an additional advantage, the resulting Cells would be of native HBase Cell 
type instead of the REST Cell type.




> Support native Cell format for protobuf in REST server and client
> -
>
> Key: HBASE-28597
> URL: https://issues.apache.org/jira/browse/HBASE-28597
> Project: HBase
>  Issue Type: Wish
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>
> REST currently uses its own (outdated) CellSetModel format for transferring 
> cells.
> This is fine for XML and JSON, which are slow anyway and even slower handling 
> byte arrays, and is expected to be used in cases where a simple  client code 
> which does not depend on the hbase java libraries is more important than raw 
> performance.
> However, we perform the same marshalling and unmarshalling when we are using 
> protobuf, which doesn't really add value, but eats up resources.
> We could add a new encoding for Results which uses the native cell format, by 
> simply dumping the binary cell bytestreams into the REST response body.
> This should save a lot of resources on the server side, and would be either 
> faster, or the same speed on the client.
> As an additional advantage, the resulting Cells would be of native HBase Cell 
> type instead of the REST Cell type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28597) Support native Cell format for protobuf in REST server and client

2024-05-16 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28597:
---

 Summary: Support native Cell format for protobuf in REST server 
and client
 Key: HBASE-28597
 URL: https://issues.apache.org/jira/browse/HBASE-28597
 Project: HBase
  Issue Type: Wish
  Components: REST
Reporter: Istvan Toth


REST currently uses its own (outdated) CellSetModel format for transferring 
cells.

This is fine for XML and JSON, which are slow anyway and even slower handling 
byte arrays, and is expected to be used in cases where a simple  client code 
which does not depend on the hbase java libraries is more important than raw 
performance.

However, we perform the same marshalling and unmarshalling when we are using 
protobuf, which doesn't really add value, but eats up resources.

We could add a new encoding for Results which uses the native cell format in 
protobuf, by simply dumping the binary cell bytestreams into the REST response 
body.

This should save a lot of resources on the server side, and would be either 
faster, or the same speed on the client.

As an additional advantage, the resulting Cells would be of native HBase Cell 
type instead of the REST Cell type.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28586) Backport HBASE-24791 to branch-2.6

2024-05-10 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845218#comment-17845218
 ] 

Istvan Toth commented on HBASE-28586:
-

If backporting to branch-2.6, it must also be backported to branch-2 as well.
Also consider backporting to 2.5/2.4 if relevant.

> Backport HBASE-24791 to branch-2.6
> --
>
> Key: HBASE-28586
> URL: https://issues.apache.org/jira/browse/HBASE-28586
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.6.0
>Reporter: Szucs Villo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28553) SSLContext not used for Kerberos auth negotiation in rest client

2024-05-08 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844853#comment-17844853
 ] 

Istvan Toth commented on HBASE-28553:
-

The fix for this is part of HBASE-28501, I will not create a separate patch.

> SSLContext not used for Kerberos auth negotiation in rest client
> 
>
> Key: HBASE-28553
> URL: https://issues.apache.org/jira/browse/HBASE-28553
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> The included REST client now supports specifying a Trust store for SSL 
> connections.
> However, the configured SSL library is not used when the Kerberos negotation 
> is performed by the Hadoop library, which uses its own client.
> We need to set up the Hadoop auth process to use the same SSLContext.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28501) Support non-SPNEGO authentication methods and implement session handling in REST java client library

2024-05-08 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28501:

Assignee: Istvan Toth
  Status: Patch Available  (was: Open)

> Support non-SPNEGO authentication methods and implement session handling in 
> REST java client library
> 
>
> Key: HBASE-28501
> URL: https://issues.apache.org/jira/browse/HBASE-28501
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>
> The current java client only supports the SPENGO authentication method.
> This does not support the case when an application proxy like Apache Knox 
> performs AAA conversion from BASIC/DIGEST to kerberos authentication.
> Add support for BASIC username/password auth the client.
> Generally, the authentication code in the client looks quite backwards, it 
> seems that most of the kerberos / auth cookie code duplicates HttpClient 
> functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
> letting it handle authentication by itself would be a better and more generic 
> solution.
> -Also add support for specifying a prefix for the URL path.-



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28501) Support non-SPNEGO authentication methods and implement session handling in REST java client library

2024-05-08 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28501:

Summary: Support non-SPNEGO authentication methods and implement session 
handling in REST java client library  (was: Support non-SPNEGO authentication 
methods in REST java client library)

> Support non-SPNEGO authentication methods and implement session handling in 
> REST java client library
> 
>
> Key: HBASE-28501
> URL: https://issues.apache.org/jira/browse/HBASE-28501
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>
> The current java client only supports the SPENGO authentication method.
> This does not support the case when an application proxy like Apache Knox 
> performs AAA conversion from BASIC/DIGEST to kerberos authentication.
> Add support for BASIC username/password auth the client.
> Generally, the authentication code in the client looks quite backwards, it 
> seems that most of the kerberos / auth cookie code duplicates HttpClient 
> functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
> letting it handle authentication by itself would be a better and more generic 
> solution.
> -Also add support for specifying a prefix for the URL path.-



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28561) Add separate fields for column family and qualifier in REST message formats

2024-05-07 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28561:

Summary: Add separate fields for column family and qualifier in REST 
message formats  (was: Add separate fields for column family and qualifier in 
REST message format)

> Add separate fields for column family and qualifier in REST message formats
> ---
>
> Key: HBASE-28561
> URL: https://issues.apache.org/jira/browse/HBASE-28561
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> The current format uses the archaic column field, which requires extra 
> processing and copying to encode/decode the CF and CQ at both the server and 
> client side.
> We need to:
> - Add a version field to the requests, to be enabled by clients that support 
> the new format
> - Add the new fields to the JSON, XML and protobuf formats, and logic to use 
> them.
> This should be doable in a backwards-compatible manner, with the server 
> falling back to the old format if it receives an unversioned request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28556) Reduce memory copying in Rest server when serializing CellModel to Protobuf

2024-05-07 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28556.
-
Fix Version/s: 2.4.18
   3.0.0
   2.7.0
   2.6.1
   2.5.9
   Resolution: Fixed

Committed to all active branches.
Thanks for the review [~zhangduo].

> Reduce memory copying in Rest server when serializing CellModel to Protobuf
> ---
>
> Key: HBASE-28556
> URL: https://issues.apache.org/jira/browse/HBASE-28556
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.4.18, 3.0.0, 2.7.0, 2.6.1, 2.5.9
>
>
> The REST server does a lot of unneccessary coping, which could be avoided at 
> least for protobuf encoding.
> - -It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses 
> the client API, so it should never encounter ByteBuffer backed cells.-
> - It clones everything from the cells (sometimes multiple times) before 
> serializing to protbuf.
> We could mimic the structure in Cell, with array, offset and length for each 
> field, in CellModel and use the appropriate protobuf setters to avoid the 
> extra copies.
> There may or may not be a way to do the same for JSON and XML via jax-rs, I 
> don't know the frameworks well enough to tell, but if not, we could just do 
> the copying in the getters for them, which would not make things worse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-28556) Reduce memory copying in Rest server when serializing CellModel to Protobuf

2024-05-06 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842961#comment-17842961
 ] 

Istvan Toth edited comment on HBASE-28556 at 5/6/24 10:18 AM:
--

I mixed up ByteString and ByteBuffer. There is nothing wrong with using 
ByteStringer/UnsafeByteOperations.



was (Author: stoty):
I mixed up ByteString and ByteBuffer. There is nothing with using 
ByteStringer/UnsafeByteOperations.


> Reduce memory copying in Rest server when serializing CellModel to Protobuf
> ---
>
> Key: HBASE-28556
> URL: https://issues.apache.org/jira/browse/HBASE-28556
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Minor
>  Labels: pull-request-available
>
> The REST server does a lot of unneccessary coping, which could be avoided at 
> least for protobuf encoding.
> - -It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses 
> the client API, so it should never encounter ByteBuffer backed cells.-
> - It clones everything from the cells (sometimes multiple times) before 
> serializing to protbuf.
> We could mimic the structure in Cell, with array, offset and length for each 
> field, in CellModel and use the appropriate protobuf setters to avoid the 
> extra copies.
> There may or may not be a way to do the same for JSON and XML via jax-rs, I 
> don't know the frameworks well enough to tell, but if not, we could just do 
> the copying in the getters for them, which would not make things worse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-28556) Reduce memory copying in Rest server when serializing CellModel to Protobuf

2024-05-06 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843699#comment-17843699
 ] 

Istvan Toth edited comment on HBASE-28556 at 5/6/24 10:16 AM:
--

I have run somewhat better tests with my fixed PerformanceEvaluationTool 
(patches not yet published):

{noformat}

hbase -Xmx2g org.apache.hadoop.hbase.rest.PerformanceEvaluation 
--host=ccycloud-1.stoty.root.comops.site:20550 ---enableSsl=false --rows=250 
--api=rest_remote --nomapred=true scanRange1 20 

- JDK17, 4GB heap , patch: ~120s wall clock,  303s REST CPU usage
- JDK17, 1GB heap, patch: ~130s wall clock,  340s REST CPU usage
- JDK17, 4GB heap, NOpatch: ~120 wall clock, 360 REST CPU usage
- JDK17, 1GB heap, NOpatch: ~150 wall clock,  405s REST CPU usage
{noformat}

So the actual CPU usage improvement is more like ~20%.


was (Author: stoty):
I have run somewhat better tests with my fixed PerformanceEvaluationTool 
(patches not yet published):

{noformat}

hbase -Xmx2g org.apache.hadoop.hbase.rest.PerformanceEvaluation 
--host=ccycloud-1.stoty.root.comops.site:20550 ---enableSsl=false --rows=250 
--api=rest_remote --nomapred=true scanRange1 20 

- JDK17, 4GB heap , patch: ~120s wall clock,  303s REST CPU usage
- JDK17, 1GB heap, patch: ~130s wall clock,  340s REST CPU usage
- JDK17, 4GB heap, NOpatch: ~120 wall clock, 360 REST CPU usage
- JDK17, 1GB heap, NOpatch: ~150 wall clock,  405s REST CPU usage
{noformat}


> Reduce memory copying in Rest server when serializing CellModel to Protobuf
> ---
>
> Key: HBASE-28556
> URL: https://issues.apache.org/jira/browse/HBASE-28556
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Minor
>  Labels: pull-request-available
>
> The REST server does a lot of unneccessary coping, which could be avoided at 
> least for protobuf encoding.
> - -It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses 
> the client API, so it should never encounter ByteBuffer backed cells.-
> - It clones everything from the cells (sometimes multiple times) before 
> serializing to protbuf.
> We could mimic the structure in Cell, with array, offset and length for each 
> field, in CellModel and use the appropriate protobuf setters to avoid the 
> extra copies.
> There may or may not be a way to do the same for JSON and XML via jax-rs, I 
> don't know the frameworks well enough to tell, but if not, we could just do 
> the copying in the getters for them, which would not make things worse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28556) Reduce memory copying in Rest server when serializing CellModel to Protobuf

2024-05-06 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843699#comment-17843699
 ] 

Istvan Toth commented on HBASE-28556:
-

I have run somewhat better tests with my fixed PerformanceEvaluationTool 
(patches not yet published):

{noformat}

hbase -Xmx2g org.apache.hadoop.hbase.rest.PerformanceEvaluation 
--host=ccycloud-1.stoty.root.comops.site:20550 ---enableSsl=false --rows=250 
--api=rest_remote --nomapred=true scanRange1 20 

- JDK17, 4GB heap , patch: ~120s wall clock,  303s REST CPU usage
- JDK17, 1GB heap, patch: ~130s wall clock,  340s REST CPU usage
- JDK17, 4GB heap, NOpatch: ~120 wall clock, 360 REST CPU usage
- JDK17, 1GB heap, NOpatch: ~150 wall clock,  405s REST CPU usage
{noformat}


> Reduce memory copying in Rest server when serializing CellModel to Protobuf
> ---
>
> Key: HBASE-28556
> URL: https://issues.apache.org/jira/browse/HBASE-28556
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Minor
>  Labels: pull-request-available
>
> The REST server does a lot of unneccessary coping, which could be avoided at 
> least for protobuf encoding.
> - -It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses 
> the client API, so it should never encounter ByteBuffer backed cells.-
> - It clones everything from the cells (sometimes multiple times) before 
> serializing to protbuf.
> We could mimic the structure in Cell, with array, offset and length for each 
> field, in CellModel and use the appropriate protobuf setters to avoid the 
> extra copies.
> There may or may not be a way to do the same for JSON and XML via jax-rs, I 
> don't know the frameworks well enough to tell, but if not, we could just do 
> the copying in the getters for them, which would not make things worse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28563) Closing ZooKeeper in ZKMainServer

2024-05-05 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843627#comment-17843627
 ] 

Istvan Toth commented on HBASE-28563:
-

Background:
The daemon bug is fixed in ZK 3.9+, but ZK has decided not to fix it in 3.8.

> Closing ZooKeeper in ZKMainServer
> -
>
> Key: HBASE-28563
> URL: https://issues.apache.org/jira/browse/HBASE-28563
> Project: HBase
>  Issue Type: Improvement
>Reporter: Minwoo Kang
>Priority: Minor
>  Labels: pull-request-available
>
> Users can switch the Zookeeper client/server communication framework to Netty.
> ZKMainServer process fails to terminate due to when users utilize Netty for 
> ZooKeeper connections.
> Netty threads identified as non-Daemon threads.
> Enforce the calling of close() on ZooKeeper before ZKMainServer termination.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28556) Reduce memory copying in Rest server when serializing CellModel to Protobuf

2024-05-03 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843198#comment-17843198
 ] 

Istvan Toth commented on HBASE-28556:
-

It's not as significant as I had hoped, but I do see a 5-10% performance 
improvement in benchmarks with large resultsets when using protobuf encoding 
with this patch.

> Reduce memory copying in Rest server when serializing CellModel to Protobuf
> ---
>
> Key: HBASE-28556
> URL: https://issues.apache.org/jira/browse/HBASE-28556
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>
> The REST server does a lot of unneccessary coping, which could be avoided at 
> least for protobuf encoding.
> - -It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses 
> the client API, so it should never encounter ByteBuffer backed cells.-
> - It clones everything from the cells (sometimes multiple times) before 
> serializing to protbuf.
> We could mimic the structure in Cell, with array, offset and length for each 
> field, in CellModel and use the appropriate protobuf setters to avoid the 
> extra copies.
> There may or may not be a way to do the same for JSON and XML via jax-rs, I 
> don't know the frameworks well enough to tell, but if not, we could just do 
> the copying in the getters for them, which would not make things worse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28556) Reduce memory copying in Rest server when serializing CellModel to Protobuf

2024-05-03 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28556:

Priority: Minor  (was: Major)

> Reduce memory copying in Rest server when serializing CellModel to Protobuf
> ---
>
> Key: HBASE-28556
> URL: https://issues.apache.org/jira/browse/HBASE-28556
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Minor
>  Labels: pull-request-available
>
> The REST server does a lot of unneccessary coping, which could be avoided at 
> least for protobuf encoding.
> - -It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses 
> the client API, so it should never encounter ByteBuffer backed cells.-
> - It clones everything from the cells (sometimes multiple times) before 
> serializing to protbuf.
> We could mimic the structure in Cell, with array, offset and length for each 
> field, in CellModel and use the appropriate protobuf setters to avoid the 
> extra copies.
> There may or may not be a way to do the same for JSON and XML via jax-rs, I 
> don't know the frameworks well enough to tell, but if not, we could just do 
> the copying in the getters for them, which would not make things worse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28556) Reduce memory copying in Rest server when serializing CellModel to Protobuf

2024-05-02 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842961#comment-17842961
 ] 

Istvan Toth commented on HBASE-28556:
-

I mixed up ByteString and ByteBuffer. There is nothing with using 
ByteStringer/UnsafeByteOperations.


> Reduce memory copying in Rest server when serializing CellModel to Protobuf
> ---
>
> Key: HBASE-28556
> URL: https://issues.apache.org/jira/browse/HBASE-28556
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> The REST server does a lot of unneccessary coping, which could be avoided at 
> least for protobuf encoding.
> - -It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses 
> the client API, so it should never encounter ByteBuffer backed cells.-
> - It clones everything from the cells (sometimes multiple times) before 
> serializing to protbuf.
> We could mimic the structure in Cell, with array, offset and length for each 
> field, in CellModel and use the appropriate protobuf setters to avoid the 
> extra copies.
> There may or may not be a way to do the same for JSON and XML via jax-rs, I 
> don't know the frameworks well enough to tell, but if not, we could just do 
> the copying in the getters for them, which would not make things worse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28556) Reduce memory copying in Rest server when serializing CellModel to Protobuf

2024-05-02 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28556:

Description: 
The REST server does a lot of unneccessary coping, which could be avoided at 
least for protobuf encoding.

- -It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses the 
client API, so it should never encounter ByteBuffer backed cells.-
- It clones everything from the cells (sometimes multiple times) before 
serializing to protbuf.

We could mimic the structure in Cell, with array, offset and length for each 
field, in CellModel and use the appropriate protobuf setters to avoid the extra 
copies.

There may or may not be a way to do the same for JSON and XML via jax-rs, I 
don't know the frameworks well enough to tell, but if not, we could just do the 
copying in the getters for them, which would not make things worse.


  was:
The REST server does a lot of unneccessary coping, which could be avoided at 
least for protobuf encoding.

- It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses the 
client API, so it should never encounter ByteBuffer backed cells.
- It clones everything from the cells (sometimes multiple times) before 
serializing to protbuf.

We could mimic the structure in Cell, with array, offset and length for each 
field, in CellModel and use the appropriate protobuf setters to avoid the extra 
copies.

There may or may not be a way to do the same for JSON and XML via jax-rs, I 
don't know the frameworks well enough to tell, but if not, we could just do the 
copying in the getters for them, which would not make things worse.



> Reduce memory copying in Rest server when serializing CellModel to Protobuf
> ---
>
> Key: HBASE-28556
> URL: https://issues.apache.org/jira/browse/HBASE-28556
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> The REST server does a lot of unneccessary coping, which could be avoided at 
> least for protobuf encoding.
> - -It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses 
> the client API, so it should never encounter ByteBuffer backed cells.-
> - It clones everything from the cells (sometimes multiple times) before 
> serializing to protbuf.
> We could mimic the structure in Cell, with array, offset and length for each 
> field, in CellModel and use the appropriate protobuf setters to avoid the 
> extra copies.
> There may or may not be a way to do the same for JSON and XML via jax-rs, I 
> don't know the frameworks well enough to tell, but if not, we could just do 
> the copying in the getters for them, which would not make things worse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28526) hbase-rest jar does not work with hbase-shaded-client with protobuf encoding

2024-05-02 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842948#comment-17842948
 ] 

Istvan Toth commented on HBASE-28526:
-

This has already been done in branch-3+, which does not use vanilla protobuf 
anymore.

I think it would be useful to backport this at least to branch-2.

> hbase-rest jar does not work with hbase-shaded-client with protobuf encoding
> 
>
> Key: HBASE-28526
> URL: https://issues.apache.org/jira/browse/HBASE-28526
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>
> When trying to decode a protobuf encoded CellSet, I get 
> {noformat}
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.mergeFrom(Lcom/google/protobuf/Message$Builder;[B)V
>   at 
> org.apache.hadoop.hbase.rest.model.CellSetModel.getObjectFromMessage(CellSetModel.java:129)
>   at RestClientExample.getMulti(RestClientExample.java:191)
>   at RestClientExample.start(RestClientExample.java:138)
>   at RestClientExample.main(RestClientExample.java:124)
> {noformat}
> Seems to be caused by relocating protobuf 2.5 in hbase-shaded-client.
> It works fine with the unrelcoated client i.e. when using the 
> {noformat}
> export CLASSPATH=`hbase --internal-classpath classpath`:
> {noformat}
> command to set up the classpath for the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28526) hbase-rest jar does not work with hbase-shaded-client with protobuf encoding

2024-05-02 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28526:

Affects Version/s: 2.5.8
   2.4.17
   2.6.0
   2.7.0

> hbase-rest jar does not work with hbase-shaded-client with protobuf encoding
> 
>
> Key: HBASE-28526
> URL: https://issues.apache.org/jira/browse/HBASE-28526
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Affects Versions: 2.6.0, 2.4.17, 2.7.0, 2.5.8
>Reporter: Istvan Toth
>Priority: Major
>
> When trying to decode a protobuf encoded CellSet, I get 
> {noformat}
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.mergeFrom(Lcom/google/protobuf/Message$Builder;[B)V
>   at 
> org.apache.hadoop.hbase.rest.model.CellSetModel.getObjectFromMessage(CellSetModel.java:129)
>   at RestClientExample.getMulti(RestClientExample.java:191)
>   at RestClientExample.start(RestClientExample.java:138)
>   at RestClientExample.main(RestClientExample.java:124)
> {noformat}
> Seems to be caused by relocating protobuf 2.5 in hbase-shaded-client.
> It works fine with the unrelcoated client i.e. when using the 
> {noformat}
> export CLASSPATH=`hbase --internal-classpath classpath`:
> {noformat}
> command to set up the classpath for the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28561) Add separate fields for column family and qualifier in REST message format

2024-05-01 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28561:

Description: 
The current format uses the archaic column field, which requires extra 
processing and copying to encode/decode the CF and CQ at both the server and 
client side.

We need to:
- Add a version field to the requests, to be enabled by clients that support 
the new format
- Add the new fields to the JSON, XML and protobuf formats, and logic to use 
them.

This should be doable in a backwards-compatible manner, with the server falling 
back to the old format if it receives an unversioned request.

  was:
The current format uses the archaic column field, which requires extra 
processing and copying at both the server and client side.

We need to:
- Add a version field to the requests, to be enabled by clients that support 
the new format
- Add the new fields to the JSON, XML and protobuf formats, and logic to use 
them.

This should be doable in a backwards-compatible manner, with the server falling 
back to the old format if it receives an unversioned request.


> Add separate fields for column family and qualifier in REST message format
> --
>
> Key: HBASE-28561
> URL: https://issues.apache.org/jira/browse/HBASE-28561
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>
> The current format uses the archaic column field, which requires extra 
> processing and copying to encode/decode the CF and CQ at both the server and 
> client side.
> We need to:
> - Add a version field to the requests, to be enabled by clients that support 
> the new format
> - Add the new fields to the JSON, XML and protobuf formats, and logic to use 
> them.
> This should be doable in a backwards-compatible manner, with the server 
> falling back to the old format if it receives an unversioned request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-28561) Add separate fields for column family and qualifier in REST message format

2024-05-01 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth reassigned HBASE-28561:
---

Assignee: Istvan Toth

> Add separate fields for column family and qualifier in REST message format
> --
>
> Key: HBASE-28561
> URL: https://issues.apache.org/jira/browse/HBASE-28561
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> The current format uses the archaic column field, which requires extra 
> processing and copying to encode/decode the CF and CQ at both the server and 
> client side.
> We need to:
> - Add a version field to the requests, to be enabled by clients that support 
> the new format
> - Add the new fields to the JSON, XML and protobuf formats, and logic to use 
> them.
> This should be doable in a backwards-compatible manner, with the server 
> falling back to the old format if it receives an unversioned request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28561) Add separate fields for column family and qualifier in REST message format

2024-05-01 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28561:
---

 Summary: Add separate fields for column family and qualifier in 
REST message format
 Key: HBASE-28561
 URL: https://issues.apache.org/jira/browse/HBASE-28561
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


The current format uses the archaic column field, which requires extra 
processing and copying at both the server and client side.

We need to:
- Add a version field to the requests, to be enabled by clients that support 
the new format
- Add the new fields to the JSON, XML and protobuf formats, and logic to use 
them.

This should be doable in a backwards-compatible manner, with the server falling 
back to the old format if it receives an unversioned request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28525) Document all REST endpoints

2024-04-30 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28525:

Labels: beginner  (was: )

> Document all REST endpoints
> ---
>
> Key: HBASE-28525
> URL: https://issues.apache.org/jira/browse/HBASE-28525
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation, REST
>Reporter: Istvan Toth
>Priority: Major
>  Labels: beginner
>
> The new features added in HBASE-28518 do not have documentation.
> While reviewing, I also found other undocumented interfaces, like TableScan, 
> and options like globbed gets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28523) Use a single get call in REST multiget endpoint

2024-04-30 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28523.
-
Resolution: Fixed

Committed to all active branches.

> Use a single get call in REST multiget endpoint
> ---
>
> Key: HBASE-28523
> URL: https://issues.apache.org/jira/browse/HBASE-28523
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 2.4.18, 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> The REST multiget endpoint currently issues a separate HBase GET operation 
> for each key.
> Use the method that accepts a list of keys instead.
> That should be faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28523) Use a single get call in REST multiget endpoint

2024-04-30 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28523:

Fix Version/s: 2.4.18
   2.5.9

> Use a single get call in REST multiget endpoint
> ---
>
> Key: HBASE-28523
> URL: https://issues.apache.org/jira/browse/HBASE-28523
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 2.4.18, 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> The REST multiget endpoint currently issues a separate HBase GET operation 
> for each key.
> Use the method that accepts a list of keys instead.
> That should be faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28523) Use a single get call in REST multiget endpoint

2024-04-30 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28523:

Fix Version/s: 2.7.0
   3.0.0-beta-2
   2.6.1

> Use a single get call in REST multiget endpoint
> ---
>
> Key: HBASE-28523
> URL: https://issues.apache.org/jira/browse/HBASE-28523
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1
>
>
> The REST multiget endpoint currently issues a separate HBase GET operation 
> for each key.
> Use the method that accepts a list of keys instead.
> That should be faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28556) Reduce memory copying in Rest server when serializing CellModel to Protobuf

2024-04-29 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28556:

Description: 
The REST server does a lot of unneccessary coping, which could be avoided at 
least for protobuf encoding.

- It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses the 
client API, so it should never encounter ByteBuffer backed cells.
- It clones everything from the cells (sometimes multiple times) before 
serializing to protbuf.

We could mimic the structure in Cell, with array, offset and length for each 
field, in CellModel and use the appropriate protobuf setters to avoid the extra 
copies.

There may or may not be a way to do the same for JSON and XML via jax-rs, I 
don't know the frameworks well enough to tell, but if not, we could just do the 
copying in the getters for them, which would not make things worse.


  was:
The REST server does a lot of unneccessary coping, which could be avoided at 
least for protobuf encoding.

- It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses the 
client API, so it sjpuld never encounter ByteBuffer backed cells.
- It clones everything from the cells (sometimes multiple times) before 
serializing to protbuf.

We could mimic the structure in Cell, with array, offset and length for each 
field, and use the appropriate protobuf setters to avoid the extra copies.

There may or may not be a way to do the same for JSON and XML via jax-rs, I 
don't know the frameworks well enough to tell, but if not, we could just do the 
copying in the getters for them.



> Reduce memory copying in Rest server when serializing CellModel to Protobuf
> ---
>
> Key: HBASE-28556
> URL: https://issues.apache.org/jira/browse/HBASE-28556
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> The REST server does a lot of unneccessary coping, which could be avoided at 
> least for protobuf encoding.
> - It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses 
> the client API, so it should never encounter ByteBuffer backed cells.
> - It clones everything from the cells (sometimes multiple times) before 
> serializing to protbuf.
> We could mimic the structure in Cell, with array, offset and length for each 
> field, in CellModel and use the appropriate protobuf setters to avoid the 
> extra copies.
> There may or may not be a way to do the same for JSON and XML via jax-rs, I 
> don't know the frameworks well enough to tell, but if not, we could just do 
> the copying in the getters for them, which would not make things worse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-28556) Reduce memory copying in Rest server when serializing CellModel to Protobuf

2024-04-29 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth reassigned HBASE-28556:
---

Assignee: Istvan Toth

> Reduce memory copying in Rest server when serializing CellModel to Protobuf
> ---
>
> Key: HBASE-28556
> URL: https://issues.apache.org/jira/browse/HBASE-28556
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> The REST server does a lot of unneccessary coping, which could be avoided at 
> least for protobuf encoding.
> - It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses 
> the client API, so it sjpuld never encounter ByteBuffer backed cells.
> - It clones everything from the cells (sometimes multiple times) before 
> serializing to protbuf.
> We could mimic the structure in Cell, with array, offset and length for each 
> field, and use the appropriate protobuf setters to avoid the extra copies.
> There may or may not be a way to do the same for JSON and XML via jax-rs, I 
> don't know the frameworks well enough to tell, but if not, we could just do 
> the copying in the getters for them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28526) hbase-rest jar does not work with hbase-shaded-client with protobuf encoding

2024-04-29 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842231#comment-17842231
 ] 

Istvan Toth commented on HBASE-28526:
-

The best solution would be splitting the hbase-rest module in two, a client and 
server component, and including the client part hbase-shaded-client.

Switching to the shaded protobuf should also help.

> hbase-rest jar does not work with hbase-shaded-client with protobuf encoding
> 
>
> Key: HBASE-28526
> URL: https://issues.apache.org/jira/browse/HBASE-28526
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>
> When trying to decode a protobuf encoded CellSet, I get 
> {noformat}
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.mergeFrom(Lcom/google/protobuf/Message$Builder;[B)V
>   at 
> org.apache.hadoop.hbase.rest.model.CellSetModel.getObjectFromMessage(CellSetModel.java:129)
>   at RestClientExample.getMulti(RestClientExample.java:191)
>   at RestClientExample.start(RestClientExample.java:138)
>   at RestClientExample.main(RestClientExample.java:124)
> {noformat}
> Seems to be caused by relocating protobuf 2.5 in hbase-shaded-client.
> It works fine with the unrelcoated client i.e. when using the 
> {noformat}
> export CLASSPATH=`hbase --internal-classpath classpath`:
> {noformat}
> command to set up the classpath for the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28526) hbase-rest jar does not work with hbase-shaded-client with protobuf encoding

2024-04-29 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28526:

Description: 
When trying to decode a protobuf encoded CellSet, I get 
{noformat}
Exception in thread "main" java.lang.NoSuchMethodError: 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.mergeFrom(Lcom/google/protobuf/Message$Builder;[B)V
at 
org.apache.hadoop.hbase.rest.model.CellSetModel.getObjectFromMessage(CellSetModel.java:129)
at RestClientExample.getMulti(RestClientExample.java:191)
at RestClientExample.start(RestClientExample.java:138)
at RestClientExample.main(RestClientExample.java:124)

{noformat}

Seems to be caused by relocating protobuf 2.5 in hbase-shaded-client.

It works fine with the unrelcoated client i.e. when using the 
{noformat}
export CLASSPATH=`hbase --internal-classpath classpath`:
{noformat}
command to set up the classpath for the client.


  was:
When trying to decode a protobof encoded CellSet, I get 
{noformat}
Exception in thread "main" java.lang.NoSuchMethodError: 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.mergeFrom(Lcom/google/protobuf/Message$Builder;[B)V
at 
org.apache.hadoop.hbase.rest.model.CellSetModel.getObjectFromMessage(CellSetModel.java:129)
at RestClientExample.getMulti(RestClientExample.java:191)
at RestClientExample.start(RestClientExample.java:138)
at RestClientExample.main(RestClientExample.java:124)

{noformat}

Seems to be caused by relocating protobuf 2.5 in hbase-shaded-client.

It works fine with the unrelcoated client i.e. when using the 
{noformat}
export CLASSPATH=`hbase --internal-classpath classpath`:
{noformat}
command to set up the classpath for the client.



> hbase-rest jar does not work with hbase-shaded-client with protobuf encoding
> 
>
> Key: HBASE-28526
> URL: https://issues.apache.org/jira/browse/HBASE-28526
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>
> When trying to decode a protobuf encoded CellSet, I get 
> {noformat}
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.mergeFrom(Lcom/google/protobuf/Message$Builder;[B)V
>   at 
> org.apache.hadoop.hbase.rest.model.CellSetModel.getObjectFromMessage(CellSetModel.java:129)
>   at RestClientExample.getMulti(RestClientExample.java:191)
>   at RestClientExample.start(RestClientExample.java:138)
>   at RestClientExample.main(RestClientExample.java:124)
> {noformat}
> Seems to be caused by relocating protobuf 2.5 in hbase-shaded-client.
> It works fine with the unrelcoated client i.e. when using the 
> {noformat}
> export CLASSPATH=`hbase --internal-classpath classpath`:
> {noformat}
> command to set up the classpath for the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28556) Reduce memory copying in Rest server when serializing CellModel to Protobuf

2024-04-29 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28556:

Summary: Reduce memory copying in Rest server when serializing CellModel to 
Protobuf  (was: Reduce memory copying in Rest server when converting CellModel 
to Protobuf)

> Reduce memory copying in Rest server when serializing CellModel to Protobuf
> ---
>
> Key: HBASE-28556
> URL: https://issues.apache.org/jira/browse/HBASE-28556
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>
> The REST server does a lot of unneccessary coping, which could be avoided at 
> least for protobuf encoding.
> - It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses 
> the client API, so it sjpuld never encounter ByteBuffer backed cells.
> - It clones everything from the cells (sometimes multiple times) before 
> serializing to protbuf.
> We could mimic the structure in Cell, with array, offset and length for each 
> field, and use the appropriate protobuf setters to avoid the extra copies.
> There may or may not be a way to do the same for JSON and XML via jax-rs, I 
> don't know the frameworks well enough to tell, but if not, we could just do 
> the copying in the getters for them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28556) Reduce memory copying in Rest server when converting CellModel to Protobuf

2024-04-29 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28556:
---

 Summary: Reduce memory copying in Rest server when converting 
CellModel to Protobuf
 Key: HBASE-28556
 URL: https://issues.apache.org/jira/browse/HBASE-28556
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


The REST server does a lot of unneccessary coping, which could be avoided at 
least for protobuf encoding.

- It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses the 
client API, so it sjpuld never encounter ByteBuffer backed cells.
- It clones everything from the cells (sometimes multiple times) before 
serializing to protbuf.

We could mimic the structure in Cell, with array, offset and length for each 
field, and use the appropriate protobuf setters to avoid the extra copies.

There may or may not be a way to do the same for JSON and XML via jax-rs, I 
don't know the frameworks well enough to tell, but if not, we could just do the 
copying in the getters for them.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-28523) Use a single get call in REST multiget endpoint

2024-04-28 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth reassigned HBASE-28523:
---

Assignee: Istvan Toth

> Use a single get call in REST multiget endpoint
> ---
>
> Key: HBASE-28523
> URL: https://issues.apache.org/jira/browse/HBASE-28523
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: beginner
>
> The REST multiget endpoint currently issues a separate HBase GET operation 
> for each key.
> Use the method that accepts a list of keys instead.
> That should be faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28553) SSLContext not used for Kerberos auth negotiation in rest client

2024-04-25 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28553:
---

 Summary: SSLContext not used for Kerberos auth negotiation in rest 
client
 Key: HBASE-28553
 URL: https://issues.apache.org/jira/browse/HBASE-28553
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


The included REST client now supports specifying a Trust store for SSL 
connections.
However, the configured SSL library is not used when the Kerberos negotation is 
performed by the Hadoop library, which uses its own client.

We need to set up the Hadoop auth process to use the same SSLContext.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28518) Allow specifying a filter for the REST multiget endpoint

2024-04-24 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840647#comment-17840647
 ] 

Istvan Toth commented on HBASE-28518:
-

Thank you, [~zhangduo]. Please take a look at 
https://github.com/apache/hbase/pull/5852 .

> Allow specifying a filter for the REST multiget endpoint
> 
>
> Key: HBASE-28518
> URL: https://issues.apache.org/jira/browse/HBASE-28518
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> The native HBase API allows specifying Filters for get operations.
> The REST interface does not currently expose this functionality.
> Add a parameter to the multiget enpoint to allow specifying filters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28501) Support non-SPNEGO authentication methods in REST java client library

2024-04-24 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28501:

Summary: Support non-SPNEGO authentication methods in REST java client 
library  (was: Support non-SPNEGO authentication methods and path prefix in 
REST java client library)

> Support non-SPNEGO authentication methods in REST java client library
> -
>
> Key: HBASE-28501
> URL: https://issues.apache.org/jira/browse/HBASE-28501
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>
> The current java client only supports the SPENGO authentication method.
> This does not support the case when an application proxy like Apache Knox 
> performs AAA conversion from BASIC/DIGEST to kerberos authentication.
> Add support for BASIC username/password auth the client.
> Generally, the authentication code in the client looks quite backwards, it 
> seems that most of the kerberos / auth cookie code duplicates HttpClient 
> functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
> letting it handle authentication by itself would be a better and more generic 
> solution.
> Also add support for specifying a prefix for the URL path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28501) Support non-SPNEGO authentication methods in REST java client library

2024-04-24 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28501:

Description: 
The current java client only supports the SPENGO authentication method.

This does not support the case when an application proxy like Apache Knox 
performs AAA conversion from BASIC/DIGEST to kerberos authentication.

Add support for BASIC username/password auth the client.

Generally, the authentication code in the client looks quite backwards, it 
seems that most of the kerberos / auth cookie code duplicates HttpClient 
functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
letting it handle authentication by itself would be a better and more generic 
solution.

-Also add support for specifying a prefix for the URL path.-

  was:
The current java client only supports the SPENGO authentication method.

This does not support the case when an application proxy like Apache Knox 
performs AAA conversion from BASIC/DIGEST to kerberos authentication.

Add support for BASIC username/password auth the client.

Generally, the authentication code in the client looks quite backwards, it 
seems that most of the kerberos / auth cookie code duplicates HttpClient 
functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
letting it handle authentication by itself would be a better and more generic 
solution.

Also add support for specifying a prefix for the URL path.


> Support non-SPNEGO authentication methods in REST java client library
> -
>
> Key: HBASE-28501
> URL: https://issues.apache.org/jira/browse/HBASE-28501
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>
> The current java client only supports the SPENGO authentication method.
> This does not support the case when an application proxy like Apache Knox 
> performs AAA conversion from BASIC/DIGEST to kerberos authentication.
> Add support for BASIC username/password auth the client.
> Generally, the authentication code in the client looks quite backwards, it 
> seems that most of the kerberos / auth cookie code duplicates HttpClient 
> functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
> letting it handle authentication by itself would be a better and more generic 
> solution.
> -Also add support for specifying a prefix for the URL path.-



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28501) Support non-SPNEGO authentication methods and path prefix in REST java client library

2024-04-24 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28501:

Summary: Support non-SPNEGO authentication methods and path prefix in REST 
java client library  (was: Support non-SPNEGO authentication methods in REST 
java client library)

> Support non-SPNEGO authentication methods and path prefix in REST java client 
> library
> -
>
> Key: HBASE-28501
> URL: https://issues.apache.org/jira/browse/HBASE-28501
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>
> The current java client only supports the SPENGO authentication method.
> This does not support the case when an application proxy like Apache Knox 
> performs AAA conversion from BASIC/DIGEST to kerberos authentication.
> Add support for BASIC username/password auth the client.
> Generally, the authentication code in the client looks quite backwards, it 
> seems that most of the kerberos / auth cookie code duplicates HttpClient 
> functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
> letting it handle authentication by itself would be a better and more generic 
> solution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28501) Support non-SPNEGO authentication methods and path prefix in REST java client library

2024-04-24 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28501:

Description: 
The current java client only supports the SPENGO authentication method.

This does not support the case when an application proxy like Apache Knox 
performs AAA conversion from BASIC/DIGEST to kerberos authentication.

Add support for BASIC username/password auth the client.

Generally, the authentication code in the client looks quite backwards, it 
seems that most of the kerberos / auth cookie code duplicates HttpClient 
functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
letting it handle authentication by itself would be a better and more generic 
solution.

Also add support for specifying a prefix for the URL path.

  was:
The current java client only supports the SPENGO authentication method.

This does not support the case when an application proxy like Apache Knox 
performs AAA conversion from BASIC/DIGEST to kerberos authentication.

Add support for BASIC username/password auth the client.

Generally, the authentication code in the client looks quite backwards, it 
seems that most of the kerberos / auth cookie code duplicates HttpClient 
functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
letting it handle authentication by itself would be a better and more generic 
solution.



> Support non-SPNEGO authentication methods and path prefix in REST java client 
> library
> -
>
> Key: HBASE-28501
> URL: https://issues.apache.org/jira/browse/HBASE-28501
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>
> The current java client only supports the SPENGO authentication method.
> This does not support the case when an application proxy like Apache Knox 
> performs AAA conversion from BASIC/DIGEST to kerberos authentication.
> Add support for BASIC username/password auth the client.
> Generally, the authentication code in the client looks quite backwards, it 
> seems that most of the kerberos / auth cookie code duplicates HttpClient 
> functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
> letting it handle authentication by itself would be a better and more generic 
> solution.
> Also add support for specifying a prefix for the URL path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28550) Provide working benchmark tool for REST server

2024-04-24 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28550:
---

 Summary: Provide working benchmark tool for REST server
 Key: HBASE-28550
 URL: https://issues.apache.org/jira/browse/HBASE-28550
 Project: HBase
  Issue Type: Umbrella
  Components: REST
Reporter: Istvan Toth


This is an umbrella ticket for the individual changes.

The goal is to be able to performance test the rest server performance either 
directly or via Knox or other proxies / load balancers, and compare this with 
the results when going via the native client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28540) Cache Results in org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner

2024-04-23 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840107#comment-17840107
 ] 

Istvan Toth commented on HBASE-28540:
-

For 10K Result rows, this improved the execution time from 40 seconds to 2.5.

> Cache Results in org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner
> -
>
> Key: HBASE-28540
> URL: https://issues.apache.org/jira/browse/HBASE-28540
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Minor
>  Labels: pull-request-available
>
> The implementation of org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner
> is very inefficient, as the standard next() methods makes separate a http 
> request for each row.
> Performance can be improved by not specifying the row count in the REST call 
> and caching the returned Results.
> Chunk size can still be influenced by scan.setBatch();



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28544) org.apache.hadoop.hbase.rest.PerformanceEvaluation does not evaluate REST performance

2024-04-23 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839955#comment-17839955
 ] 

Istvan Toth commented on HBASE-28544:
-

The write tests using BufferedMutator cannot be rewritten, as that feature is 
not even implemeneted in the REST server.

> org.apache.hadoop.hbase.rest.PerformanceEvaluation does not evaluate REST 
> performance
> -
>
> Key: HBASE-28544
> URL: https://issues.apache.org/jira/browse/HBASE-28544
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> org.apache.hadoop.hbase.rest.PerformanceEvaluation only uses the REST 
> interface for Admin tasks like creating tables.
> All data access is done via the native RPC client, which makes the whole tool 
> a big red herring.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-28544) org.apache.hadoop.hbase.rest.PerformanceEvaluation does not evaluate REST performance

2024-04-22 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth reassigned HBASE-28544:
---

Assignee: Istvan Toth

> org.apache.hadoop.hbase.rest.PerformanceEvaluation does not evaluate REST 
> performance
> -
>
> Key: HBASE-28544
> URL: https://issues.apache.org/jira/browse/HBASE-28544
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> org.apache.hadoop.hbase.rest.PerformanceEvaluation only uses the REST 
> interface for Admin tasks like creating tables.
> All data access is done via the native RPC client, which makes the whole tool 
> a big red herring.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28544) org.apache.hadoop.hbase.rest.PerformanceEvaluation does not evaluate REST performance

2024-04-22 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28544:
---

 Summary: org.apache.hadoop.hbase.rest.PerformanceEvaluation does 
not evaluate REST performance
 Key: HBASE-28544
 URL: https://issues.apache.org/jira/browse/HBASE-28544
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth


org.apache.hadoop.hbase.rest.PerformanceEvaluation only uses the REST interface 
for Admin tasks like creating tables.

All data access is done via the native RPC client, which makes the whole tool a 
big red herring.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28543) Multiple issues preventing starting org.apache.hadoop.hbase.rest.PerformanceEvaluation

2024-04-22 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28543:

Summary: Multiple issues preventing starting 
org.apache.hadoop.hbase.rest.PerformanceEvaluation  (was: 
org.apache.hadoop.hbase.rest.PerformanceEvaluation does not read hbase-site.xml)

> Multiple issues preventing starting 
> org.apache.hadoop.hbase.rest.PerformanceEvaluation
> --
>
> Key: HBASE-28543
> URL: https://issues.apache.org/jira/browse/HBASE-28543
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> I am trying to run org.apache.hadoop.hbase.rest.PerformanceEvaluation.
> It cannot connect to the ZK quorum specified in hbase-site.xml.
> It implements the Configurable interface incorrectly.
> Fixing the Configurable implementation results in connecing to ZK properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28543) org.apache.hadoop.hbase.rest.PerformanceEvaluation does not read hbase-site.xml

2024-04-22 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28543:
---

 Summary: org.apache.hadoop.hbase.rest.PerformanceEvaluation does 
not read hbase-site.xml
 Key: HBASE-28543
 URL: https://issues.apache.org/jira/browse/HBASE-28543
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


I am trying to run org.apache.hadoop.hbase.rest.PerformanceEvaluation.
It cannot connect to the ZK quorum specified in hbase-site.xml.

It implements the Configurable interface incorrectly.
Fixing the Configurable implementation results in connecing to ZK properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28543) Multiple issues preventing starting org.apache.hadoop.hbase.rest.PerformanceEvaluation

2024-04-22 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28543:

Description: 
I am trying to run org.apache.hadoop.hbase.rest.PerformanceEvaluation.
It cannot connect to the ZK quorum specified in hbase-site.xml.

It implements the Configurable interface incorrectly.
Fixing the Configurable implementation results in connecing to ZK properly.

--host option does not work because it conflicts with --h for help

  was:
I am trying to run org.apache.hadoop.hbase.rest.PerformanceEvaluation.
It cannot connect to the ZK quorum specified in hbase-site.xml.

It implements the Configurable interface incorrectly.
Fixing the Configurable implementation results in connecing to ZK properly.


> Multiple issues preventing starting 
> org.apache.hadoop.hbase.rest.PerformanceEvaluation
> --
>
> Key: HBASE-28543
> URL: https://issues.apache.org/jira/browse/HBASE-28543
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> I am trying to run org.apache.hadoop.hbase.rest.PerformanceEvaluation.
> It cannot connect to the ZK quorum specified in hbase-site.xml.
> It implements the Configurable interface incorrectly.
> Fixing the Configurable implementation results in connecing to ZK properly.
> --host option does not work because it conflicts with --h for help



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28540) Cache Results in org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner

2024-04-22 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28540:
---

 Summary: Cache Results in 
org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner
 Key: HBASE-28540
 URL: https://issues.apache.org/jira/browse/HBASE-28540
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


The implementation of org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner
is very inefficient, as the standard next() methods makes separate a http 
request for each row.

Performance can be improved by not specifying the row count in the REST call 
and caching the returned Results.

Chunk size can still be influenced by scan.setBatch();




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28500) Rest Java client library assumes stateless servers

2024-04-17 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28500.
-
Resolution: Fixed

> Rest Java client library assumes stateless servers
> --
>
> Key: HBASE-28500
> URL: https://issues.apache.org/jira/browse/HBASE-28500
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> The Rest Java client library accepts a list of rest servers, and does random 
> load balancing between them for each request.
> This does not work for scans, which do have state on the rest server instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28500) Rest Java client library assumes stateless servers

2024-04-17 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838175#comment-17838175
 ] 

Istvan Toth commented on HBASE-28500:
-

Pushed the addendum to all active branches.

> Rest Java client library assumes stateless servers
> --
>
> Key: HBASE-28500
> URL: https://issues.apache.org/jira/browse/HBASE-28500
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> The Rest Java client library accepts a list of rest servers, and does random 
> load balancing between them for each request.
> This does not work for scans, which do have state on the rest server instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-28500) Rest Java client library assumes stateless servers

2024-04-16 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth reopened HBASE-28500:
-

The spotbugs warning makes daily bugs go red.
Gonna push an addendum for it.

> Rest Java client library assumes stateless servers
> --
>
> Key: HBASE-28500
> URL: https://issues.apache.org/jira/browse/HBASE-28500
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> The Rest Java client library accepts a list of rest servers, and does random 
> load balancing between them for each request.
> This does not work for scans, which do have state on the rest server instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28526) hbase-rest jar does not work with hbase-shaded-client with protobuf encoding

2024-04-16 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28526:
---

 Summary: hbase-rest jar does not work with hbase-shaded-client 
with protobuf encoding
 Key: HBASE-28526
 URL: https://issues.apache.org/jira/browse/HBASE-28526
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth


When trying to decode a protobof encoded CellSet, I get 
{noformat}
Exception in thread "main" java.lang.NoSuchMethodError: 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.mergeFrom(Lcom/google/protobuf/Message$Builder;[B)V
at 
org.apache.hadoop.hbase.rest.model.CellSetModel.getObjectFromMessage(CellSetModel.java:129)
at RestClientExample.getMulti(RestClientExample.java:191)
at RestClientExample.start(RestClientExample.java:138)
at RestClientExample.main(RestClientExample.java:124)

{noformat}

Seems to be caused by relocating protobuf 2.5 in hbase-shaded-client.

It works fine with the unrelcoated client i.e. when using the 
{noformat}
export CLASSPATH=`hbase --internal-classpath classpath`:
{noformat}
command to set up the classpath for the client.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28500) Rest Java client library assumes stateless servers

2024-04-16 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28500:

Fix Version/s: 2.6.0
   2.4.18
   4.0.0-alpha-1
   2.7.0
   3.0.0-beta-2
   2.5.9
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed to all active branches.
Thanks for the review [~zhangduo] and [~psomogyi].

> Rest Java client library assumes stateless servers
> --
>
> Key: HBASE-28500
> URL: https://issues.apache.org/jira/browse/HBASE-28500
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> The Rest Java client library accepts a list of rest servers, and does random 
> load balancing between them for each request.
> This does not work for scans, which do have state on the rest server instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-28500) Rest Java client library assumes stateless servers

2024-04-16 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth reassigned HBASE-28500:
---

Assignee: Istvan Toth

> Rest Java client library assumes stateless servers
> --
>
> Key: HBASE-28500
> URL: https://issues.apache.org/jira/browse/HBASE-28500
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>
> The Rest Java client library accepts a list of rest servers, and does random 
> load balancing between them for each request.
> This does not work for scans, which do have state on the rest server instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28525) Document all REST endpoints

2024-04-16 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28525:
---

 Summary: Document all REST endpoints
 Key: HBASE-28525
 URL: https://issues.apache.org/jira/browse/HBASE-28525
 Project: HBase
  Issue Type: Improvement
  Components: documentation, REST
Reporter: Istvan Toth


The new features added in HBASE-28518 do not have documentation.
While reviewing, I also found other undocumented interfaces, like TableScan, 
and options like globbed gets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28518) Allow specifying a filter for the REST multiget endpoint

2024-04-16 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28518.
-
Fix Version/s: 2.6.0
   2.4.18
   4.0.0-alpha-1
   2.7.0
   3.0.0-beta-2
   2.5.9
   Resolution: Fixed

Committed to all active branches.
Thanks for the review [~ankit].

> Allow specifying a filter for the REST multiget endpoint
> 
>
> Key: HBASE-28518
> URL: https://issues.apache.org/jira/browse/HBASE-28518
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> The native HBase API allows specifying Filters for get operations.
> The REST interface does not currently expose this functionality.
> Add a parameter to the multiget enpoint to allow specifying filters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28504) Implement eviction logic for scanners in Rest APIs to prevent scanner leakage

2024-04-15 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28504:

Fix Version/s: 2.6.0
   2.4.18
   2.7.0
   3.0.0-beta-2
   2.5.9
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed to all active branches.
Thanks for the review [~psomogyi].

> Implement eviction logic for scanners in Rest APIs to prevent scanner leakage
> -
>
> Key: HBASE-28504
> URL: https://issues.apache.org/jira/browse/HBASE-28504
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> The REST API maintains a map of _ScannerInstanceResource_s (which are 
> ultimately tracking Scanner objects).
> The user is supposed to delete these after using them, but if for any reason 
> it does not, then these objects are maintained indefinitely.
> Implement logic to evict old scanners automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28504) Implement eviction logic for scanners in Rest APIs to prevent scanner leakage

2024-04-15 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28504:

Fix Version/s: 4.0.0-alpha-1

> Implement eviction logic for scanners in Rest APIs to prevent scanner leakage
> -
>
> Key: HBASE-28504
> URL: https://issues.apache.org/jira/browse/HBASE-28504
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>
> The REST API maintains a map of _ScannerInstanceResource_s (which are 
> ultimately tracking Scanner objects).
> The user is supposed to delete these after using them, but if for any reason 
> it does not, then these objects are maintained indefinitely.
> Implement logic to evict old scanners automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28524) Backport HBASE-28174 to branch-2.4 and branch-2.5

2024-04-15 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28524.
-
Fix Version/s: 2.4.18
   2.5.9
 Release Note: Done.
   Resolution: Fixed

> Backport HBASE-28174 to branch-2.4 and branch-2.5
> -
>
> Key: HBASE-28524
> URL: https://issues.apache.org/jira/browse/HBASE-28524
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.4.17, 2.5.8
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
> Fix For: 2.4.18, 2.5.9
>
>
> The changes are backwards compatible and the REST interface is super limited 
> without them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28174) DELETE endpoint in REST API does not support deleting binary row keys/columns

2024-04-15 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837304#comment-17837304
 ] 

Istvan Toth commented on HBASE-28174:
-

Backported to branch-2.4 and branch-2.5.
The backport applied cleanly and the hbase-tests completed without error.

> DELETE endpoint in REST API does not support deleting binary row keys/columns
> -
>
> Key: HBASE-28174
> URL: https://issues.apache.org/jira/browse/HBASE-28174
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1
>Reporter: James Udiljak
>Assignee: James Udiljak
>Priority: Blocker
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.9
>
> Attachments: delete_base64_1.png
>
>
> h2. Notes
> This is the first time I have raised an issue in the ASF Jira. Please let me 
> know if there's anything I need to adjust on the issue to fit in with your 
> development flow.
> I have marked the priority as "blocker" because this issue blocks me as a 
> user of the HBase REST API from deploying an effective solution for our 
> setup. Please feel free to change this if the Priority field has another 
> meaning to you.
> I have also chosen 2.4.17 as the affected version because this is the version 
> I am running, however looking at the source code on GitHub in the default 
> branch, I think many other versions would be affected.
> h2. Description of Issue
> The DELETE operation in the [HBase REST 
> API|https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/rest/package-summary.html#operation_delete]
>  requires specifying row keys and column families/offsets in the URI (i.e. as 
> UTF-8 text). This makes it impossible to specify a delete operation via the 
> REST API for a binary row key or column family/offset, as single bytes with a 
> decimal value greater than 127 are not valid in UTF-8.
> Percent-encoding these "high" values does not work around the issue, as the 
> HBase REST API uses Java's {{URLDecoder.Decode(percentEncodedString, 
> "UTF-8")}} function, which replaces any percent-encoded byte in the range 
> {{%80}} to {{%FF}} with the [replacement 
> character|https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character].
>  Even if this were not the case, the row-key is ultimately [converted to a 
> byte 
> array|https://github.com/apache/hbase/blob/rel/2.4.17/hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/RowSpec.java#L60-L100]
>  using UTF-8 encoding, wherein code points >127 are encoded across multiple 
> bytes, corrupting the user-supplied row key.
> h2. Proposed Solution
> I do not believe it is possible to allow encoding of arbitrary bytes in the 
> URL for the DELETE endpoint without breaking compatibility for any users who 
> may have been unknowingly UTF-8 encoding their binary row keys. Even if it 
> were possible, the syntax would likely be terse.
> Instead, I propose a new version of the DELETE endpoint that would accept row 
> keys and column families/offsets in the request _body_ (using Base64 encoding 
> for the JSON and XML formats, and bare binary for protobuf). This new 
> endpoint would follow the same conventions as the PUT operations, except that 
> cell values would not need to be specified (unless the user is performing a 
> check-and-delete operation).
> As an additional benefit, using the request body could potentially allow for 
> deleting multiple rows in a single request, which would drastically improve 
> the efficiency of my use case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28174) DELETE endpoint in REST API does not support deleting binary row keys/columns

2024-04-15 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28174:

Fix Version/s: 2.4.18

> DELETE endpoint in REST API does not support deleting binary row keys/columns
> -
>
> Key: HBASE-28174
> URL: https://issues.apache.org/jira/browse/HBASE-28174
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1
>Reporter: James Udiljak
>Assignee: James Udiljak
>Priority: Blocker
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.9
>
> Attachments: delete_base64_1.png
>
>
> h2. Notes
> This is the first time I have raised an issue in the ASF Jira. Please let me 
> know if there's anything I need to adjust on the issue to fit in with your 
> development flow.
> I have marked the priority as "blocker" because this issue blocks me as a 
> user of the HBase REST API from deploying an effective solution for our 
> setup. Please feel free to change this if the Priority field has another 
> meaning to you.
> I have also chosen 2.4.17 as the affected version because this is the version 
> I am running, however looking at the source code on GitHub in the default 
> branch, I think many other versions would be affected.
> h2. Description of Issue
> The DELETE operation in the [HBase REST 
> API|https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/rest/package-summary.html#operation_delete]
>  requires specifying row keys and column families/offsets in the URI (i.e. as 
> UTF-8 text). This makes it impossible to specify a delete operation via the 
> REST API for a binary row key or column family/offset, as single bytes with a 
> decimal value greater than 127 are not valid in UTF-8.
> Percent-encoding these "high" values does not work around the issue, as the 
> HBase REST API uses Java's {{URLDecoder.Decode(percentEncodedString, 
> "UTF-8")}} function, which replaces any percent-encoded byte in the range 
> {{%80}} to {{%FF}} with the [replacement 
> character|https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character].
>  Even if this were not the case, the row-key is ultimately [converted to a 
> byte 
> array|https://github.com/apache/hbase/blob/rel/2.4.17/hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/RowSpec.java#L60-L100]
>  using UTF-8 encoding, wherein code points >127 are encoded across multiple 
> bytes, corrupting the user-supplied row key.
> h2. Proposed Solution
> I do not believe it is possible to allow encoding of arbitrary bytes in the 
> URL for the DELETE endpoint without breaking compatibility for any users who 
> may have been unknowingly UTF-8 encoding their binary row keys. Even if it 
> were possible, the syntax would likely be terse.
> Instead, I propose a new version of the DELETE endpoint that would accept row 
> keys and column families/offsets in the request _body_ (using Base64 encoding 
> for the JSON and XML formats, and bare binary for protobuf). This new 
> endpoint would follow the same conventions as the PUT operations, except that 
> cell values would not need to be specified (unless the user is performing a 
> check-and-delete operation).
> As an additional benefit, using the request body could potentially allow for 
> deleting multiple rows in a single request, which would drastically improve 
> the efficiency of my use case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28174) DELETE endpoint in REST API does not support deleting binary row keys/columns

2024-04-15 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28174:

Fix Version/s: 2.5.9

> DELETE endpoint in REST API does not support deleting binary row keys/columns
> -
>
> Key: HBASE-28174
> URL: https://issues.apache.org/jira/browse/HBASE-28174
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1
>Reporter: James Udiljak
>Assignee: James Udiljak
>Priority: Blocker
> Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.9
>
> Attachments: delete_base64_1.png
>
>
> h2. Notes
> This is the first time I have raised an issue in the ASF Jira. Please let me 
> know if there's anything I need to adjust on the issue to fit in with your 
> development flow.
> I have marked the priority as "blocker" because this issue blocks me as a 
> user of the HBase REST API from deploying an effective solution for our 
> setup. Please feel free to change this if the Priority field has another 
> meaning to you.
> I have also chosen 2.4.17 as the affected version because this is the version 
> I am running, however looking at the source code on GitHub in the default 
> branch, I think many other versions would be affected.
> h2. Description of Issue
> The DELETE operation in the [HBase REST 
> API|https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/rest/package-summary.html#operation_delete]
>  requires specifying row keys and column families/offsets in the URI (i.e. as 
> UTF-8 text). This makes it impossible to specify a delete operation via the 
> REST API for a binary row key or column family/offset, as single bytes with a 
> decimal value greater than 127 are not valid in UTF-8.
> Percent-encoding these "high" values does not work around the issue, as the 
> HBase REST API uses Java's {{URLDecoder.Decode(percentEncodedString, 
> "UTF-8")}} function, which replaces any percent-encoded byte in the range 
> {{%80}} to {{%FF}} with the [replacement 
> character|https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character].
>  Even if this were not the case, the row-key is ultimately [converted to a 
> byte 
> array|https://github.com/apache/hbase/blob/rel/2.4.17/hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/RowSpec.java#L60-L100]
>  using UTF-8 encoding, wherein code points >127 are encoded across multiple 
> bytes, corrupting the user-supplied row key.
> h2. Proposed Solution
> I do not believe it is possible to allow encoding of arbitrary bytes in the 
> URL for the DELETE endpoint without breaking compatibility for any users who 
> may have been unknowingly UTF-8 encoding their binary row keys. Even if it 
> were possible, the syntax would likely be terse.
> Instead, I propose a new version of the DELETE endpoint that would accept row 
> keys and column families/offsets in the request _body_ (using Base64 encoding 
> for the JSON and XML formats, and bare binary for protobuf). This new 
> endpoint would follow the same conventions as the PUT operations, except that 
> cell values would not need to be specified (unless the user is performing a 
> check-and-delete operation).
> As an additional benefit, using the request body could potentially allow for 
> deleting multiple rows in a single request, which would drastically improve 
> the efficiency of my use case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28524) Backport HBASE-28174 to branch-2.4 and branch-2.5

2024-04-15 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28524:
---

 Summary: Backport HBASE-28174 to branch-2.4 and branch-2.5
 Key: HBASE-28524
 URL: https://issues.apache.org/jira/browse/HBASE-28524
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.5.8, 2.4.17
Reporter: Istvan Toth
Assignee: Istvan Toth


The changes are backwards compatible and the REST interface is super limited 
without them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28523) Use a single get call in REST multiget endpoint

2024-04-14 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28523:
---

 Summary: Use a single get call in REST multiget endpoint
 Key: HBASE-28523
 URL: https://issues.apache.org/jira/browse/HBASE-28523
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


The REST multiget endpoint currently issues a separate HBase GET operation for 
each key.

Use the method that accepts a list of keys instead.
That should be faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28523) Use a single get call in REST multiget endpoint

2024-04-14 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28523:

Labels: beginner  (was: )

> Use a single get call in REST multiget endpoint
> ---
>
> Key: HBASE-28523
> URL: https://issues.apache.org/jira/browse/HBASE-28523
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>  Labels: beginner
>
> The REST multiget endpoint currently issues a separate HBase GET operation 
> for each key.
> Use the method that accepts a list of keys instead.
> That should be faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-25108) checkAndPut (or checkAndMutate) might return false when the row is mutated successfully

2024-04-14 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837046#comment-17837046
 ] 

Istvan Toth edited comment on HBASE-25108 at 4/15/24 4:00 AM:
--

Maybe we could add an option to auto-disable retries for non-idempotent 
operations ? 

Only increment and checkAndMutate variants come to my mind, though anything 
implemented in coprocessors could also be non-idempotent.


was (Author: stoty):
Maybe we could add an option to auto-disable retries for non-idempotent 
operations ? 

Only increment and checkAndPut come to my mind, though anything implemented in 
coprocessors could also be non-idempotent.

> checkAndPut (or checkAndMutate) might return false when the row is mutated 
> successfully
> ---
>
> Key: HBASE-25108
> URL: https://issues.apache.org/jira/browse/HBASE-25108
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 1.2.11
>Reporter: Murilo Giacometti Rocha
>Priority: Major
>
> In the client, when the MutateRequest times out, we retry the operation in 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl#callWithRetries. If the 
> server received the request but the client failed to get a response, the 
> server returns processed=false  because the value is already there. So the 
> value false is returned, even though the checkAndPut was successful in the 
> first attempt. It should return true if processed=false because it's already 
> been processed AND it is a retry operation.
>  
> Example RpcRetryingCallerImpl inside checkAndPut:
>                request
> client  ---{-}{{-}}o{{-}}{-}> server
>   
> client  > server (processing)
>  
> client  > server (processed)
> (timed out)
>  
>               retry request
> client  ---{-}{{-}}o{{-}}{-}> server
>   
> client  > server (processing)
>  
> client   --> server (already processed)
>  
>  response (processed = false)
> client  <-{-}{{-}}o{{-}}{-}--- server
>  
> checkAndPut returns false, even though it's successful.
>  
> In 2.1.0, I could only reproduce it three times by accident, but it always 
> happened in 1.2.11. In 2.1.0, I could only reproduce it systematically by 
> cleaning the response with the debugger before it got to the hconnection 
> thread.
> Repro steps
>  * Create a breakpoint in the exception in 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl#callWithRetries to make 
> sure we get the exception and retry.
>  * Create an infinite loop to create different rows with checkAndPut.
>  * Start running with the disabled breakpoints.
>  * Enable the breakpoint.
>  * Pause all threads and verify that we are waiting for a response in the IPC 
> thread. Wait for 1-2 minutes. This will cause a timeout.
>  * Continue and verify that an exception is triggered.
>  * Add a breakpoint to verify the response.
>  * Continue and check the response and the returned value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25108) checkAndPut (or checkAndMutate) might return false when the row is mutated successfully

2024-04-14 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837046#comment-17837046
 ] 

Istvan Toth commented on HBASE-25108:
-

Maybe we could add an option to auto-disable retries for non-idempotent 
operations ? 

Only increment and checkAndPut come to my mind, though anything implemented in 
coprocessors could also be non-idempotent.

> checkAndPut (or checkAndMutate) might return false when the row is mutated 
> successfully
> ---
>
> Key: HBASE-25108
> URL: https://issues.apache.org/jira/browse/HBASE-25108
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 1.2.11
>Reporter: Murilo Giacometti Rocha
>Priority: Major
>
> In the client, when the MutateRequest times out, we retry the operation in 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl#callWithRetries. If the 
> server received the request but the client failed to get a response, the 
> server returns processed=false  because the value is already there. So the 
> value false is returned, even though the checkAndPut was successful in the 
> first attempt. It should return true if processed=false because it's already 
> been processed AND it is a retry operation.
>  
> Example RpcRetryingCallerImpl inside checkAndPut:
>                request
> client  ---{-}{{-}}o{{-}}{-}> server
>   
> client  > server (processing)
>  
> client  > server (processed)
> (timed out)
>  
>               retry request
> client  ---{-}{{-}}o{{-}}{-}> server
>   
> client  > server (processing)
>  
> client   --> server (already processed)
>  
>  response (processed = false)
> client  <-{-}{{-}}o{{-}}{-}--- server
>  
> checkAndPut returns false, even though it's successful.
>  
> In 2.1.0, I could only reproduce it three times by accident, but it always 
> happened in 1.2.11. In 2.1.0, I could only reproduce it systematically by 
> cleaning the response with the debugger before it got to the hconnection 
> thread.
> Repro steps
>  * Create a breakpoint in the exception in 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl#callWithRetries to make 
> sure we get the exception and retry.
>  * Create an infinite loop to create different rows with checkAndPut.
>  * Start running with the disabled breakpoints.
>  * Enable the breakpoint.
>  * Pause all threads and verify that we are waiting for a response in the IPC 
> thread. Wait for 1-2 minutes. This will cause a timeout.
>  * Continue and verify that an exception is triggered.
>  * Add a breakpoint to verify the response.
>  * Continue and check the response and the returned value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28518) Allow specifying a filter for the REST multiget endpoint

2024-04-12 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28518:
---

 Summary: Allow specifying a filter for the REST multiget endpoint
 Key: HBASE-28518
 URL: https://issues.apache.org/jira/browse/HBASE-28518
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


The native HBase API allows specifying Filters for get operations.
The REST interface does not currently expose this functionality.

Add a parameter to the multiget enpoint to allow specifying filters.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28500) Rest Java client library assumes stateless servers

2024-04-09 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28500:

Status: Patch Available  (was: Open)

> Rest Java client library assumes stateless servers
> --
>
> Key: HBASE-28500
> URL: https://issues.apache.org/jira/browse/HBASE-28500
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>
> The Rest Java client library accepts a list of rest servers, and does random 
> load balancing between them for each request.
> This does not work for scans, which do have state on the rest server instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28504) Implement eviction logic for scanners in Rest APIs to prevent scanner leakage

2024-04-09 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28504:

Status: Patch Available  (was: Open)

> Implement eviction logic for scanners in Rest APIs to prevent scanner leakage
> -
>
> Key: HBASE-28504
> URL: https://issues.apache.org/jira/browse/HBASE-28504
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>
> The REST API maintains a map of _ScannerInstanceResource_s (which are 
> ultimately tracking Scanner objects).
> The user is supposed to delete these after using them, but if for any reason 
> it does not, then these objects are maintained indefinitely.
> Implement logic to evict old scanners automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28500) Rest Java client library assumes stateless servers

2024-04-09 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835308#comment-17835308
 ] 

Istvan Toth commented on HBASE-28500:
-

The whole API and implementation is kind of hopeless, it cannot be properly 
fixed while maintaining backwards bug compatibility.


> Rest Java client library assumes stateless servers
> --
>
> Key: HBASE-28500
> URL: https://issues.apache.org/jira/browse/HBASE-28500
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>
> The Rest Java client library accepts a list of rest servers, and does random 
> load balancing between them for each request.
> This does not work for scans, which do have state on the rest server instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28499) Use the latest Httpclient/Httpcore 5.x in HBase

2024-04-09 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HBASE-28499:

Description: 
HttpClient 4.x is not actively developed.

We use Httpclient directly in the REST client code, and in the tests for 
several modules.

Httpclient 4.5 is a transitive dependency at least from Hadoop and Thrift, but 
httpclient 5.x uses a separate java package, so 4.5 and 5.x  should be able to 
co-exist fine.

As of now, Httpclient 4.5 is in maintenance mode:
https://hc.apache.org/status.html


  was:
HttpClient 4.x is not actively maintained.

We use Httpclient directly in the REST client code, and in the tests for 
several modules.

Httpclient 4.5 is a transitive dependency at least from Hadoop and Thrift, but 
httpclient 5.x uses a separate java package, so 4.5 and 5.x  should be able to 
co-exist fine.

As of now, Httpclient 4.5 is in maintenance mode:
https://hc.apache.org/status.html



> Use the latest Httpclient/Httpcore 5.x  in HBase
> 
>
> Key: HBASE-28499
> URL: https://issues.apache.org/jira/browse/HBASE-28499
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Priority: Minor
>
> HttpClient 4.x is not actively developed.
> We use Httpclient directly in the REST client code, and in the tests for 
> several modules.
> Httpclient 4.5 is a transitive dependency at least from Hadoop and Thrift, 
> but httpclient 5.x uses a separate java package, so 4.5 and 5.x  should be 
> able to co-exist fine.
> As of now, Httpclient 4.5 is in maintenance mode:
> https://hc.apache.org/status.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28504) Implement eviction logic for scanners in Rest APIs to prevent scanner leakage

2024-04-08 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835143#comment-17835143
 ] 

Istvan Toth commented on HBASE-28504:
-

This was originally reported by [~ankit].

> Implement eviction logic for scanners in Rest APIs to prevent scanner leakage
> -
>
> Key: HBASE-28504
> URL: https://issues.apache.org/jira/browse/HBASE-28504
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> The REST API maintains a map of _ScannerInstanceResource_s (which are 
> ultimately tracking Scanner objects).
> The user is supposed to delete these after using them, but if for any reason 
> it does not, then these objects are maintained indefinitely.
> Implement logic to evict old scanners automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28504) Implement eviction logic for scanners in Rest APIs to prevent scanner leakage

2024-04-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28504:
---

 Summary: Implement eviction logic for scanners in Rest APIs to 
prevent scanner leakage
 Key: HBASE-28504
 URL: https://issues.apache.org/jira/browse/HBASE-28504
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


The REST API maintains a map of _ScannerInstanceResource_s (which are 
ultimately tracking Scanner objects).

The user is supposed to delete these after using them, but if for any reason it 
does not, then these objects are maintained indefinitely.

Implement logic to evict old scanners automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28501) Support non-SPNEGO authentication methods in REST java client library

2024-04-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28501:
---

 Summary: Support non-SPNEGO authentication methods in REST java 
client library
 Key: HBASE-28501
 URL: https://issues.apache.org/jira/browse/HBASE-28501
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


The current java client only supports the SPENGO authentication method.

This does not support the case when an application proxy like Apache Knox 
performs AAA conversion from BASIC/DIGEST to kerberos authentication.

Add support for BASIC username/password auth the client.

Generally, the authentication code in the client looks quite backwards, it 
seems that most of the kerberos / auth cookie code duplicates HttpClient 
functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
letting it handle authentication by itself would be a better and more generic 
solution.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28500) Rest Java client library assumes stateless servers

2024-04-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28500:
---

 Summary: Rest Java client library assumes stateless servers
 Key: HBASE-28500
 URL: https://issues.apache.org/jira/browse/HBASE-28500
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth


The Rest Java client library accepts a list of rest servers, and does random 
load balancing between them for each request.
This does not work for scans, which do have state on the rest server instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-28489) Implement HTTP session support in REST server and client for default auth

2024-04-08 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834814#comment-17834814
 ] 

Istvan Toth edited comment on HBASE-28489 at 4/8/24 7:37 AM:
-

Which means that the default/BASIC auth cannot be used with HA/LB now. 


was (Author: stoty):
Which means that default/BASIC auth cannot be used with HA/LB now. 

> Implement HTTP session support in REST server and client for default auth
> -
>
> Key: HBASE-28489
> URL: https://issues.apache.org/jira/browse/HBASE-28489
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> The REST server (and java client) currently does not implement sessions.
> While is not  necessary for the REST API to work, implementing sessions would 
> be a big improvement in throughput and resource usage.
> * It would make load balancing with sticky sessions possible
> * It would save the overhead of performing authentication for each request
>  The gains are particularly big when using SPENGO:
> * The full SPENGO handshake can be skipped for subsequent requests
> * When Knox performs SPENGO authentication for the proxied client, it access 
> the identity store each time. When the session is set, this step is only 
> perfomed on the initial request.
> The same change has resulted in spectacular performance improvements for 
> Phoenix Query Server when implemented in Avatica.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   >