[jira] [Commented] (HBASE-27264) Add options to consider compressed size when delimiting blocks during hfile writes

2022-08-17 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580604#comment-17580604
 ] 

Hudson commented on HBASE-27264:


Results for branch branch-2
[build #619 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/619/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/619/General_20Nightly_20Build_20Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/619/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/619/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/619/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Add options to consider compressed size when delimiting blocks during hfile 
> writes
> --
>
> Key: HBASE-27264
> URL: https://issues.apache.org/jira/browse/HBASE-27264
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 3.0.0-alpha-4
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
>
> In HBASE-27232 we had modified "hbase.writer.unified.encoded.blocksize.ratio" 
> property soo that it can allow for the encoded size to be considered when 
> delimiting hfiles blocks during writes.
> -Here we propose two additional 
> properties,"hbase.block.size.limit.compressed" and  
> "hbase.block.size.max.compressed" that would allow for consider the 
> compressed size (if compression is in use) for delimiting blocks during hfile 
> writing. When compression is enabled, certain datasets can have very high 
> compression efficiency, so that the default 64KB block size and 10GB max file 
> size can lead to hfiles with very large number of blocks.- 
> -In this proposal, "hbase.block.size.limit.compressed" is a boolean flag that 
> switches to compressed size for delimiting blocks, and 
> "hbase.block.size.max.compressed" is an int with the limit, in bytes for the 
> compressed block size, in order to avoid very large uncompressed blocks 
> (defaulting to 320KB).-
> Note: As of 15/08/2022, the original proposal above has been modified to 
> define a pluggable strategy for predicating block compression rate. Please 
> refer to the release notes for more details. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27264) Add options to consider compressed size when delimiting blocks during hfile writes

2022-08-16 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580470#comment-17580470
 ] 

Hudson commented on HBASE-27264:


Results for branch master
[build #659 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/659/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/659/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/659/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/659/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Add options to consider compressed size when delimiting blocks during hfile 
> writes
> --
>
> Key: HBASE-27264
> URL: https://issues.apache.org/jira/browse/HBASE-27264
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 3.0.0-alpha-4
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 3.0.0-alpha-4
>
>
> In HBASE-27232 we had modified "hbase.writer.unified.encoded.blocksize.ratio" 
> property soo that it can allow for the encoded size to be considered when 
> delimiting hfiles blocks during writes.
> -Here we propose two additional 
> properties,"hbase.block.size.limit.compressed" and  
> "hbase.block.size.max.compressed" that would allow for consider the 
> compressed size (if compression is in use) for delimiting blocks during hfile 
> writing. When compression is enabled, certain datasets can have very high 
> compression efficiency, so that the default 64KB block size and 10GB max file 
> size can lead to hfiles with very large number of blocks.- 
> -In this proposal, "hbase.block.size.limit.compressed" is a boolean flag that 
> switches to compressed size for delimiting blocks, and 
> "hbase.block.size.max.compressed" is an int with the limit, in bytes for the 
> compressed block size, in order to avoid very large uncompressed blocks 
> (defaulting to 320KB).-
> Note: As of 15/08/2022, the original proposal above has been modified to 
> define a pluggable strategy for predicating block compression rate. Please 
> refer to the release notes for more details. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27264) Add options to consider compressed size when delimiting blocks during hfile writes

2022-08-02 Thread Wellington Chevreuil (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17574194#comment-17574194
 ] 

Wellington Chevreuil commented on HBASE-27264:
--

[~andrea-rockt] FYI

> Add options to consider compressed size when delimiting blocks during hfile 
> writes
> --
>
> Key: HBASE-27264
> URL: https://issues.apache.org/jira/browse/HBASE-27264
> Project: HBase
>  Issue Type: New Feature
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>
> In HBASE-27232 we had modified "hbase.writer.unified.encoded.blocksize.ratio" 
> property soo that it can allow for the encoded size to be considered when 
> delimiting hfiles blocks during writes.
> Here we propose two additional properties,"hbase.block.size.limit.compressed" 
> and  "hbase.block.size.max.compressed" that would allow for consider the 
> compressed size (if compression is in use) for delimiting blocks during hfile 
> writing. When compression is enabled, certain datasets can have very high 
> compression efficiency, so that the default 64KB block size and 10GB max file 
> size can lead to hfiles with very large number of blocks. 
> In this proposal, "hbase.block.size.limit.compressed" is a boolean flag that 
> switches to compressed size for delimiting blocks, and 
> "hbase.block.size.max.compressed" is an int with the limit, in bytes for the 
> compressed block size, in order to avoid very large uncompressed blocks 
> (defaulting to 320KB).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27264) Add options to consider compressed size when delimiting blocks during hfile writes

2022-08-01 Thread Wellington Chevreuil (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573801#comment-17573801
 ] 

Wellington Chevreuil commented on HBASE-27264:
--

{quote}
Can we have a single unified config for them?
{quote}
 Such "unified" behaviour would already be achieved by the 
"hbase.block.size.limit.compressed" and  "hbase.block.size.max.compressed". 
Because we compress the block after encoding, so even when encoding is on, we 
really are checking the encoded and compressed size here.

The original goal of unified.encoded.blocksize.ratio property was to give 
consistent block sizes in order to avoid fragmentation in the bucket cache. It 
had a bug, though, where if the encoded compression efficiency was higher than 
the configured unified.encoded.blocksize.ratio value, we would still se varying 
block sizes and fragmentation. HBASE-27232 fixed this problem. Now, with the 
fix, if the ratio is set to 1, we will have blocks of the actual encoded size 
(which was not possible before because of the bug).

So keeping these separately gives the ability for choosing at which level we 
want to delimit blocks, with the extra control over fragmentation in the case 
of encoding only.

> Add options to consider compressed size when delimiting blocks during hfile 
> writes
> --
>
> Key: HBASE-27264
> URL: https://issues.apache.org/jira/browse/HBASE-27264
> Project: HBase
>  Issue Type: New Feature
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>
> In HBASE-27232 we had modified "hbase.writer.unified.encoded.blocksize.ratio" 
> property soo that it can allow for the encoded size to be considered when 
> delimiting hfiles blocks during writes.
> Here we propose two additional properties,"hbase.block.size.limit.compressed" 
> and  "hbase.block.size.max.compressed" that would allow for consider the 
> compressed size (if compression is in use) for delimiting blocks during hfile 
> writing. When compression is enabled, certain datasets can have very high 
> compression efficiency, so that the default 64KB block size and 10GB max file 
> size can lead to hfiles with very large number of blocks. 
> In this proposal, "hbase.block.size.limit.compressed" is a boolean flag that 
> switches to compressed size for delimiting blocks, and 
> "hbase.block.size.max.compressed" is an int with the limit, in bytes for the 
> compressed block size, in order to avoid very large uncompressed blocks 
> (defaulting to 320KB).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27264) Add options to consider compressed size when delimiting blocks during hfile writes

2022-08-01 Thread Bryan Beaudreault (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573733#comment-17573733
 ] 

Bryan Beaudreault commented on HBASE-27264:
---

My only thought here is that the existing unified.encoded.blocksize.ratio 
config is a bit hard to configure, and now we're adding 2 more configs in a 
similar area. I wonder if there's some sort of simplification we can do here to 
make it easier on users. Often block encoding and compression go hand-in-hand. 
Can we have a single unified config for them? Or is there some other easier way 
to auto-tune these for users, or at least add logs/metrics to make it easier to 
know what to set it to?

> Add options to consider compressed size when delimiting blocks during hfile 
> writes
> --
>
> Key: HBASE-27264
> URL: https://issues.apache.org/jira/browse/HBASE-27264
> Project: HBase
>  Issue Type: New Feature
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>
> In HBASE-27232 we had modified "hbase.writer.unified.encoded.blocksize.ratio" 
> property soo that it can allow for the encoded size to be considered when 
> delimiting hfiles blocks during writes.
> Here we propose two additional properties,"hbase.block.size.limit.compressed" 
> and  "hbase.block.size.max.compressed" that would allow for consider the 
> compressed size (if compression is in use) for delimiting blocks during hfile 
> writing. When compression is enabled, certain datasets can have very high 
> compression efficiency, so that the default 64KB block size and 10GB max file 
> size can lead to hfiles with very large number of blocks. 
> In this proposal, "hbase.block.size.limit.compressed" is a boolean flag that 
> switches to compressed size for delimiting blocks, and 
> "hbase.block.size.max.compressed" is an int with the limit, in bytes for the 
> compressed block size, in order to avoid very large uncompressed blocks 
> (defaulting to 320KB).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)