[jira] [Commented] (HBASE-27264) Add options to consider compressed size when delimiting blocks during hfile writes
[ https://issues.apache.org/jira/browse/HBASE-27264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580604#comment-17580604 ] Hudson commented on HBASE-27264: Results for branch branch-2 [build #619 on builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/619/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/619/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/619/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/619/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/619/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Add options to consider compressed size when delimiting blocks during hfile > writes > -- > > Key: HBASE-27264 > URL: https://issues.apache.org/jira/browse/HBASE-27264 > Project: HBase > Issue Type: New Feature >Affects Versions: 3.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-4 > > > In HBASE-27232 we had modified "hbase.writer.unified.encoded.blocksize.ratio" > property soo that it can allow for the encoded size to be considered when > delimiting hfiles blocks during writes. > -Here we propose two additional > properties,"hbase.block.size.limit.compressed" and > "hbase.block.size.max.compressed" that would allow for consider the > compressed size (if compression is in use) for delimiting blocks during hfile > writing. When compression is enabled, certain datasets can have very high > compression efficiency, so that the default 64KB block size and 10GB max file > size can lead to hfiles with very large number of blocks.- > -In this proposal, "hbase.block.size.limit.compressed" is a boolean flag that > switches to compressed size for delimiting blocks, and > "hbase.block.size.max.compressed" is an int with the limit, in bytes for the > compressed block size, in order to avoid very large uncompressed blocks > (defaulting to 320KB).- > Note: As of 15/08/2022, the original proposal above has been modified to > define a pluggable strategy for predicating block compression rate. Please > refer to the release notes for more details. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27264) Add options to consider compressed size when delimiting blocks during hfile writes
[ https://issues.apache.org/jira/browse/HBASE-27264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580470#comment-17580470 ] Hudson commented on HBASE-27264: Results for branch master [build #659 on builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/659/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/659/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/659/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/659/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Add options to consider compressed size when delimiting blocks during hfile > writes > -- > > Key: HBASE-27264 > URL: https://issues.apache.org/jira/browse/HBASE-27264 > Project: HBase > Issue Type: New Feature >Affects Versions: 3.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 3.0.0-alpha-4 > > > In HBASE-27232 we had modified "hbase.writer.unified.encoded.blocksize.ratio" > property soo that it can allow for the encoded size to be considered when > delimiting hfiles blocks during writes. > -Here we propose two additional > properties,"hbase.block.size.limit.compressed" and > "hbase.block.size.max.compressed" that would allow for consider the > compressed size (if compression is in use) for delimiting blocks during hfile > writing. When compression is enabled, certain datasets can have very high > compression efficiency, so that the default 64KB block size and 10GB max file > size can lead to hfiles with very large number of blocks.- > -In this proposal, "hbase.block.size.limit.compressed" is a boolean flag that > switches to compressed size for delimiting blocks, and > "hbase.block.size.max.compressed" is an int with the limit, in bytes for the > compressed block size, in order to avoid very large uncompressed blocks > (defaulting to 320KB).- > Note: As of 15/08/2022, the original proposal above has been modified to > define a pluggable strategy for predicating block compression rate. Please > refer to the release notes for more details. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27264) Add options to consider compressed size when delimiting blocks during hfile writes
[ https://issues.apache.org/jira/browse/HBASE-27264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17574194#comment-17574194 ] Wellington Chevreuil commented on HBASE-27264: -- [~andrea-rockt] FYI > Add options to consider compressed size when delimiting blocks during hfile > writes > -- > > Key: HBASE-27264 > URL: https://issues.apache.org/jira/browse/HBASE-27264 > Project: HBase > Issue Type: New Feature >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > > In HBASE-27232 we had modified "hbase.writer.unified.encoded.blocksize.ratio" > property soo that it can allow for the encoded size to be considered when > delimiting hfiles blocks during writes. > Here we propose two additional properties,"hbase.block.size.limit.compressed" > and "hbase.block.size.max.compressed" that would allow for consider the > compressed size (if compression is in use) for delimiting blocks during hfile > writing. When compression is enabled, certain datasets can have very high > compression efficiency, so that the default 64KB block size and 10GB max file > size can lead to hfiles with very large number of blocks. > In this proposal, "hbase.block.size.limit.compressed" is a boolean flag that > switches to compressed size for delimiting blocks, and > "hbase.block.size.max.compressed" is an int with the limit, in bytes for the > compressed block size, in order to avoid very large uncompressed blocks > (defaulting to 320KB). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27264) Add options to consider compressed size when delimiting blocks during hfile writes
[ https://issues.apache.org/jira/browse/HBASE-27264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573801#comment-17573801 ] Wellington Chevreuil commented on HBASE-27264: -- {quote} Can we have a single unified config for them? {quote} Such "unified" behaviour would already be achieved by the "hbase.block.size.limit.compressed" and "hbase.block.size.max.compressed". Because we compress the block after encoding, so even when encoding is on, we really are checking the encoded and compressed size here. The original goal of unified.encoded.blocksize.ratio property was to give consistent block sizes in order to avoid fragmentation in the bucket cache. It had a bug, though, where if the encoded compression efficiency was higher than the configured unified.encoded.blocksize.ratio value, we would still se varying block sizes and fragmentation. HBASE-27232 fixed this problem. Now, with the fix, if the ratio is set to 1, we will have blocks of the actual encoded size (which was not possible before because of the bug). So keeping these separately gives the ability for choosing at which level we want to delimit blocks, with the extra control over fragmentation in the case of encoding only. > Add options to consider compressed size when delimiting blocks during hfile > writes > -- > > Key: HBASE-27264 > URL: https://issues.apache.org/jira/browse/HBASE-27264 > Project: HBase > Issue Type: New Feature >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > > In HBASE-27232 we had modified "hbase.writer.unified.encoded.blocksize.ratio" > property soo that it can allow for the encoded size to be considered when > delimiting hfiles blocks during writes. > Here we propose two additional properties,"hbase.block.size.limit.compressed" > and "hbase.block.size.max.compressed" that would allow for consider the > compressed size (if compression is in use) for delimiting blocks during hfile > writing. When compression is enabled, certain datasets can have very high > compression efficiency, so that the default 64KB block size and 10GB max file > size can lead to hfiles with very large number of blocks. > In this proposal, "hbase.block.size.limit.compressed" is a boolean flag that > switches to compressed size for delimiting blocks, and > "hbase.block.size.max.compressed" is an int with the limit, in bytes for the > compressed block size, in order to avoid very large uncompressed blocks > (defaulting to 320KB). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27264) Add options to consider compressed size when delimiting blocks during hfile writes
[ https://issues.apache.org/jira/browse/HBASE-27264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573733#comment-17573733 ] Bryan Beaudreault commented on HBASE-27264: --- My only thought here is that the existing unified.encoded.blocksize.ratio config is a bit hard to configure, and now we're adding 2 more configs in a similar area. I wonder if there's some sort of simplification we can do here to make it easier on users. Often block encoding and compression go hand-in-hand. Can we have a single unified config for them? Or is there some other easier way to auto-tune these for users, or at least add logs/metrics to make it easier to know what to set it to? > Add options to consider compressed size when delimiting blocks during hfile > writes > -- > > Key: HBASE-27264 > URL: https://issues.apache.org/jira/browse/HBASE-27264 > Project: HBase > Issue Type: New Feature >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > > In HBASE-27232 we had modified "hbase.writer.unified.encoded.blocksize.ratio" > property soo that it can allow for the encoded size to be considered when > delimiting hfiles blocks during writes. > Here we propose two additional properties,"hbase.block.size.limit.compressed" > and "hbase.block.size.max.compressed" that would allow for consider the > compressed size (if compression is in use) for delimiting blocks during hfile > writing. When compression is enabled, certain datasets can have very high > compression efficiency, so that the default 64KB block size and 10GB max file > size can lead to hfiles with very large number of blocks. > In this proposal, "hbase.block.size.limit.compressed" is a boolean flag that > switches to compressed size for delimiting blocks, and > "hbase.block.size.max.compressed" is an int with the limit, in bytes for the > compressed block size, in order to avoid very large uncompressed blocks > (defaulting to 320KB). > -- This message was sent by Atlassian Jira (v8.20.10#820010)