[ https://issues.apache.org/jira/browse/HBASE-27386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wellington Chevreuil resolved HBASE-27386. ------------------------------------------ Resolution: Fixed Merged into master and branch-2. Thanks for reviewing, [~ankit.singhal] ! > Use encoded size for calculating compression ratio in block size predicator > --------------------------------------------------------------------------- > > Key: HBASE-27386 > URL: https://issues.apache.org/jira/browse/HBASE-27386 > Project: HBase > Issue Type: Bug > Affects Versions: 3.0.0-alpha-3 > Reporter: Wellington Chevreuil > Assignee: Wellington Chevreuil > Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-4 > > > In HBASE-27264 we had introduced the notion of block size predicators to > define hfile block boundaries when writing a new hfile, and provided the > PreviousBlockCompressionRatePredicator implementation for calculating block > sizes based on a compression ratio. It was using the raw data size written to > the block so far to calculate the compression ratio, but in the case where > encoding is enabled, this could lead to a very high compression ratio and > therefore, larger block sizes. We should use the encoded size to calculate > compression ratio, instead. > Here's a example scenario: > 1) Sample block size when not using the > PreviousBlockCompressionRatePredicator as implemented by HBASE-27264: > {noformat} > onDiskSizeWithoutHeader=6613, uncompressedSizeWithoutHeader=32928 {noformat} > 2) Sample block size when using PreviousBlockCompressionRatePredicator as > implemented by HBASE-27264 (uses raw data size to calculate compression rate): > {noformat} > onDiskSizeWithoutHeader=126920, uncompressedSizeWithoutHeader=655393 > {noformat} > 3) Sample block size when using PreviousBlockCompressionRatePredicator with > encoded size for calculating compression rate: > {noformat} > onDiskSizeWithoutHeader=54299, uncompressedSizeWithoutHeader=328051 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)