Wellington Chevreuil created HBASE-27386:
--------------------------------------------

             Summary: Use encoded size for calculating compression ratio in 
block size predicator
                 Key: HBASE-27386
                 URL: https://issues.apache.org/jira/browse/HBASE-27386
             Project: HBase
          Issue Type: Bug
            Reporter: Wellington Chevreuil
            Assignee: Wellington Chevreuil


In HBASE-27264 we had introduced the notion of block size predicators to define 
hfile block boundaries when writing a new hfile, and provided the

PreviousBlockCompressionRatePredicator implementation for calculating block 
sizes based on a compression ratio. It was using the raw data size written to 
the block so far to calculate the compression ratio, but in the case where 
encoding is enabled, this could lead to a very high compression ratio and 
therefore, larger block sizes. We should use the encoded size to calculate 
compression ratio, instead.

Here's a example scenario:

1) Sample block size when not using the  PreviousBlockCompressionRatePredicator 
as implemented by HBASE-27264:
{noformat}
onDiskSizeWithoutHeader=6613, uncompressedSizeWithoutHeader=32928 {noformat}

2) Sample block size when using PreviousBlockCompressionRatePredicator as 
implemented by HBASE-27264 (uses raw data size to calculate compression rate):
{noformat}
onDiskSizeWithoutHeader=126920, uncompressedSizeWithoutHeader=655393
{noformat}

3) Sample block size when using PreviousBlockCompressionRatePredicator with 
encoded size for calculating compression rate:
{noformat}
onDiskSizeWithoutHeader=54299, uncompressedSizeWithoutHeader=328051
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to