Christian Kosmowski created KAFKA-9716:
------------------------------------------

             Summary: Values of compression-rate and compression-rate-avg are 
misleading
                 Key: KAFKA-9716
                 URL: https://issues.apache.org/jira/browse/KAFKA-9716
             Project: Kafka
          Issue Type: Bug
          Components: clients, compression
    Affects Versions: 2.4.1
            Reporter: Christian Kosmowski


The values of the following metrics:

compression-rate and compression-rate-avg and basically every other 
compression-rate (i.e.) topic compression rate

are confusing.

They are calculated as follows:
{code:java}
if (numRecords == 0L) {
    buffer().position(initialPosition);
    builtRecords = MemoryRecords.EMPTY;
} else {
    if (magic > RecordBatch.MAGIC_VALUE_V1)
        this.actualCompressionRatio = (float) writeDefaultBatchHeader() / 
this.uncompressedRecordsSizeInBytes;
    else if (compressionType != CompressionType.NONE)
        this.actualCompressionRatio = (float) 
writeLegacyCompressedWrapperHeader() / this.uncompressedRecordsSizeInBytes;

    ByteBuffer buffer = buffer().duplicate();
    buffer.flip();
    buffer.position(initialPosition);
    builtRecords = MemoryRecords.readableRecords(buffer.slice());
}
{code}
basically the compressed size is divided by the uncompressed size which leads 
to a value < 1 for high compression (good if you want compression) or > 1 for 
poor compression (bad if you want compression).

>From the name "compression rate" i would expect the exact opposite. Apart from 
>the fact that the word "rate" usually refers to comparisons based on values of 
>different units (miles per hour) the correct word "ratio" would refer to the 
>uncompressed size divided by the compressed size.

So if the compressed data takes half the space of the uncompressed data the 
correct value for compression ratio (or rate) would be 2 and not 0.5 as kafka 
reports it. That is really confusing and i would AT LEAST expect that this 
behaviour would be documented somewhere, but it's not all documentation sources 
just say "the compression rate".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to