[ 
https://issues.apache.org/jira/browse/KAFKA-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rens Groothuijsen reassigned KAFKA-9716:
----------------------------------------

    Assignee: Rens Groothuijsen

> Values of compression-rate and compression-rate-avg are misleading
> ------------------------------------------------------------------
>
>                 Key: KAFKA-9716
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9716
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, compression
>    Affects Versions: 2.4.1
>            Reporter: Christian Kosmowski
>            Assignee: Rens Groothuijsen
>            Priority: Minor
>
> The values of the following metrics:
> compression-rate and compression-rate-avg and basically every other 
> compression-rate (i.e.) topic compression rate
> are confusing.
> They are calculated as follows:
> {code:java}
> if (numRecords == 0L) {
>     buffer().position(initialPosition);
>     builtRecords = MemoryRecords.EMPTY;
> } else {
>     if (magic > RecordBatch.MAGIC_VALUE_V1)
>         this.actualCompressionRatio = (float) writeDefaultBatchHeader() / 
> this.uncompressedRecordsSizeInBytes;
>     else if (compressionType != CompressionType.NONE)
>         this.actualCompressionRatio = (float) 
> writeLegacyCompressedWrapperHeader() / this.uncompressedRecordsSizeInBytes;
>     ByteBuffer buffer = buffer().duplicate();
>     buffer.flip();
>     buffer.position(initialPosition);
>     builtRecords = MemoryRecords.readableRecords(buffer.slice());
> }
> {code}
> basically the compressed size is divided by the uncompressed size which leads 
> to a value < 1 for high compression (good if you want compression) or > 1 for 
> poor compression (bad if you want compression).
> From the name "compression rate" i would expect the exact opposite. Apart 
> from the fact that the word "rate" usually refers to comparisons based on 
> values of different units (miles per hour) the correct word "ratio" would 
> refer to the uncompressed size divided by the compressed size. (In the code 
> this is correct, but not with the metric names)
> So if the compressed data takes half the space of the uncompressed data the 
> correct value for compression ratio (or rate) would be 2 and not 0.5 as kafka 
> reports it. That is really confusing and i would AT LEAST expect that this 
> behaviour would be documented somewhere, but it's not all documentation 
> sources just say "the compression rate".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to