[ https://issues.apache.org/jira/browse/KAFKA-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rens Groothuijsen reassigned KAFKA-9716: ---------------------------------------- Assignee: Rens Groothuijsen > Values of compression-rate and compression-rate-avg are misleading > ------------------------------------------------------------------ > > Key: KAFKA-9716 > URL: https://issues.apache.org/jira/browse/KAFKA-9716 > Project: Kafka > Issue Type: Bug > Components: clients, compression > Affects Versions: 2.4.1 > Reporter: Christian Kosmowski > Assignee: Rens Groothuijsen > Priority: Minor > > The values of the following metrics: > compression-rate and compression-rate-avg and basically every other > compression-rate (i.e.) topic compression rate > are confusing. > They are calculated as follows: > {code:java} > if (numRecords == 0L) { > buffer().position(initialPosition); > builtRecords = MemoryRecords.EMPTY; > } else { > if (magic > RecordBatch.MAGIC_VALUE_V1) > this.actualCompressionRatio = (float) writeDefaultBatchHeader() / > this.uncompressedRecordsSizeInBytes; > else if (compressionType != CompressionType.NONE) > this.actualCompressionRatio = (float) > writeLegacyCompressedWrapperHeader() / this.uncompressedRecordsSizeInBytes; > ByteBuffer buffer = buffer().duplicate(); > buffer.flip(); > buffer.position(initialPosition); > builtRecords = MemoryRecords.readableRecords(buffer.slice()); > } > {code} > basically the compressed size is divided by the uncompressed size which leads > to a value < 1 for high compression (good if you want compression) or > 1 for > poor compression (bad if you want compression). > From the name "compression rate" i would expect the exact opposite. Apart > from the fact that the word "rate" usually refers to comparisons based on > values of different units (miles per hour) the correct word "ratio" would > refer to the uncompressed size divided by the compressed size. (In the code > this is correct, but not with the metric names) > So if the compressed data takes half the space of the uncompressed data the > correct value for compression ratio (or rate) would be 2 and not 0.5 as kafka > reports it. That is really confusing and i would AT LEAST expect that this > behaviour would be documented somewhere, but it's not all documentation > sources just say "the compression rate". -- This message was sent by Atlassian Jira (v8.3.4#803005)