Christian Kosmowski created KAFKA-9716: ------------------------------------------
Summary: Values of compression-rate and compression-rate-avg are misleading Key: KAFKA-9716 URL: https://issues.apache.org/jira/browse/KAFKA-9716 Project: Kafka Issue Type: Bug Components: clients, compression Affects Versions: 2.4.1 Reporter: Christian Kosmowski The values of the following metrics: compression-rate and compression-rate-avg and basically every other compression-rate (i.e.) topic compression rate are confusing. They are calculated as follows: {code:java} if (numRecords == 0L) { buffer().position(initialPosition); builtRecords = MemoryRecords.EMPTY; } else { if (magic > RecordBatch.MAGIC_VALUE_V1) this.actualCompressionRatio = (float) writeDefaultBatchHeader() / this.uncompressedRecordsSizeInBytes; else if (compressionType != CompressionType.NONE) this.actualCompressionRatio = (float) writeLegacyCompressedWrapperHeader() / this.uncompressedRecordsSizeInBytes; ByteBuffer buffer = buffer().duplicate(); buffer.flip(); buffer.position(initialPosition); builtRecords = MemoryRecords.readableRecords(buffer.slice()); } {code} basically the compressed size is divided by the uncompressed size which leads to a value < 1 for high compression (good if you want compression) or > 1 for poor compression (bad if you want compression). >From the name "compression rate" i would expect the exact opposite. Apart from >the fact that the word "rate" usually refers to comparisons based on values of >different units (miles per hour) the correct word "ratio" would refer to the >uncompressed size divided by the compressed size. So if the compressed data takes half the space of the uncompressed data the correct value for compression ratio (or rate) would be 2 and not 0.5 as kafka reports it. That is really confusing and i would AT LEAST expect that this behaviour would be documented somewhere, but it's not all documentation sources just say "the compression rate". -- This message was sent by Atlassian Jira (v8.3.4#803005)