You can read the data from the disk and see compression type. https://thehoard.blog/how-kafkas-storage-internals-work-3a29b02e026
Thanks, Nitin On Wed, May 12, 2021 at 11:10 AM Shantanu Deshmukh <shantanu...@gmail.com> wrote: > I am trying snappy compression on my producer. Here's my setup > > Kafka - 2.0.0 > Spring-Kafka - 2.1.2 > > Here's my producer config > > compressed producer ========== > > configProps.put( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, > bootstrapServer); > configProps.put( > ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, > StringSerializer.class); > configProps.put( > ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, > StringSerializer.class); > configProps.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy"); > configProps.put(ProducerConfig.LINGER_MS_CONFIG, 10); > > config of un-compressed producer ============ > > configProps.put( > ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, > bootstrapServer); > configProps.put( > ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, > StringSerializer.class); > configProps.put( > ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, > StringSerializer.class); > > My payload is almost 1mb worth of string. After sending 1000 compressed and > 1000 uncompressed such messages this is the result > ======================= > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc > /data/compressed-string-test-0/* > 8.0K /data/compressed-string-test-0/00000000000000000000.index > 990M /data/compressed-string-test-0/00000000000000000000.log > 12K /data/compressed-string-test-0/00000000000000000000.timeindex > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint > 990M total > > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -shc > /data/uncompressed-string-test-0/* > 8.0K /data/uncompressed-string-test-0/00000000000000000000.index > 992M /data/uncompressed-string-test-0/00000000000000000000.log > 12K /data/uncompressed-string-test-0/00000000000000000000.timeindex > 4.0K /data/uncompressed-string-test-0/leader-epoch-checkpoint > 992M total > ======================= > > Here we can see the difference is merely 2MB. Is compression even working? > I used dump-log-segment tool > ======================= > [shantanu@oc0148610736 kafka_2.11-2.0.0]$ sh bin/kafka-run-class.sh > kafka.tools.DumpLogSegments --files > /data/compressed-string-test-0/00000000000000000000.log --print-data-log | > head | grep compresscodec > > offset: 0 position: 0 CreateTime: 1620744081357 isvalid: true keysize: > -1 valuesize: 1039999 magic: 2 compresscodec: SNAPPY producerId: -1 > producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: [] > payload: > klxhbpyxmcazvhekqnltuenwhsewjjfmctcqyrppellyfqglfnvhqctlfplslhpuulknsncbgzzndizwmlnelotcbniyprdgihdazwn > ======================= > > I can see SNAPPY is mentioned as compression codec. But the difference > between compressed and uncompressed disk size is negligible. > > I tried gzip later on. And results are > ======================= > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc > /data/compressed-string-test-0/* > 8.0K /data/compressed-string-test-0/00000000000000000000.index > 640M /data/compressed-string-test-0/00000000000000000000.log > 12K /data/compressed-string-test-0/00000000000000000000.timeindex > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint > 640M total > ======================= > > So gzip seems to have worked somehow. I tried lz4 compression as well. > Results were same as that of snappy. > > Is snappy/lz4 compression really working here? Gzip seems to be working but > I have read a lot that snappy gives best CPU usage to compression ratio > balance. So we want to go ahead with snappy. > > Please help > > *Thanks & Regards,* > *Shantanu* >