Hey Nitin, I have already done that. I used dump-log-segments option. And I can see the codec used is snappy/gzip/lz4. My question is, only gzip is giving me compression. Rest are equivalent to uncompressed storage,
On Wed, May 12, 2021 at 11:16 AM nitin agarwal <nitingarg...@gmail.com> wrote: > You can read the data from the disk and see compression type. > https://thehoard.blog/how-kafkas-storage-internals-work-3a29b02e026 > > Thanks, > Nitin > > On Wed, May 12, 2021 at 11:10 AM Shantanu Deshmukh <shantanu...@gmail.com> > wrote: > > > I am trying snappy compression on my producer. Here's my setup > > > > Kafka - 2.0.0 > > Spring-Kafka - 2.1.2 > > > > Here's my producer config > > > > compressed producer ========== > > > > configProps.put( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, > > bootstrapServer); > > configProps.put( > > ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, > > StringSerializer.class); > > configProps.put( > > ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, > > StringSerializer.class); > > configProps.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy"); > > configProps.put(ProducerConfig.LINGER_MS_CONFIG, 10); > > > > config of un-compressed producer ============ > > > > configProps.put( > > ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, > > bootstrapServer); > > configProps.put( > > ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, > > StringSerializer.class); > > configProps.put( > > ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, > > StringSerializer.class); > > > > My payload is almost 1mb worth of string. After sending 1000 compressed > and > > 1000 uncompressed such messages this is the result > > ======================= > > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc > > /data/compressed-string-test-0/* > > 8.0K /data/compressed-string-test-0/00000000000000000000.index > > 990M /data/compressed-string-test-0/00000000000000000000.log > > 12K /data/compressed-string-test-0/00000000000000000000.timeindex > > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint > > 990M total > > > > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -shc > > /data/uncompressed-string-test-0/* > > 8.0K /data/uncompressed-string-test-0/00000000000000000000.index > > 992M /data/uncompressed-string-test-0/00000000000000000000.log > > 12K /data/uncompressed-string-test-0/00000000000000000000.timeindex > > 4.0K /data/uncompressed-string-test-0/leader-epoch-checkpoint > > 992M total > > ======================= > > > > Here we can see the difference is merely 2MB. Is compression even > working? > > I used dump-log-segment tool > > ======================= > > [shantanu@oc0148610736 kafka_2.11-2.0.0]$ sh bin/kafka-run-class.sh > > kafka.tools.DumpLogSegments --files > > /data/compressed-string-test-0/00000000000000000000.log --print-data-log > | > > head | grep compresscodec > > > > offset: 0 position: 0 CreateTime: 1620744081357 isvalid: true keysize: > > -1 valuesize: 1039999 magic: 2 compresscodec: SNAPPY producerId: -1 > > producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: [] > > payload: > > > klxhbpyxmcazvhekqnltuenwhsewjjfmctcqyrppellyfqglfnvhqctlfplslhpuulknsncbgzzndizwmlnelotcbniyprdgihdazwn > > ======================= > > > > I can see SNAPPY is mentioned as compression codec. But the difference > > between compressed and uncompressed disk size is negligible. > > > > I tried gzip later on. And results are > > ======================= > > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc > > /data/compressed-string-test-0/* > > 8.0K /data/compressed-string-test-0/00000000000000000000.index > > 640M /data/compressed-string-test-0/00000000000000000000.log > > 12K /data/compressed-string-test-0/00000000000000000000.timeindex > > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint > > 640M total > > ======================= > > > > So gzip seems to have worked somehow. I tried lz4 compression as well. > > Results were same as that of snappy. > > > > Is snappy/lz4 compression really working here? Gzip seems to be working > but > > I have read a lot that snappy gives best CPU usage to compression ratio > > balance. So we want to go ahead with snappy. > > > > Please help > > > > *Thanks & Regards,* > > *Shantanu* > > >