You can read the data from the disk and see compression type.
https://thehoard.blog/how-kafkas-storage-internals-work-3a29b02e026

Thanks,
Nitin

On Wed, May 12, 2021 at 11:10 AM Shantanu Deshmukh <shantanu...@gmail.com>
wrote:

> I am trying snappy compression on my producer. Here's my setup
>
> Kafka - 2.0.0
> Spring-Kafka - 2.1.2
>
> Here's my producer config
>
> compressed producer ==========
>
> configProps.put( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
>             bootstrapServer);
>     configProps.put(
>             ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
>             StringSerializer.class);
>     configProps.put(
>             ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
>             StringSerializer.class);
>     configProps.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
>     configProps.put(ProducerConfig.LINGER_MS_CONFIG, 10);
>
> config of un-compressed producer ============
>
> configProps.put(
>             ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
>             bootstrapServer);
>     configProps.put(
>             ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
>             StringSerializer.class);
>     configProps.put(
>             ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
>             StringSerializer.class);
>
> My payload is almost 1mb worth of string. After sending 1000 compressed and
> 1000 uncompressed such messages this is the result
> =======================
> [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
> /data/compressed-string-test-0/*
> 8.0K /data/compressed-string-test-0/00000000000000000000.index
> 990M /data/compressed-string-test-0/00000000000000000000.log
> 12K /data/compressed-string-test-0/00000000000000000000.timeindex
> 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
> 990M total
>
> [shantanu@oc0148610736 uncompressed-string-test-0]$ du -shc
> /data/uncompressed-string-test-0/*
> 8.0K    /data/uncompressed-string-test-0/00000000000000000000.index
> 992M    /data/uncompressed-string-test-0/00000000000000000000.log
> 12K /data/uncompressed-string-test-0/00000000000000000000.timeindex
> 4.0K    /data/uncompressed-string-test-0/leader-epoch-checkpoint
> 992M    total
> =======================
>
> Here we can see the difference is merely 2MB. Is compression even working?
> I used dump-log-segment tool
> =======================
> [shantanu@oc0148610736 kafka_2.11-2.0.0]$ sh bin/kafka-run-class.sh
> kafka.tools.DumpLogSegments --files
> /data/compressed-string-test-0/00000000000000000000.log --print-data-log |
> head | grep compresscodec
>
> offset: 0 position: 0 CreateTime: 1620744081357 isvalid: true keysize:
> -1 valuesize: 1039999 magic: 2 compresscodec: SNAPPY producerId: -1
> producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: []
> payload:
> klxhbpyxmcazvhekqnltuenwhsewjjfmctcqyrppellyfqglfnvhqctlfplslhpuulknsncbgzzndizwmlnelotcbniyprdgihdazwn
> =======================
>
> I can see SNAPPY is mentioned as compression codec. But the difference
> between compressed and uncompressed disk size is negligible.
>
> I tried gzip later on. And results are
> =======================
> [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
> /data/compressed-string-test-0/*
> 8.0K /data/compressed-string-test-0/00000000000000000000.index
> 640M /data/compressed-string-test-0/00000000000000000000.log
> 12K /data/compressed-string-test-0/00000000000000000000.timeindex
> 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
> 640M total
> =======================
>
> So gzip seems to have worked somehow. I tried lz4 compression as well.
> Results were same as that of snappy.
>
> Is snappy/lz4 compression really working here? Gzip seems to be working but
> I have read a lot that snappy gives best CPU usage to compression ratio
> balance. So we want to go ahead with snappy.
>
> Please help
>
> *Thanks & Regards,*
> *Shantanu*
>

Reply via email to