I am trying snappy compression on my producer. Here's my setup

Kafka - 2.0.0
Spring-Kafka - 2.1.2

Here's my producer config

compressed producer ==========

configProps.put( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
            bootstrapServer);
    configProps.put(
            ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
            StringSerializer.class);
    configProps.put(
            ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
            StringSerializer.class);
    configProps.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
    configProps.put(ProducerConfig.LINGER_MS_CONFIG, 10);

config of un-compressed producer ============

configProps.put(
            ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
            bootstrapServer);
    configProps.put(
            ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
            StringSerializer.class);
    configProps.put(
            ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
            StringSerializer.class);

My payload is almost 1mb worth of string. After sending 1000 compressed and
1000 uncompressed such messages this is the result
=======================
[shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
/data/compressed-string-test-0/*
8.0K /data/compressed-string-test-0/00000000000000000000.index
990M /data/compressed-string-test-0/00000000000000000000.log
12K /data/compressed-string-test-0/00000000000000000000.timeindex
4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
990M total

[shantanu@oc0148610736 uncompressed-string-test-0]$ du -shc
/data/uncompressed-string-test-0/*
8.0K    /data/uncompressed-string-test-0/00000000000000000000.index
992M    /data/uncompressed-string-test-0/00000000000000000000.log
12K /data/uncompressed-string-test-0/00000000000000000000.timeindex
4.0K    /data/uncompressed-string-test-0/leader-epoch-checkpoint
992M    total
=======================

Here we can see the difference is merely 2MB. Is compression even working?
I used dump-log-segment tool
=======================
[shantanu@oc0148610736 kafka_2.11-2.0.0]$ sh bin/kafka-run-class.sh
kafka.tools.DumpLogSegments --files
/data/compressed-string-test-0/00000000000000000000.log --print-data-log |
head | grep compresscodec

offset: 0 position: 0 CreateTime: 1620744081357 isvalid: true keysize:
-1 valuesize: 1039999 magic: 2 compresscodec: SNAPPY producerId: -1
producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: []
payload: 
klxhbpyxmcazvhekqnltuenwhsewjjfmctcqyrppellyfqglfnvhqctlfplslhpuulknsncbgzzndizwmlnelotcbniyprdgihdazwn
=======================

I can see SNAPPY is mentioned as compression codec. But the difference
between compressed and uncompressed disk size is negligible.

I tried gzip later on. And results are
=======================
[shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
/data/compressed-string-test-0/*
8.0K /data/compressed-string-test-0/00000000000000000000.index
640M /data/compressed-string-test-0/00000000000000000000.log
12K /data/compressed-string-test-0/00000000000000000000.timeindex
4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
640M total
=======================

So gzip seems to have worked somehow. I tried lz4 compression as well.
Results were same as that of snappy.

Is snappy/lz4 compression really working here? Gzip seems to be working but
I have read a lot that snappy gives best CPU usage to compression ratio
balance. So we want to go ahead with snappy.

Please help

*Thanks & Regards,*
*Shantanu*

Reply via email to