Hi Everyone,

In the past few days, I’ve been benchmarking Storm using a simple topology consisting of a KafkaSpout and KafkaBolt. For the benchmark, I’ve produced 100.000.000 messages into Kafka, where each message was measured in 100 bytes. The configuration of Kafka, Zookeeper and Storm was intentionally left default.

An interesting observation I’ve made is in regard to the KafkaBolt throughput. Namely, while running the KafkaProducer standalone it has an uniform throughput of approximately 650.000 messages per second. Whereas, in the case of the KafkaBolt, the throughput is at most 206.000 messages, with a skewed distribution where subsequent seconds may have zero throughput i.e. tuples emitted. For an overview of the distribution, while running the benchmark on a cluster take a look at the graph below.

Now, my question is - why does the KafkaBolt have such an decreased throughput when compared to a standalone KafkaProducer? What factors in your experience influence it’s throughput? 

I’ve measured the same by having various configurational variances, such as configuring the topology.executor.(receive | send).buffer.size, disabling acknowledgements etcetera. But, the result although in some cases improved, still has a skewed throughput throughput the benchmark. 

Thanks in advance for sharing your experience and advice!

Dominik

Attachment: KafkaBoltThroughput.pdf
Description: Adobe PDF document

Reply via email to