Replication bandwidth double what is expected

Theo Hultberg Mon, 01 Sep 2014 02:53:02 -0700

Hi,

We're evaluating Kafka, and have a problem with it using more bandwidth
than we can explain. From what we can tell the replication uses at least
twice the bandwidth it should.


We have four producer nodes and three broker nodes. We have enabled 3x
replication, so each node will get a copy of all data in this setup. The
producers have Snappy compression enabled and send batches of 200 messages.
The messages are around 1 KiB each. The cluster runs using mostly default
configuration, and the Kafka version is 0.8.1.1.

When we run iftop on the broker nodes we see that each Kafka node receives
around 6-7 Mbit from each producer node (or around 25-30 Mbit in total),
but then sends around 50 Mbit to each other Kafka node (or 100 Mbit in
total). This is twice what we expected to see, and it seems to saturate the
bandwidth on our m1.xlarge machines. In other words, we expected the
incoming 25 Mbit to be amplified to 50 Mbit, not 100.

One thing that could explain it, and that we don't really know how to
verify, is that the inter-node communication is not compressed. We aren't
sure about what compression ratio we get on the incoming data, but 50%
sounds reasonable. Could this explain what we're seeing? Is there a
configuration property to enable compression on the replication traffic
that we've missed?

yours
Theo

Replication bandwidth double what is expected

Reply via email to