Hi There,
We are running into a weird situation when using Mirrormaker to replicate messages between Kafka clusters across datacenter and reach you for help in case you also encountered this kind of problem before or have some insights in this kind of issue. Here is the scenario. We have setup a deployment where we run 30 Mirrormaker instances on 30 different nodes. Each Mirrormaker instance is configure with num.streams=1 thus only one consumer runs. The topics to replicate is configure with 100 partitions and data is almost evenly distributed across all partitions. After running a period of time, weird things happened that some of the Mirrormaker instances seems to slow down and consume at a relative slow speed from source Kafka cluster. The output of tcptrack shows the consume rate of problematic instances dropped to ~1MB/s, while the other healthy instances consume at a rate of ~3MB/s. As a result, the consumer lag for corresponding partitions are going high. After triggering a tcpdump, we noticed the traffic pattern in tcp connection of problematic Mirrmaker instances is very different from others. Packets flowing in problematic tcp connections are relatively small and seq and ack packets are basically coming in one after another. On the other hand, the packets in healthy tcp connections are coming in a different pattern, basically several seq packets comes with an ack packets. Below screenshot shows the situation, and these two captures are got on the same mirrormaker node. problematic connection. ps. 10.kfk.kfk.kfk is kafka broker, 10.mm.mm.mm is Mirrormaker node https://imgur.com/Z3odjjT healthy connection https://imgur.com/w0A6qHT If we stop the problematic Mirrormaker instance and when other instances take over the lagged partitions, they can consume messages quickly and catch up the lag soon. So the broker in source Kafaka cluster is supposed to be good. But if Mirrormaker itself causes the issue, how can one tcp connection is good but others are problematic since the connections are all established in the same manner by Kafka library. Consumer configuration for Mirrormaker instance as below. auto.offset.reset=earliest partition.assignment.strategy=org.apache.kafka.clients.consumer.RoundRobinAssignor heartbeat.interval.ms=10000 session.timeout.ms=120000 request.timeout.ms=150000 receive.buffer.bytes=1048576 max.partition.fetch.bytes=2097152 fetch.min.bytes=1048576 Kafka version is 0.10.0.0 and we have Kafka and Mirrormaker run on Ubuntu 14.04 Any response is appreciated. Regards, Tao