Hi There,


We are running into a weird situation when using Mirrormaker to replicate
messages between Kafka clusters across datacenter and reach you for help in
case you also encountered this kind of problem before or have some insights
in this kind of issue.



Here is the scenario. We have setup a deployment where we run 30
Mirrormaker instances on 30 different nodes. Each Mirrormaker instance is
configure with num.streams=1 thus only one consumer runs. The topics to
replicate is configure with 100 partitions and data is almost evenly
distributed across all partitions. After running a period of time, weird
things happened that some of the Mirrormaker instances seems to slow down
and consume at a relative slow speed from source Kafka cluster. The output
of tcptrack shows the consume rate of problematic instances dropped to
~1MB/s, while the other healthy instances consume at a rate of  ~3MB/s. As
a result, the consumer lag for corresponding partitions are going high.



After triggering a tcpdump, we noticed the traffic pattern in tcp
connection of problematic Mirrmaker instances is very different from
others. Packets flowing in problematic tcp connections are relatively small
and seq and ack packets are basically coming in one after another. On the
other hand, the packets in healthy tcp connections are coming in a
different pattern, basically several seq packets comes with an ack packets.
Below screenshot shows the situation, and these two captures are got on the
same mirrormaker node.



problematic connection.  ps. 10.kfk.kfk.kfk is kafka broker, 10.mm.mm.mm is
Mirrormaker node

https://imgur.com/Z3odjjT


healthy connection

https://imgur.com/w0A6qHT


If we stop the problematic Mirrormaker instance and when other instances
take over the lagged partitions, they can consume messages quickly and
catch up the lag soon. So the broker in source Kafaka cluster is supposed
to be good. But if Mirrormaker itself causes the issue, how can one tcp
connection is good but others are problematic since the connections are all
established in the same manner by Kafka library.



Consumer configuration for Mirrormaker instance as below.

auto.offset.reset=earliest

partition.assignment.strategy=org.apache.kafka.clients.consumer.RoundRobinAssignor

heartbeat.interval.ms=10000

session.timeout.ms=120000

request.timeout.ms=150000

receive.buffer.bytes=1048576

max.partition.fetch.bytes=2097152

fetch.min.bytes=1048576



Kafka version is 0.10.0.0 and we have Kafka and Mirrormaker run on Ubuntu
14.04



Any response is appreciated.

Regards,

Tao

Reply via email to