Hi, any pointer will be highly appreciated
On Thu, 30 Nov 2017 at 14:56 tao xiao <xiaotao...@gmail.com> wrote: > Hi There, > > > > We are running into a weird situation when using Mirrormaker to replicate > messages between Kafka clusters across datacenter and reach you for help in > case you also encountered this kind of problem before or have some insights > in this kind of issue. > > > > Here is the scenario. We have setup a deployment where we run 30 > Mirrormaker instances on 30 different nodes. Each Mirrormaker instance is > configure with num.streams=1 thus only one consumer runs. The topics to > replicate is configure with 100 partitions and data is almost evenly > distributed across all partitions. After running a period of time, weird > things happened that some of the Mirrormaker instances seems to slow down > and consume at a relative slow speed from source Kafka cluster. The output > of tcptrack shows the consume rate of problematic instances dropped to > ~1MB/s, while the other healthy instances consume at a rate of ~3MB/s. As > a result, the consumer lag for corresponding partitions are going high. > > > > > After triggering a tcpdump, we noticed the traffic pattern in tcp > connection of problematic Mirrmaker instances is very different from > others. Packets flowing in problematic tcp connections are relatively small > and seq and ack packets are basically coming in one after another. On the > other hand, the packets in healthy tcp connections are coming in a > different pattern, basically several seq packets comes with an ack packets. > Below screenshot shows the situation, and these two captures are got on the > same mirrormaker node. > > > > problematic connection. ps. 10.kfk.kfk.kfk is kafka broker, 10.mm.mm.mm > is Mirrormaker node > > https://imgur.com/Z3odjjT > > > healthy connection > > https://imgur.com/w0A6qHT > > > If we stop the problematic Mirrormaker instance and when other instances > take over the lagged partitions, they can consume messages quickly and > catch up the lag soon. So the broker in source Kafaka cluster is supposed > to be good. But if Mirrormaker itself causes the issue, how can one tcp > connection is good but others are problematic since the connections are all > established in the same manner by Kafka library. > > > > Consumer configuration for Mirrormaker instance as below. > > auto.offset.reset=earliest > > > partition.assignment.strategy=org.apache.kafka.clients.consumer.RoundRobinAssignor > > heartbeat.interval.ms=10000 > > session.timeout.ms=120000 > > request.timeout.ms=150000 > > receive.buffer.bytes=1048576 > > max.partition.fetch.bytes=2097152 > > fetch.min.bytes=1048576 > > > > Kafka version is 0.10.0.0 and we have Kafka and Mirrormaker run on Ubuntu > 14.04 > > > > Any response is appreciated. > > Regards, > > Tao >