Hey everyone,

here's my crosspost from irc.

Our setup:
3 kafka 0.8.2 brokers with zookeeper, powerful hardware (20 cores, 27 logdisks each). We use a handful of topics, but only one topic is utilized heavily. It features a replication of 2 and 600 partitions.

Our issue:
If one kafka was down, it takes very long ( from 1 to >10 hours) to show that all partitions have all isr again. This seems to heavily depend on the amount of data which is in the log.dirs (I have configured 27 threads - one for each dir featuring a own drive).
This all takes this long while there is NO data flowing into kafka.

We seem to be missing something critical here. It might be some option set wrong, or are we thinking wrong and it's not critical to have the replicas in sync.

Any pointers would be great.

Cheers
Jörg

Reply via email to