Painfully slow kafka recovery

Jörg Wagner Fri, 21 Aug 2015 06:46:34 -0700

Hey everyone,

here's my crosspost from irc.


Our setup:

3 kafka 0.8.2 brokers with zookeeper, powerful hardware (20 cores, 27logdisks each). We use a handful of topics, but only one topic isutilized heavily. It features a replication of 2 and 600 partitions.


Our issue:

If one kafka was down, it takes very long ( from 1 to >10 hours) to showthat all partitions have all isr again. This seems to heavily depend onthe amount of data which is in the log.dirs (I have configured 27threads - one for each dir featuring a own drive).

This all takes this long while there is NO data flowing into kafka.

We seem to be missing something critical here. It might be some optionset wrong, or are we thinking wrong and it's not critical to have thereplicas in sync.


Any pointers would be great.

Cheers
Jörg

Painfully slow kafka recovery

Reply via email to