One of my Kafka clusters (3 brokers,
default.replication.factor=2) that has been working fine until yesterday.
The message volume was pretty low. There were no obvious problems except....

The first symptom was *kafka-consumer-groups.sh* failing with an empty.head

When I used *kafka-topics --describe* I saw that one of the brokers was no
longer part of the appropriate ISRs.
Restarting that broker appeared not solve the problem.
In fact, I got the impression that the broker was temporarily in the ISR
and then left again.

I think I restarted each broker and eventually things returned to normal.
The problem then reoccurred a couple of hours later.

During this time, I also had a problem with one of the Kafka

ERROR kafka.consumer.ConsumerFetcherThread -
Current offset 5488 for partition [mytopic,5] out of range; reset offset to

This topic partition had >5488 messages so there offset was definitely not
out of range. The result was that the consumer reprocessed old messages.
The lag as reported by kafka-consumer-groups.sh when from <10 to > 2500

Thoughts? Recommendations for debugging this problem when it occurs again?


