Hi, One of my Kafka 0.9.0.1 clusters (3 brokers, default.replication.factor=2) that has been working fine until yesterday. The message volume was pretty low. There were no obvious problems except....
The first symptom was *kafka-consumer-groups.sh* failing with an empty.head exception. When I used *kafka-topics --describe* I saw that one of the brokers was no longer part of the appropriate ISRs. Restarting that broker appeared not solve the problem. In fact, I got the impression that the broker was temporarily in the ISR and then left again. I think I restarted each broker and eventually things returned to normal. The problem then reoccurred a couple of hours later. During this time, I also had a problem with one of the Kafka 0.8.2.1 clients: ERROR kafka.consumer.ConsumerFetcherThread - [ConsumerFetcherThread-mytopic-consumer-81a939d49903-1474231957102-90b6fc16-0-6], Current offset 5488 for partition [mytopic,5] out of range; reset offset to 2340\n","stream":"stdout","time":"2016-09-18T21:13:54.151815047Z"} This topic partition had >5488 messages so there offset was definitely not out of range. The result was that the consumer reprocessed old messages. The lag as reported by kafka-consumer-groups.sh when from <10 to > 2500 Thoughts? Recommendations for debugging this problem when it occurs again? Chris