Hi folks, Recently we run into an odd issue that some partition's latest offset becomes 0. Here's the snapshot of the Kafka Manager. As you can see partition 2 and 3 becomes zero.
*Partition* *Latest Offset* *Leader* *Replicas* *In Sync Replicas* *Preferred Leader?* *Under Replicated?* 0 25822061 3 <http://10.1.49.4:9000/clusters/ppe/brokers/3> (3,4,5) (3,5,4) true false 1 25822388 4 <http://10.1.49.4:9000/clusters/ppe/brokers/4> (4,5,1) (4,1,5) true false 2 0 2 <http://10.1.49.4:9000/clusters/ppe/brokers/2> (5,1,2) (2) false true 3 0 2 <http://10.1.49.4:9000/clusters/ppe/brokers/2> (1,2,3) (3,2) false true In the Kafka Controller node, I saw there're some errors like below in state-change log. The timing seems match, not sure if it's related or not. [2016-04-14 19:59:21,800] ERROR Controller 3 epoch 74174 initiated state change for partition [topic,2] from OnlinePartition to OnlinePartition failed (state.change.logger) kafka.common.StateChangeFailedException: encountered error while electing leader for partition [topic,2] due to: Preferred replica 1 for partition [topic,2] is either not alive or not in the isr. Current leader and ISR: [{"leader":2,"leader_epoch":169,"isr":[2]}]. And when this happens, basically all these partitions with zero latest offset fail to get new data. After we restart the controller, everything goes back normally. Do you see the similar issue before and any idea about the root cause? What other information do you suggest to collect to get to the root cause? Thanks, Qi