Hi folks,
Recently we run into an odd issue that some partition's latest offset
becomes 0. Here's the snapshot of the Kafka Manager. As you can see
partition 2 and 3 becomes zero.

*Partition*

*Latest Offset*

*Leader*

*Replicas*

*In Sync Replicas*

*Preferred Leader?*

*Under Replicated?*

0

25822061

3 <http://10.1.49.4:9000/clusters/ppe/brokers/3>

(3,4,5)

(3,5,4)

true

false

1

25822388

4 <http://10.1.49.4:9000/clusters/ppe/brokers/4>

(4,5,1)

(4,1,5)

true

false

2

0

2 <http://10.1.49.4:9000/clusters/ppe/brokers/2>

(5,1,2)

(2)

false

true

3

0

2 <http://10.1.49.4:9000/clusters/ppe/brokers/2>

(1,2,3)

(3,2)

false

true

In the Kafka Controller node, I saw there're some errors like below in
state-change log. The timing seems match, not sure if it's related or not.

[2016-04-14 19:59:21,800] ERROR Controller 3 epoch 74174 initiated state
change for partition [topic,2] from OnlinePartition to OnlinePartition
failed (state.change.logger)
kafka.common.StateChangeFailedException: encountered error while electing
leader for partition [topic,2] due to: Preferred replica 1 for partition
[topic,2] is either not alive or not in the isr. Current leader and ISR:
[{"leader":2,"leader_epoch":169,"isr":[2]}].


And when this happens, basically all these partitions with zero latest
offset fail to get new data. After we restart the controller, everything
goes back normally.

Do you see the similar issue before and any idea about the root cause? What
other information do you suggest to collect to get to the root cause?

Thanks,
Qi

Reply via email to