Killing last replica for partition doesn't change ISR/Leadership if replica is running controller

Alex Demidko Tue, 13 May 2014 12:06:09 -0700

Hi,

Kafka version is 0.8.1.1. We have three machines: A,B,C. Let’s say there is a 
topic with replication 2 and one of it’s partitions - partition 1 is placed on 
brokers A and B. If the broker A is already down than for the partition 1 we 
have: Leader: B, ISR: [B]. If the current controller is node C, than killing 
broker B will turn partition 1 into state: Leader:  -1, ISR: []. But if the 
current controller is node B, than killing it won’t update leadership/isr for 
partition 1 even when controller will be restarted on node C, so partition 1 
will forever think it’s leader is node B which is dead.


It looks that KafkaController.onBrokerFailure handles situation when the broker 
down is the partition leader - it sets the new leader value to -1. To the 
contrary, KafkaController.onControllerFailover never removes leader from the 
partition with all replicas offline - allegedly because partition gets into 
ReplicaDeletionIneligible state. Is it intended behavior?

This behavior affects DefaultEventHandler.getPartition in the null key case - 
it can’t determine partition 1 as having no leader, and this results into 
events send failure.


What we are trying to achieve - is to be able to write data even if some 
partitions lost all replicas, which is rare yet still possible scenario. Using 
null key looked suitable with minor DefaultEventHandler modifications (like 
getting rid from DefaultEventHandler.sendPartitionPerTopicCache to avoid 
caching and uneven events distribution) as we neither use logs compaction nor 
rely on partitioning of the data. We had such behavior with kafka 0.7 - if the 
node is down, simply produce to a different one.


Thanks, 
Alex

Killing last replica for partition doesn't change ISR/Leadership if replica is running controller

Reply via email to