Igor Soarez created KAFKA-15650: ----------------------------------- Summary: Data-loss on leader shutdown right after partition creation? Key: KAFKA-15650 URL: https://issues.apache.org/jira/browse/KAFKA-15650 Project: Kafka Issue Type: Sub-task Reporter: Igor Soarez
As per KIP-858, when a replica is created, the broker selects a log directory to host the replica and queues the propagation of the directory assignment to the controller. The replica becomes immediately active, it isn't blocked until the controller confirms the metadata change. If the replica is the leader replica it can immediately start accepting writes. Consider the following scenario: # A partition is created in some selected log directory, and some produce traffic is accepted # Before the broker is able to notify the controller of the directory assignment, the broker shuts down # Upon coming back online, the broker has an offline directory, the same directory which was chosen to host the replica # The broker assumes leadership for the replica, but cannot find it in any available directory and has no way of knowing it was already created because the directory assignment is still missing # The replica is created and the previously produced records are lost Step 4. may seem unlikely due to ISR membership gating leadership, but even assuming acks=all and replicas>1, if all other replicas are also offline the broker may still gain leadership. Perhaps KIP-966 is relevant here. We may need to delay new replica activation until the assignment is propagated successfully. -- This message was sent by Atlassian Jira (v8.20.10#820010)