Jason Gustafson created KAFKA-13944:
---------------------------------------

             Summary: Shutting down broker can be elected as partition leader 
in KRaft
                 Key: KAFKA-13944
                 URL: https://issues.apache.org/jira/browse/KAFKA-13944
             Project: Kafka
          Issue Type: Bug
            Reporter: Jason Gustafson


When a broker requests shutdown, it transitions to the CONTROLLED_SHUTDOWN 
state in the controller. It is possible for the broker to remain unfenced in 
this state until the controlled shutdown completes. When doing an election, the 
only thing we generally check is that the broker is unfenced, so this means we 
can elect a broker that is in controlled shutdown. 

Here are a few snippets from a recent system test in which this occurred:
{code:java}
// broker 2 starts controlled shutdown
[2022-05-26 21:17:26,451] INFO [Controller 3001] Unfenced broker 2 has 
requested and been granted a controlled shutdown. 
(org.apache.kafka.controller.BrokerHeartbeatManager)
 
// there is only one replica, so we set leader to -1
[2022-05-26 21:17:26,452] DEBUG [Controller 3001] partition change for _foo-1 
with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: 2 -> -1, leaderEpoch: 0 -> 1, 
partitionEpoch: 0 -> 1 (org.apache.kafka.controller.ReplicationControlManager)

// controlled shutdown cannot complete immediately
[2022-05-26 21:17:26,529] DEBUG [Controller 3001] The request from broker 2 to 
shut down can not yet be granted because the lowest active offset 177 is not 
greater than the broker's shutdown offset 244. 
(org.apache.kafka.controller.BrokerHeartbeatManager)
[2022-05-26 21:17:26,530] DEBUG [Controller 3001] Updated the controlled 
shutdown offset for broker 2 to 244. 
(org.apache.kafka.controller.BrokerHeartbeatManager)

// later on we elect leader 2 again
[2022-05-26 21:17:27,703] DEBUG [Controller 3001] partition change for _foo-1 
with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: -1 -> 2, leaderEpoch: 1 -> 2, 
partitionEpoch: 1 -> 2 (org.apache.kafka.controller.ReplicationControlManager)

// now controlled shutdown is stuck because of the newly elected leader
[2022-05-26 21:17:28,531] DEBUG [Controller 3001] Broker 2 is in controlled 
shutdown state, but can not shut down because more leaders still need to be 
moved. (org.apache.kafka.controller.BrokerHeartbeatManager)
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to