[ https://issues.apache.org/jira/browse/KAFKA-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ismael Juma resolved KAFKA-6879. -------------------------------- Resolution: Fixed > Controller deadlock following session expiration > ------------------------------------------------ > > Key: KAFKA-6879 > URL: https://issues.apache.org/jira/browse/KAFKA-6879 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 1.1.0 > Reporter: Jason Gustafson > Assignee: Jason Gustafson > Priority: Critical > Fix For: 2.0.0, 1.1.1 > > > We have observed an apparent deadlock situation which occurs following a > session expiration. The suspected deadlock occurs between the zookeeper > "initializationLock" and the latch inside the Expire event which we use to > ensure all events have been handled. > In the logs, we see the "Session expired" message following acquisition of > the initialization lock: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/zookeeper/ZooKeeperClient.scala#L358 > But we never see any logs indicating that the new session is being > initialized. In fact, the controller logs are basically empty from that point > on. The problem we suspect is that completion of the > {{beforeInitializingSession}} callback requires that all events have finished > processing in order to count down the latch: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L1525. > But an event which was dequeued just prior to the acquisition of the write > lock may be unable to complete because it is awaiting acquisition of the > initialization lock: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/zookeeper/ZooKeeperClient.scala#L137. > The impact is that the broker continues in a zombie state. It continues > fetching and is periodically added to ISRs, but it never receives any further > requests from the controller since it is not registered. -- This message was sent by Atlassian JIRA (v7.6.3#76005)