[ 
https://issues.apache.org/jira/browse/KAFKA-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated KAFKA-6618:
----------------------------
    Issue Type: Bug  (was: Improvement)

> Prevent two controllers from updating znodes concurrently
> ---------------------------------------------------------
>
>                 Key: KAFKA-6618
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6618
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Dong Lin
>            Assignee: Dong Lin
>            Priority: Major
>
> Kafka controller may fail to function properly (even after repeated 
> controller movement) due to the following sequence of events:
> - User requests topic deletion
> - Controller A deletes the partition znode
> - Controller B becomes controller and reads the topic znode
> - Controller A deletes the topic znode and remove the topic from the topic 
> deletion znode
> - Controller B reads the partition znode and topic deletion znode
> - According to controller B's context, the topic znode exists, the topic is 
> not listed for deletion, and some partition is not found for the given topic. 
> Then controller B will create topic znode with empty data (i.e. partition 
> assignment) and create the partition znodes.
> - All controller after controller B will fail because there is not data in 
> the topic znode.
> The long term solution is to have a way to prevent old controller from 
> writing to zookeeper if it is not the active controller. One idea is to use 
> the zookeeper multi API (See 
> [https://zookeeper.apache.org/doc/r3.4.3/api/org/apache/zookeeper/ZooKeeper.html#multi(java.lang.Iterable))]
>  such that controller only writes to zookeeper if the zk version of the 
> controller znode has not been changed.
> The short term solution is to let controller reads the topic deletion znode 
> first. If the topic is still listed in the topic deletion znode, then the new 
> controller will properly handle partition states of this topic without 
> creating partition znodes for this topic. And if the topic is not listed in 
> the topic deletion znode, then both the topic znode and the partition znodes 
> of this topic should have been deleted by the time the new controller tries 
> to read them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to