[jira] [Commented] (KAFKA-4834) Kafka cannot delete topic with ReplicaStateMachine went wrong
[ https://issues.apache.org/jira/browse/KAFKA-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899306#comment-15899306 ] Dan commented on KAFKA-4834: No. I found in another broker's log that it changed the same partition from New to Online and succeeded. So I guess there exists two controllers watching for the zookeeper events. > Kafka cannot delete topic with ReplicaStateMachine went wrong > - > > Key: KAFKA-4834 > URL: https://issues.apache.org/jira/browse/KAFKA-4834 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.1.1 >Reporter: Dan > Labels: reliability > > It happened several times that some topics can not be deleted in our > production environment. By analyzing the log, we found ReplicaStateMachine > went wrong. Here are the error messages: > In state-change.log: > ERROR Controller 2 epoch 201 initiated state change of replica 1 for > partition [test_create_topic1,1] from OnlineReplica to ReplicaDeletionStarted > failed (state.change.logger) > java.lang.AssertionError: assertion failed: Replica > [Topic=test_create_topic1,Partition=1,Replica=1] should be in the > OfflineReplica states before moving to ReplicaDeletionStarted state. Instead > it is in OnlineReplica state > at scala.Predef$.assert(Predef.scala:179) > at > kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:309) > at > kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:190) > at > kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114) > at > kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114) > at > scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:114) > at > kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:344) > at > kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:334) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) > at > kafka.controller.TopicDeletionManager.startReplicaDeletion(TopicDeletionManager.scala:334) > at > kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onPartitionDeletion(TopicDeletionManager.scala:367) > at > kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:313) > at > kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:312) > at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) > at > kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onTopicDeletion(TopicDeletionManager.scala:312) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:431) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:403) > at > scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply$mcV$sp(TopicDeletionManager.scala:403) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread.doWork(TopicDeletionManager.scala:397) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > In controller.log: > INFO Leader not yet assigned for partition [test_create_topic1,1]. Skip > sending UpdateMetadataRequest. (kafka.controller.ControllerBrokerRequestBatch) > There may exist two controllers in the cluster because creating a new topic > may trigger two machines to change the state of same partition, eg. > NonExistentPartition -> NewPartition. > On the other controller, we found following messages in controller.log of > several days earlier: >
[jira] [Commented] (KAFKA-4834) Kafka cannot delete topic with ReplicaStateMachine went wrong
[ https://issues.apache.org/jira/browse/KAFKA-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898790#comment-15898790 ] huxi commented on KAFKA-4834: - Did you delete the zookeeper nodes manually before issuing this delete topics? > Kafka cannot delete topic with ReplicaStateMachine went wrong > - > > Key: KAFKA-4834 > URL: https://issues.apache.org/jira/browse/KAFKA-4834 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.1.1 >Reporter: Dan > Labels: reliability > > It happened several times that some topics can not be deleted in our > production environment. By analyzing the log, we found ReplicaStateMachine > went wrong. Here are the error messages: > In state-change.log: > ERROR Controller 2 epoch 201 initiated state change of replica 1 for > partition [test_create_topic1,1] from OnlineReplica to ReplicaDeletionStarted > failed (state.change.logger) > java.lang.AssertionError: assertion failed: Replica > [Topic=test_create_topic1,Partition=1,Replica=1] should be in the > OfflineReplica states before moving to ReplicaDeletionStarted state. Instead > it is in OnlineReplica state > at scala.Predef$.assert(Predef.scala:179) > at > kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:309) > at > kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:190) > at > kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114) > at > kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114) > at > scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:114) > at > kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:344) > at > kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:334) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) > at > kafka.controller.TopicDeletionManager.startReplicaDeletion(TopicDeletionManager.scala:334) > at > kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onPartitionDeletion(TopicDeletionManager.scala:367) > at > kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:313) > at > kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:312) > at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) > at > kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onTopicDeletion(TopicDeletionManager.scala:312) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:431) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:403) > at > scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply$mcV$sp(TopicDeletionManager.scala:403) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread.doWork(TopicDeletionManager.scala:397) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > In controller.log: > INFO Leader not yet assigned for partition [test_create_topic1,1]. Skip > sending UpdateMetadataRequest. (kafka.controller.ControllerBrokerRequestBatch) > There may exist two controllers in the cluster because creating a new topic > may trigger two machines to change the state of same partition, eg. > NonExistentPartition -> NewPartition. > On the other controller, we found following messages in controller.log of > several days earlier: > [2017-02-25 16:51:22,353] INFO [Topic Deletion Manager 0], Topic deletion > callback for
[jira] [Commented] (KAFKA-4834) Kafka cannot delete topic with ReplicaStateMachine went wrong
[ https://issues.apache.org/jira/browse/KAFKA-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896740#comment-15896740 ] Dan commented on KAFKA-4834: By invoking AdminUtils.deleteTopic(zkUtils, topic). It seemed something wrong when the partition state changed from New to Online. Here are the error messages: [2017-03-01 13:38:51,610] ERROR Controller 2 epoch 201 encountered error while changing partition [test_create_topic1,1]'s state from New to Online since LeaderAndIsr path already exists with value {"leader":3,"leader_epoch":0,"isr":[3,1,2]} and controller epoch 200 (state.change.logger) kafka.common.StateChangeFailedException: encountered error while changing partition [test_create_topic1,1]'s state from New to Online since LeaderAndIsr path already exists with value {"leader":3,"leader_epoch":0,"isr":[3,1,2]} and controller epoch 200 > Kafka cannot delete topic with ReplicaStateMachine went wrong > - > > Key: KAFKA-4834 > URL: https://issues.apache.org/jira/browse/KAFKA-4834 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.1.1 >Reporter: Dan > > It happened several times that some topics can not be deleted in our > production environment. By analyzing the log, we found ReplicaStateMachine > went wrong. Here are the error messages: > In state-change.log: > ERROR Controller 2 epoch 201 initiated state change of replica 1 for > partition [test_create_topic1,1] from OnlineReplica to ReplicaDeletionStarted > failed (state.change.logger) > java.lang.AssertionError: assertion failed: Replica > [Topic=test_create_topic1,Partition=1,Replica=1] should be in the > OfflineReplica states before moving to ReplicaDeletionStarted state. Instead > it is in OnlineReplica state > at scala.Predef$.assert(Predef.scala:179) > at > kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:309) > at > kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:190) > at > kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114) > at > kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114) > at > scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:114) > at > kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:344) > at > kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:334) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) > at > kafka.controller.TopicDeletionManager.startReplicaDeletion(TopicDeletionManager.scala:334) > at > kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onPartitionDeletion(TopicDeletionManager.scala:367) > at > kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:313) > at > kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:312) > at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) > at > kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onTopicDeletion(TopicDeletionManager.scala:312) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:431) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:403) > at > scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply$mcV$sp(TopicDeletionManager.scala:403) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread.doWork(TopicDeletionManager.scala:397) > at
[jira] [Commented] (KAFKA-4834) Kafka cannot delete topic with ReplicaStateMachine went wrong
[ https://issues.apache.org/jira/browse/KAFKA-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893619#comment-15893619 ] huxi commented on KAFKA-4834: - How did you delete the topic? Seems the partitions' state is still Online which is weird a little bit since delete thread should firstly put them into Offline. > Kafka cannot delete topic with ReplicaStateMachine went wrong > - > > Key: KAFKA-4834 > URL: https://issues.apache.org/jira/browse/KAFKA-4834 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.1.1 >Reporter: Dan > > It happened several times that some topics can not be deleted in our > production environment. By analyzing the log, we found ReplicaStateMachine > went wrong. Here are the error messages: > In state-change.log: > ERROR Controller 2 epoch 201 initiated state change of replica 1 for > partition [test_create_topic1,1] from OnlineReplica to ReplicaDeletionStarted > failed (state.change.logger) > java.lang.AssertionError: assertion failed: Replica > [Topic=test_create_topic1,Partition=1,Replica=1] should be in the > OfflineReplica states before moving to ReplicaDeletionStarted state. Instead > it is in OnlineReplica state > at scala.Predef$.assert(Predef.scala:179) > at > kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:309) > at > kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:190) > at > kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114) > at > kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114) > at > scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:114) > at > kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:344) > at > kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:334) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) > at > kafka.controller.TopicDeletionManager.startReplicaDeletion(TopicDeletionManager.scala:334) > at > kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onPartitionDeletion(TopicDeletionManager.scala:367) > at > kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:313) > at > kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:312) > at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) > at > kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onTopicDeletion(TopicDeletionManager.scala:312) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:431) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:403) > at > scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply$mcV$sp(TopicDeletionManager.scala:403) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread.doWork(TopicDeletionManager.scala:397) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > In controller.log: > INFO Leader not yet assigned for partition [test_create_topic1,1]. Skip > sending UpdateMetadataRequest. (kafka.controller.ControllerBrokerRequestBatch) > There may exist two controllers in the cluster because creating a new topic > may trigger two machines to change the state of same partition, eg. > NonExistentPartition -> NewPartition. > On the other controller, we found following messages in controller.log of > several days earlier: > [2017-02-25 16:51:22,353] INFO [Topic Deletion Manager