Topic it seems would get deleted but request in ZK to delete topic would
not get cleared even after restarting Kafka cluster.

I'm still investigating why deletion did not complete in the first place
without restarting any nodes. It seems something smelly happens when there
is request to delete more than one topic.

Anyway, I think I found one potential bug in
ReplicaStateMachine.areAllReplicasForTopicDeleted check which could be
cause for not clearing deletion request from ZK even after restart of whole
cluster. Line ReplicaStateMachine.scala#L285
<https://github.com/sslavic/kafka/blob/trunk/core/src/main/scala/kafka/controller/ReplicaStateMachine.scala#L285>

replicaStatesForTopic.forall(_._2 == ReplicaDeletionSuccessful)

which is return value of that function/check, probably should better be
checking for

replicaStatesForTopic.isEmpty || replicaStatesForTopic.forall(_._2 ==
ReplicaDeletionSuccessful)

I noticed it because in controller logs I found entries like:

[2016-03-04 13:27:29,115] DEBUG [Replica state machine on controller 1]:
Are all replicas for topic foo deleted Map()
(kafka.controller.ReplicaStateMachine)

even though normally they look like:

[2016-03-04 09:33:41,036] DEBUG [Replica state machine on controller 1]:
Are all replicas for topic foo deleted
Map([Topic=foo,Partition=0,Replica=0] -> ReplicaDeletionStarted,
[Topic=foo,Partition=0,Replica=3] -> ReplicaDeletionStarted,
[Topic=foo,Partition=0,Replica=1] -> ReplicaDeletionSuccessful)
(kafka.controller.ReplicaStateMachine)

Kind regards,
Stevo Slavic.

On Sun, Mar 6, 2016 at 12:31 AM, Guozhang Wang <wangg...@gmail.com> wrote:

> Thanks Stevo,
>
> Feel free to paste your findings in KAFKA-2937, we can re-open that ticket
> if necessary.
>
> Guozhang
>
> On Fri, Mar 4, 2016 at 4:38 AM, Stevo Slavić <ssla...@gmail.com> wrote:
>
> > Hell Apache Kafka community,
> >
> > I'm still investigating an incident; from initial findings topic deletion
> > doesn't seem to work well still with Kafka 0.9.0.1, likely some edge case
> > not covered.
> >
> > Before with 0.8.2.x it used to happen that non-lead replica would be
> stuck
> > in topic deletion process, and workaround was just to restart that node.
> >
> > If I'm not mistaken, that edge case got (or at least is expected to be)
> > fixed in 0.9.0.1 via KAFKA-2937
> > <https://issues.apache.org/jira/browse/KAFKA-2937>
> >
> > Request to delete topic continued to be there in ZK even after whole
> > cluster restart - topic seemed not to exist, seemed to actually be
> deleted,
> > but request to delete topic would remain. Had to manually delete request
> > node in ZK.
> >
> > When I have more details, and reproducible use case, will report back.
> >
> > Kind regards,
> > Stevo Slavic.
> >
>
>
>
> --
> -- Guozhang
>

Reply via email to