Hi Sanaa,
I actually ran a migration twice.
First locally just following the procedure described by the official Kafka
documentation https://kafka.apache.org/documentation/#kraft_zk_migration
and then on Kubernetes, because I notice you are talking about StatefulSet.
But in this case I used the Strimzi operator https://strimzi.io/ to do so
(disclaimer, I am one of the maintainers and we recently added the
automatic migration feature in latest 0.40.0 release).
But referring to your problem, you are also mentioning a dashboard which is
not clear from where it's coming and which kind of metrics are you looking
at to say that you have two controllers at the same time (a broker and a
KRaft controller) which is weird because the source of truth is the
/controller znode on ZooKeeper.
I think you should look at brokers and controllers logs and investigate if
anything is going bad which doesn't allow a KRaft node to take over as
controller.

Thanks
Paolo


On Tue, 19 Mar 2024, 23:49 Sanaa Syed, <sanaa.s...@shopify.com.invalid>
wrote:

> Hi Paolo,
>
> Thank you for your response! I tested out a different theory today where I
> deployed the kraft controller statefulset and waited to see which brokers
> would be elected as controllers.
>
> Here is an example of my migration right after I have provisioned the kraft
> controller brokers/statefulset. At this point, the brokers haven't been
> restarted.
>
> get /controller
>
> {"version":2,"brokerid":1,"timestamp":"1710876891432","kraftControllerEpoch":-1}
>
> get /migration
>
> {"version":0,"kraft_metadata_offset":-1,"kraft_controller_id":-1,"kraft_metadata_epoch":-1,"kraft_controller_epoch":-1}
>
> At this point, on a dashboard I have I see that a kafka broker is a
> controller and a kraft controller broker is also a controller (although
> it's not what I see in zookeeper as shown above). One thing to note is I am
> doing this migration on a stretched cluster so this may alter the way that
> the quorum is set up (I have three kraft controller brokers across three
> regions).
>
> After I roll the brokers, I find that this is still the case (the kraft
> controller epoch has not increased in either znodes). If you don't mind
> sharing, what were the steps that you followed to migrate to KRaft?
>
> Thank you,
> Sanaa
>
> On Tue, Mar 19, 2024 at 7:05 AM Paolo Patierno <paolo.patie...@gmail.com>
> wrote:
>
> > Hi Sanaa,
> > from my experiece about running migration it never happened to me and it
> > should not happen anyway.
> >
> > When a (ZooKeeper-based) broker registers to be the controller at the
> > beginning, you can see that the corresponding /controller znode will have
> > an -1 as epoch.
> > Something like:
> >
> >
> {"version":2,"brokerid":0,"timestamp":"1710845218527","kraftControllerEpoch":-1}
> > When you deploy the KRaft quorum controller and roll the brokers to
> > register and start the migration, the controller role is got by one of
> the
> > KRaft controller and its epoch will be for sure greater than -1.
> > Something
> > like:
> >
> {"version":2,"brokerid":4,"timestamp":"1710844690234","kraftControllerEpoch":10}
> > A KRaft controller is able to "steal" the controller role even because
> its
> > epoch will be for sure greater than -1.
> > So during or after a migration, a broker could not get the controller
> role
> > because only one with epoch greater than the current one can do that (and
> > for a ZooKeeper-based broker it will be always -1).
> > A broker can get back to be controller when you rollback migration so you
> > delete the controllers, and you have to delete the /controller znode (as
> > the procedure describe). Only in this case a broker is able to "win" the
> > /controller by using -1 as epoch (because the /controller znode doesn't
> > exist anymore).
> > Not sure if in your case you made some mistakes during the migration or
> > rolling the brokers.
> >
> > Thanks,
> > Paolo.
> >
> >
> > On Mon, 18 Mar 2024 at 21:22, Sanaa Syed <sanaa.s...@shopify.com
> .invalid>
> > wrote:
> >
> > > Hello,
> > >
> > > I've begun migrating some of my Zookeeper Kafka clusters to KRaft. A
> > > behaviour I've noticed twice across two different kafka cluster
> > > environments is after provisioning a kraft controller quorum in
> migration
> > > mode, it is possible for a kafka broker to become an active controller
> > > alongside a kraft controller broker.
> > >
> > > For example, here are the steps I follow and the behaviour I notice
> (I'm
> > > currently using Kafka v3.6):
> > > 1. Enable the KRaft migration on the existing Kafka brokers (set the
> > > `controller.quorum.voter`, `controller.listener.names` and
> > > `zookeeper.metadata.migration.enable` configs in the server.properties
> > > file).
> > > 2. Deploy a kraft controller statefulset and service with the migration
> > > enabled so that data is copied over from Zookeeper and we enter a
> > > dual-write mode.
> > > 3. After a few minutes, I see that the migration has completed (it's a
> > > pretty small cluster). At this point, the kraft controller pod has been
> > > elected to be the controller (and I see this in zookeeper when I run
> `get
> > > /controller`). If the kafka brokers or kraft controller pods are
> > restarted
> > > at any point after the migration is completed, a kafka broker is
> elected
> > to
> > > be the controller and is reflected in zookeeper as well. Now, I have
> two
> > > active controllers - 1 is a kafka broker and 1 is a kraft controller
> > > broker.
> > >
> > > A couple questions I have:
> > > 1. Is this the expected behaviour? If so, how long after a migration
> has
> > > been completed should we hold off on restarting kafka brokers to avoid
> > this
> > > situation?
> > > 2. Why is it possible for a kafka broker to be a controller again
> > > post-migration?
> > > 3. How do we come back to a state where a kraft controller broker is
> the
> > > only controller again in the least disruptive way possible?
> > >
> > > Thank you,
> > > Sanaa
> > >
> >
> >
> > --
> > Paolo Patierno
> >
> > *Senior Principal Software Engineer @ Red Hat**Microsoft MVP on **Azure*
> >
> > Twitter : @ppatierno <http://twitter.com/ppatierno>
> > Linkedin : paolopatierno <http://it.linkedin.com/in/paolopatierno>
> > GitHub : ppatierno <https://github.com/ppatierno>
> >
>
>
> --
> Sanaa Syed
>

Reply via email to