Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2023-03-15 Thread David Arthur
Hey folks, I just wanted to update this thread with some minor changes I've
made to the KIP based on the implementation in 3.4.

* No new ApiVersionsResponse, we use the "metadata.version" feature flag
instead
* Pre-Migration instead of "Soft Start" for the phase of the controller
before it can accept metadata changes
* Add "kraftControllerEpoch" to /controller ZNode instead just the
"isKRaft" boolean
* Rename ZkMigrationRecord to ZkMigrationStateRecord
* In RegisterBrokerRecord, rename "IsZkBroker" to "IsMigratingZkBroker"
* In BrokerRegistrationRequest, rename "ZkMigrationReady" to
"IsMigratingZkBroker"
(no longer a tagged field)
* In the controller RPCs, use "isKraftController" boolean field instead of
"KRaftControllerId"


All of these changes were made prior to the release, so there's no
compatibility issues. I just wanted to update the KIP to make it reflect
reality.


Cheers,

David

On Tue, Nov 29, 2022 at 10:14 PM Luke Chen  wrote:

> Thanks for the reply.
> Yes, I think adding a section in Rejected Alternatives to explain the
> rationale why we don't support combined mode upgrade in this KIP is
> helpful.
>
> Thank you.
> Luke
>
> On Wed, Nov 30, 2022 at 8:47 AM Jun Rao  wrote:
>
> > Hi, David,
> >
> > Thanks for the reply. No other comments from me.
> >
> > Jun
> >
> > On Tue, Nov 29, 2022 at 2:57 PM David Arthur
> >  wrote:
> >
> > > Jun,
> > >
> > > 51. You're right, I missed that in the latest update. It's fixed now.
> > >
> > > 54. I was thinking we could update meta.properties to v1 as the brokers
> > > were being restarted in KRaft mode without the migration config set.
> > > However, as you mentioned, it is still possible to downgrade even then
> > (as
> > > long as the controller has not exited dual-write mode). Upgrading the
> > > meta.properties after seeing the final ZkMigrationRecord sounds like a
> > good
> > > idea to me. I've updated the KIP to include this detail under
> > > "Meta.Properties" section.
> > >
> > > 58. Yes, the metadata migration from ZK to KRaft will not migrate the
> > > contents of /brokers.
> > >
> > > Thanks,
> > > David
> > >
> > > On Tue, Nov 29, 2022 at 4:50 PM Jun Rao 
> > wrote:
> > >
> > > > Hi, David,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > 51. Is this reflected in the KIP? It seems that ZkMigrationState
> still
> > > has
> > > > the None value.
> > > >
> > > > 54. Supporting both v0 and v1 indefinitely in a KRaft broker could
> be a
> > > bit
> > > > confusing and may complicate future upgrades. Another approach is to
> > let
> > > > KRaft broker write the v1 meta.properties after the KRaft controller
> > > exits
> > > > the dual write mode. We could extend ZkMigrationRecord to 3 states
> like
> > > > migration, dualWrite and KRaftOnly. Once a broker sees KRaftOnly, it
> > will
> > > > write meta.properties in v1 format. At that point, downgrade could
> > cause
> > > > metadata loss and require manual work. Will that work?
> > > >
> > > > 58. When copying metadata from ZK to KRaft, I guess we will ignore
> > broker
> > > > registration since the KRaft controller has already generated a
> > > > BrokerRegistrationRecord based on BrokerRegistrationRequest?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Tue, Nov 29, 2022 at 7:14 AM David Arthur
> > > >  wrote:
> > > >
> > > > > Jun, Thanks for the comments. Igor, please see 54 below for some
> > > > additional
> > > > > discussion on the meta.properties
> > > > >
> > > > > 50.1 Yes, that field name sounds fine to me.
> > > > >
> > > > > 50.2 Ok, I'll add something to the KIP under the Controller
> section.
> > To
> > > > > your other question, NoOpRecords are used as part of our liveness
> > check
> > > > for
> > > > > the quorum. It doesn't produce any metadata really, so I don't
> think
> > it
> > > > > causes any harm to let it happen before the migration.  KIP-835 has
> > the
> > > > > details on the NoOpRecords
> > > > >
> > > > > 54. Colin and I discussed the meta.properties issue last night. How
> > > about
> > > > > we simply let the KRaft broker accept v0 or v1 meta.properties. At
> > this
> > > > > point, the two versions have the same contents, but different field
> > > > names.
> > > > > By leaving the meta.properties intact, we can simplify the
> downgrade
> > > > > process. If we care to, we could rewrite meta.properties once a
> > broker
> > > is
> > > > > restarted after the migration is finalized (migration config
> > disabled).
> > > > >
> > > > > 57. If a ZK broker can't send a BrokerRegistrationRequest because
> the
> > > > > quorum is unavailable, it should just continue operating normally.
> > > Once a
> > > > > leader is available, the broker will send the registration and
> start
> > > > > heart-beating. Unlike KRaft mode, we won't block startup on a
> > > successful
> > > > > BrokerRegistration response. Concretely, BrokerLifecycleManager
> will
> > > keep
> > > > > trying to contact the quorum in its own thread until the
> > > > > 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-29 Thread Luke Chen
Thanks for the reply.
Yes, I think adding a section in Rejected Alternatives to explain the
rationale why we don't support combined mode upgrade in this KIP is helpful.

Thank you.
Luke

On Wed, Nov 30, 2022 at 8:47 AM Jun Rao  wrote:

> Hi, David,
>
> Thanks for the reply. No other comments from me.
>
> Jun
>
> On Tue, Nov 29, 2022 at 2:57 PM David Arthur
>  wrote:
>
> > Jun,
> >
> > 51. You're right, I missed that in the latest update. It's fixed now.
> >
> > 54. I was thinking we could update meta.properties to v1 as the brokers
> > were being restarted in KRaft mode without the migration config set.
> > However, as you mentioned, it is still possible to downgrade even then
> (as
> > long as the controller has not exited dual-write mode). Upgrading the
> > meta.properties after seeing the final ZkMigrationRecord sounds like a
> good
> > idea to me. I've updated the KIP to include this detail under
> > "Meta.Properties" section.
> >
> > 58. Yes, the metadata migration from ZK to KRaft will not migrate the
> > contents of /brokers.
> >
> > Thanks,
> > David
> >
> > On Tue, Nov 29, 2022 at 4:50 PM Jun Rao 
> wrote:
> >
> > > Hi, David,
> > >
> > > Thanks for the reply.
> > >
> > > 51. Is this reflected in the KIP? It seems that ZkMigrationState still
> > has
> > > the None value.
> > >
> > > 54. Supporting both v0 and v1 indefinitely in a KRaft broker could be a
> > bit
> > > confusing and may complicate future upgrades. Another approach is to
> let
> > > KRaft broker write the v1 meta.properties after the KRaft controller
> > exits
> > > the dual write mode. We could extend ZkMigrationRecord to 3 states like
> > > migration, dualWrite and KRaftOnly. Once a broker sees KRaftOnly, it
> will
> > > write meta.properties in v1 format. At that point, downgrade could
> cause
> > > metadata loss and require manual work. Will that work?
> > >
> > > 58. When copying metadata from ZK to KRaft, I guess we will ignore
> broker
> > > registration since the KRaft controller has already generated a
> > > BrokerRegistrationRecord based on BrokerRegistrationRequest?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Nov 29, 2022 at 7:14 AM David Arthur
> > >  wrote:
> > >
> > > > Jun, Thanks for the comments. Igor, please see 54 below for some
> > > additional
> > > > discussion on the meta.properties
> > > >
> > > > 50.1 Yes, that field name sounds fine to me.
> > > >
> > > > 50.2 Ok, I'll add something to the KIP under the Controller section.
> To
> > > > your other question, NoOpRecords are used as part of our liveness
> check
> > > for
> > > > the quorum. It doesn't produce any metadata really, so I don't think
> it
> > > > causes any harm to let it happen before the migration.  KIP-835 has
> the
> > > > details on the NoOpRecords
> > > >
> > > > 54. Colin and I discussed the meta.properties issue last night. How
> > about
> > > > we simply let the KRaft broker accept v0 or v1 meta.properties. At
> this
> > > > point, the two versions have the same contents, but different field
> > > names.
> > > > By leaving the meta.properties intact, we can simplify the downgrade
> > > > process. If we care to, we could rewrite meta.properties once a
> broker
> > is
> > > > restarted after the migration is finalized (migration config
> disabled).
> > > >
> > > > 57. If a ZK broker can't send a BrokerRegistrationRequest because the
> > > > quorum is unavailable, it should just continue operating normally.
> > Once a
> > > > leader is available, the broker will send the registration and start
> > > > heart-beating. Unlike KRaft mode, we won't block startup on a
> > successful
> > > > BrokerRegistration response. Concretely, BrokerLifecycleManager will
> > keep
> > > > trying to contact the quorum in its own thread until the
> > > > BrokerToChannelManager gets a controller ID from KafkaRaftManager.
> This
> > > > shouldn't interfere with other ZK broker activity.
> > > >
> > > > -David
> > > >
> > > > >
> > > >
> > > > --
> > > > -David
> > > >
> > >
> >
> >
> > --
> > -David
> >
>


Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-29 Thread Jun Rao
Hi, David,

Thanks for the reply. No other comments from me.

Jun

On Tue, Nov 29, 2022 at 2:57 PM David Arthur
 wrote:

> Jun,
>
> 51. You're right, I missed that in the latest update. It's fixed now.
>
> 54. I was thinking we could update meta.properties to v1 as the brokers
> were being restarted in KRaft mode without the migration config set.
> However, as you mentioned, it is still possible to downgrade even then (as
> long as the controller has not exited dual-write mode). Upgrading the
> meta.properties after seeing the final ZkMigrationRecord sounds like a good
> idea to me. I've updated the KIP to include this detail under
> "Meta.Properties" section.
>
> 58. Yes, the metadata migration from ZK to KRaft will not migrate the
> contents of /brokers.
>
> Thanks,
> David
>
> On Tue, Nov 29, 2022 at 4:50 PM Jun Rao  wrote:
>
> > Hi, David,
> >
> > Thanks for the reply.
> >
> > 51. Is this reflected in the KIP? It seems that ZkMigrationState still
> has
> > the None value.
> >
> > 54. Supporting both v0 and v1 indefinitely in a KRaft broker could be a
> bit
> > confusing and may complicate future upgrades. Another approach is to let
> > KRaft broker write the v1 meta.properties after the KRaft controller
> exits
> > the dual write mode. We could extend ZkMigrationRecord to 3 states like
> > migration, dualWrite and KRaftOnly. Once a broker sees KRaftOnly, it will
> > write meta.properties in v1 format. At that point, downgrade could cause
> > metadata loss and require manual work. Will that work?
> >
> > 58. When copying metadata from ZK to KRaft, I guess we will ignore broker
> > registration since the KRaft controller has already generated a
> > BrokerRegistrationRecord based on BrokerRegistrationRequest?
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Nov 29, 2022 at 7:14 AM David Arthur
> >  wrote:
> >
> > > Jun, Thanks for the comments. Igor, please see 54 below for some
> > additional
> > > discussion on the meta.properties
> > >
> > > 50.1 Yes, that field name sounds fine to me.
> > >
> > > 50.2 Ok, I'll add something to the KIP under the Controller section. To
> > > your other question, NoOpRecords are used as part of our liveness check
> > for
> > > the quorum. It doesn't produce any metadata really, so I don't think it
> > > causes any harm to let it happen before the migration.  KIP-835 has the
> > > details on the NoOpRecords
> > >
> > > 54. Colin and I discussed the meta.properties issue last night. How
> about
> > > we simply let the KRaft broker accept v0 or v1 meta.properties. At this
> > > point, the two versions have the same contents, but different field
> > names.
> > > By leaving the meta.properties intact, we can simplify the downgrade
> > > process. If we care to, we could rewrite meta.properties once a broker
> is
> > > restarted after the migration is finalized (migration config disabled).
> > >
> > > 57. If a ZK broker can't send a BrokerRegistrationRequest because the
> > > quorum is unavailable, it should just continue operating normally.
> Once a
> > > leader is available, the broker will send the registration and start
> > > heart-beating. Unlike KRaft mode, we won't block startup on a
> successful
> > > BrokerRegistration response. Concretely, BrokerLifecycleManager will
> keep
> > > trying to contact the quorum in its own thread until the
> > > BrokerToChannelManager gets a controller ID from KafkaRaftManager. This
> > > shouldn't interfere with other ZK broker activity.
> > >
> > > -David
> > >
> > > >
> > >
> > > --
> > > -David
> > >
> >
>
>
> --
> -David
>


Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-29 Thread David Arthur
Jun,

51. You're right, I missed that in the latest update. It's fixed now.

54. I was thinking we could update meta.properties to v1 as the brokers
were being restarted in KRaft mode without the migration config set.
However, as you mentioned, it is still possible to downgrade even then (as
long as the controller has not exited dual-write mode). Upgrading the
meta.properties after seeing the final ZkMigrationRecord sounds like a good
idea to me. I've updated the KIP to include this detail under
"Meta.Properties" section.

58. Yes, the metadata migration from ZK to KRaft will not migrate the
contents of /brokers.

Thanks,
David

On Tue, Nov 29, 2022 at 4:50 PM Jun Rao  wrote:

> Hi, David,
>
> Thanks for the reply.
>
> 51. Is this reflected in the KIP? It seems that ZkMigrationState still has
> the None value.
>
> 54. Supporting both v0 and v1 indefinitely in a KRaft broker could be a bit
> confusing and may complicate future upgrades. Another approach is to let
> KRaft broker write the v1 meta.properties after the KRaft controller exits
> the dual write mode. We could extend ZkMigrationRecord to 3 states like
> migration, dualWrite and KRaftOnly. Once a broker sees KRaftOnly, it will
> write meta.properties in v1 format. At that point, downgrade could cause
> metadata loss and require manual work. Will that work?
>
> 58. When copying metadata from ZK to KRaft, I guess we will ignore broker
> registration since the KRaft controller has already generated a
> BrokerRegistrationRecord based on BrokerRegistrationRequest?
>
> Thanks,
>
> Jun
>
> On Tue, Nov 29, 2022 at 7:14 AM David Arthur
>  wrote:
>
> > Jun, Thanks for the comments. Igor, please see 54 below for some
> additional
> > discussion on the meta.properties
> >
> > 50.1 Yes, that field name sounds fine to me.
> >
> > 50.2 Ok, I'll add something to the KIP under the Controller section. To
> > your other question, NoOpRecords are used as part of our liveness check
> for
> > the quorum. It doesn't produce any metadata really, so I don't think it
> > causes any harm to let it happen before the migration.  KIP-835 has the
> > details on the NoOpRecords
> >
> > 54. Colin and I discussed the meta.properties issue last night. How about
> > we simply let the KRaft broker accept v0 or v1 meta.properties. At this
> > point, the two versions have the same contents, but different field
> names.
> > By leaving the meta.properties intact, we can simplify the downgrade
> > process. If we care to, we could rewrite meta.properties once a broker is
> > restarted after the migration is finalized (migration config disabled).
> >
> > 57. If a ZK broker can't send a BrokerRegistrationRequest because the
> > quorum is unavailable, it should just continue operating normally. Once a
> > leader is available, the broker will send the registration and start
> > heart-beating. Unlike KRaft mode, we won't block startup on a successful
> > BrokerRegistration response. Concretely, BrokerLifecycleManager will keep
> > trying to contact the quorum in its own thread until the
> > BrokerToChannelManager gets a controller ID from KafkaRaftManager. This
> > shouldn't interfere with other ZK broker activity.
> >
> > -David
> >
> > >
> >
> > --
> > -David
> >
>


-- 
-David


Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-29 Thread Jun Rao
Hi, David,

Thanks for the reply.

51. Is this reflected in the KIP? It seems that ZkMigrationState still has
the None value.

54. Supporting both v0 and v1 indefinitely in a KRaft broker could be a bit
confusing and may complicate future upgrades. Another approach is to let
KRaft broker write the v1 meta.properties after the KRaft controller exits
the dual write mode. We could extend ZkMigrationRecord to 3 states like
migration, dualWrite and KRaftOnly. Once a broker sees KRaftOnly, it will
write meta.properties in v1 format. At that point, downgrade could cause
metadata loss and require manual work. Will that work?

58. When copying metadata from ZK to KRaft, I guess we will ignore broker
registration since the KRaft controller has already generated a
BrokerRegistrationRecord based on BrokerRegistrationRequest?

Thanks,

Jun

On Tue, Nov 29, 2022 at 7:14 AM David Arthur
 wrote:

> Jun, Thanks for the comments. Igor, please see 54 below for some additional
> discussion on the meta.properties
>
> 50.1 Yes, that field name sounds fine to me.
>
> 50.2 Ok, I'll add something to the KIP under the Controller section. To
> your other question, NoOpRecords are used as part of our liveness check for
> the quorum. It doesn't produce any metadata really, so I don't think it
> causes any harm to let it happen before the migration.  KIP-835 has the
> details on the NoOpRecords
>
> 54. Colin and I discussed the meta.properties issue last night. How about
> we simply let the KRaft broker accept v0 or v1 meta.properties. At this
> point, the two versions have the same contents, but different field names.
> By leaving the meta.properties intact, we can simplify the downgrade
> process. If we care to, we could rewrite meta.properties once a broker is
> restarted after the migration is finalized (migration config disabled).
>
> 57. If a ZK broker can't send a BrokerRegistrationRequest because the
> quorum is unavailable, it should just continue operating normally. Once a
> leader is available, the broker will send the registration and start
> heart-beating. Unlike KRaft mode, we won't block startup on a successful
> BrokerRegistration response. Concretely, BrokerLifecycleManager will keep
> trying to contact the quorum in its own thread until the
> BrokerToChannelManager gets a controller ID from KafkaRaftManager. This
> shouldn't interfere with other ZK broker activity.
>
> -David
>
> >
>
> --
> -David
>


Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-29 Thread David Arthur
Jun, Thanks for the comments. Igor, please see 54 below for some additional
discussion on the meta.properties

50.1 Yes, that field name sounds fine to me.

50.2 Ok, I'll add something to the KIP under the Controller section. To
your other question, NoOpRecords are used as part of our liveness check for
the quorum. It doesn't produce any metadata really, so I don't think it
causes any harm to let it happen before the migration.  KIP-835 has the
details on the NoOpRecords

54. Colin and I discussed the meta.properties issue last night. How about
we simply let the KRaft broker accept v0 or v1 meta.properties. At this
point, the two versions have the same contents, but different field names.
By leaving the meta.properties intact, we can simplify the downgrade
process. If we care to, we could rewrite meta.properties once a broker is
restarted after the migration is finalized (migration config disabled).

57. If a ZK broker can't send a BrokerRegistrationRequest because the
quorum is unavailable, it should just continue operating normally. Once a
leader is available, the broker will send the registration and start
heart-beating. Unlike KRaft mode, we won't block startup on a successful
BrokerRegistration response. Concretely, BrokerLifecycleManager will keep
trying to contact the quorum in its own thread until the
BrokerToChannelManager gets a controller ID from KafkaRaftManager. This
shouldn't interfere with other ZK broker activity.

-David

>

-- 
-David


Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-24 Thread Igor Soarez
Hi David,

Zookeeper mode writes meta.properties with version=0. KRaft mode requires 
version=1 in meta.properties.

Will a manual step be required to update meta.properties or will brokers 
somehow update meta.properties files to version 1?

Thanks,

--
Igor



Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-09 Thread Jun Rao
Hi, David,

Thanks for the reply.

Now that we have added KRaftControllerId to both UpdateMetdataRequest and
LeaderAndIsrRequest, should we add it to StopReplicaRequest to make it more
consistent?

Thanks,

Jun

On Wed, Nov 9, 2022 at 8:17 AM David Arthur  wrote:

> Jun, I've updated the doc with clarification of combined mode under
> the ApiVersionsResponse section. I also added your suggestion to
> rename the config to something more explicit. Colin had some feedback
> in the VOTE thread which I've also incorporated.
>
> Thanks!
> David
>
> On Tue, Nov 8, 2022 at 1:18 PM Jun Rao  wrote:
> >
> > Hi, David,
> >
> > Thanks for the reply.
> >
> > 20/21. Thanks for the explanation. The suggestion sounds good. Could you
> > include that in the doc?
> >
> > 40. metadata.migration.enable: We may do future metadata migrations
> within
> > KRaft. Could we make the name more specific to ZK migration like
> > zookeeper.metadata.migration.enable?
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Nov 8, 2022 at 9:47 AM David Arthur  wrote:
> >
> > > Ah, sorry about that, you're right. Since we won't support ZK
> > > migrations in combined mode, is this issue avoided?
> > >
> > > Essentially, we would only set ZkMigrationReady in ApiVersionsResponse
> if
> > >
> > > * process.roles=controller
> > > * kafka.metadata.migration.enabled=true
> > >
> > > Also, the following would be an invalid config that should prevent
> startup:
> > >
> > > * process.roles=broker,controller
> > > * kafka.metadata.migration.enabled=true
> > >
> > > Does this seem reasonable?
> > >
> > > Thanks!
> > > David
> > >
> > > On Tue, Nov 8, 2022 at 11:12 AM Jun Rao 
> wrote:
> > > >
> > > > Hi, David,
> > > >
> > > > I am not sure that we are fully settled on the following.
> > > >
> > > > 20/21. Since separate listeners are optional, it seems that the
> broker
> > > > can't distinguish between ApiVersion requests coming from the client
> or
> > > > other brokers. This means the clients will get ZkMigrationReady in
> the
> > > > ApiVersion response, which is weird.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Tue, Nov 8, 2022 at 7:18 AM David Arthur 
> wrote:
> > > >
> > > > > Thanks for the discussion everyone, I'm going to move ahead with
> the
> > > > > vote for this KIP.
> > > > >
> > > > > -David
> > > > >
> > > > > On Thu, Nov 3, 2022 at 1:20 PM Jun Rao 
> > > wrote:
> > > > > >
> > > > > > Hi, David,
> > > > > >
> > > > > > Thanks for the reply.
> > > > > >
> > > > > > 20/21 Yes, but separate listeners are optional. It's possible
> for the
> > > > > nodes
> > > > > > to use a single port for both client and server side
> communications.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Thu, Nov 3, 2022 at 9:59 AM David Arthur
> > > > > >  wrote:
> > > > > >
> > > > > > > 20/21, in combined mode we still have a separate listener for
> the
> > > > > > > controller APIs, e.g.,
> > > > > > >
> > > > > > > listeners=PLAINTEXT://:9092,CONTROLLER://:9093
> > > > > > >
> > > > > > > inter.broker.listener.name=PLAINTEXT
> > > > > > >
> > > > > > > controller.listener.names=CONTROLLER
> > > > > > >
> > > > > > > advertised.listeners=PLAINTEXT://localhost:9092
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Clients still talk to the broker through the advertised
> listener,
> > > and
> > > > > only
> > > > > > > brokers and other controllers will communicate over the
> controller
> > > > > > > listener.
> > > > > > >
> > > > > > > 40. Sounds good, I updated the KIP
> > > > > > >
> > > > > > > Thanks!
> > > > > > > David
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Nov 3, 2022 at 12:14 PM Jun Rao
> 
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi, David,
> > > > > > > >
> > > > > > > > Thanks for the reply.
> > > > > > > >
> > > > > > > > 20/21. When KRaft runs in the combined mode, does a
> controller
> > > know
> > > > > > > whether
> > > > > > > > an ApiRequest is from a client or another broker?
> > > > > > > >
> > > > > > > > 40. Adding a "None" state sounds reasonable.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > > On Thu, Nov 3, 2022 at 8:39 AM David Arthur
> > > > > > > >  wrote:
> > > > > > > >
> > > > > > > > > Jun,
> > > > > > > > >
> > > > > > > > > 20/21 If we use a tagged field, then I don't think clients
> > > need to
> > > > > be
> > > > > > > > > concerned with it, right?. In ApiVersionsResponse sent by
> > > brokers
> > > > > to
> > > > > > > > > clients, this field would be omitted. Clients can't connect
> > > > > directly to
> > > > > > > > the
> > > > > > > > > KRaft controller nodes. Also, we have a precedent of
> sending
> > > > > controller
> > > > > > > > > node state between controllers through ApiVersions
> > > > > ("metadata.version"
> > > > > > > > > feature), so I think it fits well.
> > > > > > > > >
> > > > > > > > > 40. For new KRaft clusters, we could add another state to
> > > 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-09 Thread David Arthur
Jun, I've updated the doc with clarification of combined mode under
the ApiVersionsResponse section. I also added your suggestion to
rename the config to something more explicit. Colin had some feedback
in the VOTE thread which I've also incorporated.

Thanks!
David

On Tue, Nov 8, 2022 at 1:18 PM Jun Rao  wrote:
>
> Hi, David,
>
> Thanks for the reply.
>
> 20/21. Thanks for the explanation. The suggestion sounds good. Could you
> include that in the doc?
>
> 40. metadata.migration.enable: We may do future metadata migrations within
> KRaft. Could we make the name more specific to ZK migration like
> zookeeper.metadata.migration.enable?
>
> Thanks,
>
> Jun
>
> On Tue, Nov 8, 2022 at 9:47 AM David Arthur  wrote:
>
> > Ah, sorry about that, you're right. Since we won't support ZK
> > migrations in combined mode, is this issue avoided?
> >
> > Essentially, we would only set ZkMigrationReady in ApiVersionsResponse if
> >
> > * process.roles=controller
> > * kafka.metadata.migration.enabled=true
> >
> > Also, the following would be an invalid config that should prevent startup:
> >
> > * process.roles=broker,controller
> > * kafka.metadata.migration.enabled=true
> >
> > Does this seem reasonable?
> >
> > Thanks!
> > David
> >
> > On Tue, Nov 8, 2022 at 11:12 AM Jun Rao  wrote:
> > >
> > > Hi, David,
> > >
> > > I am not sure that we are fully settled on the following.
> > >
> > > 20/21. Since separate listeners are optional, it seems that the broker
> > > can't distinguish between ApiVersion requests coming from the client or
> > > other brokers. This means the clients will get ZkMigrationReady in the
> > > ApiVersion response, which is weird.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Nov 8, 2022 at 7:18 AM David Arthur  wrote:
> > >
> > > > Thanks for the discussion everyone, I'm going to move ahead with the
> > > > vote for this KIP.
> > > >
> > > > -David
> > > >
> > > > On Thu, Nov 3, 2022 at 1:20 PM Jun Rao 
> > wrote:
> > > > >
> > > > > Hi, David,
> > > > >
> > > > > Thanks for the reply.
> > > > >
> > > > > 20/21 Yes, but separate listeners are optional. It's possible for the
> > > > nodes
> > > > > to use a single port for both client and server side communications.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Thu, Nov 3, 2022 at 9:59 AM David Arthur
> > > > >  wrote:
> > > > >
> > > > > > 20/21, in combined mode we still have a separate listener for the
> > > > > > controller APIs, e.g.,
> > > > > >
> > > > > > listeners=PLAINTEXT://:9092,CONTROLLER://:9093
> > > > > >
> > > > > > inter.broker.listener.name=PLAINTEXT
> > > > > >
> > > > > > controller.listener.names=CONTROLLER
> > > > > >
> > > > > > advertised.listeners=PLAINTEXT://localhost:9092
> > > > > >
> > > > > >
> > > > > >
> > > > > > Clients still talk to the broker through the advertised listener,
> > and
> > > > only
> > > > > > brokers and other controllers will communicate over the controller
> > > > > > listener.
> > > > > >
> > > > > > 40. Sounds good, I updated the KIP
> > > > > >
> > > > > > Thanks!
> > > > > > David
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Nov 3, 2022 at 12:14 PM Jun Rao 
> > > > wrote:
> > > > > >
> > > > > > > Hi, David,
> > > > > > >
> > > > > > > Thanks for the reply.
> > > > > > >
> > > > > > > 20/21. When KRaft runs in the combined mode, does a controller
> > know
> > > > > > whether
> > > > > > > an ApiRequest is from a client or another broker?
> > > > > > >
> > > > > > > 40. Adding a "None" state sounds reasonable.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Thu, Nov 3, 2022 at 8:39 AM David Arthur
> > > > > > >  wrote:
> > > > > > >
> > > > > > > > Jun,
> > > > > > > >
> > > > > > > > 20/21 If we use a tagged field, then I don't think clients
> > need to
> > > > be
> > > > > > > > concerned with it, right?. In ApiVersionsResponse sent by
> > brokers
> > > > to
> > > > > > > > clients, this field would be omitted. Clients can't connect
> > > > directly to
> > > > > > > the
> > > > > > > > KRaft controller nodes. Also, we have a precedent of sending
> > > > controller
> > > > > > > > node state between controllers through ApiVersions
> > > > ("metadata.version"
> > > > > > > > feature), so I think it fits well.
> > > > > > > >
> > > > > > > > 40. For new KRaft clusters, we could add another state to
> > indicate
> > > > it
> > > > > > was
> > > > > > > > not migrated from ZK, but created as a KRaft cluster. Maybe
> > > > > > > > "kafka.controller:type=KafkaController,name=MigrationState" =>
> > > > "None" ?
> > > > > > > We
> > > > > > > > could also omit that metric for unmigrated clusters, but I'm
> > not a
> > > > fan
> > > > > > of
> > > > > > > > using the absence of a value to signal something.
> > > > > > > >
> > > > > > > > -
> > > > > > > >
> > > > > > > > Akhilesh, thanks for reviewing the KIP!
> > > > > > > >
> > > > > > > > 1. MigrationState and MetadataType are mostly the same 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-08 Thread Jun Rao
Hi, David,

Thanks for the reply.

20/21. Thanks for the explanation. The suggestion sounds good. Could you
include that in the doc?

40. metadata.migration.enable: We may do future metadata migrations within
KRaft. Could we make the name more specific to ZK migration like
zookeeper.metadata.migration.enable?

Thanks,

Jun

On Tue, Nov 8, 2022 at 9:47 AM David Arthur  wrote:

> Ah, sorry about that, you're right. Since we won't support ZK
> migrations in combined mode, is this issue avoided?
>
> Essentially, we would only set ZkMigrationReady in ApiVersionsResponse if
>
> * process.roles=controller
> * kafka.metadata.migration.enabled=true
>
> Also, the following would be an invalid config that should prevent startup:
>
> * process.roles=broker,controller
> * kafka.metadata.migration.enabled=true
>
> Does this seem reasonable?
>
> Thanks!
> David
>
> On Tue, Nov 8, 2022 at 11:12 AM Jun Rao  wrote:
> >
> > Hi, David,
> >
> > I am not sure that we are fully settled on the following.
> >
> > 20/21. Since separate listeners are optional, it seems that the broker
> > can't distinguish between ApiVersion requests coming from the client or
> > other brokers. This means the clients will get ZkMigrationReady in the
> > ApiVersion response, which is weird.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Nov 8, 2022 at 7:18 AM David Arthur  wrote:
> >
> > > Thanks for the discussion everyone, I'm going to move ahead with the
> > > vote for this KIP.
> > >
> > > -David
> > >
> > > On Thu, Nov 3, 2022 at 1:20 PM Jun Rao 
> wrote:
> > > >
> > > > Hi, David,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > 20/21 Yes, but separate listeners are optional. It's possible for the
> > > nodes
> > > > to use a single port for both client and server side communications.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Thu, Nov 3, 2022 at 9:59 AM David Arthur
> > > >  wrote:
> > > >
> > > > > 20/21, in combined mode we still have a separate listener for the
> > > > > controller APIs, e.g.,
> > > > >
> > > > > listeners=PLAINTEXT://:9092,CONTROLLER://:9093
> > > > >
> > > > > inter.broker.listener.name=PLAINTEXT
> > > > >
> > > > > controller.listener.names=CONTROLLER
> > > > >
> > > > > advertised.listeners=PLAINTEXT://localhost:9092
> > > > >
> > > > >
> > > > >
> > > > > Clients still talk to the broker through the advertised listener,
> and
> > > only
> > > > > brokers and other controllers will communicate over the controller
> > > > > listener.
> > > > >
> > > > > 40. Sounds good, I updated the KIP
> > > > >
> > > > > Thanks!
> > > > > David
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Nov 3, 2022 at 12:14 PM Jun Rao 
> > > wrote:
> > > > >
> > > > > > Hi, David,
> > > > > >
> > > > > > Thanks for the reply.
> > > > > >
> > > > > > 20/21. When KRaft runs in the combined mode, does a controller
> know
> > > > > whether
> > > > > > an ApiRequest is from a client or another broker?
> > > > > >
> > > > > > 40. Adding a "None" state sounds reasonable.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Thu, Nov 3, 2022 at 8:39 AM David Arthur
> > > > > >  wrote:
> > > > > >
> > > > > > > Jun,
> > > > > > >
> > > > > > > 20/21 If we use a tagged field, then I don't think clients
> need to
> > > be
> > > > > > > concerned with it, right?. In ApiVersionsResponse sent by
> brokers
> > > to
> > > > > > > clients, this field would be omitted. Clients can't connect
> > > directly to
> > > > > > the
> > > > > > > KRaft controller nodes. Also, we have a precedent of sending
> > > controller
> > > > > > > node state between controllers through ApiVersions
> > > ("metadata.version"
> > > > > > > feature), so I think it fits well.
> > > > > > >
> > > > > > > 40. For new KRaft clusters, we could add another state to
> indicate
> > > it
> > > > > was
> > > > > > > not migrated from ZK, but created as a KRaft cluster. Maybe
> > > > > > > "kafka.controller:type=KafkaController,name=MigrationState" =>
> > > "None" ?
> > > > > > We
> > > > > > > could also omit that metric for unmigrated clusters, but I'm
> not a
> > > fan
> > > > > of
> > > > > > > using the absence of a value to signal something.
> > > > > > >
> > > > > > > -
> > > > > > >
> > > > > > > Akhilesh, thanks for reviewing the KIP!
> > > > > > >
> > > > > > > 1. MigrationState and MetadataType are mostly the same on the
> > > > > controller,
> > > > > > > but we do have the "MigratingZkData" state that seems useful to
> > > report
> > > > > > as a
> > > > > > > metric. Aside from looking at logs, observing the controller in
> > > this
> > > > > > state
> > > > > > > is the only way to see how long its taking to copy data from
> ZK.
> > > > > > >
> > > > > > > "KRaftMode" instead of "MigrationFinalized" is similar to Jun's
> > > > > question
> > > > > > > about non-migrated clusters. I think it's useful to have a
> distinct
> > > > > > > MigrationState for clusters that have been migrated and those
> that
> > > were
> > > > > > 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-08 Thread David Arthur
Ah, sorry about that, you're right. Since we won't support ZK
migrations in combined mode, is this issue avoided?

Essentially, we would only set ZkMigrationReady in ApiVersionsResponse if

* process.roles=controller
* kafka.metadata.migration.enabled=true

Also, the following would be an invalid config that should prevent startup:

* process.roles=broker,controller
* kafka.metadata.migration.enabled=true

Does this seem reasonable?

Thanks!
David

On Tue, Nov 8, 2022 at 11:12 AM Jun Rao  wrote:
>
> Hi, David,
>
> I am not sure that we are fully settled on the following.
>
> 20/21. Since separate listeners are optional, it seems that the broker
> can't distinguish between ApiVersion requests coming from the client or
> other brokers. This means the clients will get ZkMigrationReady in the
> ApiVersion response, which is weird.
>
> Thanks,
>
> Jun
>
> On Tue, Nov 8, 2022 at 7:18 AM David Arthur  wrote:
>
> > Thanks for the discussion everyone, I'm going to move ahead with the
> > vote for this KIP.
> >
> > -David
> >
> > On Thu, Nov 3, 2022 at 1:20 PM Jun Rao  wrote:
> > >
> > > Hi, David,
> > >
> > > Thanks for the reply.
> > >
> > > 20/21 Yes, but separate listeners are optional. It's possible for the
> > nodes
> > > to use a single port for both client and server side communications.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Nov 3, 2022 at 9:59 AM David Arthur
> > >  wrote:
> > >
> > > > 20/21, in combined mode we still have a separate listener for the
> > > > controller APIs, e.g.,
> > > >
> > > > listeners=PLAINTEXT://:9092,CONTROLLER://:9093
> > > >
> > > > inter.broker.listener.name=PLAINTEXT
> > > >
> > > > controller.listener.names=CONTROLLER
> > > >
> > > > advertised.listeners=PLAINTEXT://localhost:9092
> > > >
> > > >
> > > >
> > > > Clients still talk to the broker through the advertised listener, and
> > only
> > > > brokers and other controllers will communicate over the controller
> > > > listener.
> > > >
> > > > 40. Sounds good, I updated the KIP
> > > >
> > > > Thanks!
> > > > David
> > > >
> > > >
> > > >
> > > > On Thu, Nov 3, 2022 at 12:14 PM Jun Rao 
> > wrote:
> > > >
> > > > > Hi, David,
> > > > >
> > > > > Thanks for the reply.
> > > > >
> > > > > 20/21. When KRaft runs in the combined mode, does a controller know
> > > > whether
> > > > > an ApiRequest is from a client or another broker?
> > > > >
> > > > > 40. Adding a "None" state sounds reasonable.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Thu, Nov 3, 2022 at 8:39 AM David Arthur
> > > > >  wrote:
> > > > >
> > > > > > Jun,
> > > > > >
> > > > > > 20/21 If we use a tagged field, then I don't think clients need to
> > be
> > > > > > concerned with it, right?. In ApiVersionsResponse sent by brokers
> > to
> > > > > > clients, this field would be omitted. Clients can't connect
> > directly to
> > > > > the
> > > > > > KRaft controller nodes. Also, we have a precedent of sending
> > controller
> > > > > > node state between controllers through ApiVersions
> > ("metadata.version"
> > > > > > feature), so I think it fits well.
> > > > > >
> > > > > > 40. For new KRaft clusters, we could add another state to indicate
> > it
> > > > was
> > > > > > not migrated from ZK, but created as a KRaft cluster. Maybe
> > > > > > "kafka.controller:type=KafkaController,name=MigrationState" =>
> > "None" ?
> > > > > We
> > > > > > could also omit that metric for unmigrated clusters, but I'm not a
> > fan
> > > > of
> > > > > > using the absence of a value to signal something.
> > > > > >
> > > > > > -
> > > > > >
> > > > > > Akhilesh, thanks for reviewing the KIP!
> > > > > >
> > > > > > 1. MigrationState and MetadataType are mostly the same on the
> > > > controller,
> > > > > > but we do have the "MigratingZkData" state that seems useful to
> > report
> > > > > as a
> > > > > > metric. Aside from looking at logs, observing the controller in
> > this
> > > > > state
> > > > > > is the only way to see how long its taking to copy data from ZK.
> > > > > >
> > > > > > "KRaftMode" instead of "MigrationFinalized" is similar to Jun's
> > > > question
> > > > > > about non-migrated clusters. I think it's useful to have a distinct
> > > > > > MigrationState for clusters that have been migrated and those that
> > were
> > > > > > never migrated. This does mean we'll report the MigrationState long
> > > > after
> > > > > > the migration is complete, but we can drop these metrics in 4.0
> > once ZK
> > > > > is
> > > > > > removed.
> > > > > >
> > > > > > 2. The "ZkMigrationReady" will indicate that the controller has
> > > > > > "kafka.metadata.migration.enable" _and_ the ZK configs set. We need
> > > > some
> > > > > > way to indicate that the whole quorum is correctly configured to
> > handle
> > > > > the
> > > > > > migration so we don't failover to a controller that's not
> > configured
> > > > for
> > > > > > ZK. Did I understand your question correctly?
> > > > > >
> > > > > > 3. Yea, good idea. 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-08 Thread Jun Rao
Hi, David,

I am not sure that we are fully settled on the following.

20/21. Since separate listeners are optional, it seems that the broker
can't distinguish between ApiVersion requests coming from the client or
other brokers. This means the clients will get ZkMigrationReady in the
ApiVersion response, which is weird.

Thanks,

Jun

On Tue, Nov 8, 2022 at 7:18 AM David Arthur  wrote:

> Thanks for the discussion everyone, I'm going to move ahead with the
> vote for this KIP.
>
> -David
>
> On Thu, Nov 3, 2022 at 1:20 PM Jun Rao  wrote:
> >
> > Hi, David,
> >
> > Thanks for the reply.
> >
> > 20/21 Yes, but separate listeners are optional. It's possible for the
> nodes
> > to use a single port for both client and server side communications.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Nov 3, 2022 at 9:59 AM David Arthur
> >  wrote:
> >
> > > 20/21, in combined mode we still have a separate listener for the
> > > controller APIs, e.g.,
> > >
> > > listeners=PLAINTEXT://:9092,CONTROLLER://:9093
> > >
> > > inter.broker.listener.name=PLAINTEXT
> > >
> > > controller.listener.names=CONTROLLER
> > >
> > > advertised.listeners=PLAINTEXT://localhost:9092
> > >
> > >
> > >
> > > Clients still talk to the broker through the advertised listener, and
> only
> > > brokers and other controllers will communicate over the controller
> > > listener.
> > >
> > > 40. Sounds good, I updated the KIP
> > >
> > > Thanks!
> > > David
> > >
> > >
> > >
> > > On Thu, Nov 3, 2022 at 12:14 PM Jun Rao 
> wrote:
> > >
> > > > Hi, David,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > 20/21. When KRaft runs in the combined mode, does a controller know
> > > whether
> > > > an ApiRequest is from a client or another broker?
> > > >
> > > > 40. Adding a "None" state sounds reasonable.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Thu, Nov 3, 2022 at 8:39 AM David Arthur
> > > >  wrote:
> > > >
> > > > > Jun,
> > > > >
> > > > > 20/21 If we use a tagged field, then I don't think clients need to
> be
> > > > > concerned with it, right?. In ApiVersionsResponse sent by brokers
> to
> > > > > clients, this field would be omitted. Clients can't connect
> directly to
> > > > the
> > > > > KRaft controller nodes. Also, we have a precedent of sending
> controller
> > > > > node state between controllers through ApiVersions
> ("metadata.version"
> > > > > feature), so I think it fits well.
> > > > >
> > > > > 40. For new KRaft clusters, we could add another state to indicate
> it
> > > was
> > > > > not migrated from ZK, but created as a KRaft cluster. Maybe
> > > > > "kafka.controller:type=KafkaController,name=MigrationState" =>
> "None" ?
> > > > We
> > > > > could also omit that metric for unmigrated clusters, but I'm not a
> fan
> > > of
> > > > > using the absence of a value to signal something.
> > > > >
> > > > > -
> > > > >
> > > > > Akhilesh, thanks for reviewing the KIP!
> > > > >
> > > > > 1. MigrationState and MetadataType are mostly the same on the
> > > controller,
> > > > > but we do have the "MigratingZkData" state that seems useful to
> report
> > > > as a
> > > > > metric. Aside from looking at logs, observing the controller in
> this
> > > > state
> > > > > is the only way to see how long its taking to copy data from ZK.
> > > > >
> > > > > "KRaftMode" instead of "MigrationFinalized" is similar to Jun's
> > > question
> > > > > about non-migrated clusters. I think it's useful to have a distinct
> > > > > MigrationState for clusters that have been migrated and those that
> were
> > > > > never migrated. This does mean we'll report the MigrationState long
> > > after
> > > > > the migration is complete, but we can drop these metrics in 4.0
> once ZK
> > > > is
> > > > > removed.
> > > > >
> > > > > 2. The "ZkMigrationReady" will indicate that the controller has
> > > > > "kafka.metadata.migration.enable" _and_ the ZK configs set. We need
> > > some
> > > > > way to indicate that the whole quorum is correctly configured to
> handle
> > > > the
> > > > > migration so we don't failover to a controller that's not
> configured
> > > for
> > > > > ZK. Did I understand your question correctly?
> > > > >
> > > > > 3. Yea, good idea. While the KRaft controller has
> > > > > MigrationState=MigrationIneligible, we could also report
> > > > >
> > > >
> > >
> "kafka.controller:type=KafkaController,name=MigrationInelgibleBrokerCount".
> > > > > It might be useful to report ineligible controllers as well since
> that
> > > > can
> > > > > prevent the migration from starting.
> > > > >
> > > > > 4. I think I covered this in "Incompatible Brokers". We effectively
> > > fence
> > > > > these brokers by not sending them metadata RPCs.
> > > > >
> > > > > Thanks!
> > > > > David
> > > > >
> > > > >
> > > > > On Tue, Nov 1, 2022 at 3:58 PM Akhilesh Chaganti <
> > > akhilesh@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi David,
> > > > > >
> > > > > >
> > > > > > Thanks for the KIP. I have some 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-08 Thread David Arthur
Thanks for the discussion everyone, I'm going to move ahead with the
vote for this KIP.

-David

On Thu, Nov 3, 2022 at 1:20 PM Jun Rao  wrote:
>
> Hi, David,
>
> Thanks for the reply.
>
> 20/21 Yes, but separate listeners are optional. It's possible for the nodes
> to use a single port for both client and server side communications.
>
> Thanks,
>
> Jun
>
> On Thu, Nov 3, 2022 at 9:59 AM David Arthur
>  wrote:
>
> > 20/21, in combined mode we still have a separate listener for the
> > controller APIs, e.g.,
> >
> > listeners=PLAINTEXT://:9092,CONTROLLER://:9093
> >
> > inter.broker.listener.name=PLAINTEXT
> >
> > controller.listener.names=CONTROLLER
> >
> > advertised.listeners=PLAINTEXT://localhost:9092
> >
> >
> >
> > Clients still talk to the broker through the advertised listener, and only
> > brokers and other controllers will communicate over the controller
> > listener.
> >
> > 40. Sounds good, I updated the KIP
> >
> > Thanks!
> > David
> >
> >
> >
> > On Thu, Nov 3, 2022 at 12:14 PM Jun Rao  wrote:
> >
> > > Hi, David,
> > >
> > > Thanks for the reply.
> > >
> > > 20/21. When KRaft runs in the combined mode, does a controller know
> > whether
> > > an ApiRequest is from a client or another broker?
> > >
> > > 40. Adding a "None" state sounds reasonable.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Nov 3, 2022 at 8:39 AM David Arthur
> > >  wrote:
> > >
> > > > Jun,
> > > >
> > > > 20/21 If we use a tagged field, then I don't think clients need to be
> > > > concerned with it, right?. In ApiVersionsResponse sent by brokers to
> > > > clients, this field would be omitted. Clients can't connect directly to
> > > the
> > > > KRaft controller nodes. Also, we have a precedent of sending controller
> > > > node state between controllers through ApiVersions ("metadata.version"
> > > > feature), so I think it fits well.
> > > >
> > > > 40. For new KRaft clusters, we could add another state to indicate it
> > was
> > > > not migrated from ZK, but created as a KRaft cluster. Maybe
> > > > "kafka.controller:type=KafkaController,name=MigrationState" => "None" ?
> > > We
> > > > could also omit that metric for unmigrated clusters, but I'm not a fan
> > of
> > > > using the absence of a value to signal something.
> > > >
> > > > -
> > > >
> > > > Akhilesh, thanks for reviewing the KIP!
> > > >
> > > > 1. MigrationState and MetadataType are mostly the same on the
> > controller,
> > > > but we do have the "MigratingZkData" state that seems useful to report
> > > as a
> > > > metric. Aside from looking at logs, observing the controller in this
> > > state
> > > > is the only way to see how long its taking to copy data from ZK.
> > > >
> > > > "KRaftMode" instead of "MigrationFinalized" is similar to Jun's
> > question
> > > > about non-migrated clusters. I think it's useful to have a distinct
> > > > MigrationState for clusters that have been migrated and those that were
> > > > never migrated. This does mean we'll report the MigrationState long
> > after
> > > > the migration is complete, but we can drop these metrics in 4.0 once ZK
> > > is
> > > > removed.
> > > >
> > > > 2. The "ZkMigrationReady" will indicate that the controller has
> > > > "kafka.metadata.migration.enable" _and_ the ZK configs set. We need
> > some
> > > > way to indicate that the whole quorum is correctly configured to handle
> > > the
> > > > migration so we don't failover to a controller that's not configured
> > for
> > > > ZK. Did I understand your question correctly?
> > > >
> > > > 3. Yea, good idea. While the KRaft controller has
> > > > MigrationState=MigrationIneligible, we could also report
> > > >
> > >
> > "kafka.controller:type=KafkaController,name=MigrationInelgibleBrokerCount".
> > > > It might be useful to report ineligible controllers as well since that
> > > can
> > > > prevent the migration from starting.
> > > >
> > > > 4. I think I covered this in "Incompatible Brokers". We effectively
> > fence
> > > > these brokers by not sending them metadata RPCs.
> > > >
> > > > Thanks!
> > > > David
> > > >
> > > >
> > > > On Tue, Nov 1, 2022 at 3:58 PM Akhilesh Chaganti <
> > akhilesh@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi David,
> > > > >
> > > > >
> > > > > Thanks for the KIP. I have some questions/suggestions.
> > > > >
> > > > >
> > > > > 1) I see two new metrics:
> > > > > kafka.controller:type=KafkaController,name=MetadataType and
> > > > > kafka.controller:type=KafkaController,name=MigrationState. Won't the
> > > > second
> > > > > metric already cover the cases of the first metric? Also, instead of
> > > > > MigrationFinalized, we could directly say the state is KRaftMode. So
> > we
> > > > can
> > > > > use the same value for default KRaft clusters.
> > > > >
> > > > >
> > > > > 2) ZkMigrationReady in ApiVersionsResponse from KRaft Controller. By
> > > > > default, we plan to start the Controller quorum in "
> > > > > *kafka.metadata.migration.enable*" config set to true. Then do we
> > 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-03 Thread Jun Rao
Hi, David,

Thanks for the reply.

20/21 Yes, but separate listeners are optional. It's possible for the nodes
to use a single port for both client and server side communications.

Thanks,

Jun

On Thu, Nov 3, 2022 at 9:59 AM David Arthur
 wrote:

> 20/21, in combined mode we still have a separate listener for the
> controller APIs, e.g.,
>
> listeners=PLAINTEXT://:9092,CONTROLLER://:9093
>
> inter.broker.listener.name=PLAINTEXT
>
> controller.listener.names=CONTROLLER
>
> advertised.listeners=PLAINTEXT://localhost:9092
>
>
>
> Clients still talk to the broker through the advertised listener, and only
> brokers and other controllers will communicate over the controller
> listener.
>
> 40. Sounds good, I updated the KIP
>
> Thanks!
> David
>
>
>
> On Thu, Nov 3, 2022 at 12:14 PM Jun Rao  wrote:
>
> > Hi, David,
> >
> > Thanks for the reply.
> >
> > 20/21. When KRaft runs in the combined mode, does a controller know
> whether
> > an ApiRequest is from a client or another broker?
> >
> > 40. Adding a "None" state sounds reasonable.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Nov 3, 2022 at 8:39 AM David Arthur
> >  wrote:
> >
> > > Jun,
> > >
> > > 20/21 If we use a tagged field, then I don't think clients need to be
> > > concerned with it, right?. In ApiVersionsResponse sent by brokers to
> > > clients, this field would be omitted. Clients can't connect directly to
> > the
> > > KRaft controller nodes. Also, we have a precedent of sending controller
> > > node state between controllers through ApiVersions ("metadata.version"
> > > feature), so I think it fits well.
> > >
> > > 40. For new KRaft clusters, we could add another state to indicate it
> was
> > > not migrated from ZK, but created as a KRaft cluster. Maybe
> > > "kafka.controller:type=KafkaController,name=MigrationState" => "None" ?
> > We
> > > could also omit that metric for unmigrated clusters, but I'm not a fan
> of
> > > using the absence of a value to signal something.
> > >
> > > -
> > >
> > > Akhilesh, thanks for reviewing the KIP!
> > >
> > > 1. MigrationState and MetadataType are mostly the same on the
> controller,
> > > but we do have the "MigratingZkData" state that seems useful to report
> > as a
> > > metric. Aside from looking at logs, observing the controller in this
> > state
> > > is the only way to see how long its taking to copy data from ZK.
> > >
> > > "KRaftMode" instead of "MigrationFinalized" is similar to Jun's
> question
> > > about non-migrated clusters. I think it's useful to have a distinct
> > > MigrationState for clusters that have been migrated and those that were
> > > never migrated. This does mean we'll report the MigrationState long
> after
> > > the migration is complete, but we can drop these metrics in 4.0 once ZK
> > is
> > > removed.
> > >
> > > 2. The "ZkMigrationReady" will indicate that the controller has
> > > "kafka.metadata.migration.enable" _and_ the ZK configs set. We need
> some
> > > way to indicate that the whole quorum is correctly configured to handle
> > the
> > > migration so we don't failover to a controller that's not configured
> for
> > > ZK. Did I understand your question correctly?
> > >
> > > 3. Yea, good idea. While the KRaft controller has
> > > MigrationState=MigrationIneligible, we could also report
> > >
> >
> "kafka.controller:type=KafkaController,name=MigrationInelgibleBrokerCount".
> > > It might be useful to report ineligible controllers as well since that
> > can
> > > prevent the migration from starting.
> > >
> > > 4. I think I covered this in "Incompatible Brokers". We effectively
> fence
> > > these brokers by not sending them metadata RPCs.
> > >
> > > Thanks!
> > > David
> > >
> > >
> > > On Tue, Nov 1, 2022 at 3:58 PM Akhilesh Chaganti <
> akhilesh@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi David,
> > > >
> > > >
> > > > Thanks for the KIP. I have some questions/suggestions.
> > > >
> > > >
> > > > 1) I see two new metrics:
> > > > kafka.controller:type=KafkaController,name=MetadataType and
> > > > kafka.controller:type=KafkaController,name=MigrationState. Won't the
> > > second
> > > > metric already cover the cases of the first metric? Also, instead of
> > > > MigrationFinalized, we could directly say the state is KRaftMode. So
> we
> > > can
> > > > use the same value for default KRaft clusters.
> > > >
> > > >
> > > > 2) ZkMigrationReady in ApiVersionsResponse from KRaft Controller. By
> > > > default, we plan to start the Controller quorum in "
> > > > *kafka.metadata.migration.enable*" config set to true. Then do we
> need
> > > this
> > > > additional information again to make sure The controllers are ready
> for
> > > > migration? What would happen if the Controller assumes it is ready
> for
> > > > migration from 3.4 by default if it doesn't see both
> MigrationMetadata
> > > > records?
> > > >
> > > >
> > > > 3) I see that we do not impose order on rolling the brokers with
> > > migration
> > > > flags and provisioning the controller quorum. 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-03 Thread David Arthur
20/21, in combined mode we still have a separate listener for the
controller APIs, e.g.,

listeners=PLAINTEXT://:9092,CONTROLLER://:9093

inter.broker.listener.name=PLAINTEXT

controller.listener.names=CONTROLLER

advertised.listeners=PLAINTEXT://localhost:9092



Clients still talk to the broker through the advertised listener, and only
brokers and other controllers will communicate over the controller listener.

40. Sounds good, I updated the KIP

Thanks!
David



On Thu, Nov 3, 2022 at 12:14 PM Jun Rao  wrote:

> Hi, David,
>
> Thanks for the reply.
>
> 20/21. When KRaft runs in the combined mode, does a controller know whether
> an ApiRequest is from a client or another broker?
>
> 40. Adding a "None" state sounds reasonable.
>
> Thanks,
>
> Jun
>
> On Thu, Nov 3, 2022 at 8:39 AM David Arthur
>  wrote:
>
> > Jun,
> >
> > 20/21 If we use a tagged field, then I don't think clients need to be
> > concerned with it, right?. In ApiVersionsResponse sent by brokers to
> > clients, this field would be omitted. Clients can't connect directly to
> the
> > KRaft controller nodes. Also, we have a precedent of sending controller
> > node state between controllers through ApiVersions ("metadata.version"
> > feature), so I think it fits well.
> >
> > 40. For new KRaft clusters, we could add another state to indicate it was
> > not migrated from ZK, but created as a KRaft cluster. Maybe
> > "kafka.controller:type=KafkaController,name=MigrationState" => "None" ?
> We
> > could also omit that metric for unmigrated clusters, but I'm not a fan of
> > using the absence of a value to signal something.
> >
> > -
> >
> > Akhilesh, thanks for reviewing the KIP!
> >
> > 1. MigrationState and MetadataType are mostly the same on the controller,
> > but we do have the "MigratingZkData" state that seems useful to report
> as a
> > metric. Aside from looking at logs, observing the controller in this
> state
> > is the only way to see how long its taking to copy data from ZK.
> >
> > "KRaftMode" instead of "MigrationFinalized" is similar to Jun's question
> > about non-migrated clusters. I think it's useful to have a distinct
> > MigrationState for clusters that have been migrated and those that were
> > never migrated. This does mean we'll report the MigrationState long after
> > the migration is complete, but we can drop these metrics in 4.0 once ZK
> is
> > removed.
> >
> > 2. The "ZkMigrationReady" will indicate that the controller has
> > "kafka.metadata.migration.enable" _and_ the ZK configs set. We need some
> > way to indicate that the whole quorum is correctly configured to handle
> the
> > migration so we don't failover to a controller that's not configured for
> > ZK. Did I understand your question correctly?
> >
> > 3. Yea, good idea. While the KRaft controller has
> > MigrationState=MigrationIneligible, we could also report
> >
> "kafka.controller:type=KafkaController,name=MigrationInelgibleBrokerCount".
> > It might be useful to report ineligible controllers as well since that
> can
> > prevent the migration from starting.
> >
> > 4. I think I covered this in "Incompatible Brokers". We effectively fence
> > these brokers by not sending them metadata RPCs.
> >
> > Thanks!
> > David
> >
> >
> > On Tue, Nov 1, 2022 at 3:58 PM Akhilesh Chaganti  >
> > wrote:
> >
> > > Hi David,
> > >
> > >
> > > Thanks for the KIP. I have some questions/suggestions.
> > >
> > >
> > > 1) I see two new metrics:
> > > kafka.controller:type=KafkaController,name=MetadataType and
> > > kafka.controller:type=KafkaController,name=MigrationState. Won't the
> > second
> > > metric already cover the cases of the first metric? Also, instead of
> > > MigrationFinalized, we could directly say the state is KRaftMode. So we
> > can
> > > use the same value for default KRaft clusters.
> > >
> > >
> > > 2) ZkMigrationReady in ApiVersionsResponse from KRaft Controller. By
> > > default, we plan to start the Controller quorum in "
> > > *kafka.metadata.migration.enable*" config set to true. Then do we need
> > this
> > > additional information again to make sure The controllers are ready for
> > > migration? What would happen if the Controller assumes it is ready for
> > > migration from 3.4 by default if it doesn't see both MigrationMetadata
> > > records?
> > >
> > >
> > > 3) I see that we do not impose order on rolling the brokers with
> > migration
> > > flags and provisioning the controller quorum. Along with the KRaft
> > > controller emitting migration state metrics, it may be better to emit
> the
> > > broker count for the brokers not ready for migration yet. This will
> give
> > us
> > > more insight into any roll issues.
> > >
> > >
> > > 4) Once the KRaft controller is in migration mode, we should also
> > > prevent/handle ZkBrokerRegistrations that don't enable migration mode.
> > >
> > >
> > > Thanks
> > > Akhilesh
> > >
> > >
> > > On Tue, Nov 1, 2022 at 10:49 AM Jun Rao 
> > wrote:
> > >
> > > > Hi, David,
> > > >
> > > > Thanks for the 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-03 Thread Jun Rao
Hi, David,

Thanks for the reply.

20/21. When KRaft runs in the combined mode, does a controller know whether
an ApiRequest is from a client or another broker?

40. Adding a "None" state sounds reasonable.

Thanks,

Jun

On Thu, Nov 3, 2022 at 8:39 AM David Arthur
 wrote:

> Jun,
>
> 20/21 If we use a tagged field, then I don't think clients need to be
> concerned with it, right?. In ApiVersionsResponse sent by brokers to
> clients, this field would be omitted. Clients can't connect directly to the
> KRaft controller nodes. Also, we have a precedent of sending controller
> node state between controllers through ApiVersions ("metadata.version"
> feature), so I think it fits well.
>
> 40. For new KRaft clusters, we could add another state to indicate it was
> not migrated from ZK, but created as a KRaft cluster. Maybe
> "kafka.controller:type=KafkaController,name=MigrationState" => "None" ? We
> could also omit that metric for unmigrated clusters, but I'm not a fan of
> using the absence of a value to signal something.
>
> -
>
> Akhilesh, thanks for reviewing the KIP!
>
> 1. MigrationState and MetadataType are mostly the same on the controller,
> but we do have the "MigratingZkData" state that seems useful to report as a
> metric. Aside from looking at logs, observing the controller in this state
> is the only way to see how long its taking to copy data from ZK.
>
> "KRaftMode" instead of "MigrationFinalized" is similar to Jun's question
> about non-migrated clusters. I think it's useful to have a distinct
> MigrationState for clusters that have been migrated and those that were
> never migrated. This does mean we'll report the MigrationState long after
> the migration is complete, but we can drop these metrics in 4.0 once ZK is
> removed.
>
> 2. The "ZkMigrationReady" will indicate that the controller has
> "kafka.metadata.migration.enable" _and_ the ZK configs set. We need some
> way to indicate that the whole quorum is correctly configured to handle the
> migration so we don't failover to a controller that's not configured for
> ZK. Did I understand your question correctly?
>
> 3. Yea, good idea. While the KRaft controller has
> MigrationState=MigrationIneligible, we could also report
> "kafka.controller:type=KafkaController,name=MigrationInelgibleBrokerCount".
> It might be useful to report ineligible controllers as well since that can
> prevent the migration from starting.
>
> 4. I think I covered this in "Incompatible Brokers". We effectively fence
> these brokers by not sending them metadata RPCs.
>
> Thanks!
> David
>
>
> On Tue, Nov 1, 2022 at 3:58 PM Akhilesh Chaganti 
> wrote:
>
> > Hi David,
> >
> >
> > Thanks for the KIP. I have some questions/suggestions.
> >
> >
> > 1) I see two new metrics:
> > kafka.controller:type=KafkaController,name=MetadataType and
> > kafka.controller:type=KafkaController,name=MigrationState. Won't the
> second
> > metric already cover the cases of the first metric? Also, instead of
> > MigrationFinalized, we could directly say the state is KRaftMode. So we
> can
> > use the same value for default KRaft clusters.
> >
> >
> > 2) ZkMigrationReady in ApiVersionsResponse from KRaft Controller. By
> > default, we plan to start the Controller quorum in "
> > *kafka.metadata.migration.enable*" config set to true. Then do we need
> this
> > additional information again to make sure The controllers are ready for
> > migration? What would happen if the Controller assumes it is ready for
> > migration from 3.4 by default if it doesn't see both MigrationMetadata
> > records?
> >
> >
> > 3) I see that we do not impose order on rolling the brokers with
> migration
> > flags and provisioning the controller quorum. Along with the KRaft
> > controller emitting migration state metrics, it may be better to emit the
> > broker count for the brokers not ready for migration yet. This will give
> us
> > more insight into any roll issues.
> >
> >
> > 4) Once the KRaft controller is in migration mode, we should also
> > prevent/handle ZkBrokerRegistrations that don't enable migration mode.
> >
> >
> > Thanks
> > Akhilesh
> >
> >
> > On Tue, Nov 1, 2022 at 10:49 AM Jun Rao 
> wrote:
> >
> > > Hi, David,
> > >
> > > Thanks for the reply.
> > >
> > > 20/21. Regarding the new ZkMigrationReady field in ApiVersionsResponse,
> > it
> > > seems that this is a bit intrusive since it exposes unneeded info to
> the
> > > clients. Another option is to add that field as part of the Fetch
> > request.
> > > We can choose to only set that field in the very first Fetch request
> > from a
> > > Quorum follower.
> > >
> > > 40. For kafka.controller:type=KafkaController,name=MigrationState, what
> > is
> > > the value for a brand new KRaft cluster?
> > >
> > > Jun
> > >
> > > On Mon, Oct 31, 2022 at 2:35 PM David Arthur
> > >  wrote:
> > >
> > > > 30. I think we can keep the single ControllerId field in those
> requests
> > > > since they are only used for fencing (as far as I know). Internally,
> > the
> > 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-03 Thread David Arthur
Jun,

20/21 If we use a tagged field, then I don't think clients need to be
concerned with it, right?. In ApiVersionsResponse sent by brokers to
clients, this field would be omitted. Clients can't connect directly to the
KRaft controller nodes. Also, we have a precedent of sending controller
node state between controllers through ApiVersions ("metadata.version"
feature), so I think it fits well.

40. For new KRaft clusters, we could add another state to indicate it was
not migrated from ZK, but created as a KRaft cluster. Maybe
"kafka.controller:type=KafkaController,name=MigrationState" => "None" ? We
could also omit that metric for unmigrated clusters, but I'm not a fan of
using the absence of a value to signal something.

-

Akhilesh, thanks for reviewing the KIP!

1. MigrationState and MetadataType are mostly the same on the controller,
but we do have the "MigratingZkData" state that seems useful to report as a
metric. Aside from looking at logs, observing the controller in this state
is the only way to see how long its taking to copy data from ZK.

"KRaftMode" instead of "MigrationFinalized" is similar to Jun's question
about non-migrated clusters. I think it's useful to have a distinct
MigrationState for clusters that have been migrated and those that were
never migrated. This does mean we'll report the MigrationState long after
the migration is complete, but we can drop these metrics in 4.0 once ZK is
removed.

2. The "ZkMigrationReady" will indicate that the controller has
"kafka.metadata.migration.enable" _and_ the ZK configs set. We need some
way to indicate that the whole quorum is correctly configured to handle the
migration so we don't failover to a controller that's not configured for
ZK. Did I understand your question correctly?

3. Yea, good idea. While the KRaft controller has
MigrationState=MigrationIneligible, we could also report
"kafka.controller:type=KafkaController,name=MigrationInelgibleBrokerCount".
It might be useful to report ineligible controllers as well since that can
prevent the migration from starting.

4. I think I covered this in "Incompatible Brokers". We effectively fence
these brokers by not sending them metadata RPCs.

Thanks!
David


On Tue, Nov 1, 2022 at 3:58 PM Akhilesh Chaganti 
wrote:

> Hi David,
>
>
> Thanks for the KIP. I have some questions/suggestions.
>
>
> 1) I see two new metrics:
> kafka.controller:type=KafkaController,name=MetadataType and
> kafka.controller:type=KafkaController,name=MigrationState. Won't the second
> metric already cover the cases of the first metric? Also, instead of
> MigrationFinalized, we could directly say the state is KRaftMode. So we can
> use the same value for default KRaft clusters.
>
>
> 2) ZkMigrationReady in ApiVersionsResponse from KRaft Controller. By
> default, we plan to start the Controller quorum in "
> *kafka.metadata.migration.enable*" config set to true. Then do we need this
> additional information again to make sure The controllers are ready for
> migration? What would happen if the Controller assumes it is ready for
> migration from 3.4 by default if it doesn't see both MigrationMetadata
> records?
>
>
> 3) I see that we do not impose order on rolling the brokers with migration
> flags and provisioning the controller quorum. Along with the KRaft
> controller emitting migration state metrics, it may be better to emit the
> broker count for the brokers not ready for migration yet. This will give us
> more insight into any roll issues.
>
>
> 4) Once the KRaft controller is in migration mode, we should also
> prevent/handle ZkBrokerRegistrations that don't enable migration mode.
>
>
> Thanks
> Akhilesh
>
>
> On Tue, Nov 1, 2022 at 10:49 AM Jun Rao  wrote:
>
> > Hi, David,
> >
> > Thanks for the reply.
> >
> > 20/21. Regarding the new ZkMigrationReady field in ApiVersionsResponse,
> it
> > seems that this is a bit intrusive since it exposes unneeded info to the
> > clients. Another option is to add that field as part of the Fetch
> request.
> > We can choose to only set that field in the very first Fetch request
> from a
> > Quorum follower.
> >
> > 40. For kafka.controller:type=KafkaController,name=MigrationState, what
> is
> > the value for a brand new KRaft cluster?
> >
> > Jun
> >
> > On Mon, Oct 31, 2022 at 2:35 PM David Arthur
> >  wrote:
> >
> > > 30. I think we can keep the single ControllerId field in those requests
> > > since they are only used for fencing (as far as I know). Internally,
> the
> > > broker components that handle those requests will compare the
> > ControllerId
> > > with that of MetadataCache (which is updated via UMR).
> > >
> > > The reason we need the separate KRaftControllerId in the UpdateMetadata
> > > code path so that we can have different connection behavior for a KRaft
> > > controller vs ZK controller.
> > >
> > > 31. It seems reasonable to keep the MigrationRecord in the snapshot. I
> > was
> > > thinking the same thing in terms of understanding the loss for a
> > > 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-01 Thread Akhilesh Chaganti
Hi David,


Thanks for the KIP. I have some questions/suggestions.


1) I see two new metrics:
kafka.controller:type=KafkaController,name=MetadataType and
kafka.controller:type=KafkaController,name=MigrationState. Won't the second
metric already cover the cases of the first metric? Also, instead of
MigrationFinalized, we could directly say the state is KRaftMode. So we can
use the same value for default KRaft clusters.


2) ZkMigrationReady in ApiVersionsResponse from KRaft Controller. By
default, we plan to start the Controller quorum in "
*kafka.metadata.migration.enable*" config set to true. Then do we need this
additional information again to make sure The controllers are ready for
migration? What would happen if the Controller assumes it is ready for
migration from 3.4 by default if it doesn't see both MigrationMetadata
records?


3) I see that we do not impose order on rolling the brokers with migration
flags and provisioning the controller quorum. Along with the KRaft
controller emitting migration state metrics, it may be better to emit the
broker count for the brokers not ready for migration yet. This will give us
more insight into any roll issues.


4) Once the KRaft controller is in migration mode, we should also
prevent/handle ZkBrokerRegistrations that don't enable migration mode.


Thanks
Akhilesh


On Tue, Nov 1, 2022 at 10:49 AM Jun Rao  wrote:

> Hi, David,
>
> Thanks for the reply.
>
> 20/21. Regarding the new ZkMigrationReady field in ApiVersionsResponse, it
> seems that this is a bit intrusive since it exposes unneeded info to the
> clients. Another option is to add that field as part of the Fetch request.
> We can choose to only set that field in the very first Fetch request from a
> Quorum follower.
>
> 40. For kafka.controller:type=KafkaController,name=MigrationState, what is
> the value for a brand new KRaft cluster?
>
> Jun
>
> On Mon, Oct 31, 2022 at 2:35 PM David Arthur
>  wrote:
>
> > 30. I think we can keep the single ControllerId field in those requests
> > since they are only used for fencing (as far as I know). Internally, the
> > broker components that handle those requests will compare the
> ControllerId
> > with that of MetadataCache (which is updated via UMR).
> >
> > The reason we need the separate KRaftControllerId in the UpdateMetadata
> > code path so that we can have different connection behavior for a KRaft
> > controller vs ZK controller.
> >
> > 31. It seems reasonable to keep the MigrationRecord in the snapshot. I
> was
> > thinking the same thing in terms of understanding the loss for a
> > migration-after-finalization. However, once a snapshot has been taken
> that
> > includes the final MigrationRecord, we can't easily see which records
> came
> > after it.
> >
> > 32. You're correct, we can just use the modify time from the Stat. The
> > other two fields are primarily informational and are there for operators
> > who want to inspect the state of the migration. They aren't required for
> > correctness
> >
> > 33. Yes that's right. I detail that in "Controller Leadership" section
> >
> > 34. Right, I'll fix that.
> >
> > Thanks,
> > David
> >
> > On Mon, Oct 31, 2022 at 2:55 PM Jun Rao 
> wrote:
> >
> > > Hi, David,
> > >
> > > Thanks for the updated KIP. A few more comments.
> > >
> > > 30. LeaderAndIsrRequest/StopReplicaRequest both have a controllerId
> > field.
> > > Should we add a KRaftControllerId field like UpdateMetadata?
> > >
> > > 31. "If a migration has been finalized, but the KRaft quroum comes up
> > with
> > > kafka.metadata.migration.enable, we must not re-enter the migration
> mode.
> > > In this case, while replaying the log, the controller can see the
> second
> > > MigrationRecord and know that the migration is finalized and should not
> > be
> > > resumed. " Hmm, do we want to keep the MigrationRecord in the snapshot
> > and
> > > the metadata log forever after migration is finalized? If not, we can't
> > > know for sure whether a migration has happened or not. Also, it might
> be
> > > useful to support switching back to ZK mode after the migration is
> > > finalized, with the understanding of potential metadata loss. In that
> > case,
> > > we could just trim all metadata log and recopy the ZK metadata back.
> > >
> > > 32. The /migration node in ZK: Do we need last_update_time_ms since ZK
> > > Stats already has an MTime? Also, how do we plan to use the
> > > kraft_controller_id and kraft_controller_epoch fields?
> > >
> > > 33. Controller migration: We will force a write to the "/controller"
> and
> > > "/controller_epoch" ZNodes before copying ZK data, right?
> > >
> > > 34. "Operator can remove the persistent "/controller" and
> > > "/controller_epoch" nodes allowing for ZK controller election to take
> > > place". I guess the operator only needs to remove the /controller path?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Mon, Oct 31, 2022 at 7:17 AM David Arthur
> > >  wrote:
> > >
> > > > Happy Monday, everyone! I've updated 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-11-01 Thread Jun Rao
Hi, David,

Thanks for the reply.

20/21. Regarding the new ZkMigrationReady field in ApiVersionsResponse, it
seems that this is a bit intrusive since it exposes unneeded info to the
clients. Another option is to add that field as part of the Fetch request.
We can choose to only set that field in the very first Fetch request from a
Quorum follower.

40. For kafka.controller:type=KafkaController,name=MigrationState, what is
the value for a brand new KRaft cluster?

Jun

On Mon, Oct 31, 2022 at 2:35 PM David Arthur
 wrote:

> 30. I think we can keep the single ControllerId field in those requests
> since they are only used for fencing (as far as I know). Internally, the
> broker components that handle those requests will compare the ControllerId
> with that of MetadataCache (which is updated via UMR).
>
> The reason we need the separate KRaftControllerId in the UpdateMetadata
> code path so that we can have different connection behavior for a KRaft
> controller vs ZK controller.
>
> 31. It seems reasonable to keep the MigrationRecord in the snapshot. I was
> thinking the same thing in terms of understanding the loss for a
> migration-after-finalization. However, once a snapshot has been taken that
> includes the final MigrationRecord, we can't easily see which records came
> after it.
>
> 32. You're correct, we can just use the modify time from the Stat. The
> other two fields are primarily informational and are there for operators
> who want to inspect the state of the migration. They aren't required for
> correctness
>
> 33. Yes that's right. I detail that in "Controller Leadership" section
>
> 34. Right, I'll fix that.
>
> Thanks,
> David
>
> On Mon, Oct 31, 2022 at 2:55 PM Jun Rao  wrote:
>
> > Hi, David,
> >
> > Thanks for the updated KIP. A few more comments.
> >
> > 30. LeaderAndIsrRequest/StopReplicaRequest both have a controllerId
> field.
> > Should we add a KRaftControllerId field like UpdateMetadata?
> >
> > 31. "If a migration has been finalized, but the KRaft quroum comes up
> with
> > kafka.metadata.migration.enable, we must not re-enter the migration mode.
> > In this case, while replaying the log, the controller can see the second
> > MigrationRecord and know that the migration is finalized and should not
> be
> > resumed. " Hmm, do we want to keep the MigrationRecord in the snapshot
> and
> > the metadata log forever after migration is finalized? If not, we can't
> > know for sure whether a migration has happened or not. Also, it might be
> > useful to support switching back to ZK mode after the migration is
> > finalized, with the understanding of potential metadata loss. In that
> case,
> > we could just trim all metadata log and recopy the ZK metadata back.
> >
> > 32. The /migration node in ZK: Do we need last_update_time_ms since ZK
> > Stats already has an MTime? Also, how do we plan to use the
> > kraft_controller_id and kraft_controller_epoch fields?
> >
> > 33. Controller migration: We will force a write to the "/controller" and
> > "/controller_epoch" ZNodes before copying ZK data, right?
> >
> > 34. "Operator can remove the persistent "/controller" and
> > "/controller_epoch" nodes allowing for ZK controller election to take
> > place". I guess the operator only needs to remove the /controller path?
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Oct 31, 2022 at 7:17 AM David Arthur
> >  wrote:
> >
> > > Happy Monday, everyone! I've updated the KIP with the following
> changes:
> > >
> > > * Clarified MetadataType metric usages (broker vs controller)
> > > * Added ZkMigrationReady tagged field to ApiVersionsResponse (for use
> by
> > > KRaft controller quorum)
> > > * Added MigrationRecord with two states: Started and Finished
> > > * Documented ZK configs for KRaft controller
> > > * Simplified state machine description (internally, more states will
> > exist,
> > > but only the four documented are interesting to operators)
> > > * Clarified some things in Controller Migration section
> > > * Removed KRaft -> ZK parts of Broker Registration
> > > * Added Misconfigurations section to Failure Modes
> > >
> > > Let me know if I've missed anything from the past two weeks of
> > discussion.
> > >
> > > Thanks again to everyone who has reviewed this KIP so far!
> > >
> > > -David
> > >
> > > On Fri, Oct 28, 2022 at 2:26 PM Jun Rao 
> > wrote:
> > >
> > > > Hi, David,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > 20/21. Sounds good.
> > > >
> > > > Could you update the doc with all the changes being discussed?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Fri, Oct 28, 2022 at 10:11 AM David Arthur
> > > >  wrote:
> > > >
> > > > > Jun,
> > > > >
> > > > > 20/21. I was also wondering about a "migration" record. In addition
> > to
> > > > the
> > > > > scenario you mentioned, we also need a way to prevent the cluster
> > from
> > > > > re-entering the dual write mode after the migration has been
> > > finalized. I
> > > > > could see this happening inadvertently via 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-31 Thread David Arthur
30. I think we can keep the single ControllerId field in those requests
since they are only used for fencing (as far as I know). Internally, the
broker components that handle those requests will compare the ControllerId
with that of MetadataCache (which is updated via UMR).

The reason we need the separate KRaftControllerId in the UpdateMetadata
code path so that we can have different connection behavior for a KRaft
controller vs ZK controller.

31. It seems reasonable to keep the MigrationRecord in the snapshot. I was
thinking the same thing in terms of understanding the loss for a
migration-after-finalization. However, once a snapshot has been taken that
includes the final MigrationRecord, we can't easily see which records came
after it.

32. You're correct, we can just use the modify time from the Stat. The
other two fields are primarily informational and are there for operators
who want to inspect the state of the migration. They aren't required for
correctness

33. Yes that's right. I detail that in "Controller Leadership" section

34. Right, I'll fix that.

Thanks,
David

On Mon, Oct 31, 2022 at 2:55 PM Jun Rao  wrote:

> Hi, David,
>
> Thanks for the updated KIP. A few more comments.
>
> 30. LeaderAndIsrRequest/StopReplicaRequest both have a controllerId field.
> Should we add a KRaftControllerId field like UpdateMetadata?
>
> 31. "If a migration has been finalized, but the KRaft quroum comes up with
> kafka.metadata.migration.enable, we must not re-enter the migration mode.
> In this case, while replaying the log, the controller can see the second
> MigrationRecord and know that the migration is finalized and should not be
> resumed. " Hmm, do we want to keep the MigrationRecord in the snapshot and
> the metadata log forever after migration is finalized? If not, we can't
> know for sure whether a migration has happened or not. Also, it might be
> useful to support switching back to ZK mode after the migration is
> finalized, with the understanding of potential metadata loss. In that case,
> we could just trim all metadata log and recopy the ZK metadata back.
>
> 32. The /migration node in ZK: Do we need last_update_time_ms since ZK
> Stats already has an MTime? Also, how do we plan to use the
> kraft_controller_id and kraft_controller_epoch fields?
>
> 33. Controller migration: We will force a write to the "/controller" and
> "/controller_epoch" ZNodes before copying ZK data, right?
>
> 34. "Operator can remove the persistent "/controller" and
> "/controller_epoch" nodes allowing for ZK controller election to take
> place". I guess the operator only needs to remove the /controller path?
>
> Thanks,
>
> Jun
>
> On Mon, Oct 31, 2022 at 7:17 AM David Arthur
>  wrote:
>
> > Happy Monday, everyone! I've updated the KIP with the following changes:
> >
> > * Clarified MetadataType metric usages (broker vs controller)
> > * Added ZkMigrationReady tagged field to ApiVersionsResponse (for use by
> > KRaft controller quorum)
> > * Added MigrationRecord with two states: Started and Finished
> > * Documented ZK configs for KRaft controller
> > * Simplified state machine description (internally, more states will
> exist,
> > but only the four documented are interesting to operators)
> > * Clarified some things in Controller Migration section
> > * Removed KRaft -> ZK parts of Broker Registration
> > * Added Misconfigurations section to Failure Modes
> >
> > Let me know if I've missed anything from the past two weeks of
> discussion.
> >
> > Thanks again to everyone who has reviewed this KIP so far!
> >
> > -David
> >
> > On Fri, Oct 28, 2022 at 2:26 PM Jun Rao 
> wrote:
> >
> > > Hi, David,
> > >
> > > Thanks for the reply.
> > >
> > > 20/21. Sounds good.
> > >
> > > Could you update the doc with all the changes being discussed?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Fri, Oct 28, 2022 at 10:11 AM David Arthur
> > >  wrote:
> > >
> > > > Jun,
> > > >
> > > > 20/21. I was also wondering about a "migration" record. In addition
> to
> > > the
> > > > scenario you mentioned, we also need a way to prevent the cluster
> from
> > > > re-entering the dual write mode after the migration has been
> > finalized. I
> > > > could see this happening inadvertently via a change in some
> > configuration
> > > > management system. How about we add a record that marks the beginning
> > and
> > > > end of the dual-write mode. The first occurrence of the record could
> be
> > > > included in the metadata transaction when we migrate data from ZK.
> > > >
> > > > With this, the active controller would decide whether to enter dual
> > write
> > > > mode, finalize the migration based, or fail based on:
> > > >
> > > > * Metadata log state
> > > > * It's own configuration ("kafka.metadata.migration.enable",
> > > > "zookeeper.connect", etc)
> > > > * The other controllers configuration (via ApiVersionsResponse)
> > > >
> > > > WDYT?
> > > >
> > > > 22. Since we will need the fencing anyways as a safe-guard, then I
> > agree

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-31 Thread Jun Rao
Hi, David,

Thanks for the updated KIP. A few more comments.

30. LeaderAndIsrRequest/StopReplicaRequest both have a controllerId field.
Should we add a KRaftControllerId field like UpdateMetadata?

31. "If a migration has been finalized, but the KRaft quroum comes up with
kafka.metadata.migration.enable, we must not re-enter the migration mode.
In this case, while replaying the log, the controller can see the second
MigrationRecord and know that the migration is finalized and should not be
resumed. " Hmm, do we want to keep the MigrationRecord in the snapshot and
the metadata log forever after migration is finalized? If not, we can't
know for sure whether a migration has happened or not. Also, it might be
useful to support switching back to ZK mode after the migration is
finalized, with the understanding of potential metadata loss. In that case,
we could just trim all metadata log and recopy the ZK metadata back.

32. The /migration node in ZK: Do we need last_update_time_ms since ZK
Stats already has an MTime? Also, how do we plan to use the
kraft_controller_id and kraft_controller_epoch fields?

33. Controller migration: We will force a write to the "/controller" and
"/controller_epoch" ZNodes before copying ZK data, right?

34. "Operator can remove the persistent "/controller" and
"/controller_epoch" nodes allowing for ZK controller election to take
place". I guess the operator only needs to remove the /controller path?

Thanks,

Jun

On Mon, Oct 31, 2022 at 7:17 AM David Arthur
 wrote:

> Happy Monday, everyone! I've updated the KIP with the following changes:
>
> * Clarified MetadataType metric usages (broker vs controller)
> * Added ZkMigrationReady tagged field to ApiVersionsResponse (for use by
> KRaft controller quorum)
> * Added MigrationRecord with two states: Started and Finished
> * Documented ZK configs for KRaft controller
> * Simplified state machine description (internally, more states will exist,
> but only the four documented are interesting to operators)
> * Clarified some things in Controller Migration section
> * Removed KRaft -> ZK parts of Broker Registration
> * Added Misconfigurations section to Failure Modes
>
> Let me know if I've missed anything from the past two weeks of discussion.
>
> Thanks again to everyone who has reviewed this KIP so far!
>
> -David
>
> On Fri, Oct 28, 2022 at 2:26 PM Jun Rao  wrote:
>
> > Hi, David,
> >
> > Thanks for the reply.
> >
> > 20/21. Sounds good.
> >
> > Could you update the doc with all the changes being discussed?
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Oct 28, 2022 at 10:11 AM David Arthur
> >  wrote:
> >
> > > Jun,
> > >
> > > 20/21. I was also wondering about a "migration" record. In addition to
> > the
> > > scenario you mentioned, we also need a way to prevent the cluster from
> > > re-entering the dual write mode after the migration has been
> finalized. I
> > > could see this happening inadvertently via a change in some
> configuration
> > > management system. How about we add a record that marks the beginning
> and
> > > end of the dual-write mode. The first occurrence of the record could be
> > > included in the metadata transaction when we migrate data from ZK.
> > >
> > > With this, the active controller would decide whether to enter dual
> write
> > > mode, finalize the migration based, or fail based on:
> > >
> > > * Metadata log state
> > > * It's own configuration ("kafka.metadata.migration.enable",
> > > "zookeeper.connect", etc)
> > > * The other controllers configuration (via ApiVersionsResponse)
> > >
> > > WDYT?
> > >
> > > 22. Since we will need the fencing anyways as a safe-guard, then I
> agree
> > > would could skip the registration of KRaft brokers in ZK to simply
> > things a
> > > bit.
> > >
> > > Thanks,
> > > David
> > >
> > >
> > >
> > > On Thu, Oct 27, 2022 at 5:11 PM Jun Rao 
> > wrote:
> > >
> > > > Hi, David,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > 20/21. Relying upon the presence of ZK configs to determine whether
> the
> > > > KRaft controller is in a dual write mode seems a bit error prone. If
> > > > someone accidentally adds a ZK configuration to a brand new KRaft
> > > cluster,
> > > > ideally it shouldn't cause the controller to get into a weird state.
> > Have
> > > > we considered storing the migration state in a metadata record?
> > > >
> > > > 22. If we have the broker fencing logic, do we need to write the
> broker
> > > > registration path in ZK for KRaft brokers at all?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Thu, Oct 27, 2022 at 1:02 PM David Arthur
> > > >  wrote:
> > > >
> > > > > Jun,
> > > > >
> > > > > 20/21. A KRaft controller will recover the migration state by
> reading
> > > the
> > > > > "/migration" ZNode. If the migration enable config is set, and the
> ZK
> > > > > migration is complete, it will enter the dual-write mode. Before an
> > > > > operator can decommission ZK, they will need to finalize the
> > migration
> > > > > which 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-31 Thread David Arthur
Happy Monday, everyone! I've updated the KIP with the following changes:

* Clarified MetadataType metric usages (broker vs controller)
* Added ZkMigrationReady tagged field to ApiVersionsResponse (for use by
KRaft controller quorum)
* Added MigrationRecord with two states: Started and Finished
* Documented ZK configs for KRaft controller
* Simplified state machine description (internally, more states will exist,
but only the four documented are interesting to operators)
* Clarified some things in Controller Migration section
* Removed KRaft -> ZK parts of Broker Registration
* Added Misconfigurations section to Failure Modes

Let me know if I've missed anything from the past two weeks of discussion.

Thanks again to everyone who has reviewed this KIP so far!

-David

On Fri, Oct 28, 2022 at 2:26 PM Jun Rao  wrote:

> Hi, David,
>
> Thanks for the reply.
>
> 20/21. Sounds good.
>
> Could you update the doc with all the changes being discussed?
>
> Thanks,
>
> Jun
>
> On Fri, Oct 28, 2022 at 10:11 AM David Arthur
>  wrote:
>
> > Jun,
> >
> > 20/21. I was also wondering about a "migration" record. In addition to
> the
> > scenario you mentioned, we also need a way to prevent the cluster from
> > re-entering the dual write mode after the migration has been finalized. I
> > could see this happening inadvertently via a change in some configuration
> > management system. How about we add a record that marks the beginning and
> > end of the dual-write mode. The first occurrence of the record could be
> > included in the metadata transaction when we migrate data from ZK.
> >
> > With this, the active controller would decide whether to enter dual write
> > mode, finalize the migration based, or fail based on:
> >
> > * Metadata log state
> > * It's own configuration ("kafka.metadata.migration.enable",
> > "zookeeper.connect", etc)
> > * The other controllers configuration (via ApiVersionsResponse)
> >
> > WDYT?
> >
> > 22. Since we will need the fencing anyways as a safe-guard, then I agree
> > would could skip the registration of KRaft brokers in ZK to simply
> things a
> > bit.
> >
> > Thanks,
> > David
> >
> >
> >
> > On Thu, Oct 27, 2022 at 5:11 PM Jun Rao 
> wrote:
> >
> > > Hi, David,
> > >
> > > Thanks for the reply.
> > >
> > > 20/21. Relying upon the presence of ZK configs to determine whether the
> > > KRaft controller is in a dual write mode seems a bit error prone. If
> > > someone accidentally adds a ZK configuration to a brand new KRaft
> > cluster,
> > > ideally it shouldn't cause the controller to get into a weird state.
> Have
> > > we considered storing the migration state in a metadata record?
> > >
> > > 22. If we have the broker fencing logic, do we need to write the broker
> > > registration path in ZK for KRaft brokers at all?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Thu, Oct 27, 2022 at 1:02 PM David Arthur
> > >  wrote:
> > >
> > > > Jun,
> > > >
> > > > 20/21. A KRaft controller will recover the migration state by reading
> > the
> > > > "/migration" ZNode. If the migration enable config is set, and the ZK
> > > > migration is complete, it will enter the dual-write mode. Before an
> > > > operator can decommission ZK, they will need to finalize the
> migration
> > > > which involves removing the migration config and the ZK config. I'll
> > > > clarify this in the KIP.
> > > >
> > > > 22. Yea, we could see an incorrect broker ID during that window.  If
> we
> > > > ended up with a state where we saw a ZK broker ID that conflicted
> with
> > a
> > > > KRaft broker ID, we would need to fence one of them. I would probably
> > opt
> > > > to fence the KRaft broker in that case since broker registration and
> > > > fencing is more robust in KRaft. Hopefully this is a rare case.
> > > >
> > > > 26. Sounds good.
> > > >
> > > > Thanks!
> > > > David
> > > >
> > > >
> > > > On Thu, Oct 27, 2022 at 1:34 PM Jun Rao 
> > > wrote:
> > > >
> > > > > Hi, David,
> > > > >
> > > > > Thanks for the reply. A few more comments.
> > > > >
> > > > > 20/21. Using a tagged field in ApiVersionRequest could work.
> Related
> > to
> > > > > this, how does a KRaft controller know that it's in the dual write
> > > mode?
> > > > > Does it need to read the /controller path from ZK? After the
> > migration,
> > > > > people may have the ZK cluster decommissioned, but still have the
> ZK
> > > > > configs left in the KRaft controller. Will this cause the KRaft
> > > > controller
> > > > > to be stuck because it doesn't know which mode it is in?
> > > > >
> > > > > 22. Using the ephemeral node matches the current ZK-based broker
> > > behavior
> > > > > better. However, it leaves a window for incorrect broker
> registration
> > > to
> > > > > sneak in during KRaft controller failover.
> > > > >
> > > > > 26. Then, we could just remove Broker Registration in that section.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Wed, Oct 26, 2022 at 2:21 PM David Arthur
> > > > >  wrote:
> 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-28 Thread Jun Rao
Hi, David,

Thanks for the reply.

20/21. Sounds good.

Could you update the doc with all the changes being discussed?

Thanks,

Jun

On Fri, Oct 28, 2022 at 10:11 AM David Arthur
 wrote:

> Jun,
>
> 20/21. I was also wondering about a "migration" record. In addition to the
> scenario you mentioned, we also need a way to prevent the cluster from
> re-entering the dual write mode after the migration has been finalized. I
> could see this happening inadvertently via a change in some configuration
> management system. How about we add a record that marks the beginning and
> end of the dual-write mode. The first occurrence of the record could be
> included in the metadata transaction when we migrate data from ZK.
>
> With this, the active controller would decide whether to enter dual write
> mode, finalize the migration based, or fail based on:
>
> * Metadata log state
> * It's own configuration ("kafka.metadata.migration.enable",
> "zookeeper.connect", etc)
> * The other controllers configuration (via ApiVersionsResponse)
>
> WDYT?
>
> 22. Since we will need the fencing anyways as a safe-guard, then I agree
> would could skip the registration of KRaft brokers in ZK to simply things a
> bit.
>
> Thanks,
> David
>
>
>
> On Thu, Oct 27, 2022 at 5:11 PM Jun Rao  wrote:
>
> > Hi, David,
> >
> > Thanks for the reply.
> >
> > 20/21. Relying upon the presence of ZK configs to determine whether the
> > KRaft controller is in a dual write mode seems a bit error prone. If
> > someone accidentally adds a ZK configuration to a brand new KRaft
> cluster,
> > ideally it shouldn't cause the controller to get into a weird state. Have
> > we considered storing the migration state in a metadata record?
> >
> > 22. If we have the broker fencing logic, do we need to write the broker
> > registration path in ZK for KRaft brokers at all?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Oct 27, 2022 at 1:02 PM David Arthur
> >  wrote:
> >
> > > Jun,
> > >
> > > 20/21. A KRaft controller will recover the migration state by reading
> the
> > > "/migration" ZNode. If the migration enable config is set, and the ZK
> > > migration is complete, it will enter the dual-write mode. Before an
> > > operator can decommission ZK, they will need to finalize the migration
> > > which involves removing the migration config and the ZK config. I'll
> > > clarify this in the KIP.
> > >
> > > 22. Yea, we could see an incorrect broker ID during that window.  If we
> > > ended up with a state where we saw a ZK broker ID that conflicted with
> a
> > > KRaft broker ID, we would need to fence one of them. I would probably
> opt
> > > to fence the KRaft broker in that case since broker registration and
> > > fencing is more robust in KRaft. Hopefully this is a rare case.
> > >
> > > 26. Sounds good.
> > >
> > > Thanks!
> > > David
> > >
> > >
> > > On Thu, Oct 27, 2022 at 1:34 PM Jun Rao 
> > wrote:
> > >
> > > > Hi, David,
> > > >
> > > > Thanks for the reply. A few more comments.
> > > >
> > > > 20/21. Using a tagged field in ApiVersionRequest could work. Related
> to
> > > > this, how does a KRaft controller know that it's in the dual write
> > mode?
> > > > Does it need to read the /controller path from ZK? After the
> migration,
> > > > people may have the ZK cluster decommissioned, but still have the ZK
> > > > configs left in the KRaft controller. Will this cause the KRaft
> > > controller
> > > > to be stuck because it doesn't know which mode it is in?
> > > >
> > > > 22. Using the ephemeral node matches the current ZK-based broker
> > behavior
> > > > better. However, it leaves a window for incorrect broker registration
> > to
> > > > sneak in during KRaft controller failover.
> > > >
> > > > 26. Then, we could just remove Broker Registration in that section.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Wed, Oct 26, 2022 at 2:21 PM David Arthur
> > > >  wrote:
> > > >
> > > > > Jun
> > > > >
> > > > > 20/21 It could definitely cause problems if we failover to a
> > controller
> > > > > without "kafka.metadata.migration.enable". The only mechanism I
> know
> > of
> > > > for
> > > > > controllers to learn things about one another is ApiVersions. We
> > > > currently
> > > > > use this for checking support for "metadata.version" (in KRaft
> mode).
> > > We
> > > > > could add a "zk.migration" feature flag that's enabled on a
> > controller
> > > > only
> > > > > if the config is set. Another possibility would be a tagged field
> on
> > > > > ApiVersionResponse that indicated if the config was set. Both seem
> > > > somewhat
> > > > > inelegant. I think a tagged field would be a bit simpler (and
> > arguably
> > > > less
> > > > > hacky).
> > > > >
> > > > > For 20, we could avoid entering the migration state unless the
> whole
> > > > quorum
> > > > > had the field present in their NodeApiVersions. For 21, we could
> > avoid
> > > > > leaving the migration state unless the whole quorum did not have
> the
> > > > field
> > > > > in 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-28 Thread David Arthur
Jun,

20/21. I was also wondering about a "migration" record. In addition to the
scenario you mentioned, we also need a way to prevent the cluster from
re-entering the dual write mode after the migration has been finalized. I
could see this happening inadvertently via a change in some configuration
management system. How about we add a record that marks the beginning and
end of the dual-write mode. The first occurrence of the record could be
included in the metadata transaction when we migrate data from ZK.

With this, the active controller would decide whether to enter dual write
mode, finalize the migration based, or fail based on:

* Metadata log state
* It's own configuration ("kafka.metadata.migration.enable",
"zookeeper.connect", etc)
* The other controllers configuration (via ApiVersionsResponse)

WDYT?

22. Since we will need the fencing anyways as a safe-guard, then I agree
would could skip the registration of KRaft brokers in ZK to simply things a
bit.

Thanks,
David



On Thu, Oct 27, 2022 at 5:11 PM Jun Rao  wrote:

> Hi, David,
>
> Thanks for the reply.
>
> 20/21. Relying upon the presence of ZK configs to determine whether the
> KRaft controller is in a dual write mode seems a bit error prone. If
> someone accidentally adds a ZK configuration to a brand new KRaft cluster,
> ideally it shouldn't cause the controller to get into a weird state. Have
> we considered storing the migration state in a metadata record?
>
> 22. If we have the broker fencing logic, do we need to write the broker
> registration path in ZK for KRaft brokers at all?
>
> Thanks,
>
> Jun
>
>
> On Thu, Oct 27, 2022 at 1:02 PM David Arthur
>  wrote:
>
> > Jun,
> >
> > 20/21. A KRaft controller will recover the migration state by reading the
> > "/migration" ZNode. If the migration enable config is set, and the ZK
> > migration is complete, it will enter the dual-write mode. Before an
> > operator can decommission ZK, they will need to finalize the migration
> > which involves removing the migration config and the ZK config. I'll
> > clarify this in the KIP.
> >
> > 22. Yea, we could see an incorrect broker ID during that window.  If we
> > ended up with a state where we saw a ZK broker ID that conflicted with a
> > KRaft broker ID, we would need to fence one of them. I would probably opt
> > to fence the KRaft broker in that case since broker registration and
> > fencing is more robust in KRaft. Hopefully this is a rare case.
> >
> > 26. Sounds good.
> >
> > Thanks!
> > David
> >
> >
> > On Thu, Oct 27, 2022 at 1:34 PM Jun Rao 
> wrote:
> >
> > > Hi, David,
> > >
> > > Thanks for the reply. A few more comments.
> > >
> > > 20/21. Using a tagged field in ApiVersionRequest could work. Related to
> > > this, how does a KRaft controller know that it's in the dual write
> mode?
> > > Does it need to read the /controller path from ZK? After the migration,
> > > people may have the ZK cluster decommissioned, but still have the ZK
> > > configs left in the KRaft controller. Will this cause the KRaft
> > controller
> > > to be stuck because it doesn't know which mode it is in?
> > >
> > > 22. Using the ephemeral node matches the current ZK-based broker
> behavior
> > > better. However, it leaves a window for incorrect broker registration
> to
> > > sneak in during KRaft controller failover.
> > >
> > > 26. Then, we could just remove Broker Registration in that section.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Wed, Oct 26, 2022 at 2:21 PM David Arthur
> > >  wrote:
> > >
> > > > Jun
> > > >
> > > > 20/21 It could definitely cause problems if we failover to a
> controller
> > > > without "kafka.metadata.migration.enable". The only mechanism I know
> of
> > > for
> > > > controllers to learn things about one another is ApiVersions. We
> > > currently
> > > > use this for checking support for "metadata.version" (in KRaft mode).
> > We
> > > > could add a "zk.migration" feature flag that's enabled on a
> controller
> > > only
> > > > if the config is set. Another possibility would be a tagged field on
> > > > ApiVersionResponse that indicated if the config was set. Both seem
> > > somewhat
> > > > inelegant. I think a tagged field would be a bit simpler (and
> arguably
> > > less
> > > > hacky).
> > > >
> > > > For 20, we could avoid entering the migration state unless the whole
> > > quorum
> > > > had the field present in their NodeApiVersions. For 21, we could
> avoid
> > > > leaving the migration state unless the whole quorum did not have the
> > > field
> > > > in their NodeApiVersions. Do you think this would be sufficient?
> > > >
> > > > 22. Right, we need to write the broker info back to ZK just as a
> > > safeguard
> > > > against incorrect broker IDs getting registered into ZK. I was
> thinking
> > > > these would be persistent nodes, but it's probably fine to make them
> > > > ephemeral and have the active KRaft controller keep them up to date.
> > > >
> > > > 23. Right. When the broker comes up as a KRaft broker, 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-27 Thread Jun Rao
Hi, David,

Thanks for the reply.

20/21. Relying upon the presence of ZK configs to determine whether the
KRaft controller is in a dual write mode seems a bit error prone. If
someone accidentally adds a ZK configuration to a brand new KRaft cluster,
ideally it shouldn't cause the controller to get into a weird state. Have
we considered storing the migration state in a metadata record?

22. If we have the broker fencing logic, do we need to write the broker
registration path in ZK for KRaft brokers at all?

Thanks,

Jun


On Thu, Oct 27, 2022 at 1:02 PM David Arthur
 wrote:

> Jun,
>
> 20/21. A KRaft controller will recover the migration state by reading the
> "/migration" ZNode. If the migration enable config is set, and the ZK
> migration is complete, it will enter the dual-write mode. Before an
> operator can decommission ZK, they will need to finalize the migration
> which involves removing the migration config and the ZK config. I'll
> clarify this in the KIP.
>
> 22. Yea, we could see an incorrect broker ID during that window.  If we
> ended up with a state where we saw a ZK broker ID that conflicted with a
> KRaft broker ID, we would need to fence one of them. I would probably opt
> to fence the KRaft broker in that case since broker registration and
> fencing is more robust in KRaft. Hopefully this is a rare case.
>
> 26. Sounds good.
>
> Thanks!
> David
>
>
> On Thu, Oct 27, 2022 at 1:34 PM Jun Rao  wrote:
>
> > Hi, David,
> >
> > Thanks for the reply. A few more comments.
> >
> > 20/21. Using a tagged field in ApiVersionRequest could work. Related to
> > this, how does a KRaft controller know that it's in the dual write mode?
> > Does it need to read the /controller path from ZK? After the migration,
> > people may have the ZK cluster decommissioned, but still have the ZK
> > configs left in the KRaft controller. Will this cause the KRaft
> controller
> > to be stuck because it doesn't know which mode it is in?
> >
> > 22. Using the ephemeral node matches the current ZK-based broker behavior
> > better. However, it leaves a window for incorrect broker registration to
> > sneak in during KRaft controller failover.
> >
> > 26. Then, we could just remove Broker Registration in that section.
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, Oct 26, 2022 at 2:21 PM David Arthur
> >  wrote:
> >
> > > Jun
> > >
> > > 20/21 It could definitely cause problems if we failover to a controller
> > > without "kafka.metadata.migration.enable". The only mechanism I know of
> > for
> > > controllers to learn things about one another is ApiVersions. We
> > currently
> > > use this for checking support for "metadata.version" (in KRaft mode).
> We
> > > could add a "zk.migration" feature flag that's enabled on a controller
> > only
> > > if the config is set. Another possibility would be a tagged field on
> > > ApiVersionResponse that indicated if the config was set. Both seem
> > somewhat
> > > inelegant. I think a tagged field would be a bit simpler (and arguably
> > less
> > > hacky).
> > >
> > > For 20, we could avoid entering the migration state unless the whole
> > quorum
> > > had the field present in their NodeApiVersions. For 21, we could avoid
> > > leaving the migration state unless the whole quorum did not have the
> > field
> > > in their NodeApiVersions. Do you think this would be sufficient?
> > >
> > > 22. Right, we need to write the broker info back to ZK just as a
> > safeguard
> > > against incorrect broker IDs getting registered into ZK. I was thinking
> > > these would be persistent nodes, but it's probably fine to make them
> > > ephemeral and have the active KRaft controller keep them up to date.
> > >
> > > 23. Right. When the broker comes up as a KRaft broker, it's old
> > > /brokers/ids ZNode will be gone and it will register itself with the
> > KRaft
> > > controller. The controller will know that it is now in KRaft mode and
> > will
> > > stop sending it the old RPCs.
> > >
> > > 24. Ok, I'll add these
> > >
> > > 25. I realize now the "/controller_epoch" node is already persistent.
> It
> > > should be sufficient to remove the "/controller" node to trigger an
> > > election.
> > >
> > > 26. Hmm, not sure, but I don't think it uses watches. KafkaServer
> > registers
> > > the broker info and later loads all brokers in the cluster to check
> that
> > > the endpoints don't conflict. Things that do use watches are dynamic
> > > configs, ACLs, and some others (listed in the KIP).
> > >
> > >
> > > On Wed, Oct 26, 2022 at 4:26 PM David Arthur <
> david.art...@confluent.io>
> > > wrote:
> > >
> > > > Luke and Andrew, thanks for taking a look!
> > > >
> > > > I think the names of the state machine in the KIP aren't the best.
> I'll
> > > > try to improve that section in general.
> > > >
> > > > 1. "MigrationReady" is probably better named "MigratingFromZk" or
> > > > something. It's meant to be the state when the KRaft controller is
> > > actively
> > > > migrating the data out of ZK. This happens 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-27 Thread David Arthur
Jun,

20/21. A KRaft controller will recover the migration state by reading the
"/migration" ZNode. If the migration enable config is set, and the ZK
migration is complete, it will enter the dual-write mode. Before an
operator can decommission ZK, they will need to finalize the migration
which involves removing the migration config and the ZK config. I'll
clarify this in the KIP.

22. Yea, we could see an incorrect broker ID during that window.  If we
ended up with a state where we saw a ZK broker ID that conflicted with a
KRaft broker ID, we would need to fence one of them. I would probably opt
to fence the KRaft broker in that case since broker registration and
fencing is more robust in KRaft. Hopefully this is a rare case.

26. Sounds good.

Thanks!
David


On Thu, Oct 27, 2022 at 1:34 PM Jun Rao  wrote:

> Hi, David,
>
> Thanks for the reply. A few more comments.
>
> 20/21. Using a tagged field in ApiVersionRequest could work. Related to
> this, how does a KRaft controller know that it's in the dual write mode?
> Does it need to read the /controller path from ZK? After the migration,
> people may have the ZK cluster decommissioned, but still have the ZK
> configs left in the KRaft controller. Will this cause the KRaft controller
> to be stuck because it doesn't know which mode it is in?
>
> 22. Using the ephemeral node matches the current ZK-based broker behavior
> better. However, it leaves a window for incorrect broker registration to
> sneak in during KRaft controller failover.
>
> 26. Then, we could just remove Broker Registration in that section.
>
> Thanks,
>
> Jun
>
> On Wed, Oct 26, 2022 at 2:21 PM David Arthur
>  wrote:
>
> > Jun
> >
> > 20/21 It could definitely cause problems if we failover to a controller
> > without "kafka.metadata.migration.enable". The only mechanism I know of
> for
> > controllers to learn things about one another is ApiVersions. We
> currently
> > use this for checking support for "metadata.version" (in KRaft mode). We
> > could add a "zk.migration" feature flag that's enabled on a controller
> only
> > if the config is set. Another possibility would be a tagged field on
> > ApiVersionResponse that indicated if the config was set. Both seem
> somewhat
> > inelegant. I think a tagged field would be a bit simpler (and arguably
> less
> > hacky).
> >
> > For 20, we could avoid entering the migration state unless the whole
> quorum
> > had the field present in their NodeApiVersions. For 21, we could avoid
> > leaving the migration state unless the whole quorum did not have the
> field
> > in their NodeApiVersions. Do you think this would be sufficient?
> >
> > 22. Right, we need to write the broker info back to ZK just as a
> safeguard
> > against incorrect broker IDs getting registered into ZK. I was thinking
> > these would be persistent nodes, but it's probably fine to make them
> > ephemeral and have the active KRaft controller keep them up to date.
> >
> > 23. Right. When the broker comes up as a KRaft broker, it's old
> > /brokers/ids ZNode will be gone and it will register itself with the
> KRaft
> > controller. The controller will know that it is now in KRaft mode and
> will
> > stop sending it the old RPCs.
> >
> > 24. Ok, I'll add these
> >
> > 25. I realize now the "/controller_epoch" node is already persistent. It
> > should be sufficient to remove the "/controller" node to trigger an
> > election.
> >
> > 26. Hmm, not sure, but I don't think it uses watches. KafkaServer
> registers
> > the broker info and later loads all brokers in the cluster to check that
> > the endpoints don't conflict. Things that do use watches are dynamic
> > configs, ACLs, and some others (listed in the KIP).
> >
> >
> > On Wed, Oct 26, 2022 at 4:26 PM David Arthur 
> > wrote:
> >
> > > Luke and Andrew, thanks for taking a look!
> > >
> > > I think the names of the state machine in the KIP aren't the best. I'll
> > > try to improve that section in general.
> > >
> > > 1. "MigrationReady" is probably better named "MigratingFromZk" or
> > > something. It's meant to be the state when the KRaft controller is
> > actively
> > > migrating the data out of ZK. This happens after we have detected that
> > the
> > > cluster is eligible and before we enter the dual write mode.
> > >
> > > 2. "MigrationActive" should probably be called "BrokerMigration".
> > > Transitioning to "MigrationFinished" can happen automatically when all
> > the
> > > known ZK brokers have been migrated. Since we don't have permanent
> > > registrations of ZK brokers, we can use the partition assignments as a
> > > proxy for this.
> > >
> > > 3. A metric for unready brokers makes sense. We can also log a message
> on
> > > the controller when it tries to start the migration
> > >
> > > 4. The MetadataType metric is meant to help see this. E.g., some
> brokers
> > > would have MetadataType=ZK, some would have MetadataType=Dual
> > >
> > > 5. On ZK brokers, not having this config set would prevent the new
> broker
> > > 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-27 Thread Jun Rao
Hi, David,

Thanks for the reply. A few more comments.

20/21. Using a tagged field in ApiVersionRequest could work. Related to
this, how does a KRaft controller know that it's in the dual write mode?
Does it need to read the /controller path from ZK? After the migration,
people may have the ZK cluster decommissioned, but still have the ZK
configs left in the KRaft controller. Will this cause the KRaft controller
to be stuck because it doesn't know which mode it is in?

22. Using the ephemeral node matches the current ZK-based broker behavior
better. However, it leaves a window for incorrect broker registration to
sneak in during KRaft controller failover.

26. Then, we could just remove Broker Registration in that section.

Thanks,

Jun

On Wed, Oct 26, 2022 at 2:21 PM David Arthur
 wrote:

> Jun
>
> 20/21 It could definitely cause problems if we failover to a controller
> without "kafka.metadata.migration.enable". The only mechanism I know of for
> controllers to learn things about one another is ApiVersions. We currently
> use this for checking support for "metadata.version" (in KRaft mode). We
> could add a "zk.migration" feature flag that's enabled on a controller only
> if the config is set. Another possibility would be a tagged field on
> ApiVersionResponse that indicated if the config was set. Both seem somewhat
> inelegant. I think a tagged field would be a bit simpler (and arguably less
> hacky).
>
> For 20, we could avoid entering the migration state unless the whole quorum
> had the field present in their NodeApiVersions. For 21, we could avoid
> leaving the migration state unless the whole quorum did not have the field
> in their NodeApiVersions. Do you think this would be sufficient?
>
> 22. Right, we need to write the broker info back to ZK just as a safeguard
> against incorrect broker IDs getting registered into ZK. I was thinking
> these would be persistent nodes, but it's probably fine to make them
> ephemeral and have the active KRaft controller keep them up to date.
>
> 23. Right. When the broker comes up as a KRaft broker, it's old
> /brokers/ids ZNode will be gone and it will register itself with the KRaft
> controller. The controller will know that it is now in KRaft mode and will
> stop sending it the old RPCs.
>
> 24. Ok, I'll add these
>
> 25. I realize now the "/controller_epoch" node is already persistent. It
> should be sufficient to remove the "/controller" node to trigger an
> election.
>
> 26. Hmm, not sure, but I don't think it uses watches. KafkaServer registers
> the broker info and later loads all brokers in the cluster to check that
> the endpoints don't conflict. Things that do use watches are dynamic
> configs, ACLs, and some others (listed in the KIP).
>
>
> On Wed, Oct 26, 2022 at 4:26 PM David Arthur 
> wrote:
>
> > Luke and Andrew, thanks for taking a look!
> >
> > I think the names of the state machine in the KIP aren't the best. I'll
> > try to improve that section in general.
> >
> > 1. "MigrationReady" is probably better named "MigratingFromZk" or
> > something. It's meant to be the state when the KRaft controller is
> actively
> > migrating the data out of ZK. This happens after we have detected that
> the
> > cluster is eligible and before we enter the dual write mode.
> >
> > 2. "MigrationActive" should probably be called "BrokerMigration".
> > Transitioning to "MigrationFinished" can happen automatically when all
> the
> > known ZK brokers have been migrated. Since we don't have permanent
> > registrations of ZK brokers, we can use the partition assignments as a
> > proxy for this.
> >
> > 3. A metric for unready brokers makes sense. We can also log a message on
> > the controller when it tries to start the migration
> >
> > 4. The MetadataType metric is meant to help see this. E.g., some brokers
> > would have MetadataType=ZK, some would have MetadataType=Dual
> >
> > 5. On ZK brokers, not having this config set would prevent the new broker
> > registration data from being written to ZK. This means the KRaft
> controller
> > won't send RPCs to it. If a broker has been migrated to KRaft already,
> I'm
> > not sure there is any harm in removing this config. If we decide that we
> > need to guarantee that KRaft brokers have this config set during the
> > migration, we can include it in the broker registration that's sent to
> > KRaft. This would let the controller keep that broker fenced.
> >
> > 6. Once the KRaft controller takes leadership, the ZK controller won't be
> > active any more and will stop reporting the MigrationState metric.
> >
> > 7. The "Dual" MetadataType is reported by brokers running in KRaft mode
> > when the migration enable config is set. I think the controller should
> also
> > report this, not just the brokers. I'll clarify in the KIP.
> >
> >
> >
> > Andrew:
> >
> > > How will the code get all the ZooKeeper config? Will it do some sort of
> > scan of the ZooKeeper data store? ... Do we need to do anything special
> to
> > make 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-26 Thread David Arthur
Jun

20/21 It could definitely cause problems if we failover to a controller
without "kafka.metadata.migration.enable". The only mechanism I know of for
controllers to learn things about one another is ApiVersions. We currently
use this for checking support for "metadata.version" (in KRaft mode). We
could add a "zk.migration" feature flag that's enabled on a controller only
if the config is set. Another possibility would be a tagged field on
ApiVersionResponse that indicated if the config was set. Both seem somewhat
inelegant. I think a tagged field would be a bit simpler (and arguably less
hacky).

For 20, we could avoid entering the migration state unless the whole quorum
had the field present in their NodeApiVersions. For 21, we could avoid
leaving the migration state unless the whole quorum did not have the field
in their NodeApiVersions. Do you think this would be sufficient?

22. Right, we need to write the broker info back to ZK just as a safeguard
against incorrect broker IDs getting registered into ZK. I was thinking
these would be persistent nodes, but it's probably fine to make them
ephemeral and have the active KRaft controller keep them up to date.

23. Right. When the broker comes up as a KRaft broker, it's old
/brokers/ids ZNode will be gone and it will register itself with the KRaft
controller. The controller will know that it is now in KRaft mode and will
stop sending it the old RPCs.

24. Ok, I'll add these

25. I realize now the "/controller_epoch" node is already persistent. It
should be sufficient to remove the "/controller" node to trigger an
election.

26. Hmm, not sure, but I don't think it uses watches. KafkaServer registers
the broker info and later loads all brokers in the cluster to check that
the endpoints don't conflict. Things that do use watches are dynamic
configs, ACLs, and some others (listed in the KIP).


On Wed, Oct 26, 2022 at 4:26 PM David Arthur 
wrote:

> Luke and Andrew, thanks for taking a look!
>
> I think the names of the state machine in the KIP aren't the best. I'll
> try to improve that section in general.
>
> 1. "MigrationReady" is probably better named "MigratingFromZk" or
> something. It's meant to be the state when the KRaft controller is actively
> migrating the data out of ZK. This happens after we have detected that the
> cluster is eligible and before we enter the dual write mode.
>
> 2. "MigrationActive" should probably be called "BrokerMigration".
> Transitioning to "MigrationFinished" can happen automatically when all the
> known ZK brokers have been migrated. Since we don't have permanent
> registrations of ZK brokers, we can use the partition assignments as a
> proxy for this.
>
> 3. A metric for unready brokers makes sense. We can also log a message on
> the controller when it tries to start the migration
>
> 4. The MetadataType metric is meant to help see this. E.g., some brokers
> would have MetadataType=ZK, some would have MetadataType=Dual
>
> 5. On ZK brokers, not having this config set would prevent the new broker
> registration data from being written to ZK. This means the KRaft controller
> won't send RPCs to it. If a broker has been migrated to KRaft already, I'm
> not sure there is any harm in removing this config. If we decide that we
> need to guarantee that KRaft brokers have this config set during the
> migration, we can include it in the broker registration that's sent to
> KRaft. This would let the controller keep that broker fenced.
>
> 6. Once the KRaft controller takes leadership, the ZK controller won't be
> active any more and will stop reporting the MigrationState metric.
>
> 7. The "Dual" MetadataType is reported by brokers running in KRaft mode
> when the migration enable config is set. I think the controller should also
> report this, not just the brokers. I'll clarify in the KIP.
>
>
>
> Andrew:
>
> > How will the code get all the ZooKeeper config? Will it do some sort of
> scan of the ZooKeeper data store? ... Do we need to do anything special to
> make sure we get all data from ZooKeeper?
>
> With a few exceptions, all data written to ZK by Kafka happens on the
> controller (single writer principle). In our migration code, we can
> unconditionally update the "/controller" and "/controller_epoch" ZNodes and
> effectively force the migration component to gain leadership of the ZK
> controller. Once this happens, we don't expect any data to be written to
> ZK, so we can read it iteratively without worrying about any concurrent
> writes.
>
> As for the linearizable thing, I believe this just means that reads may be
> served by a quorum follower which has stale data. We'll be reading from ZK
> the same way the ZK controller does, so I think we will be fine regarding
> consistency. If we wanted to be extra careful, we could add a delay prior
> to iterating through the znodes to give partitioned ZK followers a chance
> to get kicked out of the quorum. I don't think we'll need that though
>
> > What happens if we commit the 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-26 Thread David Arthur
Luke and Andrew, thanks for taking a look!

I think the names of the state machine in the KIP aren't the best. I'll try
to improve that section in general.

1. "MigrationReady" is probably better named "MigratingFromZk" or
something. It's meant to be the state when the KRaft controller is actively
migrating the data out of ZK. This happens after we have detected that the
cluster is eligible and before we enter the dual write mode.

2. "MigrationActive" should probably be called "BrokerMigration".
Transitioning to "MigrationFinished" can happen automatically when all the
known ZK brokers have been migrated. Since we don't have permanent
registrations of ZK brokers, we can use the partition assignments as a
proxy for this.

3. A metric for unready brokers makes sense. We can also log a message on
the controller when it tries to start the migration

4. The MetadataType metric is meant to help see this. E.g., some brokers
would have MetadataType=ZK, some would have MetadataType=Dual

5. On ZK brokers, not having this config set would prevent the new broker
registration data from being written to ZK. This means the KRaft controller
won't send RPCs to it. If a broker has been migrated to KRaft already, I'm
not sure there is any harm in removing this config. If we decide that we
need to guarantee that KRaft brokers have this config set during the
migration, we can include it in the broker registration that's sent to
KRaft. This would let the controller keep that broker fenced.

6. Once the KRaft controller takes leadership, the ZK controller won't be
active any more and will stop reporting the MigrationState metric.

7. The "Dual" MetadataType is reported by brokers running in KRaft mode
when the migration enable config is set. I think the controller should also
report this, not just the brokers. I'll clarify in the KIP.



Andrew:

> How will the code get all the ZooKeeper config? Will it do some sort of
scan of the ZooKeeper data store? ... Do we need to do anything special to
make sure we get all data from ZooKeeper?

With a few exceptions, all data written to ZK by Kafka happens on the
controller (single writer principle). In our migration code, we can
unconditionally update the "/controller" and "/controller_epoch" ZNodes and
effectively force the migration component to gain leadership of the ZK
controller. Once this happens, we don't expect any data to be written to
ZK, so we can read it iteratively without worrying about any concurrent
writes.

As for the linearizable thing, I believe this just means that reads may be
served by a quorum follower which has stale data. We'll be reading from ZK
the same way the ZK controller does, so I think we will be fine regarding
consistency. If we wanted to be extra careful, we could add a delay prior
to iterating through the znodes to give partitioned ZK followers a chance
to get kicked out of the quorum. I don't think we'll need that though

> What happens if we commit the transaction then fail right after that and
restart?

If we commit the "migration" transaction to the metadata log, it won't be a
problem if we failover or restart. We can recover our state based on the
metadata log and what exists in ZK. If we see a migration transaction in
the kraft log, but no "/migration" ZNode, we'll know we failed before
writing to ZK. If we see a partial transaction in the log, then we can
abort it and restart the migration.

> This sort of leads to me wondering if we will/should check the metadata
log state before doing the migration? That could also catch mistakes such
as if a KRaft quorum is used that already had some metadata which I assume
we want to prevent.

Yes, we should check that the metadata log is empty before attempting a
migration (perhaps excepting the bootstrap metadata -- need to think on
that one still).




On Mon, Oct 24, 2022 at 8:57 AM Andrew Grant 
wrote:

> Hey David,
>
>
> Thanks for the KIP. I had a few small questions.
>
>
> “The ZK data migration will copy the existing ZK data into the KRaft
> metadata log and establish the new KRaft active controller as the active
> controller from a ZK perspective.”
>
> How will the code get all the ZooKeeper config? Will it do some sort of
> scan of the ZooKeeper data store? Also I’m not a ZooKeeper expert but per
> https://zookeeper.apache.org/doc/current/zookeeperInternals.html “Read
> operations in ZooKeeper are *not linearizable* since they can return
> potentially stale data.” Do we need to do anything special to make sure we
> get all data from ZooKeeper?
>
>
> “For the initial migration, the controller will utilize KIP-868 Metadata
> Transactions
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-868+Metadata+Transactions
> >
> to
> write all of the ZK metadata in a single transaction. If the controller
> fails before this transaction is finalized, the next active controller will
> abort the transaction and restart the migration process.” What happens if
> we commit the transaction then fail 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-26 Thread Jun Rao
Hi, David,

Thanks for the updated KIP. A few more comments.

20. "Setting this to "true" on the KRaft controllers is the trigger for
starting the migration (more on that below)." Should we wait until all
quorum controller nodes set kafka.metadata.migration.enable to true before
enabling the migration? Otherwise, if the quorum controller fails over to
another node with kafka.metadata.migration.enable set to true, the new
controller info won't be propagated to the ZK based brokers.

21. "Once the operator has decided to commit to KRaft mode, the final step
is to restart the controller quorum and take it out of migration mode by
setting kafka.metadata.migration.enable to "false" (or unsetting it). Once
the controller leaves migration mode, it will no longer perform writes to
ZK and it will disable its special ZK handling of ZK RPCs." Similar to the
above, I am wondering if we need to take the KRaft controller out of the
dual write mode only after all quorum controllers have set
kafka.metadata.migration.enable to "false". Otherwise, it's possible for
the controller to switch from dual writes to single write and to dual
writes again. The second dual writes may not have the update to date
metadata in ZK.

22. "Inversely, as KRaft brokers register with the KRaft controller, we
must write this data back to ZK to prevent ZK brokers from registering with
the same node ID." I guess the controller needs to write the broker
registration for the KRaft brokers in ZK? Is that an ephemeral node or
persistent node? If it's the latter, who should clean it up during
downgrade?

23. "Following the migration of metadata and controller leadership to
KRaft, the brokers are restarted one-by-one in KRaft mode." When the broker
is restarted in KRaft mode, the controller is still in dual writes mode.
But I guess it won't send metadata PRC to the broker in KRaft mode?

24. During the migration, the KRaft controller and broker need the ZK
related configs. It would be useful to document those.

25. "Operator can remove the persistent "/controller" and
"/controller_epoch" nodes allowing for ZK controller election to take
place." Do we need to do that using a ZK multi-operation? Should we provide
a tool to make this easier?

26. "The ZK brokers still rely on the watch mechanism to learn about
changes to these metadata. By performing dual writes, we cover these
cases." Does the ZK broker rely on the watch mechanism for the Broker
Registration ZK path?

Thanks,

Jun


On Mon, Oct 24, 2022 at 5:57 AM Andrew Grant 
wrote:

> Hey David,
>
>
> Thanks for the KIP. I had a few small questions.
>
>
> “The ZK data migration will copy the existing ZK data into the KRaft
> metadata log and establish the new KRaft active controller as the active
> controller from a ZK perspective.”
>
> How will the code get all the ZooKeeper config? Will it do some sort of
> scan of the ZooKeeper data store? Also I’m not a ZooKeeper expert but per
> https://zookeeper.apache.org/doc/current/zookeeperInternals.html “Read
> operations in ZooKeeper are *not linearizable* since they can return
> potentially stale data.” Do we need to do anything special to make sure we
> get all data from ZooKeeper?
>
>
> “For the initial migration, the controller will utilize KIP-868 Metadata
> Transactions
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-868+Metadata+Transactions
> >
> to
> write all of the ZK metadata in a single transaction. If the controller
> fails before this transaction is finalized, the next active controller will
> abort the transaction and restart the migration process.” What happens if
> we commit the transaction then fail right after that and restart? This sort
> of leads to me wondering if we will/should check the metadata log state
> before doing the migration? That could also catch mistakes such as if a
> KRaft quorum is used that already had some metadata which I assume we want
> to prevent.
>
>
> For the Test Plan section do you think it’s worth calling out performance
> testing migrations of ZooKeeper deployments that have “large” amounts of
> metadata?
>
>
> Thanks,
>
> Andrew
>
> On Mon, Oct 24, 2022 at 3:20 AM Luke Chen  wrote:
>
> > Hi David,
> >
> > Thanks for the good and complicated proposal! :)
> > Some questions:
> >
> > 1. "MigrationReady" state: The KIP said:
> > The KRaft quorum has been started
> > I don't think we'll automatically enter this state after KRaft quorum
> > started, do we?
> > I think KRaft active controller should take over leadership by writing a
> > znode in /controller path before we entering this sate, is that correct?
> >
> > 2. "MigrationActive" state: the KIP said:
> > ZK state has been migrated, controller is in dual-write mode, brokers are
> > being restarted in KRaft mode
> > What confuses me is the last part: "brokers are being restarted in KRaft
> > mode".
> > How could we detect brokers are being restarted in KRaft mode? Old ZK
> > broker is removed and new KRaft broker is up within N minutes?
> > I 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-24 Thread Andrew Grant
Hey David,


Thanks for the KIP. I had a few small questions.


“The ZK data migration will copy the existing ZK data into the KRaft
metadata log and establish the new KRaft active controller as the active
controller from a ZK perspective.”

How will the code get all the ZooKeeper config? Will it do some sort of
scan of the ZooKeeper data store? Also I’m not a ZooKeeper expert but per
https://zookeeper.apache.org/doc/current/zookeeperInternals.html “Read
operations in ZooKeeper are *not linearizable* since they can return
potentially stale data.” Do we need to do anything special to make sure we
get all data from ZooKeeper?


“For the initial migration, the controller will utilize KIP-868 Metadata
Transactions

to
write all of the ZK metadata in a single transaction. If the controller
fails before this transaction is finalized, the next active controller will
abort the transaction and restart the migration process.” What happens if
we commit the transaction then fail right after that and restart? This sort
of leads to me wondering if we will/should check the metadata log state
before doing the migration? That could also catch mistakes such as if a
KRaft quorum is used that already had some metadata which I assume we want
to prevent.


For the Test Plan section do you think it’s worth calling out performance
testing migrations of ZooKeeper deployments that have “large” amounts of
metadata?


Thanks,

Andrew

On Mon, Oct 24, 2022 at 3:20 AM Luke Chen  wrote:

> Hi David,
>
> Thanks for the good and complicated proposal! :)
> Some questions:
>
> 1. "MigrationReady" state: The KIP said:
> The KRaft quorum has been started
> I don't think we'll automatically enter this state after KRaft quorum
> started, do we?
> I think KRaft active controller should take over leadership by writing a
> znode in /controller path before we entering this sate, is that correct?
>
> 2. "MigrationActive" state: the KIP said:
> ZK state has been migrated, controller is in dual-write mode, brokers are
> being restarted in KRaft mode
> What confuses me is the last part: "brokers are being restarted in KRaft
> mode".
> How could we detect brokers are being restarted in KRaft mode? Old ZK
> broker is removed and new KRaft broker is up within N minutes?
> I think we don't have to rely on the condition "brokers are being restarted
> in KRaft mode" to enter this state.
> "brokers are being restarted in KRaft mode" should be a process happened
> between "MigrationActive" and "MigrationFinished". Does that make sense?
>
> 3. When "Zookeeper" mode trying to enter "MigrationEligible", if there is a
> cluster with tens of brokers, how could users know why this cluster cannot
> be in "MigrationEligible" state, yet? Check that znode manually one by one?
> Or do we plan to create a tool to help them? Or maybe expose the "unready
> ZK brokers" metrics?
>
> 4. Same for "MigrationActive" entering "MigrationFinished" state. Since
> that will be some process for restarting ZK broker into KRaft broker one by
> one, could we expose the "remaining ZK brokers" as metrics?
>
> 5. When users are in "MigrationReady"/"MigrationActive"/"MigrationFinished"
> states, and they accidentally change the KRaft controller config:
> "kafka.metadata.migration.enable"
> to false. What will happen to this cluster? No dual-write for it? Will we
> have any protection for it?
>
> 6. About the "MigrationState" metric:
> The "ZooKeeper" and "MigrationEligible" is reported by ZK controller, and
> the rest of states are reported by KRaft controller -->  makes sense to me.
> One question from it is, when KRaft controller takes over the leadership
> from ZK controller, what will the "MigrationState" value in old ZK
> controller? keep in "MigrationEligible" doesn't make sense. Will there be a
> empty or null state?
>
> 7. About the "MetadataType" metric:
> An enumeration of: ZooKeeper (1), Dual (2), KRaft (3). Each broker reports
> this.
> I don't know how we could map the migration state to these 3 types.
> What is the metadataType when cluster in "MigrationReady" state? Still
> Zookeeper?
> When will brokers enter Dual type?
> This is unclear in the KIP.
>
> Thank you.
> Luke
>
> On Thu, Oct 20, 2022 at 11:33 PM David Arthur
>  wrote:
>
> > Igor, thanks for taking a look! Since JBOD in KRaft is still under
> > discussion and not likely to land before the ZK migration, I think we'll
> > need to defer it. For migrating JBOD clusters from ZK to KRaft, we'll
> also
> > need to address the log dir failure mechanism which currently uses a
> > special ZNode written to by the brokers. There is an old KIP
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> > which proposes a new RPC to move that ZK write to the controller, but I'm
> > not sure if that's the approach we'll want to take. I read through the
> > discussion over in KIP-858 and it 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-24 Thread Luke Chen
Hi David,

Thanks for the good and complicated proposal! :)
Some questions:

1. "MigrationReady" state: The KIP said:
The KRaft quorum has been started
I don't think we'll automatically enter this state after KRaft quorum
started, do we?
I think KRaft active controller should take over leadership by writing a
znode in /controller path before we entering this sate, is that correct?

2. "MigrationActive" state: the KIP said:
ZK state has been migrated, controller is in dual-write mode, brokers are
being restarted in KRaft mode
What confuses me is the last part: "brokers are being restarted in KRaft
mode".
How could we detect brokers are being restarted in KRaft mode? Old ZK
broker is removed and new KRaft broker is up within N minutes?
I think we don't have to rely on the condition "brokers are being restarted
in KRaft mode" to enter this state.
"brokers are being restarted in KRaft mode" should be a process happened
between "MigrationActive" and "MigrationFinished". Does that make sense?

3. When "Zookeeper" mode trying to enter "MigrationEligible", if there is a
cluster with tens of brokers, how could users know why this cluster cannot
be in "MigrationEligible" state, yet? Check that znode manually one by one?
Or do we plan to create a tool to help them? Or maybe expose the "unready
ZK brokers" metrics?

4. Same for "MigrationActive" entering "MigrationFinished" state. Since
that will be some process for restarting ZK broker into KRaft broker one by
one, could we expose the "remaining ZK brokers" as metrics?

5. When users are in "MigrationReady"/"MigrationActive"/"MigrationFinished"
states, and they accidentally change the KRaft controller config:
"kafka.metadata.migration.enable"
to false. What will happen to this cluster? No dual-write for it? Will we
have any protection for it?

6. About the "MigrationState" metric:
The "ZooKeeper" and "MigrationEligible" is reported by ZK controller, and
the rest of states are reported by KRaft controller -->  makes sense to me.
One question from it is, when KRaft controller takes over the leadership
from ZK controller, what will the "MigrationState" value in old ZK
controller? keep in "MigrationEligible" doesn't make sense. Will there be a
empty or null state?

7. About the "MetadataType" metric:
An enumeration of: ZooKeeper (1), Dual (2), KRaft (3). Each broker reports
this.
I don't know how we could map the migration state to these 3 types.
What is the metadataType when cluster in "MigrationReady" state? Still
Zookeeper?
When will brokers enter Dual type?
This is unclear in the KIP.

Thank you.
Luke

On Thu, Oct 20, 2022 at 11:33 PM David Arthur
 wrote:

> Igor, thanks for taking a look! Since JBOD in KRaft is still under
> discussion and not likely to land before the ZK migration, I think we'll
> need to defer it. For migrating JBOD clusters from ZK to KRaft, we'll also
> need to address the log dir failure mechanism which currently uses a
> special ZNode written to by the brokers. There is an old KIP
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> which proposes a new RPC to move that ZK write to the controller, but I'm
> not sure if that's the approach we'll want to take. I read through the
> discussion over in KIP-858 and it sounds like there are some good ideas
> there.
>
> To answer your question more directly, migrating ZK clusters using JBOD is
> out of scope for this KIP. It might be possible to stop using JBOD using
> reassignments, but I'm not sure about that.
>
> -David
>
> On Tue, Oct 18, 2022 at 12:17 PM Igor Soarez  wrote:
>
> > Hi David,
> >
> > Thanks for the KIP, this is very exciting!
> >
> > How does JBOD relate to this work? KRaft mode doesn't yet support
> > configuring brokers with multiple log directories. If the brokers in the
> > existing cluster are configured with multiple log dirs, does the
> migration
> > imply that the existing brokers need to drop use of that feature? Or is
> > there some way to upgrade them later?
> >
> > Thanks,
> >
> > --
> > Igor
> >
> > On Mon, Oct 17, 2022, at 10:07 PM, David Arthur wrote:
> > > I've updated the KIP with the following changes (the confluence diff is
> > not
> > > helpful here since i rearranged some things)
> > >
> > > * Added ZooKeeperBlockingKRaftMillis metric
> > > * Added section on new broker registration JSON
> > > * Removed section on MigrationCheck RPC
> > > * Added change to UpdateMetadataRequest
> > > * Added section "Additional ZK Broker Configs" (includes configs to
> > connect
> > > to KRaft quorum)
> > > * Added section on "Incompatible Brokers" under Failure Modes
> > > * Clarified many things per this discussion thread
> > >
> > > I realized we need the KRaft controller to pick the correct
> > > "metadata.version" when initializing the migration. I included the IBP
> > of a
> > > broker in its registration data so the KRaft controller can verify the
> > IBP
> > > and pick the correct "metadata.version" when starting 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-20 Thread David Arthur
Igor, thanks for taking a look! Since JBOD in KRaft is still under
discussion and not likely to land before the ZK migration, I think we'll
need to defer it. For migrating JBOD clusters from ZK to KRaft, we'll also
need to address the log dir failure mechanism which currently uses a
special ZNode written to by the brokers. There is an old KIP
https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
which proposes a new RPC to move that ZK write to the controller, but I'm
not sure if that's the approach we'll want to take. I read through the
discussion over in KIP-858 and it sounds like there are some good ideas
there.

To answer your question more directly, migrating ZK clusters using JBOD is
out of scope for this KIP. It might be possible to stop using JBOD using
reassignments, but I'm not sure about that.

-David

On Tue, Oct 18, 2022 at 12:17 PM Igor Soarez  wrote:

> Hi David,
>
> Thanks for the KIP, this is very exciting!
>
> How does JBOD relate to this work? KRaft mode doesn't yet support
> configuring brokers with multiple log directories. If the brokers in the
> existing cluster are configured with multiple log dirs, does the migration
> imply that the existing brokers need to drop use of that feature? Or is
> there some way to upgrade them later?
>
> Thanks,
>
> --
> Igor
>
> On Mon, Oct 17, 2022, at 10:07 PM, David Arthur wrote:
> > I've updated the KIP with the following changes (the confluence diff is
> not
> > helpful here since i rearranged some things)
> >
> > * Added ZooKeeperBlockingKRaftMillis metric
> > * Added section on new broker registration JSON
> > * Removed section on MigrationCheck RPC
> > * Added change to UpdateMetadataRequest
> > * Added section "Additional ZK Broker Configs" (includes configs to
> connect
> > to KRaft quorum)
> > * Added section on "Incompatible Brokers" under Failure Modes
> > * Clarified many things per this discussion thread
> >
> > I realized we need the KRaft controller to pick the correct
> > "metadata.version" when initializing the migration. I included the IBP
> of a
> > broker in its registration data so the KRaft controller can verify the
> IBP
> > and pick the correct "metadata.version" when starting the migration.
> > Otherwise, we could inadvertently downgrade the IBP/metadata.version as
> > part of the migration.
> >
> > I also added "clusterId" to the broker registration data so the KRaft
> could
> > verify it.
> >
> > -David
> >
> >
> >
> > On Mon, Oct 17, 2022 at 12:14 PM Colin McCabe 
> wrote:
> >
> >> On Fri, Oct 14, 2022, at 11:39, Jun Rao wrote:
> >> > Hi, Colin,
> >> >
> >> > 10. "That all goes away in the new mode, and we just have some code
> which
> >> > analyzes __cluster_metadata and reflects it in 1) updates to ZK and 2)
> >> > messages sent out to brokers."
> >> > Hmm, I am not sure it's that simple. Some of the complexity of the
> >> ZK-based
> >> > controller are (1) using the pipelining approach to write to ZK for
> >> better
> >> > throughput and using conditional writes for correctness; (2) sending
> the
> >> > proper LeaderAndIsr and UpdateMetadata requests. For example, during
> >> > controller failover, the full metadata needs to be sent while during
> >> > individual broker failure, only some of the metadata needs to be
> updated.
> >> > The controlled shutdown handling sometimes uses StopReplicaRequest
> and
> >> > some other times uses LeaderAndIsrRequest. (3) triggering new events
> >> based
> >> > on the responses of LeaderAndIsr (e.g. for topic deletion). Some of
> those
> >> > complexity could be re-implemented in a more efficient way, but we
> need
> >> to
> >> > be really careful not to generate regression. Some of the other
> >> complexity
> >> > just won't go away. Reimplementing all those logic for the 30 or so
> >> events
> >> > in the ZK-based controller is possible, but seems a bit daunting and
> >> risky.
> >>
> >> Hi Jun,
> >>
> >> Thanks for the response.  I agree that there is some work here but I
> don't
> >> think it's as bad as it might seem. Most of the code for writing to ZK
> can
> >> be reused from the old controller. We should at least evaluate using the
> >> old ControllerChannelManager as well. I don't know if we'll end up
> doing it
> >> or not but it's a possibility. (The main reason to not do it is that the
> >> response handling will be a bit different)
> >>
> >> The question of what to do during a controller failover is an
> interesting
> >> one. Technically, we should be able to continue sending incremental
> updates
> >> to the legacy nodes, for the same reason we can in KRaft mode. However,
> >> probably we should just copy what ZK mode does and send them a full
> >> metadata update. This will allow us to have an easy way to handle
> >> divergences caused by lost RPCs by changing the controller (just as we
> do
> >> in ZK mode, unfortunately). I suppose we should document this in the
> KIP...
> >>
> >> I agree controlled shutdown is 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-18 Thread Igor Soarez
Hi David,

Thanks for the KIP, this is very exciting!

How does JBOD relate to this work? KRaft mode doesn't yet support configuring 
brokers with multiple log directories. If the brokers in the existing cluster 
are configured with multiple log dirs, does the migration imply that the 
existing brokers need to drop use of that feature? Or is there some way to 
upgrade them later?

Thanks,

--
Igor

On Mon, Oct 17, 2022, at 10:07 PM, David Arthur wrote:
> I've updated the KIP with the following changes (the confluence diff is not
> helpful here since i rearranged some things)
>
> * Added ZooKeeperBlockingKRaftMillis metric
> * Added section on new broker registration JSON
> * Removed section on MigrationCheck RPC
> * Added change to UpdateMetadataRequest
> * Added section "Additional ZK Broker Configs" (includes configs to connect
> to KRaft quorum)
> * Added section on "Incompatible Brokers" under Failure Modes
> * Clarified many things per this discussion thread
>
> I realized we need the KRaft controller to pick the correct
> "metadata.version" when initializing the migration. I included the IBP of a
> broker in its registration data so the KRaft controller can verify the IBP
> and pick the correct "metadata.version" when starting the migration.
> Otherwise, we could inadvertently downgrade the IBP/metadata.version as
> part of the migration.
>
> I also added "clusterId" to the broker registration data so the KRaft could
> verify it.
>
> -David
>
>
>
> On Mon, Oct 17, 2022 at 12:14 PM Colin McCabe  wrote:
>
>> On Fri, Oct 14, 2022, at 11:39, Jun Rao wrote:
>> > Hi, Colin,
>> >
>> > 10. "That all goes away in the new mode, and we just have some code which
>> > analyzes __cluster_metadata and reflects it in 1) updates to ZK and 2)
>> > messages sent out to brokers."
>> > Hmm, I am not sure it's that simple. Some of the complexity of the
>> ZK-based
>> > controller are (1) using the pipelining approach to write to ZK for
>> better
>> > throughput and using conditional writes for correctness; (2) sending the
>> > proper LeaderAndIsr and UpdateMetadata requests. For example, during
>> > controller failover, the full metadata needs to be sent while during
>> > individual broker failure, only some of the metadata needs to be updated.
>> > The controlled shutdown handling sometimes uses StopReplicaRequest  and
>> > some other times uses LeaderAndIsrRequest. (3) triggering new events
>> based
>> > on the responses of LeaderAndIsr (e.g. for topic deletion). Some of those
>> > complexity could be re-implemented in a more efficient way, but we need
>> to
>> > be really careful not to generate regression. Some of the other
>> complexity
>> > just won't go away. Reimplementing all those logic for the 30 or so
>> events
>> > in the ZK-based controller is possible, but seems a bit daunting and
>> risky.
>>
>> Hi Jun,
>>
>> Thanks for the response.  I agree that there is some work here but I don't
>> think it's as bad as it might seem. Most of the code for writing to ZK can
>> be reused from the old controller. We should at least evaluate using the
>> old ControllerChannelManager as well. I don't know if we'll end up doing it
>> or not but it's a possibility. (The main reason to not do it is that the
>> response handling will be a bit different)
>>
>> The question of what to do during a controller failover is an interesting
>> one. Technically, we should be able to continue sending incremental updates
>> to the legacy nodes, for the same reason we can in KRaft mode. However,
>> probably we should just copy what ZK mode does and send them a full
>> metadata update. This will allow us to have an easy way to handle
>> divergences caused by lost RPCs by changing the controller (just as we do
>> in ZK mode, unfortunately). I suppose we should document this in the KIP...
>>
>> I agree controlled shutdown is tricky to get just right. I suppose this is
>> a case where the RPCs we send out are not purely "fire and forget"; we have
>> to listen for the response. But that can be done in an event-based way.
>> Controlled shutdown is also probably the last thing we'll implement once we
>> have the basic lift and shift.
>>
>> Topic deletion is another annoying thing. I wonder if we could just delete
>> immediately here and skip the complexity of implementing "deleting state."
>> Topic IDs will exist in IBP 3.4, in both ZK and KRaft mode, so it should be
>> OK, right?
>>
>> best,
>> Colin
>>
>>
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Fri, Oct 14, 2022 at 9:29 AM Colin McCabe  wrote:
>> >
>> >> On Thu, Oct 13, 2022, at 11:44, Jun Rao wrote:
>> >> > Hi, Colin,
>> >> >
>> >> > Thanks for the reply.
>> >> >
>> >> > 10. This is a bit on the implementation side. If you look at the
>> existing
>> >> > ZK-based controller, most of the logic is around maintaining an
>> in-memory
>> >> > state of all the resources (broker, topic, partition, etc),
>> >> reading/writing
>> >> > to ZK, sending LeaderAndIsr and UpdateMetadata requests and 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-18 Thread Igor Soarez
Hi Colin,

> I agree controlled shutdown is tricky to get just right. I suppose this is a 
> case where the RPCs we send out are not purely "fire and forget"; we have to 
> listen for the response. But that can be done in an event-based way. 
> Controlled shutdown is also probably the last thing we'll implement once we 
> have the basic lift and shift.

LeaderAndIsr requests are also not "fire and forget" either as the broker 
response can include per-partition errors which trigger ISR and leadership 
changes.

> Topic deletion is another annoying thing. I wonder if we could just delete 
> immediately here and skip the complexity of implementing "deleting state." 
> Topic IDs will exist in IBP 3.4, in both ZK and KRaft mode, so it should be 
> OK, right?

I don’t think this is OK because we still use topic names (not topic IDs) to 
persist records under log directories. If one of the brokers is unavailable 
when a topic is deleted and re-created with the same name that broker might 
come back later with conflicting state/records on that topic. 

Best,
--
Igor


Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-17 Thread David Arthur
I've updated the KIP with the following changes (the confluence diff is not
helpful here since i rearranged some things)

* Added ZooKeeperBlockingKRaftMillis metric
* Added section on new broker registration JSON
* Removed section on MigrationCheck RPC
* Added change to UpdateMetadataRequest
* Added section "Additional ZK Broker Configs" (includes configs to connect
to KRaft quorum)
* Added section on "Incompatible Brokers" under Failure Modes
* Clarified many things per this discussion thread

I realized we need the KRaft controller to pick the correct
"metadata.version" when initializing the migration. I included the IBP of a
broker in its registration data so the KRaft controller can verify the IBP
and pick the correct "metadata.version" when starting the migration.
Otherwise, we could inadvertently downgrade the IBP/metadata.version as
part of the migration.

I also added "clusterId" to the broker registration data so the KRaft could
verify it.

-David



On Mon, Oct 17, 2022 at 12:14 PM Colin McCabe  wrote:

> On Fri, Oct 14, 2022, at 11:39, Jun Rao wrote:
> > Hi, Colin,
> >
> > 10. "That all goes away in the new mode, and we just have some code which
> > analyzes __cluster_metadata and reflects it in 1) updates to ZK and 2)
> > messages sent out to brokers."
> > Hmm, I am not sure it's that simple. Some of the complexity of the
> ZK-based
> > controller are (1) using the pipelining approach to write to ZK for
> better
> > throughput and using conditional writes for correctness; (2) sending the
> > proper LeaderAndIsr and UpdateMetadata requests. For example, during
> > controller failover, the full metadata needs to be sent while during
> > individual broker failure, only some of the metadata needs to be updated.
> > The controlled shutdown handling sometimes uses StopReplicaRequest  and
> > some other times uses LeaderAndIsrRequest. (3) triggering new events
> based
> > on the responses of LeaderAndIsr (e.g. for topic deletion). Some of those
> > complexity could be re-implemented in a more efficient way, but we need
> to
> > be really careful not to generate regression. Some of the other
> complexity
> > just won't go away. Reimplementing all those logic for the 30 or so
> events
> > in the ZK-based controller is possible, but seems a bit daunting and
> risky.
>
> Hi Jun,
>
> Thanks for the response.  I agree that there is some work here but I don't
> think it's as bad as it might seem. Most of the code for writing to ZK can
> be reused from the old controller. We should at least evaluate using the
> old ControllerChannelManager as well. I don't know if we'll end up doing it
> or not but it's a possibility. (The main reason to not do it is that the
> response handling will be a bit different)
>
> The question of what to do during a controller failover is an interesting
> one. Technically, we should be able to continue sending incremental updates
> to the legacy nodes, for the same reason we can in KRaft mode. However,
> probably we should just copy what ZK mode does and send them a full
> metadata update. This will allow us to have an easy way to handle
> divergences caused by lost RPCs by changing the controller (just as we do
> in ZK mode, unfortunately). I suppose we should document this in the KIP...
>
> I agree controlled shutdown is tricky to get just right. I suppose this is
> a case where the RPCs we send out are not purely "fire and forget"; we have
> to listen for the response. But that can be done in an event-based way.
> Controlled shutdown is also probably the last thing we'll implement once we
> have the basic lift and shift.
>
> Topic deletion is another annoying thing. I wonder if we could just delete
> immediately here and skip the complexity of implementing "deleting state."
> Topic IDs will exist in IBP 3.4, in both ZK and KRaft mode, so it should be
> OK, right?
>
> best,
> Colin
>
>
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Oct 14, 2022 at 9:29 AM Colin McCabe  wrote:
> >
> >> On Thu, Oct 13, 2022, at 11:44, Jun Rao wrote:
> >> > Hi, Colin,
> >> >
> >> > Thanks for the reply.
> >> >
> >> > 10. This is a bit on the implementation side. If you look at the
> existing
> >> > ZK-based controller, most of the logic is around maintaining an
> in-memory
> >> > state of all the resources (broker, topic, partition, etc),
> >> reading/writing
> >> > to ZK, sending LeaderAndIsr and UpdateMetadata requests and handling
> the
> >> > responses to brokers. So we need all that logic in the dual write
> mode.
> >> One
> >> > option is to duplicate all that logic in some new code. This can be a
> bit
> >> > error prone and makes the code a bit harder to maintain if we need to
> fix
> >> > some critical issues in ZK-based controllers. Another option is to try
> >> > reusing the existing code in the ZK-based controller. For example, we
> >> could
> >> > start the EventManager in the ZK-based controller, but let the KRaft
> >> > controller ingest new events. This has its own challenges: (1) the

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-17 Thread Colin McCabe
On Fri, Oct 14, 2022, at 11:39, Jun Rao wrote:
> Hi, Colin,
>
> 10. "That all goes away in the new mode, and we just have some code which
> analyzes __cluster_metadata and reflects it in 1) updates to ZK and 2)
> messages sent out to brokers."
> Hmm, I am not sure it's that simple. Some of the complexity of the ZK-based
> controller are (1) using the pipelining approach to write to ZK for better
> throughput and using conditional writes for correctness; (2) sending the
> proper LeaderAndIsr and UpdateMetadata requests. For example, during
> controller failover, the full metadata needs to be sent while during
> individual broker failure, only some of the metadata needs to be updated.
> The controlled shutdown handling sometimes uses StopReplicaRequest  and
> some other times uses LeaderAndIsrRequest. (3) triggering new events based
> on the responses of LeaderAndIsr (e.g. for topic deletion). Some of those
> complexity could be re-implemented in a more efficient way, but we need to
> be really careful not to generate regression. Some of the other complexity
> just won't go away. Reimplementing all those logic for the 30 or so events
> in the ZK-based controller is possible, but seems a bit daunting and risky.

Hi Jun,

Thanks for the response.  I agree that there is some work here but I don't 
think it's as bad as it might seem. Most of the code for writing to ZK can be 
reused from the old controller. We should at least evaluate using the old 
ControllerChannelManager as well. I don't know if we'll end up doing it or not 
but it's a possibility. (The main reason to not do it is that the response 
handling will be a bit different)

The question of what to do during a controller failover is an interesting one. 
Technically, we should be able to continue sending incremental updates to the 
legacy nodes, for the same reason we can in KRaft mode. However, probably we 
should just copy what ZK mode does and send them a full metadata update. This 
will allow us to have an easy way to handle divergences caused by lost RPCs by 
changing the controller (just as we do in ZK mode, unfortunately). I suppose we 
should document this in the KIP...

I agree controlled shutdown is tricky to get just right. I suppose this is a 
case where the RPCs we send out are not purely "fire and forget"; we have to 
listen for the response. But that can be done in an event-based way. Controlled 
shutdown is also probably the last thing we'll implement once we have the basic 
lift and shift.

Topic deletion is another annoying thing. I wonder if we could just delete 
immediately here and skip the complexity of implementing "deleting state." 
Topic IDs will exist in IBP 3.4, in both ZK and KRaft mode, so it should be OK, 
right?

best,
Colin


>
> Thanks,
>
> Jun
>
> On Fri, Oct 14, 2022 at 9:29 AM Colin McCabe  wrote:
>
>> On Thu, Oct 13, 2022, at 11:44, Jun Rao wrote:
>> > Hi, Colin,
>> >
>> > Thanks for the reply.
>> >
>> > 10. This is a bit on the implementation side. If you look at the existing
>> > ZK-based controller, most of the logic is around maintaining an in-memory
>> > state of all the resources (broker, topic, partition, etc),
>> reading/writing
>> > to ZK, sending LeaderAndIsr and UpdateMetadata requests and handling the
>> > responses to brokers. So we need all that logic in the dual write mode.
>> One
>> > option is to duplicate all that logic in some new code. This can be a bit
>> > error prone and makes the code a bit harder to maintain if we need to fix
>> > some critical issues in ZK-based controllers. Another option is to try
>> > reusing the existing code in the ZK-based controller. For example, we
>> could
>> > start the EventManager in the ZK-based controller, but let the KRaft
>> > controller ingest new events. This has its own challenges: (1) the
>> existing
>> > logic only logs ZK failures and doesn't expose them to the caller; (2)
>> the
>> > existing logic may add new events to the queue itself and we probably
>> need
>> > to think through how this is coordinated with the KRaft controller; (3)
>> it
>> > registers some ZK listeners unnecessarily (may not be a big concern). So
>> we
>> > need to get around those issues somehow. I am wondering if we have
>> > considered both options and which approach we are leaning towards for the
>> > implementation.
>> >
>>
>> Yes, this is a good question. My take is that a big part of the complexity
>> in the old controller code results from the fact that we use ZK as a
>> multi-writer database for propagating information between different
>> components. So in the old controller, every write to ZK needs to be
>> structured as a compare and swap to be fully correct. Every time we get
>> notified about something, it's usually in the form of "this znode changed"
>> which prompts a full reload of part of the data in ZK (which itself has
>> multiple parts, loading, deserializing, reconciling, etc.) That all goes
>> away in the new mode, and we just have some code which 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-14 Thread David Arthur
Jun, a few thoughts on 10

> using the pipelining approach to write to ZK for better throughput and
using conditional writes for correctness;

For writes to ZK from the KRaft controller, I think we can reuse some or
all of the code in KafkaZkClient which does the MultiOp correctness CheckOp
along with retries, error handling, and pipelining. For the common
pipelining cases (partition state changes and topic creation), we should be
able to preserve the existing pipelining semantics since those are in the
context of a single record batch in KRaft.

> (2) sending the proper LeaderAndIsr and UpdateMetadata requests.

I agree this is one of the more challenging aspects of this design. In
fact, this is really the critical piece of the design that allows us to do
these migrations online. We either need to make the ZK controller
understand the metadata log and KRaft semantics, or we need to make the
KRaft controller understand ZK broker semantics. Both are complex, but I
think the latter is easier :) I do expect we'll want to extract and reuse
some ZK controller code where it makes sense (e.g.,
ControllerChannelManager, ControllerBrokerRequestBatch, KafkaZkClient).

Thanks!
David



On Fri, Oct 14, 2022 at 2:39 PM Jun Rao  wrote:

> Hi, Colin,
>
> 10. "That all goes away in the new mode, and we just have some code which
> analyzes __cluster_metadata and reflects it in 1) updates to ZK and 2)
> messages sent out to brokers."
> Hmm, I am not sure it's that simple. Some of the complexity of the ZK-based
> controller are (1) using the pipelining approach to write to ZK for better
> throughput and using conditional writes for correctness; (2) sending the
> proper LeaderAndIsr and UpdateMetadata requests. For example, during
> controller failover, the full metadata needs to be sent while during
> individual broker failure, only some of the metadata needs to be updated.
> The controlled shutdown handling sometimes uses StopReplicaRequest  and
> some other times uses LeaderAndIsrRequest. (3) triggering new events based
> on the responses of LeaderAndIsr (e.g. for topic deletion). Some of those
> complexity could be re-implemented in a more efficient way, but we need to
> be really careful not to generate regression. Some of the other complexity
> just won't go away. Reimplementing all those logic for the 30 or so events
> in the ZK-based controller is possible, but seems a bit daunting and risky.
>
> Thanks,
>
> Jun
>
> On Fri, Oct 14, 2022 at 9:29 AM Colin McCabe  wrote:
>
> > On Thu, Oct 13, 2022, at 11:44, Jun Rao wrote:
> > > Hi, Colin,
> > >
> > > Thanks for the reply.
> > >
> > > 10. This is a bit on the implementation side. If you look at the
> existing
> > > ZK-based controller, most of the logic is around maintaining an
> in-memory
> > > state of all the resources (broker, topic, partition, etc),
> > reading/writing
> > > to ZK, sending LeaderAndIsr and UpdateMetadata requests and handling
> the
> > > responses to brokers. So we need all that logic in the dual write mode.
> > One
> > > option is to duplicate all that logic in some new code. This can be a
> bit
> > > error prone and makes the code a bit harder to maintain if we need to
> fix
> > > some critical issues in ZK-based controllers. Another option is to try
> > > reusing the existing code in the ZK-based controller. For example, we
> > could
> > > start the EventManager in the ZK-based controller, but let the KRaft
> > > controller ingest new events. This has its own challenges: (1) the
> > existing
> > > logic only logs ZK failures and doesn't expose them to the caller; (2)
> > the
> > > existing logic may add new events to the queue itself and we probably
> > need
> > > to think through how this is coordinated with the KRaft controller; (3)
> > it
> > > registers some ZK listeners unnecessarily (may not be a big concern).
> So
> > we
> > > need to get around those issues somehow. I am wondering if we have
> > > considered both options and which approach we are leaning towards for
> the
> > > implementation.
> > >
> >
> > Yes, this is a good question. My take is that a big part of the
> complexity
> > in the old controller code results from the fact that we use ZK as a
> > multi-writer database for propagating information between different
> > components. So in the old controller, every write to ZK needs to be
> > structured as a compare and swap to be fully correct. Every time we get
> > notified about something, it's usually in the form of "this znode
> changed"
> > which prompts a full reload of part of the data in ZK (which itself has
> > multiple parts, loading, deserializing, reconciling, etc.) That all goes
> > away in the new mode, and we just have some code which analyzes
> > __cluster_metadata and reflects it in 1) updates to ZK and 2) messages
> sent
> > out to brokers.
> >
> > This is pretty decoupled from the other logic in QuorumController and
> > should be easy to unit test, since the same inputs from the log always
> > produce the same 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-14 Thread Jun Rao
Hi, Colin,

10. "That all goes away in the new mode, and we just have some code which
analyzes __cluster_metadata and reflects it in 1) updates to ZK and 2)
messages sent out to brokers."
Hmm, I am not sure it's that simple. Some of the complexity of the ZK-based
controller are (1) using the pipelining approach to write to ZK for better
throughput and using conditional writes for correctness; (2) sending the
proper LeaderAndIsr and UpdateMetadata requests. For example, during
controller failover, the full metadata needs to be sent while during
individual broker failure, only some of the metadata needs to be updated.
The controlled shutdown handling sometimes uses StopReplicaRequest  and
some other times uses LeaderAndIsrRequest. (3) triggering new events based
on the responses of LeaderAndIsr (e.g. for topic deletion). Some of those
complexity could be re-implemented in a more efficient way, but we need to
be really careful not to generate regression. Some of the other complexity
just won't go away. Reimplementing all those logic for the 30 or so events
in the ZK-based controller is possible, but seems a bit daunting and risky.

Thanks,

Jun

On Fri, Oct 14, 2022 at 9:29 AM Colin McCabe  wrote:

> On Thu, Oct 13, 2022, at 11:44, Jun Rao wrote:
> > Hi, Colin,
> >
> > Thanks for the reply.
> >
> > 10. This is a bit on the implementation side. If you look at the existing
> > ZK-based controller, most of the logic is around maintaining an in-memory
> > state of all the resources (broker, topic, partition, etc),
> reading/writing
> > to ZK, sending LeaderAndIsr and UpdateMetadata requests and handling the
> > responses to brokers. So we need all that logic in the dual write mode.
> One
> > option is to duplicate all that logic in some new code. This can be a bit
> > error prone and makes the code a bit harder to maintain if we need to fix
> > some critical issues in ZK-based controllers. Another option is to try
> > reusing the existing code in the ZK-based controller. For example, we
> could
> > start the EventManager in the ZK-based controller, but let the KRaft
> > controller ingest new events. This has its own challenges: (1) the
> existing
> > logic only logs ZK failures and doesn't expose them to the caller; (2)
> the
> > existing logic may add new events to the queue itself and we probably
> need
> > to think through how this is coordinated with the KRaft controller; (3)
> it
> > registers some ZK listeners unnecessarily (may not be a big concern). So
> we
> > need to get around those issues somehow. I am wondering if we have
> > considered both options and which approach we are leaning towards for the
> > implementation.
> >
>
> Yes, this is a good question. My take is that a big part of the complexity
> in the old controller code results from the fact that we use ZK as a
> multi-writer database for propagating information between different
> components. So in the old controller, every write to ZK needs to be
> structured as a compare and swap to be fully correct. Every time we get
> notified about something, it's usually in the form of "this znode changed"
> which prompts a full reload of part of the data in ZK (which itself has
> multiple parts, loading, deserializing, reconciling, etc.) That all goes
> away in the new mode, and we just have some code which analyzes
> __cluster_metadata and reflects it in 1) updates to ZK and 2) messages sent
> out to brokers.
>
> This is pretty decoupled from the other logic in QuorumController and
> should be easy to unit test, since the same inputs from the log always
> produce the same output in ZK. Basically, ZK is write-only for us, we do
> not read it (with the big exception of broker registration znodes) and I
> think that will greatly simplify things.
>
> So I think dual-write mode as described here will be substantially simpler
> than trying to run part or all of the old controller in parallel. I do
> think we will reuse a bunch of the serialization / deserialization code for
> znodes and possibly the code for communicating with ZK.
>
> best,
> Colin
>
>
> >
> > 14. Good point and make sense.
> >
> > Thanks,
> >
> > Jun
> >
> >
> >
> >
> > On Wed, Oct 12, 2022 at 3:27 PM Colin McCabe  wrote:
> >
> >> Hi Jun,
> >>
> >> Thanks for taking a look. I can answer some questions here because I
> >> collaborated on this a bit, and David is on vacation for a few days.
> >>
> >> On Wed, Oct 12, 2022, at 14:41, Jun Rao wrote:
> >> > Hi, David,
> >> >
> >> > Thanks for the KIP. A few comments below.
> >> >
> >> > 10. It's still not very clear to me how the KRaft controller works in
> the
> >> > dual writes mode to KRaft log and ZK when the brokers still run in ZK
> >> mode.
> >> > Does the KRaft controller run a ZK based controller in parallel or do
> we
> >> > derive what needs to be written to ZK based on KRaft controller logic?
> >>
> >> We derive what needs to be written to ZK based on KRaft controller
> logic.
> >>
> >> > I am also not sure how the KRaft controller 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-14 Thread David Arthur
Thanks for the discussion so far, everyone!

Mickael

1) In this case, I think the KRaft controller could avoid sending UMR
or LISR to this broker and not include it in LiveBrokers that are sent
to the rest of the cluster. This would effectively fence it off from
the rest of the cluster. The controller can report this as an error.

2) I agree with Colin's suggestion here. At some point, we'll need to
block writes on the kraft controller. We might want a "time that ZK is
blocking KRaft" metric here to accompany the
"ZooKeeperWriteBehindLag". As we get on with the implementation, I
think we'll be able to pick a reasonable size for the allowable
backlog of writes to ZK.



Jun

10) Colin answered this

11) Yes, that's right -- I'll clarify that.

12) +1

13) Colin answered this

14) As Colin mentioned, we achieve ZK controller fencing by bumping
the controller epoch. I'll clarify this

15) The text is a bit unclear here. What I meant is we will return a
random broker as ControllerId in the MetadataResponse sent to clients,
but the active KRaft controller would be sent as the ControllerId in
UpdateMetadataRequest. However, Colin raises a good point that we'll
need a different code path on the ZK brokers for connecting to a KRaft
controller. We should add a check on "controller.quorum.voters" as a
readiness check on the ZK brokers.

16) I think we can remove this. This is a remnant of an earlier design
without forwarding



Colin

1) Yea, I like this. Since we should be able to statically determine
our "readiness" on a broker, we can put this in the registration data.

2) It's not strictly required, no. However, it might be useful for the
case of explicitly disallowing migrations on a cluster. For example,
if someone brought up a KRaft quorum and misconfigured the
"zk.connect", it could trigger a migration on some existing/unrelated
cluster. This would be mostly harmless, but could cause some trouble
for operators. I think of this config like an arming switch for the
migration.

3) Sounds good

4) Sure, sounds good

5) Yea, agreed. We should fence these brokers -- see my response to
Mickael above.



Colin/Jun

As Colin mentioned in his reply, we can react to records in the
metadata log and update ZK and send out RPCs to brokers. Something
like an asynchronous listener to the metadata log.

Another important detail here is that in the KRaft controller we can
work off a consistent in-memory snapshot of the metadata. This could
be helpful for the case where our operation might take some time (like
sending UMR to all the brokers, or making a lot of partition updates
in ZK). During the time we are processing a record, we don't have to
worry about the metadata changing out from under us.



On Thu, Oct 13, 2022 at 2:44 PM Jun Rao  wrote:
>
> Hi, Colin,
>
> Thanks for the reply.
>
> 10. This is a bit on the implementation side. If you look at the existing
> ZK-based controller, most of the logic is around maintaining an in-memory
> state of all the resources (broker, topic, partition, etc), reading/writing
> to ZK, sending LeaderAndIsr and UpdateMetadata requests and handling the
> responses to brokers. So we need all that logic in the dual write mode. One
> option is to duplicate all that logic in some new code. This can be a bit
> error prone and makes the code a bit harder to maintain if we need to fix
> some critical issues in ZK-based controllers. Another option is to try
> reusing the existing code in the ZK-based controller. For example, we could
> start the EventManager in the ZK-based controller, but let the KRaft
> controller ingest new events. This has its own challenges: (1) the existing
> logic only logs ZK failures and doesn't expose them to the caller; (2) the
> existing logic may add new events to the queue itself and we probably need
> to think through how this is coordinated with the KRaft controller; (3) it
> registers some ZK listeners unnecessarily (may not be a big concern). So we
> need to get around those issues somehow. I am wondering if we have
> considered both options and which approach we are leaning towards for the
> implementation.
>
> 14. Good point and make sense.
>
> Thanks,
>
> Jun
>
>
>
>
> On Wed, Oct 12, 2022 at 3:27 PM Colin McCabe  wrote:
>
> > Hi Jun,
> >
> > Thanks for taking a look. I can answer some questions here because I
> > collaborated on this a bit, and David is on vacation for a few days.
> >
> > On Wed, Oct 12, 2022, at 14:41, Jun Rao wrote:
> > > Hi, David,
> > >
> > > Thanks for the KIP. A few comments below.
> > >
> > > 10. It's still not very clear to me how the KRaft controller works in the
> > > dual writes mode to KRaft log and ZK when the brokers still run in ZK
> > mode.
> > > Does the KRaft controller run a ZK based controller in parallel or do we
> > > derive what needs to be written to ZK based on KRaft controller logic?
> >
> > We derive what needs to be written to ZK based on KRaft controller logic.
> >
> > > I am 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-14 Thread Colin McCabe
On Thu, Oct 13, 2022, at 11:44, Jun Rao wrote:
> Hi, Colin,
>
> Thanks for the reply.
>
> 10. This is a bit on the implementation side. If you look at the existing
> ZK-based controller, most of the logic is around maintaining an in-memory
> state of all the resources (broker, topic, partition, etc), reading/writing
> to ZK, sending LeaderAndIsr and UpdateMetadata requests and handling the
> responses to brokers. So we need all that logic in the dual write mode. One
> option is to duplicate all that logic in some new code. This can be a bit
> error prone and makes the code a bit harder to maintain if we need to fix
> some critical issues in ZK-based controllers. Another option is to try
> reusing the existing code in the ZK-based controller. For example, we could
> start the EventManager in the ZK-based controller, but let the KRaft
> controller ingest new events. This has its own challenges: (1) the existing
> logic only logs ZK failures and doesn't expose them to the caller; (2) the
> existing logic may add new events to the queue itself and we probably need
> to think through how this is coordinated with the KRaft controller; (3) it
> registers some ZK listeners unnecessarily (may not be a big concern). So we
> need to get around those issues somehow. I am wondering if we have
> considered both options and which approach we are leaning towards for the
> implementation.
>

Yes, this is a good question. My take is that a big part of the complexity in 
the old controller code results from the fact that we use ZK as a multi-writer 
database for propagating information between different components. So in the 
old controller, every write to ZK needs to be structured as a compare and swap 
to be fully correct. Every time we get notified about something, it's usually 
in the form of "this znode changed" which prompts a full reload of part of the 
data in ZK (which itself has multiple parts, loading, deserializing, 
reconciling, etc.) That all goes away in the new mode, and we just have some 
code which analyzes __cluster_metadata and reflects it in 1) updates to ZK and 
2) messages sent out to brokers.

This is pretty decoupled from the other logic in QuorumController and should be 
easy to unit test, since the same inputs from the log always produce the same 
output in ZK. Basically, ZK is write-only for us, we do not read it (with the 
big exception of broker registration znodes) and I think that will greatly 
simplify things.

So I think dual-write mode as described here will be substantially simpler than 
trying to run part or all of the old controller in parallel. I do think we will 
reuse a bunch of the serialization / deserialization code for znodes and 
possibly the code for communicating with ZK.

best,
Colin


>
> 14. Good point and make sense.
>
> Thanks,
>
> Jun
>
>
>
>
> On Wed, Oct 12, 2022 at 3:27 PM Colin McCabe  wrote:
>
>> Hi Jun,
>>
>> Thanks for taking a look. I can answer some questions here because I
>> collaborated on this a bit, and David is on vacation for a few days.
>>
>> On Wed, Oct 12, 2022, at 14:41, Jun Rao wrote:
>> > Hi, David,
>> >
>> > Thanks for the KIP. A few comments below.
>> >
>> > 10. It's still not very clear to me how the KRaft controller works in the
>> > dual writes mode to KRaft log and ZK when the brokers still run in ZK
>> mode.
>> > Does the KRaft controller run a ZK based controller in parallel or do we
>> > derive what needs to be written to ZK based on KRaft controller logic?
>>
>> We derive what needs to be written to ZK based on KRaft controller logic.
>>
>> > I am also not sure how the KRaft controller handles broker
>> > registration/deregistration, since brokers are still running in ZK mode
>> and
>> > are not heartbeating to the KRaft controller.
>>
>> The new controller will listen for broker registrations under /brokers.
>> This is the only znode watch that the new controller will do.
>>
>> We did consider changing how ZK-based broker registration worked, but it
>> just ended up being too much work for not enough gain.
>>
>> >
>> > 12. "A new set of nodes will be provisioned to host the controller
>> quorum."
>> > I guess we don't support starting the KRaft controller quorum on existing
>> > brokers. It would be useful to make that clear.
>> >
>>
>> Agreed
>>
>> > 13. "Once the quorum is established and a leader is elected, the
>> controller
>> > will check the state of the cluster using the MigrationCheck RPC." How
>> does
>> > the quorum controller detect other brokers? Does the controller node need
>> > to be configured with ZK connection string? If so, it would be useful to
>> > document the additional configs that the quorum controller needs to set.
>> >
>>
>> Yes, the controllers monitor ZK for broker registrations, as I mentioned
>> above. So they need zk.connect and the other ZK connection configurations.
>>
>> > 14. "In order to prevent further writes to ZK, the first thing the new
>> > KRaft quorum must do is take over leadership of the ZK 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-13 Thread Jun Rao
Hi, Colin,

Thanks for the reply.

10. This is a bit on the implementation side. If you look at the existing
ZK-based controller, most of the logic is around maintaining an in-memory
state of all the resources (broker, topic, partition, etc), reading/writing
to ZK, sending LeaderAndIsr and UpdateMetadata requests and handling the
responses to brokers. So we need all that logic in the dual write mode. One
option is to duplicate all that logic in some new code. This can be a bit
error prone and makes the code a bit harder to maintain if we need to fix
some critical issues in ZK-based controllers. Another option is to try
reusing the existing code in the ZK-based controller. For example, we could
start the EventManager in the ZK-based controller, but let the KRaft
controller ingest new events. This has its own challenges: (1) the existing
logic only logs ZK failures and doesn't expose them to the caller; (2) the
existing logic may add new events to the queue itself and we probably need
to think through how this is coordinated with the KRaft controller; (3) it
registers some ZK listeners unnecessarily (may not be a big concern). So we
need to get around those issues somehow. I am wondering if we have
considered both options and which approach we are leaning towards for the
implementation.

14. Good point and make sense.

Thanks,

Jun




On Wed, Oct 12, 2022 at 3:27 PM Colin McCabe  wrote:

> Hi Jun,
>
> Thanks for taking a look. I can answer some questions here because I
> collaborated on this a bit, and David is on vacation for a few days.
>
> On Wed, Oct 12, 2022, at 14:41, Jun Rao wrote:
> > Hi, David,
> >
> > Thanks for the KIP. A few comments below.
> >
> > 10. It's still not very clear to me how the KRaft controller works in the
> > dual writes mode to KRaft log and ZK when the brokers still run in ZK
> mode.
> > Does the KRaft controller run a ZK based controller in parallel or do we
> > derive what needs to be written to ZK based on KRaft controller logic?
>
> We derive what needs to be written to ZK based on KRaft controller logic.
>
> > I am also not sure how the KRaft controller handles broker
> > registration/deregistration, since brokers are still running in ZK mode
> and
> > are not heartbeating to the KRaft controller.
>
> The new controller will listen for broker registrations under /brokers.
> This is the only znode watch that the new controller will do.
>
> We did consider changing how ZK-based broker registration worked, but it
> just ended up being too much work for not enough gain.
>
> >
> > 12. "A new set of nodes will be provisioned to host the controller
> quorum."
> > I guess we don't support starting the KRaft controller quorum on existing
> > brokers. It would be useful to make that clear.
> >
>
> Agreed
>
> > 13. "Once the quorum is established and a leader is elected, the
> controller
> > will check the state of the cluster using the MigrationCheck RPC." How
> does
> > the quorum controller detect other brokers? Does the controller node need
> > to be configured with ZK connection string? If so, it would be useful to
> > document the additional configs that the quorum controller needs to set.
> >
>
> Yes, the controllers monitor ZK for broker registrations, as I mentioned
> above. So they need zk.connect and the other ZK connection configurations.
>
> > 14. "In order to prevent further writes to ZK, the first thing the new
> > KRaft quorum must do is take over leadership of the ZK controller. " The
> ZK
> > controller processing changes to /controller update asynchronously. How
> > does the KRaft controller know when the ZK controller has resigned before
> > it can safely copy the ZK data?
> >
>
> This should be done through expectedControllerEpochZkVersion, just like in
> ZK mode, right? We should bump this epoch value so that any writes from the
> old controller will not go through. I agree we should spell this out in the
> KIP.
>
> > 15. We have the following sentences. One says ControllerId is a random
> > KRaft broker and the other says it's the active controller. Which one is
> > correct?
> > "UpdateMetadata: for certain metadata changes, the KRaft controller will
> > need to send UpdateMetadataRequests to the ZK brokers. For the
> > “ControllerId” field in this request, the controller should specify a
> > random KRaft broker."
> > "In the UpdateMetadataRequest sent by the KRaft controller to the ZK
> > brokers, the ControllerId will point to the active controller which will
> be
> > used for the inter-broker requests."
> >
>
> Yeah, this seems like an error to me as well. A random value is not really
> useful. Plus the text here is self-contradictory, as you pointed out.
>
> I suspect what we should do here is add a new field, KRaftControllerId,
> and populate it with the real controller ID, and leave the old controllerId
> field as -1. A ZK-based broker that sees this can then consult its
> controller.quorum.voters configuration to see where it should send
> 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-12 Thread Colin McCabe
On Wed, Oct 5, 2022, at 10:06, Mickael Maison wrote:
> Hi David,
>
> Thanks for starting this important KIP.
>
> I've just taken a quick look so far but I've got a couple of initial 
> questions:
>
> 1) What happens if a non KRaft compatible broker (or with
> kafka.metadata.migration.enable set to false) joins the cluster after
> the migration is triggered?
>

Hi Mickael,

Yes, I also asked this :)

I think the answer is that it's an administrative error, and the broker will 
not be able to function. Maybe we should spell out that the broker should shut 
itself down in this scenario. Obviously this will not be possible for already 
released versions, but a broker running the new 3.4 code but with something 
that disqualifies it from upgrade-from-ZK (like the wrong IBP) could shut 
itself down if it knew it was in the wrong spot.

>
> 2) In the Failure Modes section you mention a scenario where a write
> to ZK fails. What happens when the divergence limit is reached? Is
> this a fatal condition? How much divergence should we allow?
>

Yes, this is one of the complexities of supporting async mirroring to ZK. I 
don't think it's quite as bad as it sounds because we still have the ordered 
__cluster_metadata log.

I would suggest that if our backlog grows too high, we just block the kraft 
controller thread until it reduces. We really cannot afford to have an 
unbounded backlog. I would expected the performance to be comparable to the 
current controller's performance since we are doing the same stuff (less stuff 
actually, since we aren't doing reads...)

best,
Colin


> Thanks,
> Mickael
>
> On Wed, Oct 5, 2022 at 12:20 AM David Arthur  wrote:
>>
>> Hey folks, I wanted to get the ball rolling on the discussion for the
>> ZooKeeper migration KIP. This KIP details how we plan to do an online
>> migration of metadata from ZooKeeper to KRaft as well as a rolling
>> upgrade of brokers to KRaft mode.
>>
>> The general idea is to keep KRaft and ZooKeeper in sync during the
>> migration, so both types of brokers can exist simultaneously. Then,
>> once everything is migrated and updated, we can turn off ZooKeeper
>> writes.
>>
>> This is a pretty complex KIP, so please take a look :)
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
>>
>> Thanks!
>> David


Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-12 Thread Colin McCabe
Thanks for putting this together, David. A few questions (maybe some of them 
overlap with Jun's?)

1. Rather than having MigrationCheckRequest, can we just add a new JSON field 
to the broker registration like "kraftReady": true? After all, we are already 
going to have to read the broker registration from ZK as we discussed earlier 
(and as the KIP mentions).

2. Do we really need kafka.metadata.migration.enable? It's not clear to me what 
harm would result if we just left it out of the design. Presumably standing up 
a new KRaft controller quorum and pointing zk.connect at the old ZK cluster is 
a very intentional action without much chance that we did it accidentally.

3. We should probably be explicit that the new metadata.version the KIP 
mentions corresponds to an inter.broker.protocol of 3.4 or later (since we're 
targetting 3.4) and that using IBP 3.4 means ALWAYS forwarding. I realize 
that's implicit if you read some of the earlier KIPs, but it would be good to 
make it clear. Also presumably the controller will copy its statically 
configured value of inter.broker.protocol into the metadata.version used in the 
new log, so it all matches up. Again, probably clear to us but good to spell 
out.

4. Migration state names: should "ZooKeeper" be renamed to 
"MigrationIneligible"? Since it does mean you can't upgrade (in effect)

5. If we get a new ineligible broker registering in ZK after the migration 
begins, I assume we will just have to ignore it and keep going. This would be 
an operator error, of course, and this broker will not be able to participate 
in the cluster. It would be good to spell out that behavior a bit.

best,
Colin


On Tue, Oct 4, 2022, at 15:19, David Arthur wrote:
> Hey folks, I wanted to get the ball rolling on the discussion for the
> ZooKeeper migration KIP. This KIP details how we plan to do an online
> migration of metadata from ZooKeeper to KRaft as well as a rolling
> upgrade of brokers to KRaft mode.
>
> The general idea is to keep KRaft and ZooKeeper in sync during the
> migration, so both types of brokers can exist simultaneously. Then,
> once everything is migrated and updated, we can turn off ZooKeeper
> writes.
>
> This is a pretty complex KIP, so please take a look :)
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
>
> Thanks!
> David


Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-12 Thread Colin McCabe
Hi Jun,

Thanks for taking a look. I can answer some questions here because I 
collaborated on this a bit, and David is on vacation for a few days.

On Wed, Oct 12, 2022, at 14:41, Jun Rao wrote:
> Hi, David,
>
> Thanks for the KIP. A few comments below.
>
> 10. It's still not very clear to me how the KRaft controller works in the
> dual writes mode to KRaft log and ZK when the brokers still run in ZK mode.
> Does the KRaft controller run a ZK based controller in parallel or do we
> derive what needs to be written to ZK based on KRaft controller logic?

We derive what needs to be written to ZK based on KRaft controller logic.

> I am also not sure how the KRaft controller handles broker
> registration/deregistration, since brokers are still running in ZK mode and
> are not heartbeating to the KRaft controller.

The new controller will listen for broker registrations under /brokers. This is 
the only znode watch that the new controller will do.

We did consider changing how ZK-based broker registration worked, but it just 
ended up being too much work for not enough gain.

>
> 12. "A new set of nodes will be provisioned to host the controller quorum."
> I guess we don't support starting the KRaft controller quorum on existing
> brokers. It would be useful to make that clear.
>

Agreed

> 13. "Once the quorum is established and a leader is elected, the controller
> will check the state of the cluster using the MigrationCheck RPC." How does
> the quorum controller detect other brokers? Does the controller node need
> to be configured with ZK connection string? If so, it would be useful to
> document the additional configs that the quorum controller needs to set.
>

Yes, the controllers monitor ZK for broker registrations, as I mentioned above. 
So they need zk.connect and the other ZK connection configurations.

> 14. "In order to prevent further writes to ZK, the first thing the new
> KRaft quorum must do is take over leadership of the ZK controller. " The ZK
> controller processing changes to /controller update asynchronously. How
> does the KRaft controller know when the ZK controller has resigned before
> it can safely copy the ZK data?
>

This should be done through expectedControllerEpochZkVersion, just like in ZK 
mode, right? We should bump this epoch value so that any writes from the old 
controller will not go through. I agree we should spell this out in the KIP.

> 15. We have the following sentences. One says ControllerId is a random
> KRaft broker and the other says it's the active controller. Which one is
> correct?
> "UpdateMetadata: for certain metadata changes, the KRaft controller will
> need to send UpdateMetadataRequests to the ZK brokers. For the
> “ControllerId” field in this request, the controller should specify a
> random KRaft broker."
> "In the UpdateMetadataRequest sent by the KRaft controller to the ZK
> brokers, the ControllerId will point to the active controller which will be
> used for the inter-broker requests."
>

Yeah, this seems like an error to me as well. A random value is not really 
useful. Plus the text here is self-contradictory, as you pointed out.

I suspect what we should do here is add a new field, KRaftControllerId, and 
populate it with the real controller ID, and leave the old controllerId field 
as -1. A ZK-based broker that sees this can then consult its 
controller.quorum.voters configuration to see where it should send 
controller-bound RPCs. That (static) configuration lets us map between 
controller ID and host:port.

We should still keep our existing epoch logic for deciding when 
UpdateMetadataRequest / LeaderAndIsrRequests are stale, with the caveat that 
any kraft-based epoch should be treated as greater than any ZK-based epoch. 
After all, the kraft epoch is coming from the epoch of __cluster_metadata, 
whereas the ZK epoch comes from ZK.

>
> 16. "Additionally, the controller must specify if a broker in “LiveBrokers”
> is KRaft or ZK." Does that require any protocol changes to UpdateMetadata?
>

Yeah, I am also curious why the we need to care whether brokers are ZK or KRaft 
in UpdateMetadataRequest. We don't reveal this to clients, so can we just leave 
this out?

best,
Colin

> Thanks,
>
> Jun
>
> On Wed, Oct 5, 2022 at 10:07 AM Mickael Maison 
> wrote:
>
>> Hi David,
>>
>> Thanks for starting this important KIP.
>>
>> I've just taken a quick look so far but I've got a couple of initial
>> questions:
>>
>> 1) What happens if a non KRaft compatible broker (or with
>> kafka.metadata.migration.enable set to false) joins the cluster after
>> the migration is triggered?
>>
>> 2) In the Failure Modes section you mention a scenario where a write
>> to ZK fails. What happens when the divergence limit is reached? Is
>> this a fatal condition? How much divergence should we allow?
>>
>> Thanks,
>> Mickael
>>
>> On Wed, Oct 5, 2022 at 12:20 AM David Arthur  wrote:
>> >
>> > Hey folks, I wanted to get the ball rolling on the discussion for the
>> > ZooKeeper 

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-12 Thread Jun Rao
Hi, David,

Thanks for the KIP. A few comments below.

10. It's still not very clear to me how the KRaft controller works in the
dual writes mode to KRaft log and ZK when the brokers still run in ZK mode.
Does the KRaft controller run a ZK based controller in parallel or do we
derive what needs to be written to ZK based on KRaft controller logic? I am
also not sure how the KRaft controller handles broker
registration/deregistration, since brokers are still running in ZK mode and
are not heartbeating to the KRaft controller.

11. When preparing the Cluster, each broker needs
kafka.metadata.migration.enable set to “true”, right? Could we document
that?

12. "A new set of nodes will be provisioned to host the controller quorum."
I guess we don't support starting the KRaft controller quorum on existing
brokers. It would be useful to make that clear.

13. "Once the quorum is established and a leader is elected, the controller
will check the state of the cluster using the MigrationCheck RPC." How does
the quorum controller detect other brokers? Does the controller node need
to be configured with ZK connection string? If so, it would be useful to
document the additional configs that the quorum controller needs to set.

14. "In order to prevent further writes to ZK, the first thing the new
KRaft quorum must do is take over leadership of the ZK controller. " The ZK
controller processing changes to /controller update asynchronously. How
does the KRaft controller know when the ZK controller has resigned before
it can safely copy the ZK data?

15. We have the following sentences. One says ControllerId is a random
KRaft broker and the other says it's the active controller. Which one is
correct?
"UpdateMetadata: for certain metadata changes, the KRaft controller will
need to send UpdateMetadataRequests to the ZK brokers. For the
“ControllerId” field in this request, the controller should specify a
random KRaft broker."
"In the UpdateMetadataRequest sent by the KRaft controller to the ZK
brokers, the ControllerId will point to the active controller which will be
used for the inter-broker requests."

16. "Additionally, the controller must specify if a broker in “LiveBrokers”
is KRaft or ZK." Does that require any protocol changes to UpdateMetadata?

Thanks,

Jun

On Wed, Oct 5, 2022 at 10:07 AM Mickael Maison 
wrote:

> Hi David,
>
> Thanks for starting this important KIP.
>
> I've just taken a quick look so far but I've got a couple of initial
> questions:
>
> 1) What happens if a non KRaft compatible broker (or with
> kafka.metadata.migration.enable set to false) joins the cluster after
> the migration is triggered?
>
> 2) In the Failure Modes section you mention a scenario where a write
> to ZK fails. What happens when the divergence limit is reached? Is
> this a fatal condition? How much divergence should we allow?
>
> Thanks,
> Mickael
>
> On Wed, Oct 5, 2022 at 12:20 AM David Arthur  wrote:
> >
> > Hey folks, I wanted to get the ball rolling on the discussion for the
> > ZooKeeper migration KIP. This KIP details how we plan to do an online
> > migration of metadata from ZooKeeper to KRaft as well as a rolling
> > upgrade of brokers to KRaft mode.
> >
> > The general idea is to keep KRaft and ZooKeeper in sync during the
> > migration, so both types of brokers can exist simultaneously. Then,
> > once everything is migrated and updated, we can turn off ZooKeeper
> > writes.
> >
> > This is a pretty complex KIP, so please take a look :)
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
> >
> > Thanks!
> > David
>


Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-05 Thread Mickael Maison
Hi David,

Thanks for starting this important KIP.

I've just taken a quick look so far but I've got a couple of initial questions:

1) What happens if a non KRaft compatible broker (or with
kafka.metadata.migration.enable set to false) joins the cluster after
the migration is triggered?

2) In the Failure Modes section you mention a scenario where a write
to ZK fails. What happens when the divergence limit is reached? Is
this a fatal condition? How much divergence should we allow?

Thanks,
Mickael

On Wed, Oct 5, 2022 at 12:20 AM David Arthur  wrote:
>
> Hey folks, I wanted to get the ball rolling on the discussion for the
> ZooKeeper migration KIP. This KIP details how we plan to do an online
> migration of metadata from ZooKeeper to KRaft as well as a rolling
> upgrade of brokers to KRaft mode.
>
> The general idea is to keep KRaft and ZooKeeper in sync during the
> migration, so both types of brokers can exist simultaneously. Then,
> once everything is migrated and updated, we can turn off ZooKeeper
> writes.
>
> This is a pretty complex KIP, so please take a look :)
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
>
> Thanks!
> David


[DISCUSS] KIP-866 ZooKeeper to KRaft Migration

2022-10-04 Thread David Arthur
Hey folks, I wanted to get the ball rolling on the discussion for the
ZooKeeper migration KIP. This KIP details how we plan to do an online
migration of metadata from ZooKeeper to KRaft as well as a rolling
upgrade of brokers to KRaft mode.

The general idea is to keep KRaft and ZooKeeper in sync during the
migration, so both types of brokers can exist simultaneously. Then,
once everything is migrated and updated, we can turn off ZooKeeper
writes.

This is a pretty complex KIP, so please take a look :)

https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration

Thanks!
David