[jira] [Assigned] (KAFKA-16516) Fix the controller node provider for broker to control channel
[ https://issues.apache.org/jira/browse/KAFKA-16516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe reassigned KAFKA-16516: Assignee: Colin McCabe (was: José Armando García Sancio) > Fix the controller node provider for broker to control channel > -- > > Key: KAFKA-16516 > URL: https://issues.apache.org/jira/browse/KAFKA-16516 > Project: Kafka > Issue Type: Sub-task > Components: core >Reporter: José Armando García Sancio >Assignee: Colin McCabe >Priority: Major > Fix For: 3.8.0 > > > The broker to controller channel gets the set of voters directly from the > static configuration. This needs to change so that the leader nodes comes > from the kraft client/manager. > The code is in KafkaServer where it construct the RaftControllerNodeProvider. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16515) Fix the ZK Metadata cache use of voter static configuration
[ https://issues.apache.org/jira/browse/KAFKA-16515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe reassigned KAFKA-16515: Assignee: Colin McCabe (was: José Armando García Sancio) > Fix the ZK Metadata cache use of voter static configuration > --- > > Key: KAFKA-16515 > URL: https://issues.apache.org/jira/browse/KAFKA-16515 > Project: Kafka > Issue Type: Sub-task > Components: core >Reporter: José Armando García Sancio >Assignee: Colin McCabe >Priority: Major > Fix For: 3.8.0 > > > Looks like because of ZK migration to KRaft the ZK Metadata cache was changed > to read the voter static configuration. This needs to change to use the voter > nodes reported by the raft manager or the kraft client. > The injection code is in KafkaServer where it constructs > MetadataCache.zkMetadata. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16469) Metadata Schema Checker
[ https://issues.apache.org/jira/browse/KAFKA-16469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe reassigned KAFKA-16469: Assignee: Colin McCabe > Metadata Schema Checker > --- > > Key: KAFKA-16469 > URL: https://issues.apache.org/jira/browse/KAFKA-16469 > Project: Kafka > Issue Type: New Feature >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16757) Fix broker re-registration issues around MV 3.7-IV2
Colin McCabe created KAFKA-16757: Summary: Fix broker re-registration issues around MV 3.7-IV2 Key: KAFKA-16757 URL: https://issues.apache.org/jira/browse/KAFKA-16757 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe When upgrading from a MetadataVersion older than 3.7-IV2, we need to resend the broker registration, so that the controller can record the storage directories. The current code for doing this has several problems, however. One is that it tends to trigger even in cases where we don't actually need it. Another is that when re-registering the broker, the broker is marked as fenced. This PR moves the handling of the re-registration case out of BrokerMetadataPublisher and into BrokerRegistrationTracker. The re-registration code there will only trigger in the case where the broker sees an existing registration for itself with no directories set. This is much more targetted than the original code. Additionally, in ClusterControlManager, when re-registering the same broker, we now preserve its fencing and shutdown state, rather than clearing those. (There isn't any good reason re-registering the same broker should clear these things... this was purely an oversight.) Note that we can tell the broker is "the same" because it has the same IncarnationId. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16649) Remove lock from DynamicBrokerConfig.removeReconfigurable
[ https://issues.apache.org/jira/browse/KAFKA-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16649: - Summary: Remove lock from DynamicBrokerConfig.removeReconfigurable (was: Fix potential deadlock in DynamicBrokerConfig) > Remove lock from DynamicBrokerConfig.removeReconfigurable > - > > Key: KAFKA-16649 > URL: https://issues.apache.org/jira/browse/KAFKA-16649 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16649) Remove lock from DynamicBrokerConfig.removeReconfigurable
[ https://issues.apache.org/jira/browse/KAFKA-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16649: - Description: Do not acquire the DynamicBrokerConfig lock in DynamicBrokerConfig.removeReconfigurable. It's not necessary, because the list that these functions are modifying is a thread-safe CopyOnWriteArrayList. > Remove lock from DynamicBrokerConfig.removeReconfigurable > - > > Key: KAFKA-16649 > URL: https://issues.apache.org/jira/browse/KAFKA-16649 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Priority: Major > > Do not acquire the DynamicBrokerConfig lock in > DynamicBrokerConfig.removeReconfigurable. It's not necessary, because the > list that these functions are modifying is a thread-safe CopyOnWriteArrayList. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16649) Fix potential deadlock in DynamicBrokerConfig
Colin McCabe created KAFKA-16649: Summary: Fix potential deadlock in DynamicBrokerConfig Key: KAFKA-16649 URL: https://issues.apache.org/jira/browse/KAFKA-16649 Project: Kafka Issue Type: Bug Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16624) Don't generate useless PartitionChangeRecord on older MV
Colin McCabe created KAFKA-16624: Summary: Don't generate useless PartitionChangeRecord on older MV Key: KAFKA-16624 URL: https://issues.apache.org/jira/browse/KAFKA-16624 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe Fix a case where we could generate useless PartitionChangeRecords on metadata versions older than 3.6-IV0. This could happen in the case where we had an ISR with only one broker in it, and we were trying to go down to a fully empty ISR. In this case, PartitionChangeBuilder would block the record to going down to a fully empty ISR (since that is not valid in these pre-KIP-966 metadata versions), but it would still emit the record, even though it had no effect. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16003) The znode /config/topics is not updated during KRaft migration in "dual-write" mode
[ https://issues.apache.org/jira/browse/KAFKA-16003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16003: - Fix Version/s: 3.7.1 > The znode /config/topics is not updated during KRaft migration in > "dual-write" mode > --- > > Key: KAFKA-16003 > URL: https://issues.apache.org/jira/browse/KAFKA-16003 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 3.6.1 >Reporter: Paolo Patierno >Assignee: Mickael Maison >Priority: Major > Fix For: 3.8.0, 3.7.1 > > > I tried the following scenario ... > I have a ZooKeeper-based cluster and create a my-topic-1 topic (without > specifying any specific configuration for it). The correct znodes are created > under /config/topics and /brokers/topics. > I start a migration to KRaft but not moving forward from "dual write" mode. > While in this mode, I create a new my-topic-2 topic (still without any > specific config). I see that a new znode is created under /brokers/topics but > NOT under /config/topics. It seems that the KRaft controller is not updating > this information in ZooKeeper during the dual-write. The controller log shows > that the write to ZooKeeper was done, but not everything I would say: > {code:java} > 2023-12-13 10:23:26,229 TRACE [KRaftMigrationDriver id=3] Create Topic > my-topic-2, ID Macbp8BvQUKpzmq2vG_8dA. Transitioned migration state from > ZkMigrationLeadershipState{kraftControllerId=3, kraftControllerEpoch=7, > kraftMetadataOffset=445, kraftMetadataEpoch=7, > lastUpdatedTimeMs=1702462785587, migrationZkVersion=236, controllerZkEpoch=3, > controllerZkVersion=3} to ZkMigrationLeadershipState{kraftControllerId=3, > kraftControllerEpoch=7, kraftMetadataOffset=445, kraftMetadataEpoch=7, > lastUpdatedTimeMs=1702462785587, migrationZkVersion=237, controllerZkEpoch=3, > controllerZkVersion=3} > (org.apache.kafka.metadata.migration.KRaftMigrationDriver) > [controller-3-migration-driver-event-handler] > 2023-12-13 10:23:26,229 DEBUG [KRaftMigrationDriver id=3] Made the following > ZK writes when handling KRaft delta: {CreateTopic=1} > (org.apache.kafka.metadata.migration.KRaftMigrationDriver) > [controller-3-migration-driver-event-handler] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16509) CurrentControllerId metric is unreliable in ZK mode
[ https://issues.apache.org/jira/browse/KAFKA-16509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe reassigned KAFKA-16509: Assignee: Colin McCabe > CurrentControllerId metric is unreliable in ZK mode > --- > > Key: KAFKA-16509 > URL: https://issues.apache.org/jira/browse/KAFKA-16509 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > > The CurrentControllerId metric added by KIP-1001 is unreliable in ZK mode. > Sometimes when there is no active ZK-based controller, it still shows the > previous controller ID. Instead, it should show -1 in that situation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16509) CurrentControllerId metric is unreliable in ZK mode
Colin McCabe created KAFKA-16509: Summary: CurrentControllerId metric is unreliable in ZK mode Key: KAFKA-16509 URL: https://issues.apache.org/jira/browse/KAFKA-16509 Project: Kafka Issue Type: Bug Reporter: Colin McCabe The CurrentControllerId metric added by KIP-1001 is unreliable in ZK mode. Sometimes when there is no active ZK-based controller, it still shows the previous controller ID. Instead, it should show -1 in that situation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16475) Create unit test for TopicImageNode
Colin McCabe created KAFKA-16475: Summary: Create unit test for TopicImageNode Key: KAFKA-16475 URL: https://issues.apache.org/jira/browse/KAFKA-16475 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16469) Metadata Schema Checker
Colin McCabe created KAFKA-16469: Summary: Metadata Schema Checker Key: KAFKA-16469 URL: https://issues.apache.org/jira/browse/KAFKA-16469 Project: Kafka Issue Type: New Feature Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16411) Correctly migrate default client quota entities in KRaft migration
[ https://issues.apache.org/jira/browse/KAFKA-16411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-16411. -- Resolution: Fixed > Correctly migrate default client quota entities in KRaft migration > -- > > Key: KAFKA-16411 > URL: https://issues.apache.org/jira/browse/KAFKA-16411 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Blocker > Fix For: 3.6.2, 3.8.0, 3.7.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16428) Fix bug where config change notification znode may not get created during migration
[ https://issues.apache.org/jira/browse/KAFKA-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-16428. -- Resolution: Fixed > Fix bug where config change notification znode may not get created during > migration > --- > > Key: KAFKA-16428 > URL: https://issues.apache.org/jira/browse/KAFKA-16428 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0, 3.6.1 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > Fix For: 3.6.2, 3.8.0, 3.7.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16435) Add test for KAFKA-16428
Colin McCabe created KAFKA-16435: Summary: Add test for KAFKA-16428 Key: KAFKA-16435 URL: https://issues.apache.org/jira/browse/KAFKA-16435 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Add a test for KAFKA-16428: Fix bug where config change notification znode may not get created during migration #15608 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16428) Fix bug where config change notification znode may not get created during migration
Colin McCabe created KAFKA-16428: Summary: Fix bug where config change notification znode may not get created during migration Key: KAFKA-16428 URL: https://issues.apache.org/jira/browse/KAFKA-16428 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16411) Correctly migrate default client quota entities in KRaft migration
[ https://issues.apache.org/jira/browse/KAFKA-16411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16411: - Fix Version/s: 3.6.2 Affects Version/s: 3.4.0 Priority: Blocker (was: Major) > Correctly migrate default client quota entities in KRaft migration > -- > > Key: KAFKA-16411 > URL: https://issues.apache.org/jira/browse/KAFKA-16411 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Blocker > Fix For: 3.6.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16411) Correctly migrate default client quota entities in KRaft migration
[ https://issues.apache.org/jira/browse/KAFKA-16411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16411: - Summary: Correctly migrate default client quota entities in KRaft migration (was: Correctly migrate default entities in KRaft migration) > Correctly migrate default client quota entities in KRaft migration > -- > > Key: KAFKA-16411 > URL: https://issues.apache.org/jira/browse/KAFKA-16411 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16411) Correctly migrate default entities in KRaft migration
Colin McCabe created KAFKA-16411: Summary: Correctly migrate default entities in KRaft migration Key: KAFKA-16411 URL: https://issues.apache.org/jira/browse/KAFKA-16411 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16222) KRaft Migration: desanitize entity name when migrate client quotas
[ https://issues.apache.org/jira/browse/KAFKA-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16222: - Summary: KRaft Migration: desanitize entity name when migrate client quotas (was: KRaft Migration: Incorrect default user-principal quota after migration) > KRaft Migration: desanitize entity name when migrate client quotas > -- > > Key: KAFKA-16222 > URL: https://issues.apache.org/jira/browse/KAFKA-16222 > Project: Kafka > Issue Type: Bug > Components: kraft, migration >Affects Versions: 3.7.0, 3.6.1 >Reporter: Dominik >Assignee: PoAn Yang >Priority: Blocker > Fix For: 3.6.2, 3.8.0, 3.7.1 > > > We observed that our default user quota seems not to be migrated correctly. > Before Migration: > bin/kafka-configs.sh --describe --all --entity-type users > Quota configs for the *default user-principal* are > consumer_byte_rate=100.0, producer_byte_rate=100.0 > Quota configs for user-principal {color:#172b4d}'myuser{*}@{*}prod'{color} > are consumer_byte_rate=1.5E8, producer_byte_rate=1.5E8 > After Migration: > bin/kafka-configs.sh --describe --all --entity-type users > Quota configs for *user-principal ''* are consumer_byte_rate=100.0, > producer_byte_rate=100.0 > Quota configs for user-principal {color:#172b4d}'myuser{*}%40{*}prod'{color} > are consumer_byte_rate=1.5E8, producer_byte_rate=1.5E8 > > Additional finding: Our names contains a "@" which also lead to incorrect > after migration state. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16321) Default directory ids to MIGRATING, not UNASSIGNED
Colin McCabe created KAFKA-16321: Summary: Default directory ids to MIGRATING, not UNASSIGNED Key: KAFKA-16321 URL: https://issues.apache.org/jira/browse/KAFKA-16321 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe Directory ids should be defaulted to MIGRATING, not UNASSIGNED. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16216) Reduce batch size for initial metadata load during ZK migration
[ https://issues.apache.org/jira/browse/KAFKA-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-16216. -- Fix Version/s: 3.7.0 Reviewer: Colin McCabe Assignee: David Arthur (was: Colin McCabe) Resolution: Fixed > Reduce batch size for initial metadata load during ZK migration > --- > > Key: KAFKA-16216 > URL: https://issues.apache.org/jira/browse/KAFKA-16216 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: David Arthur >Priority: Major > Fix For: 3.7.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16216) Reduce batch size for initial metadata load during ZK migration
[ https://issues.apache.org/jira/browse/KAFKA-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe reassigned KAFKA-16216: Assignee: Colin McCabe (was: David Arthur) > Reduce batch size for initial metadata load during ZK migration > --- > > Key: KAFKA-16216 > URL: https://issues.apache.org/jira/browse/KAFKA-16216 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16216) Reduce batch size for initial metadata load during ZK migration
Colin McCabe created KAFKA-16216: Summary: Reduce batch size for initial metadata load during ZK migration Key: KAFKA-16216 URL: https://issues.apache.org/jira/browse/KAFKA-16216 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: David Arthur -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16180) Full metadata request sometimes fails during zk migration
[ https://issues.apache.org/jira/browse/KAFKA-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16180: - Description: Example: {code} java.util.NoSuchElementException: dev_kafka.stream.detection.bucket3-KTABLE-SUPPRESS-STATE-STORE-08-changelog at scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:508) at scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:507) at scala.collection.mutable.AnyRefMap.apply(AnyRefMap.scala:207) at kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2(ZkMetadataCache.scala:112) at kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2$adapted(ZkMetadataCache.scala:105) at scala.collection.immutable.HashSet.foreach(HashSet.scala:958) at kafka.server.metadata.ZkMetadataCache$.maybeInjectDeletedPartitionsFromFullMetadataRequest(ZkMetadataCache.scala:105) at kafka.server.metadata.ZkMetadataCache.$anonfun$updateMetadata$1(ZkMetadataCache.scala:506) at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:183) at kafka.server.metadata.ZkMetadataCache.updateMetadata(ZkMetadataCache.scala:496) at kafka.server.ReplicaManager.maybeUpdateMetadataCache(ReplicaManager.scala:2482) at kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:733) at kafka.server.KafkaApis.handle(KafkaApis.scala:349) at kafka.server.KafkaRequestHandler.$anonfun$poll$8(KafkaRequestHandler.scala:210) at kafka.server.KafkaRequestHandler.$anonfun$poll$8$adapted(KafkaRequestHandler.scala:210) at io.confluent.kafka.availability.ThreadCountersManager.wrapEngine(ThreadCountersManager.java:146) at kafka.server.KafkaRequestHandler.poll(KafkaRequestHandler.scala:210) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:151) at java.base/java.lang.Thread.run(Thread.java:1583) at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:66) {code} was: Example: {code} java.util.NoSuchElementException: lkc-gnjo9m_dev_kafka.stream.detection.bucket3-KTABLE-SUPPRESS-STATE-STORE-08-changelog at scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:508) at scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:507) at scala.collection.mutable.AnyRefMap.apply(AnyRefMap.scala:207) at kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2(ZkMetadataCache.scala:112) at kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2$adapted(ZkMetadataCache.scala:105) at scala.collection.immutable.HashSet.foreach(HashSet.scala:958) at kafka.server.metadata.ZkMetadataCache$.maybeInjectDeletedPartitionsFromFullMetadataRequest(ZkMetadataCache.scala:105) at kafka.server.metadata.ZkMetadataCache.$anonfun$updateMetadata$1(ZkMetadataCache.scala:506) at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:183) at kafka.server.metadata.ZkMetadataCache.updateMetadata(ZkMetadataCache.scala:496) at kafka.server.ReplicaManager.maybeUpdateMetadataCache(ReplicaManager.scala:2482) at kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:733) at kafka.server.KafkaApis.handle(KafkaApis.scala:349) at kafka.server.KafkaRequestHandler.$anonfun$poll$8(KafkaRequestHandler.scala:210) at kafka.server.KafkaRequestHandler.$anonfun$poll$8$adapted(KafkaRequestHandler.scala:210) at io.confluent.kafka.availability.ThreadCountersManager.wrapEngine(ThreadCountersManager.java:146) at kafka.server.KafkaRequestHandler.poll(KafkaRequestHandler.scala:210) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:151) at java.base/java.lang.Thread.run(Thread.java:1583) at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:66) {code} > Full metadata request sometimes fails during zk migration > - > > Key: KAFKA-16180 > URL: https://issues.apache.org/jira/browse/KAFKA-16180 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Colin McCabe >Priority: Blocker > > Example: > {code} > java.util.NoSuchElementException: > dev_kafka.stream.detection.bucket3-KTABLE-SUPPRESS-STATE-STORE-08-changelog > at > scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:508) > at > scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:507) > at
[jira] [Updated] (KAFKA-16180) Full metadata request sometimes fails during zk migration
[ https://issues.apache.org/jira/browse/KAFKA-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16180: - Description: Example: {code} java.util.NoSuchElementException: lkc-gnjo9m_dev_kafka.stream.detection.bucket3-KTABLE-SUPPRESS-STATE-STORE-08-changelog at scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:508) at scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:507) at scala.collection.mutable.AnyRefMap.apply(AnyRefMap.scala:207) at kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2(ZkMetadataCache.scala:112) at kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2$adapted(ZkMetadataCache.scala:105) at scala.collection.immutable.HashSet.foreach(HashSet.scala:958) at kafka.server.metadata.ZkMetadataCache$.maybeInjectDeletedPartitionsFromFullMetadataRequest(ZkMetadataCache.scala:105) at kafka.server.metadata.ZkMetadataCache.$anonfun$updateMetadata$1(ZkMetadataCache.scala:506) at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:183) at kafka.server.metadata.ZkMetadataCache.updateMetadata(ZkMetadataCache.scala:496) at kafka.server.ReplicaManager.maybeUpdateMetadataCache(ReplicaManager.scala:2482) at kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:733) at kafka.server.KafkaApis.handle(KafkaApis.scala:349) at kafka.server.KafkaRequestHandler.$anonfun$poll$8(KafkaRequestHandler.scala:210) at kafka.server.KafkaRequestHandler.$anonfun$poll$8$adapted(KafkaRequestHandler.scala:210) at io.confluent.kafka.availability.ThreadCountersManager.wrapEngine(ThreadCountersManager.java:146) at kafka.server.KafkaRequestHandler.poll(KafkaRequestHandler.scala:210) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:151) at java.base/java.lang.Thread.run(Thread.java:1583) at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:66) {code} was: Example: {{java.util.NoSuchElementException: lkc-gnjo9m_dev_kafka.stream.detection.bucket3-KTABLE-SUPPRESS-STATE-STORE-08-changelog at scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:508) at scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:507) at scala.collection.mutable.AnyRefMap.apply(AnyRefMap.scala:207) at kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2(ZkMetadataCache.scala:112) at kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2$adapted(ZkMetadataCache.scala:105) at scala.collection.immutable.HashSet.foreach(HashSet.scala:958) at kafka.server.metadata.ZkMetadataCache$.maybeInjectDeletedPartitionsFromFullMetadataRequest(ZkMetadataCache.scala:105) at kafka.server.metadata.ZkMetadataCache.$anonfun$updateMetadata$1(ZkMetadataCache.scala:506) at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:183) at kafka.server.metadata.ZkMetadataCache.updateMetadata(ZkMetadataCache.scala:496) at kafka.server.ReplicaManager.maybeUpdateMetadataCache(ReplicaManager.scala:2482) at kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:733) at kafka.server.KafkaApis.handle(KafkaApis.scala:349) at kafka.server.KafkaRequestHandler.$anonfun$poll$8(KafkaRequestHandler.scala:210) at kafka.server.KafkaRequestHandler.$anonfun$poll$8$adapted(KafkaRequestHandler.scala:210) at io.confluent.kafka.availability.ThreadCountersManager.wrapEngine(ThreadCountersManager.java:146) at kafka.server.KafkaRequestHandler.poll(KafkaRequestHandler.scala:210) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:151) at java.base/java.lang.Thread.run(Thread.java:1583) at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:66)}} > Full metadata request sometimes fails during zk migration > - > > Key: KAFKA-16180 > URL: https://issues.apache.org/jira/browse/KAFKA-16180 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Colin McCabe >Priority: Blocker > > Example: > {code} > java.util.NoSuchElementException: > lkc-gnjo9m_dev_kafka.stream.detection.bucket3-KTABLE-SUPPRESS-STATE-STORE-08-changelog > at > scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:508) > at > scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:507) > at
[jira] [Created] (KAFKA-16180) Full metadata request sometimes fails during zk migration
Colin McCabe created KAFKA-16180: Summary: Full metadata request sometimes fails during zk migration Key: KAFKA-16180 URL: https://issues.apache.org/jira/browse/KAFKA-16180 Project: Kafka Issue Type: Bug Affects Versions: 3.7.0 Reporter: Colin McCabe Example: {{java.util.NoSuchElementException: lkc-gnjo9m_dev_kafka.stream.detection.bucket3-KTABLE-SUPPRESS-STATE-STORE-08-changelog at scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:508) at scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:507) at scala.collection.mutable.AnyRefMap.apply(AnyRefMap.scala:207) at kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2(ZkMetadataCache.scala:112) at kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2$adapted(ZkMetadataCache.scala:105) at scala.collection.immutable.HashSet.foreach(HashSet.scala:958) at kafka.server.metadata.ZkMetadataCache$.maybeInjectDeletedPartitionsFromFullMetadataRequest(ZkMetadataCache.scala:105) at kafka.server.metadata.ZkMetadataCache.$anonfun$updateMetadata$1(ZkMetadataCache.scala:506) at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:183) at kafka.server.metadata.ZkMetadataCache.updateMetadata(ZkMetadataCache.scala:496) at kafka.server.ReplicaManager.maybeUpdateMetadataCache(ReplicaManager.scala:2482) at kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:733) at kafka.server.KafkaApis.handle(KafkaApis.scala:349) at kafka.server.KafkaRequestHandler.$anonfun$poll$8(KafkaRequestHandler.scala:210) at kafka.server.KafkaRequestHandler.$anonfun$poll$8$adapted(KafkaRequestHandler.scala:210) at io.confluent.kafka.availability.ThreadCountersManager.wrapEngine(ThreadCountersManager.java:146) at kafka.server.KafkaRequestHandler.poll(KafkaRequestHandler.scala:210) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:151) at java.base/java.lang.Thread.run(Thread.java:1583) at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:66)}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16078) InterBrokerProtocolVersion defaults to non-production MetadataVersion
[ https://issues.apache.org/jira/browse/KAFKA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16078: - Description: The InterBrokerProtocolVersion currently defaults to a non-production MetadataVersion. We should be more consistent about getting the latest MetadataVersion. (was: Be more consistent about getting the latest MetadataVersion) > InterBrokerProtocolVersion defaults to non-production MetadataVersion > - > > Key: KAFKA-16078 > URL: https://issues.apache.org/jira/browse/KAFKA-16078 > Project: Kafka > Issue Type: Bug >Reporter: David Arthur >Assignee: David Arthur >Priority: Major > > The InterBrokerProtocolVersion currently defaults to a non-production > MetadataVersion. We should be more consistent about getting the latest > MetadataVersion. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16078) Be more consistent about getting the latest MetadataVersion
[ https://issues.apache.org/jira/browse/KAFKA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-16078. -- Fix Version/s: 3.7.0 Reviewer: Colin Patrick McCabe Resolution: Fixed > Be more consistent about getting the latest MetadataVersion > --- > > Key: KAFKA-16078 > URL: https://issues.apache.org/jira/browse/KAFKA-16078 > Project: Kafka > Issue Type: Bug >Reporter: David Arthur >Assignee: David Arthur >Priority: Major > Fix For: 3.7.0 > > > The InterBrokerProtocolVersion currently defaults to a non-production > MetadataVersion. We should be more consistent about getting the latest > MetadataVersion. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16078) Be more consistent about getting the latest MetadataVersion
[ https://issues.apache.org/jira/browse/KAFKA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16078: - Summary: Be more consistent about getting the latest MetadataVersion (was: InterBrokerProtocolVersion defaults to non-production MetadataVersion) > Be more consistent about getting the latest MetadataVersion > --- > > Key: KAFKA-16078 > URL: https://issues.apache.org/jira/browse/KAFKA-16078 > Project: Kafka > Issue Type: Bug >Reporter: David Arthur >Assignee: David Arthur >Priority: Major > > The InterBrokerProtocolVersion currently defaults to a non-production > MetadataVersion. We should be more consistent about getting the latest > MetadataVersion. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16078) InterBrokerProtocolVersion defaults to non-production MetadataVersion
[ https://issues.apache.org/jira/browse/KAFKA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16078: - Description: Be more consistent about getting the latest MetadataVersion > InterBrokerProtocolVersion defaults to non-production MetadataVersion > - > > Key: KAFKA-16078 > URL: https://issues.apache.org/jira/browse/KAFKA-16078 > Project: Kafka > Issue Type: Bug >Reporter: David Arthur >Assignee: David Arthur >Priority: Major > > Be more consistent about getting the latest MetadataVersion -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16131) Repeated UnsupportedVersionException logged when running Kafka 3.7.0-RC2 KRaft cluster with metadata version 3.6
[ https://issues.apache.org/jira/browse/KAFKA-16131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-16131. -- Resolution: Fixed > Repeated UnsupportedVersionException logged when running Kafka 3.7.0-RC2 > KRaft cluster with metadata version 3.6 > > > Key: KAFKA-16131 > URL: https://issues.apache.org/jira/browse/KAFKA-16131 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Jakub Scholz >Assignee: Proven Provenzano >Priority: Blocker > Fix For: 3.7.0 > > > When running Kafka 3.7.0-RC2 as a KRaft cluster with metadata version set to > 3.6-IV2 metadata version, it throws repeated errors like this in the > controller logs: > {quote}2024-01-13 16:58:01,197 INFO [QuorumController id=0] > assignReplicasToDirs: event failed with UnsupportedVersionException in 15 > microseconds. (org.apache.kafka.controller.QuorumController) > [quorum-controller-0-event-handler] > 2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0] Unexpected error > handling request RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS, apiVersion=0, > clientId=1000, correlationId=14, headerVersion=2) – > AssignReplicasToDirsRequestData(brokerId=1000, brokerEpoch=5, > directories=[DirectoryData(id=w_uxN7pwQ6eXSMrOKceYIQ, > topics=[TopicData(topicId=bvAKLSwmR7iJoKv2yZgygQ, > partitions=[PartitionData(partitionIndex=2), > PartitionData(partitionIndex=1)]), TopicData(topicId=uNe7f5VrQgO0zST6yH1jDQ, > partitions=[PartitionData(partitionIndex=0)])])]) with context > RequestContext(header=RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS, > apiVersion=0, clientId=1000, correlationId=14, headerVersion=2), > connectionId='172.16.14.219:9090-172.16.14.217:53590-7', > clientAddress=/[172.16.14.217|http://172.16.14.217/], > principal=User:CN=my-cluster-kafka,O=io.strimzi, > listenerName=ListenerName(CONTROLPLANE-9090), securityProtocol=SSL, > clientInformation=ClientInformation(softwareName=apache-kafka-java, > softwareVersion=3.7.0), fromPrivilegedListener=false, > principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@71004ad2]) > (kafka.server.ControllerApis) [quorum-controller-0-event-handler] > java.util.concurrent.CompletionException: > org.apache.kafka.common.errors.UnsupportedVersionException: Directory > assignment is not supported yet. > at > java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) > at > java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) > at > org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:880) > at > org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871) > at > org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:148) > at > org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:137) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181) > at java.base/java.lang.Thread.run(Thread.java:840) > Caused by: org.apache.kafka.common.errors.UnsupportedVersionException: > Directory assignment is not supported yet. > {quote} > > With the metadata version set to 3.6-IV2, it makes sense that the request is > not supported. But the request should in such case not be sent at all. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16101) KRaft migration rollback documentation is incorrect
[ https://issues.apache.org/jira/browse/KAFKA-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16101: - Summary: KRaft migration rollback documentation is incorrect (was: KRaft migration documentation is incorrect) > KRaft migration rollback documentation is incorrect > --- > > Key: KAFKA-16101 > URL: https://issues.apache.org/jira/browse/KAFKA-16101 > Project: Kafka > Issue Type: Bug > Components: kraft >Affects Versions: 3.6.1 >Reporter: Paolo Patierno >Assignee: Colin McCabe >Priority: Blocker > Fix For: 3.7.0 > > > Hello, > I was trying the KRaft migration rollback procedure locally and I came across > a potential bug or anyway a situation where the cluster is not > usable/available for a certain amount of time. > In order to test the procedure, I start with a one broker (broker ID = 0) and > one zookeeper node cluster. Then I start the migration with a one KRaft > controller node (broker ID = 1). The migration runs fine and it reaches the > point of "dual write" state. > From this point, I try to run the rollback procedure as described in the > documentation. > As first step, this involves ... > * stopping the broker > * removing the __cluster_metadata folder > * removing ZooKeeper migration flag and controller(s) related configuration > from the broker > * restarting the broker > With the above steps done, the broker starts in ZooKeeper mode (no migration, > no KRaft controllers knowledge) and it keeps logging the following messages > in DEBUG: > {code:java} > [2024-01-08 11:51:20,608] DEBUG > [zk-broker-0-to-controller-forwarding-channel-manager]: Controller isn't > cached, looking for local metadata changes > (kafka.server.BrokerToControllerRequestThread) > [2024-01-08 11:51:20,608] DEBUG > [zk-broker-0-to-controller-forwarding-channel-manager]: No controller > provided, retrying after backoff > (kafka.server.BrokerToControllerRequestThread) > [2024-01-08 11:51:20,629] DEBUG > [zk-broker-0-to-controller-alter-partition-channel-manager]: Controller isn't > cached, looking for local metadata changes > (kafka.server.BrokerToControllerRequestThread) > [2024-01-08 11:51:20,629] DEBUG > [zk-broker-0-to-controller-alter-partition-channel-manager]: No controller > provided, retrying after backoff > (kafka.server.BrokerToControllerRequestThread) {code} > What's happening should be clear. > The /controller znode in ZooKeeper still reports the KRaft controller (broker > ID = 1) as controller. The broker gets it from the znode but doesn't know how > to reach it. > The issue is that until the procedure isn't fully completed with the next > steps (shutting down KRaft controller, deleting /controller znode), the > cluster is unusable. Any admin or client operation against the broker doesn't > work, just hangs, the broker doesn't reply. > Imagining this scenario to a more complex one with 10-20-50 brokers and > partitions' replicas spread across them, when the brokers are rolled one by > one (in ZK mode) reporting the above error, the topics will become not > available one after the other, until all brokers are in such a state and > nothing can work. This is because from a KRaft controller perspective (still > running), the brokers are not available anymore and the partitions' replicas > are out of sync. > Of course, as soon as you complete the rollback procedure, after deleting the > /controller znode, the brokers are able to elect a new controller among them > and everything recovers to work. > My first question ... isn't the cluster supposed to work during rollback and > being always available during the rollback when the procedure is not > completed yet? Or having the cluster not available is an assumption during > the rollback, until it's fully completed? > This "unavailability" time window could be reduced by deleting the > /controller znode before shutting down the KRaft controllers to allow the > brokers electing a new controller among them, but in this case, could there > be a race condition where KRaft controllers still running could steal > leadership again? > Or is there anything missing in the documentation maybe which is driving to > this problem? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16132) Upgrading from 3.6 to 3.7 in KRaft will have seconds of partitions unavailable
[ https://issues.apache.org/jira/browse/KAFKA-16132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807044#comment-17807044 ] Colin McCabe commented on KAFKA-16132: -- I think we need to look at this more, but it may be a blocker. > Upgrading from 3.6 to 3.7 in KRaft will have seconds of partitions unavailable > -- > > Key: KAFKA-16132 > URL: https://issues.apache.org/jira/browse/KAFKA-16132 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Luke Chen >Priority: Major > > When upgrading from 3.6 to 3.7, we noticed that after upgrade the metadata > version, all the partitions will be reset at one time, which causes a short > period of time unavailable. This doesn't happen before. > {code:java} > [2024-01-15 20:45:19,757] INFO [BrokerMetadataPublisher id=2] Updating > metadata.version to 19 at offset OffsetAndEpoch(offset=229, epoch=2). > (kafka.server.metadata.BrokerMetadataPublisher) > [2024-01-15 20:45:29,915] INFO [ReplicaFetcherManager on broker 2] Removed > fetcher for partitions Set(t1-29, t1-25, t1-21, t1-17, t1-46, t1-13, t1-42, > t1-9, t1-38, t1-5, t1-34, t1-1, t1-30, t1-26, t1-22, t1-18, t1-47, t1-14, > t1-43, t1-10, t1-39, t1-6, t1-35, t1-2, t1-31, t1-27, t1-23, t1-19, t1-48, > t1-15, t1-44, t1-11, t1-40, t1-7, t1-36, t1-3, t1-32, t1-28, t1-24, t1-20, > t1-49, t1-16, t1-45, t1-12, t1-41, t1-8, t1-37, t1-4, t1-33, t1-0) > (kafka.server.ReplicaFetcherManager) > {code} > Complete log: > https://gist.github.com/showuon/665aa3ce6afd59097a2662f8260ecc10 > Steps: > 1. start up a 3.6 kafka cluster in KRaft with 1 broker > 2. create a topic > 3. upgrade the binary to 3.7 > 4. use kafka-features.sh to upgrade to 3.7 metadata version > 5. check the log (and metrics if interested) > Analysis: > In 3.7, we have JBOD support in KRaft, so the partitionRegistration added a > new directory field. And it causes diff found while comparing delta. We might > be able to identify this adding directory change doesn't need to reset the > leader/follower state, and just update the metadata, to avoid causing > unavailability. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16121) Partition reassignments in ZK migration dual write mode stalled until leader epoch incremented
[ https://issues.apache.org/jira/browse/KAFKA-16121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-16121. -- Fix Version/s: 3.7.0 Reviewer: Colin McCabe Assignee: David Mao Resolution: Duplicate > Partition reassignments in ZK migration dual write mode stalled until leader > epoch incremented > -- > > Key: KAFKA-16121 > URL: https://issues.apache.org/jira/browse/KAFKA-16121 > Project: Kafka > Issue Type: Bug >Reporter: David Mao >Assignee: David Mao >Priority: Major > Fix For: 3.7.0 > > > I noticed this in an integration test in > https://github.com/apache/kafka/pull/15184 > In ZK mode, partition leaders rely on the LeaderAndIsr request to be notified > of new replicas as part of a reassignment. In ZK mode, we ignore any > LeaderAndIsr request where the partition leader epoch is less than or equal > to the current partition leader epoch. > In KRaft mode, we do not bump the leader epoch when starting a new > reassignment, see: `triggerLeaderEpochBumpIfNeeded`. This means that the > leader will ignore the LISR request initiating the reassignment until a > leader epoch bump is triggered through another means, for instance preferred > leader election. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16120) Fix partition reassignment during ZK migration
[ https://issues.apache.org/jira/browse/KAFKA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16120: - Description: When a reassignment is completed in ZK migration hybrid mode, the `StopReplica` sent by the kraft quorum migration propagator is sent with `delete = false` for deleted replicas when processing the topic delta. This results in stray replicas. (was: When a reassignment is completed in ZK migration dual-write mode, the `StopReplica` sent by the kraft quorum migration propagator is sent with `delete = false` for deleted replicas when processing the topic delta. This results in stray replicas.) > Fix partition reassignment during ZK migration > -- > > Key: KAFKA-16120 > URL: https://issues.apache.org/jira/browse/KAFKA-16120 > Project: Kafka > Issue Type: Bug >Reporter: David Mao >Assignee: David Mao >Priority: Major > Fix For: 3.7.0 > > > When a reassignment is completed in ZK migration hybrid mode, the > `StopReplica` sent by the kraft quorum migration propagator is sent with > `delete = false` for deleted replicas when processing the topic delta. This > results in stray replicas. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16120) Fix partition reassignment during ZK migration
[ https://issues.apache.org/jira/browse/KAFKA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16120: - Summary: Fix partition reassignment during ZK migration (was: Partition reassignments in ZK migration dual write leaves stray partitions) > Fix partition reassignment during ZK migration > -- > > Key: KAFKA-16120 > URL: https://issues.apache.org/jira/browse/KAFKA-16120 > Project: Kafka > Issue Type: Bug >Reporter: David Mao >Assignee: David Mao >Priority: Major > Fix For: 3.7.0 > > > When a reassignment is completed in ZK migration dual-write mode, the > `StopReplica` sent by the kraft quorum migration propagator is sent with > `delete = false` for deleted replicas when processing the topic delta. This > results in stray replicas. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16101) KRaft migration documentation is incorrect
[ https://issues.apache.org/jira/browse/KAFKA-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe reassigned KAFKA-16101: Fix Version/s: 3.7.0 Assignee: Colin McCabe Priority: Blocker (was: Major) > KRaft migration documentation is incorrect > -- > > Key: KAFKA-16101 > URL: https://issues.apache.org/jira/browse/KAFKA-16101 > Project: Kafka > Issue Type: Bug > Components: kraft >Affects Versions: 3.6.1 >Reporter: Paolo Patierno >Assignee: Colin McCabe >Priority: Blocker > Fix For: 3.7.0 > > > Hello, > I was trying the KRaft migration rollback procedure locally and I came across > a potential bug or anyway a situation where the cluster is not > usable/available for a certain amount of time. > In order to test the procedure, I start with a one broker (broker ID = 0) and > one zookeeper node cluster. Then I start the migration with a one KRaft > controller node (broker ID = 1). The migration runs fine and it reaches the > point of "dual write" state. > From this point, I try to run the rollback procedure as described in the > documentation. > As first step, this involves ... > * stopping the broker > * removing the __cluster_metadata folder > * removing ZooKeeper migration flag and controller(s) related configuration > from the broker > * restarting the broker > With the above steps done, the broker starts in ZooKeeper mode (no migration, > no KRaft controllers knowledge) and it keeps logging the following messages > in DEBUG: > {code:java} > [2024-01-08 11:51:20,608] DEBUG > [zk-broker-0-to-controller-forwarding-channel-manager]: Controller isn't > cached, looking for local metadata changes > (kafka.server.BrokerToControllerRequestThread) > [2024-01-08 11:51:20,608] DEBUG > [zk-broker-0-to-controller-forwarding-channel-manager]: No controller > provided, retrying after backoff > (kafka.server.BrokerToControllerRequestThread) > [2024-01-08 11:51:20,629] DEBUG > [zk-broker-0-to-controller-alter-partition-channel-manager]: Controller isn't > cached, looking for local metadata changes > (kafka.server.BrokerToControllerRequestThread) > [2024-01-08 11:51:20,629] DEBUG > [zk-broker-0-to-controller-alter-partition-channel-manager]: No controller > provided, retrying after backoff > (kafka.server.BrokerToControllerRequestThread) {code} > What's happening should be clear. > The /controller znode in ZooKeeper still reports the KRaft controller (broker > ID = 1) as controller. The broker gets it from the znode but doesn't know how > to reach it. > The issue is that until the procedure isn't fully completed with the next > steps (shutting down KRaft controller, deleting /controller znode), the > cluster is unusable. Any admin or client operation against the broker doesn't > work, just hangs, the broker doesn't reply. > Imagining this scenario to a more complex one with 10-20-50 brokers and > partitions' replicas spread across them, when the brokers are rolled one by > one (in ZK mode) reporting the above error, the topics will become not > available one after the other, until all brokers are in such a state and > nothing can work. This is because from a KRaft controller perspective (still > running), the brokers are not available anymore and the partitions' replicas > are out of sync. > Of course, as soon as you complete the rollback procedure, after deleting the > /controller znode, the brokers are able to elect a new controller among them > and everything recovers to work. > My first question ... isn't the cluster supposed to work during rollback and > being always available during the rollback when the procedure is not > completed yet? Or having the cluster not available is an assumption during > the rollback, until it's fully completed? > This "unavailability" time window could be reduced by deleting the > /controller znode before shutting down the KRaft controllers to allow the > brokers electing a new controller among them, but in this case, could there > be a race condition where KRaft controllers still running could steal > leadership again? > Or is there anything missing in the documentation maybe which is driving to > this problem? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16101) KRaft migration documentation is incorrect
[ https://issues.apache.org/jira/browse/KAFKA-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806678#comment-17806678 ] Colin McCabe commented on KAFKA-16101: -- Hi Luke, Thanks for testing rollback. I think this is a case where the documentation is wrong. The intention was to for the steps to basically be: 1. roll all the brokers into zk mode, but with migration enabled 2. take down the kraft quorum 3. rmr /controller, allowing a hybrid broker to take over. 4. roll all the brokers into zk mode without migration enabled (if desired) With these steps, there isn't really unavailability since a ZK controller can be elected quickly after the kraft quorum is gone. I will update the docs. > KRaft migration documentation is incorrect > -- > > Key: KAFKA-16101 > URL: https://issues.apache.org/jira/browse/KAFKA-16101 > Project: Kafka > Issue Type: Bug > Components: kraft >Affects Versions: 3.6.1 >Reporter: Paolo Patierno >Assignee: Colin McCabe >Priority: Blocker > Fix For: 3.7.0 > > > Hello, > I was trying the KRaft migration rollback procedure locally and I came across > a potential bug or anyway a situation where the cluster is not > usable/available for a certain amount of time. > In order to test the procedure, I start with a one broker (broker ID = 0) and > one zookeeper node cluster. Then I start the migration with a one KRaft > controller node (broker ID = 1). The migration runs fine and it reaches the > point of "dual write" state. > From this point, I try to run the rollback procedure as described in the > documentation. > As first step, this involves ... > * stopping the broker > * removing the __cluster_metadata folder > * removing ZooKeeper migration flag and controller(s) related configuration > from the broker > * restarting the broker > With the above steps done, the broker starts in ZooKeeper mode (no migration, > no KRaft controllers knowledge) and it keeps logging the following messages > in DEBUG: > {code:java} > [2024-01-08 11:51:20,608] DEBUG > [zk-broker-0-to-controller-forwarding-channel-manager]: Controller isn't > cached, looking for local metadata changes > (kafka.server.BrokerToControllerRequestThread) > [2024-01-08 11:51:20,608] DEBUG > [zk-broker-0-to-controller-forwarding-channel-manager]: No controller > provided, retrying after backoff > (kafka.server.BrokerToControllerRequestThread) > [2024-01-08 11:51:20,629] DEBUG > [zk-broker-0-to-controller-alter-partition-channel-manager]: Controller isn't > cached, looking for local metadata changes > (kafka.server.BrokerToControllerRequestThread) > [2024-01-08 11:51:20,629] DEBUG > [zk-broker-0-to-controller-alter-partition-channel-manager]: No controller > provided, retrying after backoff > (kafka.server.BrokerToControllerRequestThread) {code} > What's happening should be clear. > The /controller znode in ZooKeeper still reports the KRaft controller (broker > ID = 1) as controller. The broker gets it from the znode but doesn't know how > to reach it. > The issue is that until the procedure isn't fully completed with the next > steps (shutting down KRaft controller, deleting /controller znode), the > cluster is unusable. Any admin or client operation against the broker doesn't > work, just hangs, the broker doesn't reply. > Imagining this scenario to a more complex one with 10-20-50 brokers and > partitions' replicas spread across them, when the brokers are rolled one by > one (in ZK mode) reporting the above error, the topics will become not > available one after the other, until all brokers are in such a state and > nothing can work. This is because from a KRaft controller perspective (still > running), the brokers are not available anymore and the partitions' replicas > are out of sync. > Of course, as soon as you complete the rollback procedure, after deleting the > /controller znode, the brokers are able to elect a new controller among them > and everything recovers to work. > My first question ... isn't the cluster supposed to work during rollback and > being always available during the rollback when the procedure is not > completed yet? Or having the cluster not available is an assumption during > the rollback, until it's fully completed? > This "unavailability" time window could be reduced by deleting the > /controller znode before shutting down the KRaft controllers to allow the > brokers electing a new controller among them, but in this case, could there > be a race condition where KRaft controllers still running could steal > leadership again? > Or is there anything missing in the documentation maybe which is driving to > this problem? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16101) KRaft migration documentation is incorrect
[ https://issues.apache.org/jira/browse/KAFKA-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16101: - Summary: KRaft migration documentation is incorrect (was: Kafka cluster unavailable during KRaft migration rollback procedure) > KRaft migration documentation is incorrect > -- > > Key: KAFKA-16101 > URL: https://issues.apache.org/jira/browse/KAFKA-16101 > Project: Kafka > Issue Type: Bug > Components: kraft >Affects Versions: 3.6.1 >Reporter: Paolo Patierno >Priority: Major > > Hello, > I was trying the KRaft migration rollback procedure locally and I came across > a potential bug or anyway a situation where the cluster is not > usable/available for a certain amount of time. > In order to test the procedure, I start with a one broker (broker ID = 0) and > one zookeeper node cluster. Then I start the migration with a one KRaft > controller node (broker ID = 1). The migration runs fine and it reaches the > point of "dual write" state. > From this point, I try to run the rollback procedure as described in the > documentation. > As first step, this involves ... > * stopping the broker > * removing the __cluster_metadata folder > * removing ZooKeeper migration flag and controller(s) related configuration > from the broker > * restarting the broker > With the above steps done, the broker starts in ZooKeeper mode (no migration, > no KRaft controllers knowledge) and it keeps logging the following messages > in DEBUG: > {code:java} > [2024-01-08 11:51:20,608] DEBUG > [zk-broker-0-to-controller-forwarding-channel-manager]: Controller isn't > cached, looking for local metadata changes > (kafka.server.BrokerToControllerRequestThread) > [2024-01-08 11:51:20,608] DEBUG > [zk-broker-0-to-controller-forwarding-channel-manager]: No controller > provided, retrying after backoff > (kafka.server.BrokerToControllerRequestThread) > [2024-01-08 11:51:20,629] DEBUG > [zk-broker-0-to-controller-alter-partition-channel-manager]: Controller isn't > cached, looking for local metadata changes > (kafka.server.BrokerToControllerRequestThread) > [2024-01-08 11:51:20,629] DEBUG > [zk-broker-0-to-controller-alter-partition-channel-manager]: No controller > provided, retrying after backoff > (kafka.server.BrokerToControllerRequestThread) {code} > What's happening should be clear. > The /controller znode in ZooKeeper still reports the KRaft controller (broker > ID = 1) as controller. The broker gets it from the znode but doesn't know how > to reach it. > The issue is that until the procedure isn't fully completed with the next > steps (shutting down KRaft controller, deleting /controller znode), the > cluster is unusable. Any admin or client operation against the broker doesn't > work, just hangs, the broker doesn't reply. > Imagining this scenario to a more complex one with 10-20-50 brokers and > partitions' replicas spread across them, when the brokers are rolled one by > one (in ZK mode) reporting the above error, the topics will become not > available one after the other, until all brokers are in such a state and > nothing can work. This is because from a KRaft controller perspective (still > running), the brokers are not available anymore and the partitions' replicas > are out of sync. > Of course, as soon as you complete the rollback procedure, after deleting the > /controller znode, the brokers are able to elect a new controller among them > and everything recovers to work. > My first question ... isn't the cluster supposed to work during rollback and > being always available during the rollback when the procedure is not > completed yet? Or having the cluster not available is an assumption during > the rollback, until it's fully completed? > This "unavailability" time window could be reduced by deleting the > /controller znode before shutting down the KRaft controllers to allow the > brokers electing a new controller among them, but in this case, could there > be a race condition where KRaft controllers still running could steal > leadership again? > Or is there anything missing in the documentation maybe which is driving to > this problem? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16120) Partition reassignments in ZK migration dual write leaves stray partitions
[ https://issues.apache.org/jira/browse/KAFKA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-16120. -- Fix Version/s: 3.7.0 Reviewer: Colin McCabe Assignee: David Mao Resolution: Fixed > Partition reassignments in ZK migration dual write leaves stray partitions > -- > > Key: KAFKA-16120 > URL: https://issues.apache.org/jira/browse/KAFKA-16120 > Project: Kafka > Issue Type: Bug >Reporter: David Mao >Assignee: David Mao >Priority: Major > Fix For: 3.7.0 > > > When a reassignment is completed in ZK migration dual-write mode, the > `StopReplica` sent by the kraft quorum migration propagator is sent with > `delete = false` for deleted replicas when processing the topic delta. This > results in stray replicas. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16120) Partition reassignments in ZK migration dual write leaves stray partitions
[ https://issues.apache.org/jira/browse/KAFKA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806565#comment-17806565 ] Colin McCabe commented on KAFKA-16120: -- KAFKA-14616 is a separate bug, unfortunately. I am working on a fix for that one > Partition reassignments in ZK migration dual write leaves stray partitions > -- > > Key: KAFKA-16120 > URL: https://issues.apache.org/jira/browse/KAFKA-16120 > Project: Kafka > Issue Type: Bug >Reporter: David Mao >Priority: Major > > When a reassignment is completed in ZK migration dual-write mode, the > `StopReplica` sent by the kraft quorum migration propagator is sent with > `delete = false` for deleted replicas when processing the topic delta. This > results in stray replicas. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16126) Kcontroller dynamic configurations may fail to apply at startup
Colin McCabe created KAFKA-16126: Summary: Kcontroller dynamic configurations may fail to apply at startup Key: KAFKA-16126 URL: https://issues.apache.org/jira/browse/KAFKA-16126 Project: Kafka Issue Type: Bug Affects Versions: 3.7.0 Reporter: Colin McCabe Assignee: Colin McCabe Some kcontroller dynamic configurations may fail to apply at startup. This happens because there is a race between registering the reconfigurables to the DynamicBrokerConfig class, and receiving the first update from the metadata publisher. We can fix this by registering the reconfigurables first. This seems to have been introduced by the "MINOR: Install ControllerServer metadata publishers sooner" change. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16094) BrokerRegistrationRequest.logDirs field must be ignorable
[ https://issues.apache.org/jira/browse/KAFKA-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-16094. -- Fix Version/s: 3.7.0 Resolution: Fixed > BrokerRegistrationRequest.logDirs field must be ignorable > - > > Key: KAFKA-16094 > URL: https://issues.apache.org/jira/browse/KAFKA-16094 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Blocker > Fix For: 3.7.0 > > > 3.7 brokers must be able to register with 3.6 and earlier controllers. So > this means that the logDirs field must be ignorable (aka, not sent) if the > highest BrokerRegistrationRequest version we can negotiate is older than v2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16094) BrokerRegistrationRequest.logDirs field must be ignorable
[ https://issues.apache.org/jira/browse/KAFKA-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16094: - Summary: BrokerRegistrationRequest.logDirs field must be ignorable (was: 3.7 brokers must be able to register with 3.6 and earlier controllers) > BrokerRegistrationRequest.logDirs field must be ignorable > - > > Key: KAFKA-16094 > URL: https://issues.apache.org/jira/browse/KAFKA-16094 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Blocker > > 3.7 brokers must be able to register with 3.6 and earlier controllers. So > this means that the logDirs field must be ignorable (aka, not sent) if the > highest BrokerRegistrationRequest version we can negotiate is older than v2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16094) 3.7 brokers must be able to register with 3.6 and earlier controllers
Colin McCabe created KAFKA-16094: Summary: 3.7 brokers must be able to register with 3.6 and earlier controllers Key: KAFKA-16094 URL: https://issues.apache.org/jira/browse/KAFKA-16094 Project: Kafka Issue Type: Bug Affects Versions: 3.7.0 Reporter: Colin McCabe Assignee: Colin McCabe 3.7 brokers must be able to register with 3.6 and earlier controllers. So this means that the logDirs field must be ignorable (aka, not sent) if the highest BrokerRegistrationRequest version we can negotiate is older than v2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14127) KIP-858: Handle JBOD broker disk failure in KRaft
[ https://issues.apache.org/jira/browse/KAFKA-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14127. -- Resolution: Fixed > KIP-858: Handle JBOD broker disk failure in KRaft > - > > Key: KAFKA-14127 > URL: https://issues.apache.org/jira/browse/KAFKA-14127 > Project: Kafka > Issue Type: Improvement > Components: jbod, kraft >Reporter: Igor Soarez >Assignee: Igor Soarez >Priority: Major > Labels: 4.0-blocker, kip-500, kraft > Fix For: 3.7.0 > > > Supporting configurations with multiple storage directories in KRaft mode -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15955) Migrating ZK brokers send dir assignments
[ https://issues.apache.org/jira/browse/KAFKA-15955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-15955: - Parent: KAFKA-16061 Issue Type: Sub-task (was: Bug) > Migrating ZK brokers send dir assignments > - > > Key: KAFKA-15955 > URL: https://issues.apache.org/jira/browse/KAFKA-15955 > Project: Kafka > Issue Type: Sub-task >Reporter: Igor Soarez >Assignee: Proven Provenzano >Priority: Major > > Broker in ZooKeeper mode, while in migration mode, should start sending > directory assignments to the KRaft Controller using AssignmentsManager. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15955) Migrating ZK brokers send dir assignments
[ https://issues.apache.org/jira/browse/KAFKA-15955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-15955: - Parent: (was: KAFKA-14127) Issue Type: Bug (was: Sub-task) > Migrating ZK brokers send dir assignments > - > > Key: KAFKA-15955 > URL: https://issues.apache.org/jira/browse/KAFKA-15955 > Project: Kafka > Issue Type: Bug >Reporter: Igor Soarez >Assignee: Proven Provenzano >Priority: Major > > Broker in ZooKeeper mode, while in migration mode, should start sending > directory assignments to the KRaft Controller using AssignmentsManager. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15650) Data-loss on leader shutdown right after partition creation?
[ https://issues.apache.org/jira/browse/KAFKA-15650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801078#comment-17801078 ] Colin McCabe commented on KAFKA-15650: -- Based on our follow-up discussions, this is not an issue because partitions initially are in state UNASSIGNED, and only later get a directory. (Unless there is only a single directory -- then the controller assigns.) > Data-loss on leader shutdown right after partition creation? > > > Key: KAFKA-15650 > URL: https://issues.apache.org/jira/browse/KAFKA-15650 > Project: Kafka > Issue Type: Sub-task >Reporter: Igor Soarez >Priority: Major > > As per KIP-858, when a replica is created, the broker selects a log directory > to host the replica and queues the propagation of the directory assignment to > the controller. The replica becomes immediately active, it isn't blocked > until the controller confirms the metadata change. If the replica is the > leader replica it can immediately start accepting writes. > Consider the following scenario: > # A partition is created in some selected log directory, and some produce > traffic is accepted > # Before the broker is able to notify the controller of the directory > assignment, the broker shuts down > # Upon coming back online, the broker has an offline directory, the same > directory which was chosen to host the replica > # The broker assumes leadership for the replica, but cannot find it in any > available directory and has no way of knowing it was already created because > the directory assignment is still missing > # The replica is created and the previously produced records are lost > Step 4. may seem unlikely due to ISR membership gating leadership, but even > assuming acks=all and replicas>1, if all other replicas are also offline the > broker may still gain leadership. Perhaps KIP-966 is relevant here. > We may need to delay new replica activation until the assignment is > propagated successfully. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15650) Data-loss on leader shutdown right after partition creation?
[ https://issues.apache.org/jira/browse/KAFKA-15650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15650. -- Resolution: Not A Problem > Data-loss on leader shutdown right after partition creation? > > > Key: KAFKA-15650 > URL: https://issues.apache.org/jira/browse/KAFKA-15650 > Project: Kafka > Issue Type: Sub-task >Reporter: Igor Soarez >Priority: Major > > As per KIP-858, when a replica is created, the broker selects a log directory > to host the replica and queues the propagation of the directory assignment to > the controller. The replica becomes immediately active, it isn't blocked > until the controller confirms the metadata change. If the replica is the > leader replica it can immediately start accepting writes. > Consider the following scenario: > # A partition is created in some selected log directory, and some produce > traffic is accepted > # Before the broker is able to notify the controller of the directory > assignment, the broker shuts down > # Upon coming back online, the broker has an offline directory, the same > directory which was chosen to host the replica > # The broker assumes leadership for the replica, but cannot find it in any > available directory and has no way of knowing it was already created because > the directory assignment is still missing > # The replica is created and the previously produced records are lost > Step 4. may seem unlikely due to ISR membership gating leadership, but even > assuming acks=all and replicas>1, if all other replicas are also offline the > broker may still gain leadership. Perhaps KIP-966 is relevant here. > We may need to delay new replica activation until the assignment is > propagated successfully. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15649) Handle directory failure timeout
[ https://issues.apache.org/jira/browse/KAFKA-15649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-15649: - Parent: KAFKA-16061 Issue Type: Sub-task (was: Bug) > Handle directory failure timeout > - > > Key: KAFKA-15649 > URL: https://issues.apache.org/jira/browse/KAFKA-15649 > Project: Kafka > Issue Type: Sub-task >Reporter: Igor Soarez >Priority: Minor > > If a broker with an offline log directory continues to fail to notify the > controller of either: > * the fact that the directory is offline; or > * of any replica assignment into a failed directory > then the controller will not check if a leadership change is required, and > this may lead to partitions remaining indefinitely offline. > KIP-858 proposes that the broker should shut down after a configurable > timeout to force a leadership change. Alternatively, the broker could also > request to be fenced, as long as there's a path for it to later become > unfenced. > While this unavailability is possible in theory, in practice it's not easy to > entertain a scenario where a broker continues to appear as healthy before the > controller, but fails to send this information. So it's not clear if this is > a real problem. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15367) Test KRaft non-JBOD -> JBOD migration
[ https://issues.apache.org/jira/browse/KAFKA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-15367: - Parent: KAFKA-16061 Issue Type: Sub-task (was: Bug) > Test KRaft non-JBOD -> JBOD migration > - > > Key: KAFKA-15367 > URL: https://issues.apache.org/jira/browse/KAFKA-15367 > Project: Kafka > Issue Type: Sub-task >Reporter: Igor Soarez >Priority: Major > > A cluster running in KRaft without JBOD should be able to transition into > JBOD mode without issues -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15649) Handle directory failure timeout
[ https://issues.apache.org/jira/browse/KAFKA-15649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-15649: - Parent: (was: KAFKA-14127) Issue Type: Bug (was: Sub-task) > Handle directory failure timeout > - > > Key: KAFKA-15649 > URL: https://issues.apache.org/jira/browse/KAFKA-15649 > Project: Kafka > Issue Type: Bug >Reporter: Igor Soarez >Priority: Minor > > If a broker with an offline log directory continues to fail to notify the > controller of either: > * the fact that the directory is offline; or > * of any replica assignment into a failed directory > then the controller will not check if a leadership change is required, and > this may lead to partitions remaining indefinitely offline. > KIP-858 proposes that the broker should shut down after a configurable > timeout to force a leadership change. Alternatively, the broker could also > request to be fenced, as long as there's a path for it to later become > unfenced. > While this unavailability is possible in theory, in practice it's not easy to > entertain a scenario where a broker continues to appear as healthy before the > controller, but fails to send this information. So it's not clear if this is > a real problem. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15368) Test ZK JBOD to KRaft migration
[ https://issues.apache.org/jira/browse/KAFKA-15368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-15368: - Parent: KAFKA-16061 Issue Type: Sub-task (was: Bug) > Test ZK JBOD to KRaft migration > --- > > Key: KAFKA-15368 > URL: https://issues.apache.org/jira/browse/KAFKA-15368 > Project: Kafka > Issue Type: Sub-task >Reporter: Igor Soarez >Priority: Major > > A ZK cluster running JBOD should be able to migrate to KRaft mode without > issues -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15368) Test ZK JBOD to KRaft migration
[ https://issues.apache.org/jira/browse/KAFKA-15368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-15368: - Parent: (was: KAFKA-14127) Issue Type: Bug (was: Sub-task) > Test ZK JBOD to KRaft migration > --- > > Key: KAFKA-15368 > URL: https://issues.apache.org/jira/browse/KAFKA-15368 > Project: Kafka > Issue Type: Bug >Reporter: Igor Soarez >Priority: Major > > A ZK cluster running JBOD should be able to migrate to KRaft mode without > issues -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15367) Test KRaft non-JBOD -> JBOD migration
[ https://issues.apache.org/jira/browse/KAFKA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-15367: - Parent: (was: KAFKA-14127) Issue Type: Bug (was: Sub-task) > Test KRaft non-JBOD -> JBOD migration > - > > Key: KAFKA-15367 > URL: https://issues.apache.org/jira/browse/KAFKA-15367 > Project: Kafka > Issue Type: Bug >Reporter: Igor Soarez >Priority: Major > > A cluster running in KRaft without JBOD should be able to transition into > JBOD mode without issues -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14127) KIP-858: Handle JBOD broker disk failure in KRaft
[ https://issues.apache.org/jira/browse/KAFKA-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801077#comment-17801077 ] Colin McCabe commented on KAFKA-14127: -- JBOD is a feature that is in 3.7, so the fix version needs to be 3.7 here. I'll move all the remaining work to a follow-up JIRA for clarity. Some of it is "nice to have" features, some of it is testing. > KIP-858: Handle JBOD broker disk failure in KRaft > - > > Key: KAFKA-14127 > URL: https://issues.apache.org/jira/browse/KAFKA-14127 > Project: Kafka > Issue Type: Improvement > Components: jbod, kraft >Reporter: Igor Soarez >Assignee: Igor Soarez >Priority: Major > Labels: 4.0-blocker, kip-500, kraft > Fix For: 3.7.0 > > > Supporting configurations with multiple storage directories in KRaft mode -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14127) KIP-858: Handle JBOD broker disk failure in KRaft
[ https://issues.apache.org/jira/browse/KAFKA-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-14127: - Fix Version/s: 3.7.0 (was: 3.8.0) > KIP-858: Handle JBOD broker disk failure in KRaft > - > > Key: KAFKA-14127 > URL: https://issues.apache.org/jira/browse/KAFKA-14127 > Project: Kafka > Issue Type: Improvement > Components: jbod, kraft >Reporter: Igor Soarez >Assignee: Igor Soarez >Priority: Major > Labels: 4.0-blocker, kip-500, kraft > Fix For: 3.7.0 > > > Supporting configurations with multiple storage directories in KRaft mode -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15359) log.dir.failure.timeout.ms configuration
[ https://issues.apache.org/jira/browse/KAFKA-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-15359: - Parent: (was: KAFKA-14127) Issue Type: Improvement (was: Sub-task) > log.dir.failure.timeout.ms configuration > > > Key: KAFKA-15359 > URL: https://issues.apache.org/jira/browse/KAFKA-15359 > Project: Kafka > Issue Type: Improvement >Reporter: Igor Soarez >Assignee: Igor Soarez >Priority: Major > > If the Broker repeatedly cannot communicate fails to communicate a log > directory failure after a configurable amount of time — > {{log.dir.failure.timeout.ms}} — and it is the leader for any replicas in the > failed log directory the broker will shutdown, as that is the only other way > to guarantee that the controller will elect a new leader for those partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15359) Support log.dir.failure.timeout.ms configuration for JBOD
[ https://issues.apache.org/jira/browse/KAFKA-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-15359: - Summary: Support log.dir.failure.timeout.ms configuration for JBOD (was: log.dir.failure.timeout.ms configuration) > Support log.dir.failure.timeout.ms configuration for JBOD > - > > Key: KAFKA-15359 > URL: https://issues.apache.org/jira/browse/KAFKA-15359 > Project: Kafka > Issue Type: Sub-task >Reporter: Igor Soarez >Assignee: Igor Soarez >Priority: Major > > If the Broker repeatedly cannot communicate fails to communicate a log > directory failure after a configurable amount of time — > {{log.dir.failure.timeout.ms}} — and it is the leader for any replicas in the > failed log directory the broker will shutdown, as that is the only other way > to guarantee that the controller will elect a new leader for those partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16061) KRaft JBOD follow-ups and improvements
[ https://issues.apache.org/jira/browse/KAFKA-16061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-16061: - Summary: KRaft JBOD follow-ups and improvements (was: JBOD follow-ups) > KRaft JBOD follow-ups and improvements > -- > > Key: KAFKA-16061 > URL: https://issues.apache.org/jira/browse/KAFKA-16061 > Project: Kafka > Issue Type: Improvement >Reporter: Colin McCabe >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15359) log.dir.failure.timeout.ms configuration
[ https://issues.apache.org/jira/browse/KAFKA-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-15359: - Parent: KAFKA-16061 Issue Type: Sub-task (was: Improvement) > log.dir.failure.timeout.ms configuration > > > Key: KAFKA-15359 > URL: https://issues.apache.org/jira/browse/KAFKA-15359 > Project: Kafka > Issue Type: Sub-task >Reporter: Igor Soarez >Assignee: Igor Soarez >Priority: Major > > If the Broker repeatedly cannot communicate fails to communicate a log > directory failure after a configurable amount of time — > {{log.dir.failure.timeout.ms}} — and it is the leader for any replicas in the > failed log directory the broker will shutdown, as that is the only other way > to guarantee that the controller will elect a new leader for those partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16061) JBOD follow-ups
Colin McCabe created KAFKA-16061: Summary: JBOD follow-ups Key: KAFKA-16061 URL: https://issues.apache.org/jira/browse/KAFKA-16061 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15979) Add KIP-1001 CurrentControllerId metric
Colin McCabe created KAFKA-15979: Summary: Add KIP-1001 CurrentControllerId metric Key: KAFKA-15979 URL: https://issues.apache.org/jira/browse/KAFKA-15979 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15980) Add KIP-1001 CurrentControllerId metric
Colin McCabe created KAFKA-15980: Summary: Add KIP-1001 CurrentControllerId metric Key: KAFKA-15980 URL: https://issues.apache.org/jira/browse/KAFKA-15980 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15956) MetadataShell must take the directory lock when reading
Colin McCabe created KAFKA-15956: Summary: MetadataShell must take the directory lock when reading Key: KAFKA-15956 URL: https://issues.apache.org/jira/browse/KAFKA-15956 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe MetadataShell must take the directory lock when reading files, to avoid unpleasant surprises from concurrent reads and writes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15311) Fix docs about reverting to ZooKeeper mode during KRaft migration
[ https://issues.apache.org/jira/browse/KAFKA-15311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15311. -- Fix Version/s: 3.7.0 Resolution: Fixed > Fix docs about reverting to ZooKeeper mode during KRaft migration > - > > Key: KAFKA-15311 > URL: https://issues.apache.org/jira/browse/KAFKA-15311 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Minor > Fix For: 3.7.0 > > > The cocs incorrectly state that reverting to ZooKeeper mode during KRaft > migration is not possible -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15922) Add MetadataVersion for JBOD
Colin McCabe created KAFKA-15922: Summary: Add MetadataVersion for JBOD Key: KAFKA-15922 URL: https://issues.apache.org/jira/browse/KAFKA-15922 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15860) ControllerRegistration must be written out to the metadata image
[ https://issues.apache.org/jira/browse/KAFKA-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15860. -- Fix Version/s: 3.7.0 Resolution: Fixed > ControllerRegistration must be written out to the metadata image > > > Key: KAFKA-15860 > URL: https://issues.apache.org/jira/browse/KAFKA-15860 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > Fix For: 3.7.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14552) Remove no longer required server protocol versions in Kafka 4.0
[ https://issues.apache.org/jira/browse/KAFKA-14552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788891#comment-17788891 ] Colin McCabe commented on KAFKA-14552: -- I could go either way. I think most of the configuration key removals are "implied" by other KIPs (or sometimes stated directly there) but I thought it would be good to gather them somewhere. > Remove no longer required server protocol versions in Kafka 4.0 > --- > > Key: KAFKA-14552 > URL: https://issues.apache.org/jira/browse/KAFKA-14552 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma >Priority: Blocker > Fix For: 4.0.0 > > > Kafka 4.0 will remove support for zk mode and kraft mode became production > ready in Kafka 3.3. Furthermore, migration from zk mode to kraft mode will > require upgrading to the bridge release first (likely 3.5, but could also be > 3.6). > This provides an opportunity to remove exclusively server side protocols > versions that only exist to allow direct upgrades from versions older than > 3.n where n is either 0 (KRaft preview), 3 (KRaft production ready) or 5 > (bridge release). We should decide on the right `n` and make the change as > part of 4.0. > Note that this is complementary to the protocols that will be completely > removed as part of zk mode removal. Step one would be to create a list of > protocols that will be completely removed due to zk mode removal and the list > of exclusively server side protocols remaining after that (one example is > ControlledShutdown). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15860) ControllerRegistration must be written out to the metadata image
Colin McCabe created KAFKA-15860: Summary: ControllerRegistration must be written out to the metadata image Key: KAFKA-15860 URL: https://issues.apache.org/jira/browse/KAFKA-15860 Project: Kafka Issue Type: Bug Affects Versions: 3.7.0 Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15532) ZkWriteBehindLag should not be reported by inactive controllers
[ https://issues.apache.org/jira/browse/KAFKA-15532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15532. -- Resolution: Fixed > ZkWriteBehindLag should not be reported by inactive controllers > --- > > Key: KAFKA-15532 > URL: https://issues.apache.org/jira/browse/KAFKA-15532 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: David Arthur >Assignee: David Arthur >Priority: Minor > > Since only the active controller is performing the dual-write to ZK during a > migration, it should be the only controller to report the ZkWriteBehindLag > metric. > > Currently, if the controller fails over during a migration, the previous > active controller will incorrectly report its last value for ZkWriteBehindLag > forever. Instead, it should report zero. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15532) ZkWriteBehindLag should not be reported by inactive controllers
[ https://issues.apache.org/jira/browse/KAFKA-15532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe reassigned KAFKA-15532: Assignee: David Arthur > ZkWriteBehindLag should not be reported by inactive controllers > --- > > Key: KAFKA-15532 > URL: https://issues.apache.org/jira/browse/KAFKA-15532 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: David Arthur >Assignee: David Arthur >Priority: Minor > > Since only the active controller is performing the dual-write to ZK during a > migration, it should be the only controller to report the ZkWriteBehindLag > metric. > > Currently, if the controller fails over during a migration, the previous > active controller will incorrectly report its last value for ZkWriteBehindLag > forever. Instead, it should report zero. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15782) Establish concrete project conventions to define public APIs that require a KIP
[ https://issues.apache.org/jira/browse/KAFKA-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783369#comment-17783369 ] Colin McCabe commented on KAFKA-15782: -- I think the rules are already quite clear. The main source of unclarity is that we have a bunch of things which are public / protected, but not actually intended to be used by end-users. This happens because of some of the technical limitations of Java. There are just cases where something in package A needs to be visible to package B, even though end-users are not supposed to be directly using either package. In these situations, "interface annotations" are supposed to enforce the rules. But this is only a partial solution because people can easily ignore the annotations. Also, annotations are relatively new in the history of the project, so a lot of older classes don't have them at all. The best solution in the long term is to move as much code as possible out of the "clients" module. While people can technically access the broker / controller jars and start messing with them, it tends to be much less of a problem in practice. People mostly understand that if they pull server code and start subclassing it, that's on them. A lot of things in clients should really be in server-common. That being said, KAFKA-15781 doesn't seem like a grey area to me at all. ProducerConfig is very obviously a user-visible class, and always has been. The theory that we don't need a KIP for changes to public classes if they're just "one line changes" doesn't make sense to me. I could very clearly break compatibility for everyone just with one line. > Establish concrete project conventions to define public APIs that require a > KIP > --- > > Key: KAFKA-15782 > URL: https://issues.apache.org/jira/browse/KAFKA-15782 > Project: Kafka > Issue Type: Improvement >Reporter: A. Sophie Blee-Goldman >Priority: Major > Labels: needs-kip > > There seems to be no concrete definition that establishes project-specific > conventions for what is and is not considered a public API change that > requires a KIP. This results in frequent drawn-out debates that revisit the > same topic and slow things down, and often ends up forcing trivial changes > through the KIP process. For a recent example, KIP-998 was required for a > one-line change just to add the "protected" access modifier to an otherwise > package-private class. See [this comment > thread|https://github.com/apache/kafka/pull/14681#discussion_r1378591228] for > the full debate on this subject. > It would be beneficial and in the long run save us all time to just sit down > and hash out the project conventions, such as whether a > package-private/protected method on a non-final java class is to be > considered a public API, even if the method itself is/was never a public > method. This will of course require a KIP, but should help to establish some > ground rules to avoid any more superfluous KIPs in the future -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781842#comment-17781842 ] Colin McCabe commented on KAFKA-15754: -- bq. Going to close this again, even if it's a mistery why this call Uuid.randomUuid().toString() produced a UUID starting with "-" in our code. My guess would be that you are depending on an older version of the Kafka client libraries where this was possible. bq. Going to close this again Ack. > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781155#comment-17781155 ] Colin McCabe commented on KAFKA-15754: -- {quote} I was wondering if there is any good reason for using a Base64 URL encoder and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet not containing the "-". {quote} At one point, I did raise the question of why dash was used to serialize Kafka Uuids. But by the time I did so we were already using it in a few places so the question was not relevant. We're not going to change Uuid serialization now. I think the general rationale was that dash and underscore were friendlier than slash and plus sign. But that's debatable, of course. Slash, at least, is not filesystem-safe. > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781152#comment-17781152 ] Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:21 PM: - You can run this code yourself if you are curious. Here it is. You will need bash 4 or better. (my version is {{GNU bash, version 5.2.15(1)-release (aarch64-apple-darwin21.6.0)}}) {code} #!/usr/bin/env bash declare -A IDS_PER_INITIAL_LETTER for ((i = 0; i < 1 ; i++)); do ./kafka-storage.sh random-uuid > /tmp/out 2> /dev/null FIRST_LETTER=$(head -c 1 /tmp/out) IDS_PER_INITIAL_LETTER[$FIRST_LETTER]=$((IDS_PER_INITIAL_LETTER[$FIRST_LETTER]+1)) done for k in "${!IDS_PER_INITIAL_LETTER[@]}"; do echo "IDs starting with $k : ${IDS_PER_INITIAL_LETTER[$k]}" done {code} was (Author: cmccabe): You can run this code yourself if you are curious. Here it is. You will need bash 4 or better. (my version is `GNU bash, version 5.2.15(1)-release (aarch64-apple-darwin21.6.0)`) {code} #!/usr/bin/env bash declare -A IDS_PER_INITIAL_LETTER for ((i = 0; i < 1 ; i++)); do ./kafka-storage.sh random-uuid > /tmp/out 2> /dev/null FIRST_LETTER=$(head -c 1 /tmp/out) IDS_PER_INITIAL_LETTER[$FIRST_LETTER]=$((IDS_PER_INITIAL_LETTER[$FIRST_LETTER]+1)) done for k in "${!IDS_PER_INITIAL_LETTER[@]}"; do echo "IDs starting with $k : ${IDS_PER_INITIAL_LETTER[$k]}" done {code} > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781153#comment-17781153 ] Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:20 PM: - I am closing this JIRA because {{kafka-storage.sh random-uuid}} can not, in fact, generate uuids starting with {{-}}. You can see this via analysis of the code or by just running it as I did was (Author: cmccabe): I am closing this JIRA because {{kafka-storage.sh random-uuid}} can not, in fact, generate uuids starting with '-' > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781153#comment-17781153 ] Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:20 PM: - I am closing this JIRA because {{kafka-storage.sh random-uuid}} can not, in fact, generate uuids starting with '-' was (Author: cmccabe): I am closing this JIRA because `kafka-storage.sh` can not, in fact, generate uuids starting with '-' > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781151#comment-17781151 ] Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:20 PM: - I ran {{kafka-storage.sh random-uuid}} 10,000 times and got the following distribution of first characters: {code} IDs starting with 0 : 166 IDs starting with 1 : 174 IDs starting with 2 : 135 IDs starting with 3 : 172 IDs starting with 4 : 155 IDs starting with 5 : 154 IDs starting with 6 : 152 IDs starting with 7 : 172 IDs starting with 8 : 170 IDs starting with 9 : 166 IDs starting with A : 147 IDs starting with B : 161 IDs starting with C : 172 IDs starting with D : 158 IDs starting with E : 164 IDs starting with F : 164 IDs starting with G : 146 IDs starting with H : 156 IDs starting with I : 166 IDs starting with J : 172 IDs starting with K : 177 IDs starting with L : 143 IDs starting with M : 171 IDs starting with N : 144 IDs starting with O : 157 IDs starting with P : 162 IDs starting with Q : 144 IDs starting with R : 157 IDs starting with S : 161 IDs starting with T : 158 IDs starting with U : 174 IDs starting with V : 166 IDs starting with W : 166 IDs starting with X : 159 IDs starting with Y : 165 IDs starting with Z : 161 IDs starting with _ : 159 IDs starting with a : 145 IDs starting with b : 169 IDs starting with c : 166 IDs starting with d : 171 IDs starting with e : 162 IDs starting with f : 154 IDs starting with g : 132 IDs starting with h : 152 IDs starting with i : 136 IDs starting with j : 166 IDs starting with k : 159 IDs starting with l : 156 IDs starting with m : 154 IDs starting with n : 155 IDs starting with o : 154 IDs starting with p : 158 IDs starting with q : 141 IDs starting with r : 165 IDs starting with s : 154 IDs starting with t : 162 IDs starting with u : 146 IDs starting with v : 161 IDs starting with w : 164 IDs starting with x : 154 IDs starting with y : 164 IDs starting with z : 154 {code} No IDs were generated with a first character of {{-}}, as expected. was (Author: cmccabe): I ran {kafka-storage.sh random-uuid} 10,000 times and got the following distribution of first characters: {code} IDs starting with 0 : 166 IDs starting with 1 : 174 IDs starting with 2 : 135 IDs starting with 3 : 172 IDs starting with 4 : 155 IDs starting with 5 : 154 IDs starting with 6 : 152 IDs starting with 7 : 172 IDs starting with 8 : 170 IDs starting with 9 : 166 IDs starting with A : 147 IDs starting with B : 161 IDs starting with C : 172 IDs starting with D : 158 IDs starting with E : 164 IDs starting with F : 164 IDs starting with G : 146 IDs starting with H : 156 IDs starting with I : 166 IDs starting with J : 172 IDs starting with K : 177 IDs starting with L : 143 IDs starting with M : 171 IDs starting with N : 144 IDs starting with O : 157 IDs starting with P : 162 IDs starting with Q : 144 IDs starting with R : 157 IDs starting with S : 161 IDs starting with T : 158 IDs starting with U : 174 IDs starting with V : 166 IDs starting with W : 166 IDs starting with X : 159 IDs starting with Y : 165 IDs starting with Z : 161 IDs starting with _ : 159 IDs starting with a : 145 IDs starting with b : 169 IDs starting with c : 166 IDs starting with d : 171 IDs starting with e : 162 IDs starting with f : 154 IDs starting with g : 132 IDs starting with h : 152 IDs starting with i : 136 IDs starting with j : 166 IDs starting with k : 159 IDs starting with l : 156 IDs starting with m : 154 IDs starting with n : 155 IDs starting with o : 154 IDs starting with p : 158 IDs starting with q : 141 IDs starting with r : 165 IDs starting with s : 154 IDs starting with t : 162 IDs starting with u : 146 IDs starting with v : 161 IDs starting with w : 164 IDs starting with x : 154 IDs starting with y : 164 IDs starting with z : 154 {code} No IDs were generated with a first character of {-}, as expected. > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it >
[jira] [Resolved] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15754. -- Resolution: Invalid kafka-storage tool can not, in fact, generate uuids starting with '-' > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781153#comment-17781153 ] Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:19 PM: - I am closing this JIRA because `kafka-storage.sh` can not, in fact, generate uuids starting with '-' was (Author: cmccabe): kafka-storage tool can not, in fact, generate uuids starting with '-' > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781151#comment-17781151 ] Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:19 PM: - I ran {kafka-storage.sh random-uuid} 10,000 times and got the following distribution of first characters: {code} IDs starting with 0 : 166 IDs starting with 1 : 174 IDs starting with 2 : 135 IDs starting with 3 : 172 IDs starting with 4 : 155 IDs starting with 5 : 154 IDs starting with 6 : 152 IDs starting with 7 : 172 IDs starting with 8 : 170 IDs starting with 9 : 166 IDs starting with A : 147 IDs starting with B : 161 IDs starting with C : 172 IDs starting with D : 158 IDs starting with E : 164 IDs starting with F : 164 IDs starting with G : 146 IDs starting with H : 156 IDs starting with I : 166 IDs starting with J : 172 IDs starting with K : 177 IDs starting with L : 143 IDs starting with M : 171 IDs starting with N : 144 IDs starting with O : 157 IDs starting with P : 162 IDs starting with Q : 144 IDs starting with R : 157 IDs starting with S : 161 IDs starting with T : 158 IDs starting with U : 174 IDs starting with V : 166 IDs starting with W : 166 IDs starting with X : 159 IDs starting with Y : 165 IDs starting with Z : 161 IDs starting with _ : 159 IDs starting with a : 145 IDs starting with b : 169 IDs starting with c : 166 IDs starting with d : 171 IDs starting with e : 162 IDs starting with f : 154 IDs starting with g : 132 IDs starting with h : 152 IDs starting with i : 136 IDs starting with j : 166 IDs starting with k : 159 IDs starting with l : 156 IDs starting with m : 154 IDs starting with n : 155 IDs starting with o : 154 IDs starting with p : 158 IDs starting with q : 141 IDs starting with r : 165 IDs starting with s : 154 IDs starting with t : 162 IDs starting with u : 146 IDs starting with v : 161 IDs starting with w : 164 IDs starting with x : 154 IDs starting with y : 164 IDs starting with z : 154 {code} No IDs were generated with a first character of {-}, as expected. was (Author: cmccabe): I ran `kafka-storage.sh random-uuid` 10,000 times and got the following distribution of first characters: {code} IDs starting with 0 : 166 IDs starting with 1 : 174 IDs starting with 2 : 135 IDs starting with 3 : 172 IDs starting with 4 : 155 IDs starting with 5 : 154 IDs starting with 6 : 152 IDs starting with 7 : 172 IDs starting with 8 : 170 IDs starting with 9 : 166 IDs starting with A : 147 IDs starting with B : 161 IDs starting with C : 172 IDs starting with D : 158 IDs starting with E : 164 IDs starting with F : 164 IDs starting with G : 146 IDs starting with H : 156 IDs starting with I : 166 IDs starting with J : 172 IDs starting with K : 177 IDs starting with L : 143 IDs starting with M : 171 IDs starting with N : 144 IDs starting with O : 157 IDs starting with P : 162 IDs starting with Q : 144 IDs starting with R : 157 IDs starting with S : 161 IDs starting with T : 158 IDs starting with U : 174 IDs starting with V : 166 IDs starting with W : 166 IDs starting with X : 159 IDs starting with Y : 165 IDs starting with Z : 161 IDs starting with _ : 159 IDs starting with a : 145 IDs starting with b : 169 IDs starting with c : 166 IDs starting with d : 171 IDs starting with e : 162 IDs starting with f : 154 IDs starting with g : 132 IDs starting with h : 152 IDs starting with i : 136 IDs starting with j : 166 IDs starting with k : 159 IDs starting with l : 156 IDs starting with m : 154 IDs starting with n : 155 IDs starting with o : 154 IDs starting with p : 158 IDs starting with q : 141 IDs starting with r : 165 IDs starting with s : 154 IDs starting with t : 162 IDs starting with u : 146 IDs starting with v : 161 IDs starting with w : 164 IDs starting with x : 154 IDs starting with y : 164 IDs starting with z : 154 {code} No IDs were generated with a first character of `-`, as expected. > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it >
[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781152#comment-17781152 ] Colin McCabe commented on KAFKA-15754: -- You can run this code yourself if you are curious. Here it is. You will need bash 4 or better. (my version is `GNU bash, version 5.2.15(1)-release (aarch64-apple-darwin21.6.0)`) {code} #!/usr/bin/env bash declare -A IDS_PER_INITIAL_LETTER for ((i = 0; i < 1 ; i++)); do ./kafka-storage.sh random-uuid > /tmp/out 2> /dev/null FIRST_LETTER=$(head -c 1 /tmp/out) IDS_PER_INITIAL_LETTER[$FIRST_LETTER]=$((IDS_PER_INITIAL_LETTER[$FIRST_LETTER]+1)) done for k in "${!IDS_PER_INITIAL_LETTER[@]}"; do echo "IDs starting with $k : ${IDS_PER_INITIAL_LETTER[$k]}" done {code} > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781151#comment-17781151 ] Colin McCabe commented on KAFKA-15754: -- I ran `kafka-storage.sh random-uuid` 10,000 times and got the following distribution of first characters: {code} IDs starting with 0 : 166 IDs starting with 1 : 174 IDs starting with 2 : 135 IDs starting with 3 : 172 IDs starting with 4 : 155 IDs starting with 5 : 154 IDs starting with 6 : 152 IDs starting with 7 : 172 IDs starting with 8 : 170 IDs starting with 9 : 166 IDs starting with A : 147 IDs starting with B : 161 IDs starting with C : 172 IDs starting with D : 158 IDs starting with E : 164 IDs starting with F : 164 IDs starting with G : 146 IDs starting with H : 156 IDs starting with I : 166 IDs starting with J : 172 IDs starting with K : 177 IDs starting with L : 143 IDs starting with M : 171 IDs starting with N : 144 IDs starting with O : 157 IDs starting with P : 162 IDs starting with Q : 144 IDs starting with R : 157 IDs starting with S : 161 IDs starting with T : 158 IDs starting with U : 174 IDs starting with V : 166 IDs starting with W : 166 IDs starting with X : 159 IDs starting with Y : 165 IDs starting with Z : 161 IDs starting with _ : 159 IDs starting with a : 145 IDs starting with b : 169 IDs starting with c : 166 IDs starting with d : 171 IDs starting with e : 162 IDs starting with f : 154 IDs starting with g : 132 IDs starting with h : 152 IDs starting with i : 136 IDs starting with j : 166 IDs starting with k : 159 IDs starting with l : 156 IDs starting with m : 154 IDs starting with n : 155 IDs starting with o : 154 IDs starting with p : 158 IDs starting with q : 141 IDs starting with r : 165 IDs starting with s : 154 IDs starting with t : 162 IDs starting with u : 146 IDs starting with v : 161 IDs starting with w : 164 IDs starting with x : 154 IDs starting with y : 164 IDs starting with z : 154 {code} No IDs were generated with a first character of `-`, as expected. > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14349) Support dynamically resizing the KRaft controller's thread pools
[ https://issues.apache.org/jira/browse/KAFKA-14349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780551#comment-17780551 ] Colin McCabe commented on KAFKA-14349: -- This was fixed as part of KAFKA-14351, but we forgot to close the JIRA. Closing now. > Support dynamically resizing the KRaft controller's thread pools > > > Key: KAFKA-14349 > URL: https://issues.apache.org/jira/browse/KAFKA-14349 > Project: Kafka > Issue Type: Improvement >Reporter: Colin McCabe >Priority: Major > Labels: 4.0-blocker, kip-500 > > Support dynamically resizing the KRaft controller's request handler and > network handler thread pools. See {{DynamicBrokerConfig.scala}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14369) Docs - KRAFT controller authentication example
[ https://issues.apache.org/jira/browse/KAFKA-14369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780549#comment-17780549 ] Colin McCabe commented on KAFKA-14369: -- Thanks [~dbove]. I agree that it would be helpful to have an example config file with non-PLAINTEXT auth. If you have one, please post it here. > Docs - KRAFT controller authentication example > -- > > Key: KAFKA-14369 > URL: https://issues.apache.org/jira/browse/KAFKA-14369 > Project: Kafka > Issue Type: Bug > Components: docs >Affects Versions: 3.3.1 >Reporter: Domenic Bove >Priority: Minor > Labels: kraft > > The [Kafka Listener docs > |https://kafka.apache.org/documentation/#listener_configuration]mention how > to handle kafka protocols (other than PLAINTEXT) on the KRAFT controller > listener, but it is not a working example and I found that I was missing this > property: > {code:java} > sasl.mechanism.controller.protocol {code} > when attempting to do SASL_PLAINTEXT on the controller listener. I see that > property here: > [https://kafka.apache.org/documentation/#brokerconfigs_sasl.mechanism.controller.protocol] > But nowhere else. > I wonder if a complete working example would be better. Here are my working > configs for sasl plain on the controller > {code:java} > process.roles=controller > listeners=CONTROLLER://:9093 > node.id=1 > controller.quorum.voters=1@localhost:9093 > controller.listener.names=CONTROLLER > listener.security.protocol.map=CONTROLLER:SASL_PLAINTEXT > listener.name.controller.plain.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule > required username="admin" password="admin-secret" user_admin="admin-secret" > user_alice="alice-secret"; > listener.name.controller.sasl.enabled.mechanisms=PLAIN > listener.name.controller.sasl.mechanism=PLAIN > sasl.enabled.mechanisms=PLAIN > sasl.mechanism.controller.protocol=PLAIN{code} > Or maybe just a callout of that property in the existing docs -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14369) Docs - KRAFT controller authentication example
[ https://issues.apache.org/jira/browse/KAFKA-14369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-14369: - Labels: kraft (was: 4.0-blocker) > Docs - KRAFT controller authentication example > -- > > Key: KAFKA-14369 > URL: https://issues.apache.org/jira/browse/KAFKA-14369 > Project: Kafka > Issue Type: Bug > Components: docs >Affects Versions: 3.3.1 >Reporter: Domenic Bove >Priority: Minor > Labels: kraft > > The [Kafka Listener docs > |https://kafka.apache.org/documentation/#listener_configuration]mention how > to handle kafka protocols (other than PLAINTEXT) on the KRAFT controller > listener, but it is not a working example and I found that I was missing this > property: > {code:java} > sasl.mechanism.controller.protocol {code} > when attempting to do SASL_PLAINTEXT on the controller listener. I see that > property here: > [https://kafka.apache.org/documentation/#brokerconfigs_sasl.mechanism.controller.protocol] > But nowhere else. > I wonder if a complete working example would be better. Here are my working > configs for sasl plain on the controller > {code:java} > process.roles=controller > listeners=CONTROLLER://:9093 > node.id=1 > controller.quorum.voters=1@localhost:9093 > controller.listener.names=CONTROLLER > listener.security.protocol.map=CONTROLLER:SASL_PLAINTEXT > listener.name.controller.plain.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule > required username="admin" password="admin-secret" user_admin="admin-secret" > user_alice="alice-secret"; > listener.name.controller.sasl.enabled.mechanisms=PLAIN > listener.name.controller.sasl.mechanism=PLAIN > sasl.enabled.mechanisms=PLAIN > sasl.mechanism.controller.protocol=PLAIN{code} > Or maybe just a callout of that property in the existing docs -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14927) Prevent kafka-configs.sh from setting non-alphanumeric config key names
[ https://issues.apache.org/jira/browse/KAFKA-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-14927: - Labels: (was: 4.0-blocker) > Prevent kafka-configs.sh from setting non-alphanumeric config key names > --- > > Key: KAFKA-14927 > URL: https://issues.apache.org/jira/browse/KAFKA-14927 > Project: Kafka > Issue Type: Bug > Components: tools >Affects Versions: 3.3.2 >Reporter: Justin Daines >Assignee: Aman Singh >Priority: Minor > Fix For: 3.7.0 > > > Using {{kafka-configs}} should validate dynamic configurations before > applying. It is possible to send a file with invalid configurations. > For example a file containing the following: > {code:java} > { > "routes": { > "crn:///kafka=*": { > "management": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events-denied" > }, > "describe": { > "allowed": "", > "denied": "confluent-audit-log-events-denied" > }, > "authentication": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events-denied-authn" > }, > "authorize": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events-denied-authz" > }, > "interbroker": { > "allowed": "", > "denied": "" > } > }, > "crn:///kafka=*/group=*": { > "consume": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events" > } > }, > "crn:///kafka=*/topic=*": { > "produce": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events" > }, > "consume": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events" > } > } > }, > "destinations": { > "topics": { > "confluent-audit-log-events": { > "retention_ms": 777600 > }, > "confluent-audit-log-events-denied": { > "retention_ms": 777600 > }, > "confluent-audit-log-events-denied-authn": { > "retention_ms": 777600 > }, > "confluent-audit-log-events-denied-authz": { > "retention_ms": 777600 > }, > "confluent-audit-log-events_audit": { > "retention_ms": 777600 > } > } > }, > "default_topics": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events" > }, > "excluded_principals": [ > "User:schemaregistryUser", > "User:ANONYMOUS", > "User:appSA", > "User:admin", > "User:connectAdmin", > "User:connectorSubmitter", > "User:connectorSA", > "User:schemaregistryUser", > "User:ksqlDBAdmin", > "User:ksqlDBUser", > "User:controlCenterAndKsqlDBServer", > "User:controlcenterAdmin", > "User:restAdmin", > "User:appSA", > "User:clientListen", > "User:superUser" > ] > } {code} > {code:java} > kafka-configs --bootstrap-server $KAFKA_BOOTSTRAP --entity-type brokers > --entity-default --alter --add-config-file audit-log.json {code} > Yields the following dynamic configs: > {code:java} > Default configs for brokers in the cluster are: > "destinations"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"destinations"=null} > "confluent-audit-log-events-denied-authn"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"confluent-audit-log-events-denied-authn"=null} > "routes"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"routes"=null} > "User=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"User=null} > },=null sensitive=true synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:},=null} > "excluded_principals"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"excluded_principals"=null} > "confluent-audit-log-events_audit"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"confluent-audit-log-events_audit"=null} > "authorize"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"authorize"=null} > "default_topics"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"default_topics"=null} > "topics"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"topics"=null} > ]=null sensitive=true synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:]=null} > "interbroker"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"interbroker"=null} > "produce"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"produce"=null} > "denied"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"denied"=null} >
[jira] [Commented] (KAFKA-14927) Prevent kafka-configs.sh from setting non-alphanumeric config key names
[ https://issues.apache.org/jira/browse/KAFKA-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780541#comment-17780541 ] Colin McCabe commented on KAFKA-14927: -- It looks like this change was committed. I will close the JIRA then. > Prevent kafka-configs.sh from setting non-alphanumeric config key names > --- > > Key: KAFKA-14927 > URL: https://issues.apache.org/jira/browse/KAFKA-14927 > Project: Kafka > Issue Type: Bug > Components: tools >Affects Versions: 3.3.2 >Reporter: Justin Daines >Assignee: Aman Singh >Priority: Minor > Labels: 4.0-blocker > Fix For: 3.7.0 > > > Using {{kafka-configs}} should validate dynamic configurations before > applying. It is possible to send a file with invalid configurations. > For example a file containing the following: > {code:java} > { > "routes": { > "crn:///kafka=*": { > "management": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events-denied" > }, > "describe": { > "allowed": "", > "denied": "confluent-audit-log-events-denied" > }, > "authentication": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events-denied-authn" > }, > "authorize": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events-denied-authz" > }, > "interbroker": { > "allowed": "", > "denied": "" > } > }, > "crn:///kafka=*/group=*": { > "consume": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events" > } > }, > "crn:///kafka=*/topic=*": { > "produce": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events" > }, > "consume": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events" > } > } > }, > "destinations": { > "topics": { > "confluent-audit-log-events": { > "retention_ms": 777600 > }, > "confluent-audit-log-events-denied": { > "retention_ms": 777600 > }, > "confluent-audit-log-events-denied-authn": { > "retention_ms": 777600 > }, > "confluent-audit-log-events-denied-authz": { > "retention_ms": 777600 > }, > "confluent-audit-log-events_audit": { > "retention_ms": 777600 > } > } > }, > "default_topics": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events" > }, > "excluded_principals": [ > "User:schemaregistryUser", > "User:ANONYMOUS", > "User:appSA", > "User:admin", > "User:connectAdmin", > "User:connectorSubmitter", > "User:connectorSA", > "User:schemaregistryUser", > "User:ksqlDBAdmin", > "User:ksqlDBUser", > "User:controlCenterAndKsqlDBServer", > "User:controlcenterAdmin", > "User:restAdmin", > "User:appSA", > "User:clientListen", > "User:superUser" > ] > } {code} > {code:java} > kafka-configs --bootstrap-server $KAFKA_BOOTSTRAP --entity-type brokers > --entity-default --alter --add-config-file audit-log.json {code} > Yields the following dynamic configs: > {code:java} > Default configs for brokers in the cluster are: > "destinations"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"destinations"=null} > "confluent-audit-log-events-denied-authn"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"confluent-audit-log-events-denied-authn"=null} > "routes"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"routes"=null} > "User=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"User=null} > },=null sensitive=true synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:},=null} > "excluded_principals"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"excluded_principals"=null} > "confluent-audit-log-events_audit"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"confluent-audit-log-events_audit"=null} > "authorize"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"authorize"=null} > "default_topics"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"default_topics"=null} > "topics"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"topics"=null} > ]=null sensitive=true synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:]=null} > "interbroker"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"interbroker"=null} > "produce"=null sensitive=true >
[jira] (KAFKA-14927) Prevent kafka-configs.sh from setting non-alphanumeric config key names
[ https://issues.apache.org/jira/browse/KAFKA-14927 ] Colin McCabe deleted comment on KAFKA-14927: -- was (Author: cmccabe): It looks like this change was committed. I will close the JIRA then. > Prevent kafka-configs.sh from setting non-alphanumeric config key names > --- > > Key: KAFKA-14927 > URL: https://issues.apache.org/jira/browse/KAFKA-14927 > Project: Kafka > Issue Type: Bug > Components: tools >Affects Versions: 3.3.2 >Reporter: Justin Daines >Assignee: Aman Singh >Priority: Minor > Labels: 4.0-blocker > Fix For: 3.7.0 > > > Using {{kafka-configs}} should validate dynamic configurations before > applying. It is possible to send a file with invalid configurations. > For example a file containing the following: > {code:java} > { > "routes": { > "crn:///kafka=*": { > "management": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events-denied" > }, > "describe": { > "allowed": "", > "denied": "confluent-audit-log-events-denied" > }, > "authentication": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events-denied-authn" > }, > "authorize": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events-denied-authz" > }, > "interbroker": { > "allowed": "", > "denied": "" > } > }, > "crn:///kafka=*/group=*": { > "consume": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events" > } > }, > "crn:///kafka=*/topic=*": { > "produce": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events" > }, > "consume": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events" > } > } > }, > "destinations": { > "topics": { > "confluent-audit-log-events": { > "retention_ms": 777600 > }, > "confluent-audit-log-events-denied": { > "retention_ms": 777600 > }, > "confluent-audit-log-events-denied-authn": { > "retention_ms": 777600 > }, > "confluent-audit-log-events-denied-authz": { > "retention_ms": 777600 > }, > "confluent-audit-log-events_audit": { > "retention_ms": 777600 > } > } > }, > "default_topics": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events" > }, > "excluded_principals": [ > "User:schemaregistryUser", > "User:ANONYMOUS", > "User:appSA", > "User:admin", > "User:connectAdmin", > "User:connectorSubmitter", > "User:connectorSA", > "User:schemaregistryUser", > "User:ksqlDBAdmin", > "User:ksqlDBUser", > "User:controlCenterAndKsqlDBServer", > "User:controlcenterAdmin", > "User:restAdmin", > "User:appSA", > "User:clientListen", > "User:superUser" > ] > } {code} > {code:java} > kafka-configs --bootstrap-server $KAFKA_BOOTSTRAP --entity-type brokers > --entity-default --alter --add-config-file audit-log.json {code} > Yields the following dynamic configs: > {code:java} > Default configs for brokers in the cluster are: > "destinations"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"destinations"=null} > "confluent-audit-log-events-denied-authn"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"confluent-audit-log-events-denied-authn"=null} > "routes"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"routes"=null} > "User=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"User=null} > },=null sensitive=true synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:},=null} > "excluded_principals"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"excluded_principals"=null} > "confluent-audit-log-events_audit"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"confluent-audit-log-events_audit"=null} > "authorize"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"authorize"=null} > "default_topics"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"default_topics"=null} > "topics"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"topics"=null} > ]=null sensitive=true synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:]=null} > "interbroker"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"interbroker"=null} > "produce"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"produce"=null} > "denied"=null sensitive=true >
[jira] [Updated] (KAFKA-14927) Prevent kafka-configs.sh from setting non-alphanumeric config key names
[ https://issues.apache.org/jira/browse/KAFKA-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-14927: - Summary: Prevent kafka-configs.sh from setting non-alphanumeric config key names (was: Dynamic configs not validated when using kafka-configs and --add-config-file) > Prevent kafka-configs.sh from setting non-alphanumeric config key names > --- > > Key: KAFKA-14927 > URL: https://issues.apache.org/jira/browse/KAFKA-14927 > Project: Kafka > Issue Type: Bug > Components: tools >Affects Versions: 3.3.2 >Reporter: Justin Daines >Assignee: Aman Singh >Priority: Minor > Labels: 4.0-blocker > Fix For: 3.7.0 > > > Using {{kafka-configs}} should validate dynamic configurations before > applying. It is possible to send a file with invalid configurations. > For example a file containing the following: > {code:java} > { > "routes": { > "crn:///kafka=*": { > "management": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events-denied" > }, > "describe": { > "allowed": "", > "denied": "confluent-audit-log-events-denied" > }, > "authentication": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events-denied-authn" > }, > "authorize": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events-denied-authz" > }, > "interbroker": { > "allowed": "", > "denied": "" > } > }, > "crn:///kafka=*/group=*": { > "consume": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events" > } > }, > "crn:///kafka=*/topic=*": { > "produce": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events" > }, > "consume": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events" > } > } > }, > "destinations": { > "topics": { > "confluent-audit-log-events": { > "retention_ms": 777600 > }, > "confluent-audit-log-events-denied": { > "retention_ms": 777600 > }, > "confluent-audit-log-events-denied-authn": { > "retention_ms": 777600 > }, > "confluent-audit-log-events-denied-authz": { > "retention_ms": 777600 > }, > "confluent-audit-log-events_audit": { > "retention_ms": 777600 > } > } > }, > "default_topics": { > "allowed": "confluent-audit-log-events_audit", > "denied": "confluent-audit-log-events" > }, > "excluded_principals": [ > "User:schemaregistryUser", > "User:ANONYMOUS", > "User:appSA", > "User:admin", > "User:connectAdmin", > "User:connectorSubmitter", > "User:connectorSA", > "User:schemaregistryUser", > "User:ksqlDBAdmin", > "User:ksqlDBUser", > "User:controlCenterAndKsqlDBServer", > "User:controlcenterAdmin", > "User:restAdmin", > "User:appSA", > "User:clientListen", > "User:superUser" > ] > } {code} > {code:java} > kafka-configs --bootstrap-server $KAFKA_BOOTSTRAP --entity-type brokers > --entity-default --alter --add-config-file audit-log.json {code} > Yields the following dynamic configs: > {code:java} > Default configs for brokers in the cluster are: > "destinations"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"destinations"=null} > "confluent-audit-log-events-denied-authn"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"confluent-audit-log-events-denied-authn"=null} > "routes"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"routes"=null} > "User=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"User=null} > },=null sensitive=true synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:},=null} > "excluded_principals"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"excluded_principals"=null} > "confluent-audit-log-events_audit"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"confluent-audit-log-events_audit"=null} > "authorize"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"authorize"=null} > "default_topics"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"default_topics"=null} > "topics"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"topics"=null} > ]=null sensitive=true synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:]=null} > "interbroker"=null sensitive=true > synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"interbroker"=null} > "produce"=null
[jira] [Commented] (KAFKA-14941) Document which configuration options are applicable only to processes with broker role or controller role
[ https://issues.apache.org/jira/browse/KAFKA-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780540#comment-17780540 ] Colin McCabe commented on KAFKA-14941: -- I'm not sure that I totally understand the goal here. If the goal is to be able to dynamically change configurations, that does not require leaving the configuration out of the static broker or controller config file. The dynamic configuration always takes precedence. If the goal is to understand what the configuration does, the help text of the configuration should explain that. Can you explain a bit more about the goal? > Document which configuration options are applicable only to processes with > broker role or controller role > - > > Key: KAFKA-14941 > URL: https://issues.apache.org/jira/browse/KAFKA-14941 > Project: Kafka > Issue Type: Improvement >Reporter: Jakub Scholz >Priority: Major > > When running in KRaft mode, some of the configuration options are applicable > only to nodes with the broker process role and some are applicable only to > the nodes with the controller process roles. It would be great if this > information was part of the documentation (e.g. in the [Broker > Configs|https://kafka.apache.org/documentation/#brokerconfigs] table on the > website), but if it was also part of the config classes so that it can be > used in situations when the configuration is dynamically configured to for > example filter the options applicable to different nodes. This would allow > having configuration files with only the actually used configuration options > and for example, help to reduce unnecessary restarts when rolling out new > configurations etc. > For some options, it seems clear and the Kafka node would refuse to start if > they are set - for example the configurations of the non-controler-listeners > in controller-only nodes. For others, it seems a bit less clear (Does > {{compression.type}} option apply to controller-only nodes? Or the > configurations for the offset topic? etc.). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14941) Document which configuration options are applicable only to processes with broker role or controller role
[ https://issues.apache.org/jira/browse/KAFKA-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-14941: - Labels: (was: 4.0-blocker) > Document which configuration options are applicable only to processes with > broker role or controller role > - > > Key: KAFKA-14941 > URL: https://issues.apache.org/jira/browse/KAFKA-14941 > Project: Kafka > Issue Type: Improvement >Reporter: Jakub Scholz >Priority: Major > > When running in KRaft mode, some of the configuration options are applicable > only to nodes with the broker process role and some are applicable only to > the nodes with the controller process roles. It would be great if this > information was part of the documentation (e.g. in the [Broker > Configs|https://kafka.apache.org/documentation/#brokerconfigs] table on the > website), but if it was also part of the config classes so that it can be > used in situations when the configuration is dynamically configured to for > example filter the options applicable to different nodes. This would allow > having configuration files with only the actually used configuration options > and for example, help to reduce unnecessary restarts when rolling out new > configurations etc. > For some options, it seems clear and the Kafka node would refuse to start if > they are set - for example the configurations of the non-controler-listeners > in controller-only nodes. For others, it seems a bit less clear (Does > {{compression.type}} option apply to controller-only nodes? Or the > configurations for the offset topic? etc.). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15489) split brain in KRaft cluster
[ https://issues.apache.org/jira/browse/KAFKA-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-15489: - Labels: (was: 4.0-blocker) > split brain in KRaft cluster > - > > Key: KAFKA-15489 > URL: https://issues.apache.org/jira/browse/KAFKA-15489 > Project: Kafka > Issue Type: Bug > Components: kraft >Affects Versions: 3.5.1 >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > I found in the current KRaft implementation, when network partition happened > between the current controller leader and the other controller nodes, the > "split brain" issue will happen. It causes 2 leaders will exist in the > controller cluster, and 2 inconsistent sets of metadata will return to the > clients. > > *Root cause* > In > [KIP-595|https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum#KIP595:ARaftProtocolfortheMetadataQuorum-Vote], > we said A voter will begin a new election under three conditions: > 1. If it fails to receive a FetchResponse from the current leader before > expiration of quorum.fetch.timeout.ms > 2. If it receives a EndQuorumEpoch request from the current leader > 3. If it fails to receive a majority of votes before expiration of > quorum.election.timeout.ms after declaring itself a candidate. > And that's exactly what the current KRaft's implementation. > > However, when the leader is isolated from the network partition, there's no > way for it to resign from the leadership and start a new election. So the > leader will always be the leader even though all other nodes are down. And > this makes the split brain issue possible. > When reading further in the KIP-595, I found we indeed considered this > situation and have solution for that. in [this > section|https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum#KIP595:ARaftProtocolfortheMetadataQuorum-LeaderProgressTimeout], > it said: > {quote}In the pull-based model, however, say a new leader has been elected > with a new epoch and everyone has learned about it except the old leader > (e.g. that leader was not in the voters anymore and hence not receiving the > BeginQuorumEpoch as well), then that old leader would not be notified by > anyone about the new leader / epoch and become a pure "zombie leader", as > there is no regular heartbeats being pushed from leader to the follower. This > could lead to stale information being served to the observers and clients > inside the cluster. > {quote} > {quote}To resolve this issue, we will piggy-back on the > "quorum.fetch.timeout.ms" config, such that if the leader did not receive > Fetch requests from a majority of the quorum for that amount of time, it > would begin a new election and start sending VoteRequest to voter nodes in > the cluster to understand the latest quorum. If it couldn't connect to any > known voter, the old leader shall keep starting new elections and bump the > epoch. > {quote} > > But we missed this implementation in current KRaft. > > *The flow is like this:* > 1. 3 controller nodes, A(leader), B(follower), C(follower) > 2. network partition happened between [A] and [B, C]. > 3. B and C starts new election since fetch timeout expired before receiving > fetch response from leader A. > 4. B (or C) is elected as a leader in new epoch, while A is still the leader > in old epoch. > 5. broker D creates a topic "new", and updates to leader B. > 6. broker E describe topic "new", but got nothing because it is connecting to > the old leader A. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15513) KRaft cluster fails with SCRAM authentication enabled for control-plane
[ https://issues.apache.org/jira/browse/KAFKA-15513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe updated KAFKA-15513: - Labels: (was: 4.0-blocker) > KRaft cluster fails with SCRAM authentication enabled for control-plane > --- > > Key: KAFKA-15513 > URL: https://issues.apache.org/jira/browse/KAFKA-15513 > Project: Kafka > Issue Type: Bug > Components: kraft >Affects Versions: 3.6.0, 3.5.1 >Reporter: migruiz4 >Priority: Major > > We have observed a scenario where a KRaft cluster fails to bootstrap when > using SCRAM authentication for controller-to-controller communications. > The steps to reproduce are simple: > * Deploy (at least) 2 Kafka servers using latest version 3.5.1. > * Configure a KRaft cluster, where the controller listener uses > SASL_PLAINTEXT + SCRAM-SHA-256 or SCRAM-SHA-512. In my case, I'm using the > recommended in-line jaas config > '{{{}listener.name..scram-sha-512.sasl.jaas.config{}}}' > * Run 'kafka-storage.sh' in both nodes using option '--add-scram' to create > the SCRAM user. > When initialized, Controllers will fail to connect to each other with an > authentication error: > > {code:java} > [2023-08-01 11:12:45,295] ERROR [kafka-1-raft-outbound-request-thread]: > Failed to send the following request due to authentication error: > ClientRequest(expectResponse=true, > callback=kafka.raft.KafkaNetworkChannel$$Lambda$687/0x7f27d443fc60@2aba6075, > destination=0, correlationId=129, clientId=raft-client-1, > createdTimeMs=1690888364960, > requestBuilder=VoteRequestData(clusterId='abcdefghijklmnopqrstug', > topics=[TopicData(topicName='__cluster_metadata', > partitions=[PartitionData(partitionIndex=0, candidateEpoch=4, candidateId=1, > lastOffsetEpoch=0, lastOffset=0)])])) (kafka.raft.RaftSendThread) {code} > Some additional details about the scenario that we tested out: > * Controller listener does work when configured with SASL+PLAIN > * The issue only affects the Controller listener, SCRAM users created using > the same method work for data-plane listeners and inter-broker listeners. > > Below you can find the exact configuration and command used to deploy: > * server.properties > {code:java} > listeners=INTERNAL://:9092,CLIENT://:9091,CONTROLLER://:9093 > advertised.listeners=INTERNAL://kafka-0:9092,CLIENT://:9091 > listener.security.protocol.map=INTERNAL:PLAINTEXT,CLIENT:PLAINTEXT,CONTROLLER:SASL_PLAINTEXT > num.network.threads=3 > num.io.threads=8 > socket.send.buffer.bytes=102400 > socket.receive.buffer.bytes=102400 > socket.request.max.bytes=104857600 > log.dirs=/bitnami/kafka/data > num.partitions=1 > num.recovery.threads.per.data.dir=1 > offsets.topic.replication.factor=1 > transaction.state.log.replication.factor=1 > transaction.state.log.min.isr=1 > log.retention.hours=168 > log.retention.check.interval.ms=30 > controller.listener.names=CONTROLLER > controller.quorum.voters=0@kafka-0:9093,1@kafka-1:9093 > inter.broker.listener.name=INTERNAL > node.id=0 > process.roles=controller,broker > sasl.enabled.mechanisms=PLAIN,SCRAM-SHA-256,SCRAM-SHA-512 > sasl.mechanism.controller.protocol=SCRAM-SHA-512 > listener.name.controller.sasl.enabled.mechanisms=SCRAM-SHA-512 > listener.name.controller.scram-sha-512.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule > required username="controller_user" password="controller_password";{code} > * kafka-storage.sh command > {code:java} > kafka-storage.sh format --config /path/to/server.properties > --ignore-formatted --cluster-id abcdefghijklmnopqrstuv --add-scram > SCRAM-SHA-512=[name=controller_user,password=controller_password] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15513) KRaft cluster fails with SCRAM authentication enabled for control-plane
[ https://issues.apache.org/jira/browse/KAFKA-15513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780539#comment-17780539 ] Colin McCabe commented on KAFKA-15513: -- To be more concrete, you need to use the {{--add-scram}} argument to the {{kafka-storage.sh format}} command. > KRaft cluster fails with SCRAM authentication enabled for control-plane > --- > > Key: KAFKA-15513 > URL: https://issues.apache.org/jira/browse/KAFKA-15513 > Project: Kafka > Issue Type: Bug > Components: kraft >Affects Versions: 3.6.0, 3.5.1 >Reporter: migruiz4 >Priority: Major > Labels: 4.0-blocker > > We have observed a scenario where a KRaft cluster fails to bootstrap when > using SCRAM authentication for controller-to-controller communications. > The steps to reproduce are simple: > * Deploy (at least) 2 Kafka servers using latest version 3.5.1. > * Configure a KRaft cluster, where the controller listener uses > SASL_PLAINTEXT + SCRAM-SHA-256 or SCRAM-SHA-512. In my case, I'm using the > recommended in-line jaas config > '{{{}listener.name..scram-sha-512.sasl.jaas.config{}}}' > * Run 'kafka-storage.sh' in both nodes using option '--add-scram' to create > the SCRAM user. > When initialized, Controllers will fail to connect to each other with an > authentication error: > > {code:java} > [2023-08-01 11:12:45,295] ERROR [kafka-1-raft-outbound-request-thread]: > Failed to send the following request due to authentication error: > ClientRequest(expectResponse=true, > callback=kafka.raft.KafkaNetworkChannel$$Lambda$687/0x7f27d443fc60@2aba6075, > destination=0, correlationId=129, clientId=raft-client-1, > createdTimeMs=1690888364960, > requestBuilder=VoteRequestData(clusterId='abcdefghijklmnopqrstug', > topics=[TopicData(topicName='__cluster_metadata', > partitions=[PartitionData(partitionIndex=0, candidateEpoch=4, candidateId=1, > lastOffsetEpoch=0, lastOffset=0)])])) (kafka.raft.RaftSendThread) {code} > Some additional details about the scenario that we tested out: > * Controller listener does work when configured with SASL+PLAIN > * The issue only affects the Controller listener, SCRAM users created using > the same method work for data-plane listeners and inter-broker listeners. > > Below you can find the exact configuration and command used to deploy: > * server.properties > {code:java} > listeners=INTERNAL://:9092,CLIENT://:9091,CONTROLLER://:9093 > advertised.listeners=INTERNAL://kafka-0:9092,CLIENT://:9091 > listener.security.protocol.map=INTERNAL:PLAINTEXT,CLIENT:PLAINTEXT,CONTROLLER:SASL_PLAINTEXT > num.network.threads=3 > num.io.threads=8 > socket.send.buffer.bytes=102400 > socket.receive.buffer.bytes=102400 > socket.request.max.bytes=104857600 > log.dirs=/bitnami/kafka/data > num.partitions=1 > num.recovery.threads.per.data.dir=1 > offsets.topic.replication.factor=1 > transaction.state.log.replication.factor=1 > transaction.state.log.min.isr=1 > log.retention.hours=168 > log.retention.check.interval.ms=30 > controller.listener.names=CONTROLLER > controller.quorum.voters=0@kafka-0:9093,1@kafka-1:9093 > inter.broker.listener.name=INTERNAL > node.id=0 > process.roles=controller,broker > sasl.enabled.mechanisms=PLAIN,SCRAM-SHA-256,SCRAM-SHA-512 > sasl.mechanism.controller.protocol=SCRAM-SHA-512 > listener.name.controller.sasl.enabled.mechanisms=SCRAM-SHA-512 > listener.name.controller.scram-sha-512.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule > required username="controller_user" password="controller_password";{code} > * kafka-storage.sh command > {code:java} > kafka-storage.sh format --config /path/to/server.properties > --ignore-formatted --cluster-id abcdefghijklmnopqrstuv --add-scram > SCRAM-SHA-512=[name=controller_user,password=controller_password] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] (KAFKA-15513) KRaft cluster fails with SCRAM authentication enabled for control-plane
[ https://issues.apache.org/jira/browse/KAFKA-15513 ] Colin McCabe deleted comment on KAFKA-15513: -- was (Author: cmccabe): Currently, you need to add the controller principal to `super.users` rather than relying on SCRAM to configure it. This is no different than how in ZK mode, you must have working ZK auth before you can configure Kafka. In the future, we will probably support configuring SCRAM prior to controller startup via the `kafka-format.sh` command. The mechanism is all there (in the form of the bootstrap file) but we haven't finished implementing it yet... > KRaft cluster fails with SCRAM authentication enabled for control-plane > --- > > Key: KAFKA-15513 > URL: https://issues.apache.org/jira/browse/KAFKA-15513 > Project: Kafka > Issue Type: Bug > Components: kraft >Affects Versions: 3.6.0, 3.5.1 >Reporter: migruiz4 >Priority: Major > Labels: 4.0-blocker > > We have observed a scenario where a KRaft cluster fails to bootstrap when > using SCRAM authentication for controller-to-controller communications. > The steps to reproduce are simple: > * Deploy (at least) 2 Kafka servers using latest version 3.5.1. > * Configure a KRaft cluster, where the controller listener uses > SASL_PLAINTEXT + SCRAM-SHA-256 or SCRAM-SHA-512. In my case, I'm using the > recommended in-line jaas config > '{{{}listener.name..scram-sha-512.sasl.jaas.config{}}}' > * Run 'kafka-storage.sh' in both nodes using option '--add-scram' to create > the SCRAM user. > When initialized, Controllers will fail to connect to each other with an > authentication error: > > {code:java} > [2023-08-01 11:12:45,295] ERROR [kafka-1-raft-outbound-request-thread]: > Failed to send the following request due to authentication error: > ClientRequest(expectResponse=true, > callback=kafka.raft.KafkaNetworkChannel$$Lambda$687/0x7f27d443fc60@2aba6075, > destination=0, correlationId=129, clientId=raft-client-1, > createdTimeMs=1690888364960, > requestBuilder=VoteRequestData(clusterId='abcdefghijklmnopqrstug', > topics=[TopicData(topicName='__cluster_metadata', > partitions=[PartitionData(partitionIndex=0, candidateEpoch=4, candidateId=1, > lastOffsetEpoch=0, lastOffset=0)])])) (kafka.raft.RaftSendThread) {code} > Some additional details about the scenario that we tested out: > * Controller listener does work when configured with SASL+PLAIN > * The issue only affects the Controller listener, SCRAM users created using > the same method work for data-plane listeners and inter-broker listeners. > > Below you can find the exact configuration and command used to deploy: > * server.properties > {code:java} > listeners=INTERNAL://:9092,CLIENT://:9091,CONTROLLER://:9093 > advertised.listeners=INTERNAL://kafka-0:9092,CLIENT://:9091 > listener.security.protocol.map=INTERNAL:PLAINTEXT,CLIENT:PLAINTEXT,CONTROLLER:SASL_PLAINTEXT > num.network.threads=3 > num.io.threads=8 > socket.send.buffer.bytes=102400 > socket.receive.buffer.bytes=102400 > socket.request.max.bytes=104857600 > log.dirs=/bitnami/kafka/data > num.partitions=1 > num.recovery.threads.per.data.dir=1 > offsets.topic.replication.factor=1 > transaction.state.log.replication.factor=1 > transaction.state.log.min.isr=1 > log.retention.hours=168 > log.retention.check.interval.ms=30 > controller.listener.names=CONTROLLER > controller.quorum.voters=0@kafka-0:9093,1@kafka-1:9093 > inter.broker.listener.name=INTERNAL > node.id=0 > process.roles=controller,broker > sasl.enabled.mechanisms=PLAIN,SCRAM-SHA-256,SCRAM-SHA-512 > sasl.mechanism.controller.protocol=SCRAM-SHA-512 > listener.name.controller.sasl.enabled.mechanisms=SCRAM-SHA-512 > listener.name.controller.scram-sha-512.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule > required username="controller_user" password="controller_password";{code} > * kafka-storage.sh command > {code:java} > kafka-storage.sh format --config /path/to/server.properties > --ignore-formatted --cluster-id abcdefghijklmnopqrstuv --add-scram > SCRAM-SHA-512=[name=controller_user,password=controller_password] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)