[jira] [Assigned] (KAFKA-16516) Fix the controller node provider for broker to control channel

2024-05-23 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe reassigned KAFKA-16516:


Assignee: Colin McCabe  (was: José Armando García Sancio)

> Fix the controller node provider for broker to control channel
> --
>
> Key: KAFKA-16516
> URL: https://issues.apache.org/jira/browse/KAFKA-16516
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Reporter: José Armando García Sancio
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.8.0
>
>
> The broker to controller channel gets the set of voters directly from the 
> static configuration. This needs to change so that the leader nodes comes 
> from the kraft client/manager.
> The code is in KafkaServer where it construct the RaftControllerNodeProvider.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16515) Fix the ZK Metadata cache use of voter static configuration

2024-05-20 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe reassigned KAFKA-16515:


Assignee: Colin McCabe  (was: José Armando García Sancio)

> Fix the ZK Metadata cache use of voter static configuration
> ---
>
> Key: KAFKA-16515
> URL: https://issues.apache.org/jira/browse/KAFKA-16515
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Reporter: José Armando García Sancio
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.8.0
>
>
> Looks like because of ZK migration to KRaft the ZK Metadata cache was changed 
> to read the voter static configuration. This needs to change to use the voter 
> nodes reported by  the raft manager or the kraft client.
> The injection code is in KafkaServer where it constructs 
> MetadataCache.zkMetadata.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16469) Metadata Schema Checker

2024-05-19 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe reassigned KAFKA-16469:


Assignee: Colin McCabe

> Metadata Schema Checker
> ---
>
> Key: KAFKA-16469
> URL: https://issues.apache.org/jira/browse/KAFKA-16469
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16757) Fix broker re-registration issues around MV 3.7-IV2

2024-05-13 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16757:


 Summary: Fix broker re-registration issues around MV 3.7-IV2
 Key: KAFKA-16757
 URL: https://issues.apache.org/jira/browse/KAFKA-16757
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe


When upgrading from a MetadataVersion older than 3.7-IV2, we need to resend the 
broker registration, so that the controller can record the storage directories. 
The current code for doing this has several problems, however. One is that it 
tends to trigger even in cases where we don't actually need it. Another is that 
when re-registering the broker, the broker is marked as fenced.

This PR moves the handling of the re-registration case out of 
BrokerMetadataPublisher and into BrokerRegistrationTracker. The re-registration 
code there will only trigger in the case where the broker sees an existing 
registration for itself with no directories set. This is much more targetted 
than the original code.

Additionally, in ClusterControlManager, when re-registering the same broker, we 
now preserve its fencing and shutdown state, rather than clearing those. (There 
isn't any good reason re-registering the same broker should clear these 
things... this was purely an oversight.) Note that we can tell the broker is 
"the same" because it has the same IncarnationId.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16649) Remove lock from DynamicBrokerConfig.removeReconfigurable

2024-04-30 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16649:
-
Summary: Remove lock from DynamicBrokerConfig.removeReconfigurable  (was: 
Fix potential deadlock in DynamicBrokerConfig)

> Remove lock from DynamicBrokerConfig.removeReconfigurable
> -
>
> Key: KAFKA-16649
> URL: https://issues.apache.org/jira/browse/KAFKA-16649
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16649) Remove lock from DynamicBrokerConfig.removeReconfigurable

2024-04-30 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16649:
-
Description: Do not acquire the DynamicBrokerConfig lock in 
DynamicBrokerConfig.removeReconfigurable. It's not necessary, because the list 
that these functions are modifying is a thread-safe CopyOnWriteArrayList.

> Remove lock from DynamicBrokerConfig.removeReconfigurable
> -
>
> Key: KAFKA-16649
> URL: https://issues.apache.org/jira/browse/KAFKA-16649
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Priority: Major
>
> Do not acquire the DynamicBrokerConfig lock in 
> DynamicBrokerConfig.removeReconfigurable. It's not necessary, because the 
> list that these functions are modifying is a thread-safe CopyOnWriteArrayList.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16649) Fix potential deadlock in DynamicBrokerConfig

2024-04-30 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16649:


 Summary: Fix potential deadlock in DynamicBrokerConfig
 Key: KAFKA-16649
 URL: https://issues.apache.org/jira/browse/KAFKA-16649
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16624) Don't generate useless PartitionChangeRecord on older MV

2024-04-25 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16624:


 Summary: Don't generate useless PartitionChangeRecord on older MV
 Key: KAFKA-16624
 URL: https://issues.apache.org/jira/browse/KAFKA-16624
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe


Fix a case where we could generate useless PartitionChangeRecords on metadata 
versions older than 3.6-IV0. This could happen in the case where we had an ISR 
with only one broker in it, and we were trying to go down to a fully empty ISR. 
In this case, PartitionChangeBuilder would block the record to going down to a 
fully empty ISR (since that is not valid in these pre-KIP-966 metadata 
versions), but it would still emit the record, even though it had no effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16003) The znode /config/topics is not updated during KRaft migration in "dual-write" mode

2024-04-17 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16003:
-
Fix Version/s: 3.7.1

> The znode /config/topics is not updated during KRaft migration in 
> "dual-write" mode
> ---
>
> Key: KAFKA-16003
> URL: https://issues.apache.org/jira/browse/KAFKA-16003
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 3.6.1
>Reporter: Paolo Patierno
>Assignee: Mickael Maison
>Priority: Major
> Fix For: 3.8.0, 3.7.1
>
>
> I tried the following scenario ...
> I have a ZooKeeper-based cluster and create a my-topic-1 topic (without 
> specifying any specific configuration for it). The correct znodes are created 
> under /config/topics and /brokers/topics.
> I start a migration to KRaft but not moving forward from "dual write" mode. 
> While in this mode, I create a new my-topic-2 topic (still without any 
> specific config). I see that a new znode is created under /brokers/topics but 
> NOT under /config/topics. It seems that the KRaft controller is not updating 
> this information in ZooKeeper during the dual-write. The controller log shows 
> that the write to ZooKeeper was done, but not everything I would say:
> {code:java}
> 2023-12-13 10:23:26,229 TRACE [KRaftMigrationDriver id=3] Create Topic 
> my-topic-2, ID Macbp8BvQUKpzmq2vG_8dA. Transitioned migration state from 
> ZkMigrationLeadershipState{kraftControllerId=3, kraftControllerEpoch=7, 
> kraftMetadataOffset=445, kraftMetadataEpoch=7, 
> lastUpdatedTimeMs=1702462785587, migrationZkVersion=236, controllerZkEpoch=3, 
> controllerZkVersion=3} to ZkMigrationLeadershipState{kraftControllerId=3, 
> kraftControllerEpoch=7, kraftMetadataOffset=445, kraftMetadataEpoch=7, 
> lastUpdatedTimeMs=1702462785587, migrationZkVersion=237, controllerZkEpoch=3, 
> controllerZkVersion=3} 
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver) 
> [controller-3-migration-driver-event-handler]
> 2023-12-13 10:23:26,229 DEBUG [KRaftMigrationDriver id=3] Made the following 
> ZK writes when handling KRaft delta: {CreateTopic=1} 
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver) 
> [controller-3-migration-driver-event-handler] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16509) CurrentControllerId metric is unreliable in ZK mode

2024-04-10 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe reassigned KAFKA-16509:


Assignee: Colin McCabe

> CurrentControllerId metric is unreliable in ZK mode
> ---
>
> Key: KAFKA-16509
> URL: https://issues.apache.org/jira/browse/KAFKA-16509
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
>
> The CurrentControllerId metric added by KIP-1001 is unreliable in ZK mode. 
> Sometimes when there is no active ZK-based controller, it still shows the 
> previous controller ID. Instead, it should show -1 in that situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16509) CurrentControllerId metric is unreliable in ZK mode

2024-04-10 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16509:


 Summary: CurrentControllerId metric is unreliable in ZK mode
 Key: KAFKA-16509
 URL: https://issues.apache.org/jira/browse/KAFKA-16509
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe


The CurrentControllerId metric added by KIP-1001 is unreliable in ZK mode. 
Sometimes when there is no active ZK-based controller, it still shows the 
previous controller ID. Instead, it should show -1 in that situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16475) Create unit test for TopicImageNode

2024-04-04 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16475:


 Summary: Create unit test for TopicImageNode
 Key: KAFKA-16475
 URL: https://issues.apache.org/jira/browse/KAFKA-16475
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16469) Metadata Schema Checker

2024-04-03 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16469:


 Summary: Metadata Schema Checker
 Key: KAFKA-16469
 URL: https://issues.apache.org/jira/browse/KAFKA-16469
 Project: Kafka
  Issue Type: New Feature
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16411) Correctly migrate default client quota entities in KRaft migration

2024-03-27 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16411.
--
Resolution: Fixed

> Correctly migrate default client quota entities in KRaft migration
> --
>
> Key: KAFKA-16411
> URL: https://issues.apache.org/jira/browse/KAFKA-16411
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.6.2, 3.8.0, 3.7.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16428) Fix bug where config change notification znode may not get created during migration

2024-03-27 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16428.
--
Resolution: Fixed

> Fix bug where config change notification znode may not get created during 
> migration
> ---
>
> Key: KAFKA-16428
> URL: https://issues.apache.org/jira/browse/KAFKA-16428
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0, 3.6.1
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.6.2, 3.8.0, 3.7.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16435) Add test for KAFKA-16428

2024-03-27 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16435:


 Summary: Add test for KAFKA-16428
 Key: KAFKA-16435
 URL: https://issues.apache.org/jira/browse/KAFKA-16435
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe


Add a test for KAFKA-16428: Fix bug where config change notification znode may 
not get created during migration #15608



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16428) Fix bug where config change notification znode may not get created during migration

2024-03-26 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16428:


 Summary: Fix bug where config change notification znode may not 
get created during migration
 Key: KAFKA-16428
 URL: https://issues.apache.org/jira/browse/KAFKA-16428
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16411) Correctly migrate default client quota entities in KRaft migration

2024-03-22 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16411:
-
Fix Version/s: 3.6.2
Affects Version/s: 3.4.0
 Priority: Blocker  (was: Major)

> Correctly migrate default client quota entities in KRaft migration
> --
>
> Key: KAFKA-16411
> URL: https://issues.apache.org/jira/browse/KAFKA-16411
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.6.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16411) Correctly migrate default client quota entities in KRaft migration

2024-03-22 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16411:
-
Summary: Correctly migrate default client quota entities in KRaft migration 
 (was: Correctly migrate default entities in KRaft migration)

> Correctly migrate default client quota entities in KRaft migration
> --
>
> Key: KAFKA-16411
> URL: https://issues.apache.org/jira/browse/KAFKA-16411
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16411) Correctly migrate default entities in KRaft migration

2024-03-22 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16411:


 Summary: Correctly migrate default entities in KRaft migration
 Key: KAFKA-16411
 URL: https://issues.apache.org/jira/browse/KAFKA-16411
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16222) KRaft Migration: desanitize entity name when migrate client quotas

2024-03-22 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16222:
-
Summary: KRaft Migration: desanitize entity name when migrate client quotas 
 (was: KRaft Migration: Incorrect default user-principal quota after migration)

> KRaft Migration: desanitize entity name when migrate client quotas
> --
>
> Key: KAFKA-16222
> URL: https://issues.apache.org/jira/browse/KAFKA-16222
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft, migration
>Affects Versions: 3.7.0, 3.6.1
>Reporter: Dominik
>Assignee: PoAn Yang
>Priority: Blocker
> Fix For: 3.6.2, 3.8.0, 3.7.1
>
>
> We observed that our default user quota seems not to be migrated correctly.
> Before Migration:
> bin/kafka-configs.sh --describe --all --entity-type users
> Quota configs for the *default user-principal* are 
> consumer_byte_rate=100.0, producer_byte_rate=100.0
> Quota configs for user-principal {color:#172b4d}'myuser{*}@{*}prod'{color} 
> are consumer_byte_rate=1.5E8, producer_byte_rate=1.5E8
> After Migration:
> bin/kafka-configs.sh --describe --all --entity-type users
> Quota configs for *user-principal ''* are consumer_byte_rate=100.0, 
> producer_byte_rate=100.0
> Quota configs for user-principal {color:#172b4d}'myuser{*}%40{*}prod'{color} 
> are consumer_byte_rate=1.5E8, producer_byte_rate=1.5E8
>  
> Additional finding: Our names contains a "@" which also lead to incorrect 
> after migration state.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16321) Default directory ids to MIGRATING, not UNASSIGNED

2024-03-01 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16321:


 Summary: Default directory ids to MIGRATING, not UNASSIGNED
 Key: KAFKA-16321
 URL: https://issues.apache.org/jira/browse/KAFKA-16321
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe


Directory ids should be defaulted to MIGRATING, not UNASSIGNED.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16216) Reduce batch size for initial metadata load during ZK migration

2024-02-01 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16216.
--
Fix Version/s: 3.7.0
 Reviewer: Colin McCabe
 Assignee: David Arthur  (was: Colin McCabe)
   Resolution: Fixed

> Reduce batch size for initial metadata load during ZK migration
> ---
>
> Key: KAFKA-16216
> URL: https://issues.apache.org/jira/browse/KAFKA-16216
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: David Arthur
>Priority: Major
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16216) Reduce batch size for initial metadata load during ZK migration

2024-02-01 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe reassigned KAFKA-16216:


Assignee: Colin McCabe  (was: David Arthur)

> Reduce batch size for initial metadata load during ZK migration
> ---
>
> Key: KAFKA-16216
> URL: https://issues.apache.org/jira/browse/KAFKA-16216
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16216) Reduce batch size for initial metadata load during ZK migration

2024-02-01 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16216:


 Summary: Reduce batch size for initial metadata load during ZK 
migration
 Key: KAFKA-16216
 URL: https://issues.apache.org/jira/browse/KAFKA-16216
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: David Arthur






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16180) Full metadata request sometimes fails during zk migration

2024-01-19 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16180:
-
Description: 
Example:

{code}
java.util.NoSuchElementException: 
dev_kafka.stream.detection.bucket3-KTABLE-SUPPRESS-STATE-STORE-08-changelog
at 
scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:508)
at 
scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:507)
at scala.collection.mutable.AnyRefMap.apply(AnyRefMap.scala:207)
at 
kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2(ZkMetadataCache.scala:112)
at 
kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2$adapted(ZkMetadataCache.scala:105)
at scala.collection.immutable.HashSet.foreach(HashSet.scala:958)
at 
kafka.server.metadata.ZkMetadataCache$.maybeInjectDeletedPartitionsFromFullMetadataRequest(ZkMetadataCache.scala:105)
at 
kafka.server.metadata.ZkMetadataCache.$anonfun$updateMetadata$1(ZkMetadataCache.scala:506)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:183)
at 
kafka.server.metadata.ZkMetadataCache.updateMetadata(ZkMetadataCache.scala:496)
at 
kafka.server.ReplicaManager.maybeUpdateMetadataCache(ReplicaManager.scala:2482)
at 
kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:733)
at kafka.server.KafkaApis.handle(KafkaApis.scala:349)
at 
kafka.server.KafkaRequestHandler.$anonfun$poll$8(KafkaRequestHandler.scala:210)
at 
kafka.server.KafkaRequestHandler.$anonfun$poll$8$adapted(KafkaRequestHandler.scala:210)
at 
io.confluent.kafka.availability.ThreadCountersManager.wrapEngine(ThreadCountersManager.java:146)
at kafka.server.KafkaRequestHandler.poll(KafkaRequestHandler.scala:210)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:151)
at java.base/java.lang.Thread.run(Thread.java:1583)
at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:66)
{code}

  was:
Example:

{code}
java.util.NoSuchElementException: 
lkc-gnjo9m_dev_kafka.stream.detection.bucket3-KTABLE-SUPPRESS-STATE-STORE-08-changelog
at 
scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:508)
at 
scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:507)
at scala.collection.mutable.AnyRefMap.apply(AnyRefMap.scala:207)
at 
kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2(ZkMetadataCache.scala:112)
at 
kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2$adapted(ZkMetadataCache.scala:105)
at scala.collection.immutable.HashSet.foreach(HashSet.scala:958)
at 
kafka.server.metadata.ZkMetadataCache$.maybeInjectDeletedPartitionsFromFullMetadataRequest(ZkMetadataCache.scala:105)
at 
kafka.server.metadata.ZkMetadataCache.$anonfun$updateMetadata$1(ZkMetadataCache.scala:506)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:183)
at 
kafka.server.metadata.ZkMetadataCache.updateMetadata(ZkMetadataCache.scala:496)
at 
kafka.server.ReplicaManager.maybeUpdateMetadataCache(ReplicaManager.scala:2482)
at 
kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:733)
at kafka.server.KafkaApis.handle(KafkaApis.scala:349)
at 
kafka.server.KafkaRequestHandler.$anonfun$poll$8(KafkaRequestHandler.scala:210)
at 
kafka.server.KafkaRequestHandler.$anonfun$poll$8$adapted(KafkaRequestHandler.scala:210)
at 
io.confluent.kafka.availability.ThreadCountersManager.wrapEngine(ThreadCountersManager.java:146)
at kafka.server.KafkaRequestHandler.poll(KafkaRequestHandler.scala:210)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:151)
at java.base/java.lang.Thread.run(Thread.java:1583)
at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:66)
{code}


> Full metadata request sometimes fails during zk migration
> -
>
> Key: KAFKA-16180
> URL: https://issues.apache.org/jira/browse/KAFKA-16180
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Colin McCabe
>Priority: Blocker
>
> Example:
> {code}
> java.util.NoSuchElementException: 
> dev_kafka.stream.detection.bucket3-KTABLE-SUPPRESS-STATE-STORE-08-changelog
> at 
> scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:508)
> at 
> scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:507)
> at 

[jira] [Updated] (KAFKA-16180) Full metadata request sometimes fails during zk migration

2024-01-19 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16180:
-
Description: 
Example:

{code}
java.util.NoSuchElementException: 
lkc-gnjo9m_dev_kafka.stream.detection.bucket3-KTABLE-SUPPRESS-STATE-STORE-08-changelog
at 
scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:508)
at 
scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:507)
at scala.collection.mutable.AnyRefMap.apply(AnyRefMap.scala:207)
at 
kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2(ZkMetadataCache.scala:112)
at 
kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2$adapted(ZkMetadataCache.scala:105)
at scala.collection.immutable.HashSet.foreach(HashSet.scala:958)
at 
kafka.server.metadata.ZkMetadataCache$.maybeInjectDeletedPartitionsFromFullMetadataRequest(ZkMetadataCache.scala:105)
at 
kafka.server.metadata.ZkMetadataCache.$anonfun$updateMetadata$1(ZkMetadataCache.scala:506)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:183)
at 
kafka.server.metadata.ZkMetadataCache.updateMetadata(ZkMetadataCache.scala:496)
at 
kafka.server.ReplicaManager.maybeUpdateMetadataCache(ReplicaManager.scala:2482)
at 
kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:733)
at kafka.server.KafkaApis.handle(KafkaApis.scala:349)
at 
kafka.server.KafkaRequestHandler.$anonfun$poll$8(KafkaRequestHandler.scala:210)
at 
kafka.server.KafkaRequestHandler.$anonfun$poll$8$adapted(KafkaRequestHandler.scala:210)
at 
io.confluent.kafka.availability.ThreadCountersManager.wrapEngine(ThreadCountersManager.java:146)
at kafka.server.KafkaRequestHandler.poll(KafkaRequestHandler.scala:210)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:151)
at java.base/java.lang.Thread.run(Thread.java:1583)
at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:66)
{code}

  was:
Example:

{{java.util.NoSuchElementException: 
lkc-gnjo9m_dev_kafka.stream.detection.bucket3-KTABLE-SUPPRESS-STATE-STORE-08-changelog
at 
scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:508)
at 
scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:507)
at scala.collection.mutable.AnyRefMap.apply(AnyRefMap.scala:207)
at 
kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2(ZkMetadataCache.scala:112)
at 
kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2$adapted(ZkMetadataCache.scala:105)
at scala.collection.immutable.HashSet.foreach(HashSet.scala:958)
at 
kafka.server.metadata.ZkMetadataCache$.maybeInjectDeletedPartitionsFromFullMetadataRequest(ZkMetadataCache.scala:105)
at 
kafka.server.metadata.ZkMetadataCache.$anonfun$updateMetadata$1(ZkMetadataCache.scala:506)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:183)
at 
kafka.server.metadata.ZkMetadataCache.updateMetadata(ZkMetadataCache.scala:496)
at 
kafka.server.ReplicaManager.maybeUpdateMetadataCache(ReplicaManager.scala:2482)
at 
kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:733)
at kafka.server.KafkaApis.handle(KafkaApis.scala:349)
at 
kafka.server.KafkaRequestHandler.$anonfun$poll$8(KafkaRequestHandler.scala:210)
at 
kafka.server.KafkaRequestHandler.$anonfun$poll$8$adapted(KafkaRequestHandler.scala:210)
at 
io.confluent.kafka.availability.ThreadCountersManager.wrapEngine(ThreadCountersManager.java:146)
at kafka.server.KafkaRequestHandler.poll(KafkaRequestHandler.scala:210)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:151)
at java.base/java.lang.Thread.run(Thread.java:1583)
at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:66)}}


> Full metadata request sometimes fails during zk migration
> -
>
> Key: KAFKA-16180
> URL: https://issues.apache.org/jira/browse/KAFKA-16180
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Colin McCabe
>Priority: Blocker
>
> Example:
> {code}
> java.util.NoSuchElementException: 
> lkc-gnjo9m_dev_kafka.stream.detection.bucket3-KTABLE-SUPPRESS-STATE-STORE-08-changelog
> at 
> scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:508)
> at 
> scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:507)
> at 

[jira] [Created] (KAFKA-16180) Full metadata request sometimes fails during zk migration

2024-01-19 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16180:


 Summary: Full metadata request sometimes fails during zk migration
 Key: KAFKA-16180
 URL: https://issues.apache.org/jira/browse/KAFKA-16180
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Colin McCabe


Example:

{{java.util.NoSuchElementException: 
lkc-gnjo9m_dev_kafka.stream.detection.bucket3-KTABLE-SUPPRESS-STATE-STORE-08-changelog
at 
scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:508)
at 
scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:507)
at scala.collection.mutable.AnyRefMap.apply(AnyRefMap.scala:207)
at 
kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2(ZkMetadataCache.scala:112)
at 
kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2$adapted(ZkMetadataCache.scala:105)
at scala.collection.immutable.HashSet.foreach(HashSet.scala:958)
at 
kafka.server.metadata.ZkMetadataCache$.maybeInjectDeletedPartitionsFromFullMetadataRequest(ZkMetadataCache.scala:105)
at 
kafka.server.metadata.ZkMetadataCache.$anonfun$updateMetadata$1(ZkMetadataCache.scala:506)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:183)
at 
kafka.server.metadata.ZkMetadataCache.updateMetadata(ZkMetadataCache.scala:496)
at 
kafka.server.ReplicaManager.maybeUpdateMetadataCache(ReplicaManager.scala:2482)
at 
kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:733)
at kafka.server.KafkaApis.handle(KafkaApis.scala:349)
at 
kafka.server.KafkaRequestHandler.$anonfun$poll$8(KafkaRequestHandler.scala:210)
at 
kafka.server.KafkaRequestHandler.$anonfun$poll$8$adapted(KafkaRequestHandler.scala:210)
at 
io.confluent.kafka.availability.ThreadCountersManager.wrapEngine(ThreadCountersManager.java:146)
at kafka.server.KafkaRequestHandler.poll(KafkaRequestHandler.scala:210)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:151)
at java.base/java.lang.Thread.run(Thread.java:1583)
at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:66)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16078) InterBrokerProtocolVersion defaults to non-production MetadataVersion

2024-01-17 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16078:
-
Description: The InterBrokerProtocolVersion currently defaults to a 
non-production MetadataVersion. We should be more consistent about getting the 
latest MetadataVersion.  (was: Be more consistent about getting the latest 
MetadataVersion)

> InterBrokerProtocolVersion defaults to non-production MetadataVersion
> -
>
> Key: KAFKA-16078
> URL: https://issues.apache.org/jira/browse/KAFKA-16078
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Major
>
> The InterBrokerProtocolVersion currently defaults to a non-production 
> MetadataVersion. We should be more consistent about getting the latest 
> MetadataVersion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16078) Be more consistent about getting the latest MetadataVersion

2024-01-17 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16078.
--
Fix Version/s: 3.7.0
 Reviewer: Colin Patrick McCabe
   Resolution: Fixed

> Be more consistent about getting the latest MetadataVersion
> ---
>
> Key: KAFKA-16078
> URL: https://issues.apache.org/jira/browse/KAFKA-16078
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Major
> Fix For: 3.7.0
>
>
> The InterBrokerProtocolVersion currently defaults to a non-production 
> MetadataVersion. We should be more consistent about getting the latest 
> MetadataVersion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16078) Be more consistent about getting the latest MetadataVersion

2024-01-17 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16078:
-
Summary: Be more consistent about getting the latest MetadataVersion  (was: 
InterBrokerProtocolVersion defaults to non-production MetadataVersion)

> Be more consistent about getting the latest MetadataVersion
> ---
>
> Key: KAFKA-16078
> URL: https://issues.apache.org/jira/browse/KAFKA-16078
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Major
>
> The InterBrokerProtocolVersion currently defaults to a non-production 
> MetadataVersion. We should be more consistent about getting the latest 
> MetadataVersion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16078) InterBrokerProtocolVersion defaults to non-production MetadataVersion

2024-01-17 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16078:
-
Description: Be more consistent about getting the latest MetadataVersion

> InterBrokerProtocolVersion defaults to non-production MetadataVersion
> -
>
> Key: KAFKA-16078
> URL: https://issues.apache.org/jira/browse/KAFKA-16078
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Major
>
> Be more consistent about getting the latest MetadataVersion



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16131) Repeated UnsupportedVersionException logged when running Kafka 3.7.0-RC2 KRaft cluster with metadata version 3.6

2024-01-17 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16131.
--
Resolution: Fixed

> Repeated UnsupportedVersionException logged when running Kafka 3.7.0-RC2 
> KRaft cluster with metadata version 3.6
> 
>
> Key: KAFKA-16131
> URL: https://issues.apache.org/jira/browse/KAFKA-16131
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Jakub Scholz
>Assignee: Proven Provenzano
>Priority: Blocker
> Fix For: 3.7.0
>
>
> When running Kafka 3.7.0-RC2 as a KRaft cluster with metadata version set to 
> 3.6-IV2 metadata version, it throws repeated errors like this in the 
> controller logs:
> {quote}2024-01-13 16:58:01,197 INFO [QuorumController id=0] 
> assignReplicasToDirs: event failed with UnsupportedVersionException in 15 
> microseconds. (org.apache.kafka.controller.QuorumController) 
> [quorum-controller-0-event-handler]
> 2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0] Unexpected error 
> handling request RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS, apiVersion=0, 
> clientId=1000, correlationId=14, headerVersion=2) – 
> AssignReplicasToDirsRequestData(brokerId=1000, brokerEpoch=5, 
> directories=[DirectoryData(id=w_uxN7pwQ6eXSMrOKceYIQ, 
> topics=[TopicData(topicId=bvAKLSwmR7iJoKv2yZgygQ, 
> partitions=[PartitionData(partitionIndex=2), 
> PartitionData(partitionIndex=1)]), TopicData(topicId=uNe7f5VrQgO0zST6yH1jDQ, 
> partitions=[PartitionData(partitionIndex=0)])])]) with context 
> RequestContext(header=RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS, 
> apiVersion=0, clientId=1000, correlationId=14, headerVersion=2), 
> connectionId='172.16.14.219:9090-172.16.14.217:53590-7', 
> clientAddress=/[172.16.14.217|http://172.16.14.217/], 
> principal=User:CN=my-cluster-kafka,O=io.strimzi, 
> listenerName=ListenerName(CONTROLPLANE-9090), securityProtocol=SSL, 
> clientInformation=ClientInformation(softwareName=apache-kafka-java, 
> softwareVersion=3.7.0), fromPrivilegedListener=false, 
> principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@71004ad2])
>  (kafka.server.ControllerApis) [quorum-controller-0-event-handler]
> java.util.concurrent.CompletionException: 
> org.apache.kafka.common.errors.UnsupportedVersionException: Directory 
> assignment is not supported yet.
> at 
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
>  at 
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
>  at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636)
>  at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
>  at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
>  at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:880)
>  at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871)
>  at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:148)
>  at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:137)
>  at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
>  at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
>  at java.base/java.lang.Thread.run(Thread.java:840)
> Caused by: org.apache.kafka.common.errors.UnsupportedVersionException: 
> Directory assignment is not supported yet.
> {quote}
>  
> With the metadata version set to 3.6-IV2, it makes sense that the request is 
> not supported. But the request should in such case not be sent at all.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16101) KRaft migration rollback documentation is incorrect

2024-01-16 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16101:
-
Summary: KRaft migration rollback documentation is incorrect  (was: KRaft 
migration documentation is incorrect)

> KRaft migration rollback documentation is incorrect
> ---
>
> Key: KAFKA-16101
> URL: https://issues.apache.org/jira/browse/KAFKA-16101
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.6.1
>Reporter: Paolo Patierno
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.7.0
>
>
> Hello,
> I was trying the KRaft migration rollback procedure locally and I came across 
> a potential bug or anyway a situation where the cluster is not 
> usable/available for a certain amount of time.
> In order to test the procedure, I start with a one broker (broker ID = 0) and 
> one zookeeper node cluster. Then I start the migration with a one KRaft 
> controller node (broker ID = 1). The migration runs fine and it reaches the 
> point of "dual write" state.
> From this point, I try to run the rollback procedure as described in the 
> documentation.
> As first step, this involves ...
>  * stopping the broker
>  * removing the __cluster_metadata folder
>  * removing ZooKeeper migration flag and controller(s) related configuration 
> from the broker
>  * restarting the broker
> With the above steps done, the broker starts in ZooKeeper mode (no migration, 
> no KRaft controllers knowledge) and it keeps logging the following messages 
> in DEBUG:
> {code:java}
> [2024-01-08 11:51:20,608] DEBUG 
> [zk-broker-0-to-controller-forwarding-channel-manager]: Controller isn't 
> cached, looking for local metadata changes 
> (kafka.server.BrokerToControllerRequestThread)
> [2024-01-08 11:51:20,608] DEBUG 
> [zk-broker-0-to-controller-forwarding-channel-manager]: No controller 
> provided, retrying after backoff 
> (kafka.server.BrokerToControllerRequestThread)
> [2024-01-08 11:51:20,629] DEBUG 
> [zk-broker-0-to-controller-alter-partition-channel-manager]: Controller isn't 
> cached, looking for local metadata changes 
> (kafka.server.BrokerToControllerRequestThread)
> [2024-01-08 11:51:20,629] DEBUG 
> [zk-broker-0-to-controller-alter-partition-channel-manager]: No controller 
> provided, retrying after backoff 
> (kafka.server.BrokerToControllerRequestThread) {code}
> What's happening should be clear.
> The /controller znode in ZooKeeper still reports the KRaft controller (broker 
> ID = 1) as controller. The broker gets it from the znode but doesn't know how 
> to reach it.
> The issue is that until the procedure isn't fully completed with the next 
> steps (shutting down KRaft controller, deleting /controller znode), the 
> cluster is unusable. Any admin or client operation against the broker doesn't 
> work, just hangs, the broker doesn't reply.
> Imagining this scenario to a more complex one with 10-20-50 brokers and 
> partitions' replicas spread across them, when the brokers are rolled one by 
> one (in ZK mode) reporting the above error, the topics will become not 
> available one after the other, until all brokers are in such a state and 
> nothing can work. This is because from a KRaft controller perspective (still 
> running), the brokers are not available anymore and the partitions' replicas 
> are out of sync.
> Of course, as soon as you complete the rollback procedure, after deleting the 
> /controller znode, the brokers are able to elect a new controller among them 
> and everything recovers to work.
> My first question ... isn't the cluster supposed to work during rollback and 
> being always available during the rollback when the procedure is not 
> completed yet? Or having the cluster not available is an assumption during 
> the rollback, until it's fully completed?
> This "unavailability" time window could be reduced by deleting the 
> /controller znode before shutting down the KRaft controllers to allow the 
> brokers electing a new controller among them, but in this case, could there 
> be a race condition where KRaft controllers still running could steal 
> leadership again?
> Or is there anything missing in the documentation maybe which is driving to 
> this problem?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16132) Upgrading from 3.6 to 3.7 in KRaft will have seconds of partitions unavailable

2024-01-15 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807044#comment-17807044
 ] 

Colin McCabe commented on KAFKA-16132:
--

I think we need to look at this more, but it may be a blocker.

> Upgrading from 3.6 to 3.7 in KRaft will have seconds of partitions unavailable
> --
>
> Key: KAFKA-16132
> URL: https://issues.apache.org/jira/browse/KAFKA-16132
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Luke Chen
>Priority: Major
>
> When upgrading from 3.6 to 3.7, we noticed that after upgrade the metadata 
> version, all the partitions will be reset at one time, which causes a short 
> period of time unavailable. This doesn't happen before. 
> {code:java}
> [2024-01-15 20:45:19,757] INFO [BrokerMetadataPublisher id=2] Updating 
> metadata.version to 19 at offset OffsetAndEpoch(offset=229, epoch=2). 
> (kafka.server.metadata.BrokerMetadataPublisher)
> [2024-01-15 20:45:29,915] INFO [ReplicaFetcherManager on broker 2] Removed 
> fetcher for partitions Set(t1-29, t1-25, t1-21, t1-17, t1-46, t1-13, t1-42, 
> t1-9, t1-38, t1-5, t1-34, t1-1, t1-30, t1-26, t1-22, t1-18, t1-47, t1-14, 
> t1-43, t1-10, t1-39, t1-6, t1-35, t1-2, t1-31, t1-27, t1-23, t1-19, t1-48, 
> t1-15, t1-44, t1-11, t1-40, t1-7, t1-36, t1-3, t1-32, t1-28, t1-24, t1-20, 
> t1-49, t1-16, t1-45, t1-12, t1-41, t1-8, t1-37, t1-4, t1-33, t1-0) 
> (kafka.server.ReplicaFetcherManager)
> {code}
> Complete log:
> https://gist.github.com/showuon/665aa3ce6afd59097a2662f8260ecc10
> Steps:
> 1. start up a 3.6 kafka cluster in KRaft with 1 broker
> 2. create a topic
> 3. upgrade the binary to 3.7
> 4. use kafka-features.sh to upgrade to 3.7 metadata version
> 5. check the log (and metrics if interested)
> Analysis:
> In 3.7, we have JBOD support in KRaft, so the partitionRegistration added a 
> new directory field. And it causes diff found while comparing delta. We might 
> be able to identify this adding directory change doesn't need to reset the 
> leader/follower state, and just update the metadata, to avoid causing 
> unavailability. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16121) Partition reassignments in ZK migration dual write mode stalled until leader epoch incremented

2024-01-15 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16121.
--
Fix Version/s: 3.7.0
 Reviewer: Colin McCabe
 Assignee: David Mao
   Resolution: Duplicate

> Partition reassignments in ZK migration dual write mode stalled until leader 
> epoch incremented
> --
>
> Key: KAFKA-16121
> URL: https://issues.apache.org/jira/browse/KAFKA-16121
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Mao
>Assignee: David Mao
>Priority: Major
> Fix For: 3.7.0
>
>
> I noticed this in an integration test in 
> https://github.com/apache/kafka/pull/15184
> In ZK mode, partition leaders rely on the LeaderAndIsr request to be notified 
> of new replicas as part of a reassignment. In ZK mode, we ignore any 
> LeaderAndIsr request where the partition leader epoch is less than or equal 
> to the current partition leader epoch.
> In KRaft mode, we do not bump the leader epoch when starting a new 
> reassignment, see: `triggerLeaderEpochBumpIfNeeded`. This means that the 
> leader will ignore the LISR request initiating the reassignment until a 
> leader epoch bump is triggered through another means, for instance preferred 
> leader election.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16120) Fix partition reassignment during ZK migration

2024-01-15 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16120:
-
Description: When a reassignment is completed in ZK migration hybrid mode, 
the `StopReplica` sent by the kraft quorum migration propagator is sent with 
`delete = false` for deleted replicas when processing the topic delta. This 
results in stray replicas.  (was: When a reassignment is completed in ZK 
migration dual-write mode, the `StopReplica` sent by the kraft quorum migration 
propagator is sent with `delete = false` for deleted replicas when processing 
the topic delta. This results in stray replicas.)

> Fix partition reassignment during ZK migration
> --
>
> Key: KAFKA-16120
> URL: https://issues.apache.org/jira/browse/KAFKA-16120
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Mao
>Assignee: David Mao
>Priority: Major
> Fix For: 3.7.0
>
>
> When a reassignment is completed in ZK migration hybrid mode, the 
> `StopReplica` sent by the kraft quorum migration propagator is sent with 
> `delete = false` for deleted replicas when processing the topic delta. This 
> results in stray replicas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16120) Fix partition reassignment during ZK migration

2024-01-15 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16120:
-
Summary: Fix partition reassignment during ZK migration  (was: Partition 
reassignments in ZK migration dual write leaves stray partitions)

> Fix partition reassignment during ZK migration
> --
>
> Key: KAFKA-16120
> URL: https://issues.apache.org/jira/browse/KAFKA-16120
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Mao
>Assignee: David Mao
>Priority: Major
> Fix For: 3.7.0
>
>
> When a reassignment is completed in ZK migration dual-write mode, the 
> `StopReplica` sent by the kraft quorum migration propagator is sent with 
> `delete = false` for deleted replicas when processing the topic delta. This 
> results in stray replicas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16101) KRaft migration documentation is incorrect

2024-01-15 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe reassigned KAFKA-16101:


Fix Version/s: 3.7.0
 Assignee: Colin McCabe
 Priority: Blocker  (was: Major)

> KRaft migration documentation is incorrect
> --
>
> Key: KAFKA-16101
> URL: https://issues.apache.org/jira/browse/KAFKA-16101
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.6.1
>Reporter: Paolo Patierno
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.7.0
>
>
> Hello,
> I was trying the KRaft migration rollback procedure locally and I came across 
> a potential bug or anyway a situation where the cluster is not 
> usable/available for a certain amount of time.
> In order to test the procedure, I start with a one broker (broker ID = 0) and 
> one zookeeper node cluster. Then I start the migration with a one KRaft 
> controller node (broker ID = 1). The migration runs fine and it reaches the 
> point of "dual write" state.
> From this point, I try to run the rollback procedure as described in the 
> documentation.
> As first step, this involves ...
>  * stopping the broker
>  * removing the __cluster_metadata folder
>  * removing ZooKeeper migration flag and controller(s) related configuration 
> from the broker
>  * restarting the broker
> With the above steps done, the broker starts in ZooKeeper mode (no migration, 
> no KRaft controllers knowledge) and it keeps logging the following messages 
> in DEBUG:
> {code:java}
> [2024-01-08 11:51:20,608] DEBUG 
> [zk-broker-0-to-controller-forwarding-channel-manager]: Controller isn't 
> cached, looking for local metadata changes 
> (kafka.server.BrokerToControllerRequestThread)
> [2024-01-08 11:51:20,608] DEBUG 
> [zk-broker-0-to-controller-forwarding-channel-manager]: No controller 
> provided, retrying after backoff 
> (kafka.server.BrokerToControllerRequestThread)
> [2024-01-08 11:51:20,629] DEBUG 
> [zk-broker-0-to-controller-alter-partition-channel-manager]: Controller isn't 
> cached, looking for local metadata changes 
> (kafka.server.BrokerToControllerRequestThread)
> [2024-01-08 11:51:20,629] DEBUG 
> [zk-broker-0-to-controller-alter-partition-channel-manager]: No controller 
> provided, retrying after backoff 
> (kafka.server.BrokerToControllerRequestThread) {code}
> What's happening should be clear.
> The /controller znode in ZooKeeper still reports the KRaft controller (broker 
> ID = 1) as controller. The broker gets it from the znode but doesn't know how 
> to reach it.
> The issue is that until the procedure isn't fully completed with the next 
> steps (shutting down KRaft controller, deleting /controller znode), the 
> cluster is unusable. Any admin or client operation against the broker doesn't 
> work, just hangs, the broker doesn't reply.
> Imagining this scenario to a more complex one with 10-20-50 brokers and 
> partitions' replicas spread across them, when the brokers are rolled one by 
> one (in ZK mode) reporting the above error, the topics will become not 
> available one after the other, until all brokers are in such a state and 
> nothing can work. This is because from a KRaft controller perspective (still 
> running), the brokers are not available anymore and the partitions' replicas 
> are out of sync.
> Of course, as soon as you complete the rollback procedure, after deleting the 
> /controller znode, the brokers are able to elect a new controller among them 
> and everything recovers to work.
> My first question ... isn't the cluster supposed to work during rollback and 
> being always available during the rollback when the procedure is not 
> completed yet? Or having the cluster not available is an assumption during 
> the rollback, until it's fully completed?
> This "unavailability" time window could be reduced by deleting the 
> /controller znode before shutting down the KRaft controllers to allow the 
> brokers electing a new controller among them, but in this case, could there 
> be a race condition where KRaft controllers still running could steal 
> leadership again?
> Or is there anything missing in the documentation maybe which is driving to 
> this problem?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16101) KRaft migration documentation is incorrect

2024-01-15 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806678#comment-17806678
 ] 

Colin McCabe commented on KAFKA-16101:
--

Hi Luke,

Thanks for testing rollback. I think this is a case where the documentation is 
wrong. The intention was to for the steps to basically be:

1. roll all the brokers into zk mode, but with migration enabled
2. take down the kraft quorum
3. rmr /controller, allowing a hybrid broker to take over.
4. roll all the brokers into zk mode without migration enabled (if desired)

With these steps, there isn't really unavailability since a ZK controller can 
be elected quickly after the kraft quorum is gone.

I will update the docs.

> KRaft migration documentation is incorrect
> --
>
> Key: KAFKA-16101
> URL: https://issues.apache.org/jira/browse/KAFKA-16101
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.6.1
>Reporter: Paolo Patierno
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.7.0
>
>
> Hello,
> I was trying the KRaft migration rollback procedure locally and I came across 
> a potential bug or anyway a situation where the cluster is not 
> usable/available for a certain amount of time.
> In order to test the procedure, I start with a one broker (broker ID = 0) and 
> one zookeeper node cluster. Then I start the migration with a one KRaft 
> controller node (broker ID = 1). The migration runs fine and it reaches the 
> point of "dual write" state.
> From this point, I try to run the rollback procedure as described in the 
> documentation.
> As first step, this involves ...
>  * stopping the broker
>  * removing the __cluster_metadata folder
>  * removing ZooKeeper migration flag and controller(s) related configuration 
> from the broker
>  * restarting the broker
> With the above steps done, the broker starts in ZooKeeper mode (no migration, 
> no KRaft controllers knowledge) and it keeps logging the following messages 
> in DEBUG:
> {code:java}
> [2024-01-08 11:51:20,608] DEBUG 
> [zk-broker-0-to-controller-forwarding-channel-manager]: Controller isn't 
> cached, looking for local metadata changes 
> (kafka.server.BrokerToControllerRequestThread)
> [2024-01-08 11:51:20,608] DEBUG 
> [zk-broker-0-to-controller-forwarding-channel-manager]: No controller 
> provided, retrying after backoff 
> (kafka.server.BrokerToControllerRequestThread)
> [2024-01-08 11:51:20,629] DEBUG 
> [zk-broker-0-to-controller-alter-partition-channel-manager]: Controller isn't 
> cached, looking for local metadata changes 
> (kafka.server.BrokerToControllerRequestThread)
> [2024-01-08 11:51:20,629] DEBUG 
> [zk-broker-0-to-controller-alter-partition-channel-manager]: No controller 
> provided, retrying after backoff 
> (kafka.server.BrokerToControllerRequestThread) {code}
> What's happening should be clear.
> The /controller znode in ZooKeeper still reports the KRaft controller (broker 
> ID = 1) as controller. The broker gets it from the znode but doesn't know how 
> to reach it.
> The issue is that until the procedure isn't fully completed with the next 
> steps (shutting down KRaft controller, deleting /controller znode), the 
> cluster is unusable. Any admin or client operation against the broker doesn't 
> work, just hangs, the broker doesn't reply.
> Imagining this scenario to a more complex one with 10-20-50 brokers and 
> partitions' replicas spread across them, when the brokers are rolled one by 
> one (in ZK mode) reporting the above error, the topics will become not 
> available one after the other, until all brokers are in such a state and 
> nothing can work. This is because from a KRaft controller perspective (still 
> running), the brokers are not available anymore and the partitions' replicas 
> are out of sync.
> Of course, as soon as you complete the rollback procedure, after deleting the 
> /controller znode, the brokers are able to elect a new controller among them 
> and everything recovers to work.
> My first question ... isn't the cluster supposed to work during rollback and 
> being always available during the rollback when the procedure is not 
> completed yet? Or having the cluster not available is an assumption during 
> the rollback, until it's fully completed?
> This "unavailability" time window could be reduced by deleting the 
> /controller znode before shutting down the KRaft controllers to allow the 
> brokers electing a new controller among them, but in this case, could there 
> be a race condition where KRaft controllers still running could steal 
> leadership again?
> Or is there anything missing in the documentation maybe which is driving to 
> this problem?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16101) KRaft migration documentation is incorrect

2024-01-15 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16101:
-
Summary: KRaft migration documentation is incorrect  (was: Kafka cluster 
unavailable during KRaft migration rollback procedure)

> KRaft migration documentation is incorrect
> --
>
> Key: KAFKA-16101
> URL: https://issues.apache.org/jira/browse/KAFKA-16101
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.6.1
>Reporter: Paolo Patierno
>Priority: Major
>
> Hello,
> I was trying the KRaft migration rollback procedure locally and I came across 
> a potential bug or anyway a situation where the cluster is not 
> usable/available for a certain amount of time.
> In order to test the procedure, I start with a one broker (broker ID = 0) and 
> one zookeeper node cluster. Then I start the migration with a one KRaft 
> controller node (broker ID = 1). The migration runs fine and it reaches the 
> point of "dual write" state.
> From this point, I try to run the rollback procedure as described in the 
> documentation.
> As first step, this involves ...
>  * stopping the broker
>  * removing the __cluster_metadata folder
>  * removing ZooKeeper migration flag and controller(s) related configuration 
> from the broker
>  * restarting the broker
> With the above steps done, the broker starts in ZooKeeper mode (no migration, 
> no KRaft controllers knowledge) and it keeps logging the following messages 
> in DEBUG:
> {code:java}
> [2024-01-08 11:51:20,608] DEBUG 
> [zk-broker-0-to-controller-forwarding-channel-manager]: Controller isn't 
> cached, looking for local metadata changes 
> (kafka.server.BrokerToControllerRequestThread)
> [2024-01-08 11:51:20,608] DEBUG 
> [zk-broker-0-to-controller-forwarding-channel-manager]: No controller 
> provided, retrying after backoff 
> (kafka.server.BrokerToControllerRequestThread)
> [2024-01-08 11:51:20,629] DEBUG 
> [zk-broker-0-to-controller-alter-partition-channel-manager]: Controller isn't 
> cached, looking for local metadata changes 
> (kafka.server.BrokerToControllerRequestThread)
> [2024-01-08 11:51:20,629] DEBUG 
> [zk-broker-0-to-controller-alter-partition-channel-manager]: No controller 
> provided, retrying after backoff 
> (kafka.server.BrokerToControllerRequestThread) {code}
> What's happening should be clear.
> The /controller znode in ZooKeeper still reports the KRaft controller (broker 
> ID = 1) as controller. The broker gets it from the znode but doesn't know how 
> to reach it.
> The issue is that until the procedure isn't fully completed with the next 
> steps (shutting down KRaft controller, deleting /controller znode), the 
> cluster is unusable. Any admin or client operation against the broker doesn't 
> work, just hangs, the broker doesn't reply.
> Imagining this scenario to a more complex one with 10-20-50 brokers and 
> partitions' replicas spread across them, when the brokers are rolled one by 
> one (in ZK mode) reporting the above error, the topics will become not 
> available one after the other, until all brokers are in such a state and 
> nothing can work. This is because from a KRaft controller perspective (still 
> running), the brokers are not available anymore and the partitions' replicas 
> are out of sync.
> Of course, as soon as you complete the rollback procedure, after deleting the 
> /controller znode, the brokers are able to elect a new controller among them 
> and everything recovers to work.
> My first question ... isn't the cluster supposed to work during rollback and 
> being always available during the rollback when the procedure is not 
> completed yet? Or having the cluster not available is an assumption during 
> the rollback, until it's fully completed?
> This "unavailability" time window could be reduced by deleting the 
> /controller znode before shutting down the KRaft controllers to allow the 
> brokers electing a new controller among them, but in this case, could there 
> be a race condition where KRaft controllers still running could steal 
> leadership again?
> Or is there anything missing in the documentation maybe which is driving to 
> this problem?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16120) Partition reassignments in ZK migration dual write leaves stray partitions

2024-01-14 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16120.
--
Fix Version/s: 3.7.0
 Reviewer: Colin McCabe
 Assignee: David Mao
   Resolution: Fixed

> Partition reassignments in ZK migration dual write leaves stray partitions
> --
>
> Key: KAFKA-16120
> URL: https://issues.apache.org/jira/browse/KAFKA-16120
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Mao
>Assignee: David Mao
>Priority: Major
> Fix For: 3.7.0
>
>
> When a reassignment is completed in ZK migration dual-write mode, the 
> `StopReplica` sent by the kraft quorum migration propagator is sent with 
> `delete = false` for deleted replicas when processing the topic delta. This 
> results in stray replicas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16120) Partition reassignments in ZK migration dual write leaves stray partitions

2024-01-14 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806565#comment-17806565
 ] 

Colin McCabe commented on KAFKA-16120:
--

KAFKA-14616 is a separate bug, unfortunately. I am working on a fix for that one

> Partition reassignments in ZK migration dual write leaves stray partitions
> --
>
> Key: KAFKA-16120
> URL: https://issues.apache.org/jira/browse/KAFKA-16120
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Mao
>Priority: Major
>
> When a reassignment is completed in ZK migration dual-write mode, the 
> `StopReplica` sent by the kraft quorum migration propagator is sent with 
> `delete = false` for deleted replicas when processing the topic delta. This 
> results in stray replicas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16126) Kcontroller dynamic configurations may fail to apply at startup

2024-01-14 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16126:


 Summary: Kcontroller dynamic configurations may fail to apply at 
startup
 Key: KAFKA-16126
 URL: https://issues.apache.org/jira/browse/KAFKA-16126
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Colin McCabe
Assignee: Colin McCabe


Some kcontroller dynamic configurations may fail to apply at startup. This 
happens because there is a race between registering the reconfigurables to the 
DynamicBrokerConfig class, and receiving the first update from the metadata 
publisher. We can fix this by registering the reconfigurables first. This seems 
to have been introduced by the "MINOR: Install ControllerServer metadata 
publishers sooner" change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16094) BrokerRegistrationRequest.logDirs field must be ignorable

2024-01-09 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16094.
--
Fix Version/s: 3.7.0
   Resolution: Fixed

> BrokerRegistrationRequest.logDirs field must be ignorable
> -
>
> Key: KAFKA-16094
> URL: https://issues.apache.org/jira/browse/KAFKA-16094
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.7.0
>
>
> 3.7 brokers must be able to register with 3.6 and earlier controllers. So 
> this means that the logDirs field must be ignorable (aka, not sent) if the 
> highest BrokerRegistrationRequest version we can negotiate is older than v2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16094) BrokerRegistrationRequest.logDirs field must be ignorable

2024-01-08 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16094:
-
Summary: BrokerRegistrationRequest.logDirs field must be ignorable  (was: 
3.7 brokers must be able to register with 3.6 and earlier controllers)

> BrokerRegistrationRequest.logDirs field must be ignorable
> -
>
> Key: KAFKA-16094
> URL: https://issues.apache.org/jira/browse/KAFKA-16094
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
>
> 3.7 brokers must be able to register with 3.6 and earlier controllers. So 
> this means that the logDirs field must be ignorable (aka, not sent) if the 
> highest BrokerRegistrationRequest version we can negotiate is older than v2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16094) 3.7 brokers must be able to register with 3.6 and earlier controllers

2024-01-08 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16094:


 Summary: 3.7 brokers must be able to register with 3.6 and earlier 
controllers
 Key: KAFKA-16094
 URL: https://issues.apache.org/jira/browse/KAFKA-16094
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Colin McCabe
Assignee: Colin McCabe


3.7 brokers must be able to register with 3.6 and earlier controllers. So this 
means that the logDirs field must be ignorable (aka, not sent) if the highest 
BrokerRegistrationRequest version we can negotiate is older than v2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14127) KIP-858: Handle JBOD broker disk failure in KRaft

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14127.
--
Resolution: Fixed

> KIP-858: Handle JBOD broker disk failure in KRaft
> -
>
> Key: KAFKA-14127
> URL: https://issues.apache.org/jira/browse/KAFKA-14127
> Project: Kafka
>  Issue Type: Improvement
>  Components: jbod, kraft
>Reporter: Igor Soarez
>Assignee: Igor Soarez
>Priority: Major
>  Labels: 4.0-blocker, kip-500, kraft
> Fix For: 3.7.0
>
>
> Supporting configurations with multiple storage directories in KRaft mode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15955) Migrating ZK brokers send dir assignments

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-15955:
-
Parent: KAFKA-16061
Issue Type: Sub-task  (was: Bug)

> Migrating ZK brokers send dir assignments
> -
>
> Key: KAFKA-15955
> URL: https://issues.apache.org/jira/browse/KAFKA-15955
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Igor Soarez
>Assignee: Proven Provenzano
>Priority: Major
>
> Broker in ZooKeeper mode, while in migration mode, should start sending 
> directory assignments to the KRaft Controller using AssignmentsManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15955) Migrating ZK brokers send dir assignments

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-15955:
-
Parent: (was: KAFKA-14127)
Issue Type: Bug  (was: Sub-task)

> Migrating ZK brokers send dir assignments
> -
>
> Key: KAFKA-15955
> URL: https://issues.apache.org/jira/browse/KAFKA-15955
> Project: Kafka
>  Issue Type: Bug
>Reporter: Igor Soarez
>Assignee: Proven Provenzano
>Priority: Major
>
> Broker in ZooKeeper mode, while in migration mode, should start sending 
> directory assignments to the KRaft Controller using AssignmentsManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15650) Data-loss on leader shutdown right after partition creation?

2023-12-28 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801078#comment-17801078
 ] 

Colin McCabe commented on KAFKA-15650:
--

Based on our follow-up discussions, this is not an issue because partitions 
initially are in state UNASSIGNED, and only later get a directory. (Unless 
there is only a single directory -- then the controller assigns.)

> Data-loss on leader shutdown right after partition creation?
> 
>
> Key: KAFKA-15650
> URL: https://issues.apache.org/jira/browse/KAFKA-15650
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Igor Soarez
>Priority: Major
>
> As per KIP-858, when a replica is created, the broker selects a log directory 
> to host the replica and queues the propagation of the directory assignment to 
> the controller. The replica becomes immediately active, it isn't blocked 
> until the controller confirms the metadata change. If the replica is the 
> leader replica it can immediately start accepting writes. 
> Consider the following scenario:
>  # A partition is created in some selected log directory, and some produce 
> traffic is accepted
>  # Before the broker is able to notify the controller of the directory 
> assignment, the broker shuts down
>  # Upon coming back online, the broker has an offline directory, the same 
> directory which was chosen to host the replica
>  # The broker assumes leadership for the replica, but cannot find it in any 
> available directory and has no way of knowing it was already created because 
> the directory assignment is still missing
>  # The replica is created and the previously produced records are lost
> Step 4. may seem unlikely due to ISR membership gating leadership, but even 
> assuming acks=all and replicas>1, if all other replicas are also offline the 
> broker may still gain leadership. Perhaps KIP-966 is relevant here.
> We may need to delay new replica activation until the assignment is 
> propagated successfully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15650) Data-loss on leader shutdown right after partition creation?

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15650.
--
Resolution: Not A Problem

> Data-loss on leader shutdown right after partition creation?
> 
>
> Key: KAFKA-15650
> URL: https://issues.apache.org/jira/browse/KAFKA-15650
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Igor Soarez
>Priority: Major
>
> As per KIP-858, when a replica is created, the broker selects a log directory 
> to host the replica and queues the propagation of the directory assignment to 
> the controller. The replica becomes immediately active, it isn't blocked 
> until the controller confirms the metadata change. If the replica is the 
> leader replica it can immediately start accepting writes. 
> Consider the following scenario:
>  # A partition is created in some selected log directory, and some produce 
> traffic is accepted
>  # Before the broker is able to notify the controller of the directory 
> assignment, the broker shuts down
>  # Upon coming back online, the broker has an offline directory, the same 
> directory which was chosen to host the replica
>  # The broker assumes leadership for the replica, but cannot find it in any 
> available directory and has no way of knowing it was already created because 
> the directory assignment is still missing
>  # The replica is created and the previously produced records are lost
> Step 4. may seem unlikely due to ISR membership gating leadership, but even 
> assuming acks=all and replicas>1, if all other replicas are also offline the 
> broker may still gain leadership. Perhaps KIP-966 is relevant here.
> We may need to delay new replica activation until the assignment is 
> propagated successfully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15649) Handle directory failure timeout

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-15649:
-
Parent: KAFKA-16061
Issue Type: Sub-task  (was: Bug)

> Handle directory failure timeout 
> -
>
> Key: KAFKA-15649
> URL: https://issues.apache.org/jira/browse/KAFKA-15649
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Igor Soarez
>Priority: Minor
>
> If a broker with an offline log directory continues to fail to notify the 
> controller of either:
>  * the fact that the directory is offline; or
>  * of any replica assignment into a failed directory
> then the controller will not check if a leadership change is required, and 
> this may lead to partitions remaining indefinitely offline.
> KIP-858 proposes that the broker should shut down after a configurable 
> timeout to force a leadership change. Alternatively, the broker could also 
> request to be fenced, as long as there's a path for it to later become 
> unfenced.
> While this unavailability is possible in theory, in practice it's not easy to 
> entertain a scenario where a broker continues to appear as healthy before the 
> controller, but fails to send this information. So it's not clear if this is 
> a real problem. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15367) Test KRaft non-JBOD -> JBOD migration

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-15367:
-
Parent: KAFKA-16061
Issue Type: Sub-task  (was: Bug)

> Test KRaft non-JBOD -> JBOD migration
> -
>
> Key: KAFKA-15367
> URL: https://issues.apache.org/jira/browse/KAFKA-15367
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Igor Soarez
>Priority: Major
>
> A cluster running in KRaft without JBOD should be able to transition into 
> JBOD mode without issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15649) Handle directory failure timeout

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-15649:
-
Parent: (was: KAFKA-14127)
Issue Type: Bug  (was: Sub-task)

> Handle directory failure timeout 
> -
>
> Key: KAFKA-15649
> URL: https://issues.apache.org/jira/browse/KAFKA-15649
> Project: Kafka
>  Issue Type: Bug
>Reporter: Igor Soarez
>Priority: Minor
>
> If a broker with an offline log directory continues to fail to notify the 
> controller of either:
>  * the fact that the directory is offline; or
>  * of any replica assignment into a failed directory
> then the controller will not check if a leadership change is required, and 
> this may lead to partitions remaining indefinitely offline.
> KIP-858 proposes that the broker should shut down after a configurable 
> timeout to force a leadership change. Alternatively, the broker could also 
> request to be fenced, as long as there's a path for it to later become 
> unfenced.
> While this unavailability is possible in theory, in practice it's not easy to 
> entertain a scenario where a broker continues to appear as healthy before the 
> controller, but fails to send this information. So it's not clear if this is 
> a real problem. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15368) Test ZK JBOD to KRaft migration

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-15368:
-
Parent: KAFKA-16061
Issue Type: Sub-task  (was: Bug)

> Test ZK JBOD to KRaft migration
> ---
>
> Key: KAFKA-15368
> URL: https://issues.apache.org/jira/browse/KAFKA-15368
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Igor Soarez
>Priority: Major
>
> A ZK cluster running JBOD should be able to migrate to KRaft mode without 
> issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15368) Test ZK JBOD to KRaft migration

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-15368:
-
Parent: (was: KAFKA-14127)
Issue Type: Bug  (was: Sub-task)

> Test ZK JBOD to KRaft migration
> ---
>
> Key: KAFKA-15368
> URL: https://issues.apache.org/jira/browse/KAFKA-15368
> Project: Kafka
>  Issue Type: Bug
>Reporter: Igor Soarez
>Priority: Major
>
> A ZK cluster running JBOD should be able to migrate to KRaft mode without 
> issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15367) Test KRaft non-JBOD -> JBOD migration

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-15367:
-
Parent: (was: KAFKA-14127)
Issue Type: Bug  (was: Sub-task)

> Test KRaft non-JBOD -> JBOD migration
> -
>
> Key: KAFKA-15367
> URL: https://issues.apache.org/jira/browse/KAFKA-15367
> Project: Kafka
>  Issue Type: Bug
>Reporter: Igor Soarez
>Priority: Major
>
> A cluster running in KRaft without JBOD should be able to transition into 
> JBOD mode without issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14127) KIP-858: Handle JBOD broker disk failure in KRaft

2023-12-28 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801077#comment-17801077
 ] 

Colin McCabe commented on KAFKA-14127:
--

JBOD is a feature that is in 3.7, so the fix version needs to be 3.7 here.

I'll move all the remaining work to a follow-up JIRA for clarity. Some of it is 
"nice to have" features, some of it is testing.

> KIP-858: Handle JBOD broker disk failure in KRaft
> -
>
> Key: KAFKA-14127
> URL: https://issues.apache.org/jira/browse/KAFKA-14127
> Project: Kafka
>  Issue Type: Improvement
>  Components: jbod, kraft
>Reporter: Igor Soarez
>Assignee: Igor Soarez
>Priority: Major
>  Labels: 4.0-blocker, kip-500, kraft
> Fix For: 3.7.0
>
>
> Supporting configurations with multiple storage directories in KRaft mode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14127) KIP-858: Handle JBOD broker disk failure in KRaft

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-14127:
-
Fix Version/s: 3.7.0
   (was: 3.8.0)

> KIP-858: Handle JBOD broker disk failure in KRaft
> -
>
> Key: KAFKA-14127
> URL: https://issues.apache.org/jira/browse/KAFKA-14127
> Project: Kafka
>  Issue Type: Improvement
>  Components: jbod, kraft
>Reporter: Igor Soarez
>Assignee: Igor Soarez
>Priority: Major
>  Labels: 4.0-blocker, kip-500, kraft
> Fix For: 3.7.0
>
>
> Supporting configurations with multiple storage directories in KRaft mode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15359) log.dir.failure.timeout.ms configuration

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-15359:
-
Parent: (was: KAFKA-14127)
Issue Type: Improvement  (was: Sub-task)

> log.dir.failure.timeout.ms configuration
> 
>
> Key: KAFKA-15359
> URL: https://issues.apache.org/jira/browse/KAFKA-15359
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Igor Soarez
>Assignee: Igor Soarez
>Priority: Major
>
> If the Broker repeatedly cannot communicate fails to communicate a log 
> directory failure after a configurable amount of time — 
> {{log.dir.failure.timeout.ms}} — and it is the leader for any replicas in the 
> failed log directory the broker will shutdown, as that is the only other way 
> to guarantee that the controller will elect a new leader for those partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15359) Support log.dir.failure.timeout.ms configuration for JBOD

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-15359:
-
Summary: Support log.dir.failure.timeout.ms configuration for JBOD  (was: 
log.dir.failure.timeout.ms configuration)

> Support log.dir.failure.timeout.ms configuration for JBOD
> -
>
> Key: KAFKA-15359
> URL: https://issues.apache.org/jira/browse/KAFKA-15359
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Igor Soarez
>Assignee: Igor Soarez
>Priority: Major
>
> If the Broker repeatedly cannot communicate fails to communicate a log 
> directory failure after a configurable amount of time — 
> {{log.dir.failure.timeout.ms}} — and it is the leader for any replicas in the 
> failed log directory the broker will shutdown, as that is the only other way 
> to guarantee that the controller will elect a new leader for those partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16061) KRaft JBOD follow-ups and improvements

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-16061:
-
Summary: KRaft JBOD follow-ups and improvements  (was: JBOD follow-ups)

> KRaft JBOD follow-ups and improvements
> --
>
> Key: KAFKA-16061
> URL: https://issues.apache.org/jira/browse/KAFKA-16061
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin McCabe
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15359) log.dir.failure.timeout.ms configuration

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-15359:
-
Parent: KAFKA-16061
Issue Type: Sub-task  (was: Improvement)

> log.dir.failure.timeout.ms configuration
> 
>
> Key: KAFKA-15359
> URL: https://issues.apache.org/jira/browse/KAFKA-15359
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Igor Soarez
>Assignee: Igor Soarez
>Priority: Major
>
> If the Broker repeatedly cannot communicate fails to communicate a log 
> directory failure after a configurable amount of time — 
> {{log.dir.failure.timeout.ms}} — and it is the leader for any replicas in the 
> failed log directory the broker will shutdown, as that is the only other way 
> to guarantee that the controller will elect a new leader for those partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16061) JBOD follow-ups

2023-12-28 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16061:


 Summary: JBOD follow-ups
 Key: KAFKA-16061
 URL: https://issues.apache.org/jira/browse/KAFKA-16061
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15979) Add KIP-1001 CurrentControllerId metric

2023-12-06 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15979:


 Summary: Add KIP-1001 CurrentControllerId metric
 Key: KAFKA-15979
 URL: https://issues.apache.org/jira/browse/KAFKA-15979
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15980) Add KIP-1001 CurrentControllerId metric

2023-12-06 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15980:


 Summary: Add KIP-1001 CurrentControllerId metric
 Key: KAFKA-15980
 URL: https://issues.apache.org/jira/browse/KAFKA-15980
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15956) MetadataShell must take the directory lock when reading

2023-12-01 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15956:


 Summary: MetadataShell must take the directory lock when reading
 Key: KAFKA-15956
 URL: https://issues.apache.org/jira/browse/KAFKA-15956
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe


MetadataShell must take the directory lock when reading files, to avoid 
unpleasant surprises from concurrent reads and writes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15311) Fix docs about reverting to ZooKeeper mode during KRaft migration

2023-11-29 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15311.
--
Fix Version/s: 3.7.0
   Resolution: Fixed

> Fix docs about reverting to ZooKeeper mode during KRaft migration
> -
>
> Key: KAFKA-15311
> URL: https://issues.apache.org/jira/browse/KAFKA-15311
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Minor
> Fix For: 3.7.0
>
>
> The cocs incorrectly state that reverting to ZooKeeper mode during KRaft 
> migration is not possible



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15922) Add MetadataVersion for JBOD

2023-11-28 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15922:


 Summary: Add MetadataVersion for JBOD
 Key: KAFKA-15922
 URL: https://issues.apache.org/jira/browse/KAFKA-15922
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15860) ControllerRegistration must be written out to the metadata image

2023-11-27 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15860.
--
Fix Version/s: 3.7.0
   Resolution: Fixed

> ControllerRegistration must be written out to the metadata image
> 
>
> Key: KAFKA-15860
> URL: https://issues.apache.org/jira/browse/KAFKA-15860
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14552) Remove no longer required server protocol versions in Kafka 4.0

2023-11-22 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788891#comment-17788891
 ] 

Colin McCabe commented on KAFKA-14552:
--

I could go either way. I think most of the configuration key removals are 
"implied" by other KIPs (or sometimes stated directly there) but I thought it 
would be good to gather them somewhere.

> Remove no longer required server protocol versions in Kafka 4.0
> ---
>
> Key: KAFKA-14552
> URL: https://issues.apache.org/jira/browse/KAFKA-14552
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Priority: Blocker
> Fix For: 4.0.0
>
>
> Kafka 4.0 will remove support for zk mode and kraft mode became production 
> ready in Kafka 3.3. Furthermore, migration from zk mode to kraft mode will 
> require upgrading to the bridge release first (likely 3.5, but could also be 
> 3.6).
> This provides an opportunity to remove exclusively server side protocols 
> versions that only exist to allow direct upgrades from versions older than 
> 3.n where n is either 0 (KRaft preview), 3 (KRaft production ready) or 5 
> (bridge release). We should decide on the right `n` and make the change as 
> part of 4.0.
> Note that this is complementary to the protocols that will be completely 
> removed as part of zk mode removal. Step one would be to create a list of 
> protocols that will be completely removed due to zk mode removal and the list 
> of exclusively server side protocols remaining after that (one example is 
> ControlledShutdown).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15860) ControllerRegistration must be written out to the metadata image

2023-11-20 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15860:


 Summary: ControllerRegistration must be written out to the 
metadata image
 Key: KAFKA-15860
 URL: https://issues.apache.org/jira/browse/KAFKA-15860
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15532) ZkWriteBehindLag should not be reported by inactive controllers

2023-11-13 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15532.
--
Resolution: Fixed

> ZkWriteBehindLag should not be reported by inactive controllers
> ---
>
> Key: KAFKA-15532
> URL: https://issues.apache.org/jira/browse/KAFKA-15532
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Minor
>
> Since only the active controller is performing the dual-write to ZK during a 
> migration, it should be the only controller to report the ZkWriteBehindLag 
> metric. 
>  
> Currently, if the controller fails over during a migration, the previous 
> active controller will incorrectly report its last value for ZkWriteBehindLag 
> forever. Instead, it should report zero.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-15532) ZkWriteBehindLag should not be reported by inactive controllers

2023-11-13 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe reassigned KAFKA-15532:


Assignee: David Arthur

> ZkWriteBehindLag should not be reported by inactive controllers
> ---
>
> Key: KAFKA-15532
> URL: https://issues.apache.org/jira/browse/KAFKA-15532
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Minor
>
> Since only the active controller is performing the dual-write to ZK during a 
> migration, it should be the only controller to report the ZkWriteBehindLag 
> metric. 
>  
> Currently, if the controller fails over during a migration, the previous 
> active controller will incorrectly report its last value for ZkWriteBehindLag 
> forever. Instead, it should report zero.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15782) Establish concrete project conventions to define public APIs that require a KIP

2023-11-06 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783369#comment-17783369
 ] 

Colin McCabe commented on KAFKA-15782:
--

I think the rules are already quite clear.

The main source of unclarity is that we have a bunch of things which are public 
/ protected, but not actually intended to be used by end-users. This happens 
because of some of the technical limitations of Java. There are just cases 
where something in package A needs to be visible to package B, even though 
end-users are not supposed to be directly using either package.

In these situations, "interface annotations" are supposed to enforce the rules. 
But this is only a partial solution because people can easily ignore the 
annotations. Also, annotations are relatively new in the history of the 
project, so a lot of older classes don't have them at all.

The best solution in the long term is to move as much code as possible out of 
the "clients" module. While people can technically access the broker / 
controller jars and start messing with them, it tends to be much less of a 
problem in practice. People mostly understand that if they pull server code and 
start subclassing it, that's on them. A lot of things in clients should really 
be in server-common.

That being said, KAFKA-15781 doesn't seem like a grey area to me at all. 
ProducerConfig is very obviously a user-visible class, and always has been. The 
theory that we don't need a KIP for changes to public classes if they're just 
"one line changes" doesn't make sense to me. I could very clearly break 
compatibility for everyone just with one line.

> Establish concrete project conventions to define public APIs that require a 
> KIP
> ---
>
> Key: KAFKA-15782
> URL: https://issues.apache.org/jira/browse/KAFKA-15782
> Project: Kafka
>  Issue Type: Improvement
>Reporter: A. Sophie Blee-Goldman
>Priority: Major
>  Labels: needs-kip
>
> There seems to be no concrete definition that establishes project-specific 
> conventions for what is and is not considered a public API change that 
> requires a KIP. This results in frequent drawn-out debates that revisit the 
> same topic and slow things down, and often ends up forcing trivial changes 
> through the KIP process. For a recent example, KIP-998 was required for a 
> one-line change just to add the "protected" access modifier to an otherwise 
> package-private class. See [this comment 
> thread|https://github.com/apache/kafka/pull/14681#discussion_r1378591228] for 
> the full debate on this subject.
> It would be beneficial and in the long run save us all time to just sit down 
> and hash out the project conventions, such as whether a 
> package-private/protected method on a non-final java class is to be 
> considered a public API, even if the method itself is/was never a public 
> method. This will of course require a KIP, but should help to establish some 
> ground rules to avoid any more superfluous KIPs in the future



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-11-01 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781842#comment-17781842
 ] 

Colin McCabe commented on KAFKA-15754:
--

bq. Going to close this again, even if it's a mistery why this call 
Uuid.randomUuid().toString() produced a UUID starting with "-" in our code.

My guess would be that you are depending on an older version of the Kafka 
client libraries where this was possible.

bq. Going to close this again

Ack.

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781155#comment-17781155
 ] 

Colin McCabe commented on KAFKA-15754:
--

{quote}
I was wondering if there is any good reason for using a Base64 URL encoder and 
not just the RFC4648 (not URL safe) which uses the common Base64 alphabet not 
containing the "-".
{quote}

At one point, I did raise the question of why dash was used to serialize Kafka 
Uuids. But by the time I did so we were already using it in a few places so the 
question was not relevant. We're not going to change Uuid serialization now.

I think the general rationale was that dash and underscore were friendlier than 
slash and plus sign. But that's debatable, of course. Slash, at least, is not 
filesystem-safe.

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781152#comment-17781152
 ] 

Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:21 PM:
-

You can run this code yourself if you are curious. Here it is. You will need 
bash 4 or better. (my version is {{GNU bash, version 5.2.15(1)-release 
(aarch64-apple-darwin21.6.0)}})

{code}
#!/usr/bin/env bash

declare -A IDS_PER_INITIAL_LETTER
for ((i = 0; i < 1 ; i++)); do
./kafka-storage.sh random-uuid > /tmp/out 2> /dev/null
FIRST_LETTER=$(head -c 1 /tmp/out)

IDS_PER_INITIAL_LETTER[$FIRST_LETTER]=$((IDS_PER_INITIAL_LETTER[$FIRST_LETTER]+1))
done

for k in "${!IDS_PER_INITIAL_LETTER[@]}"; do
echo "IDs starting with $k : ${IDS_PER_INITIAL_LETTER[$k]}"
done
{code}


was (Author: cmccabe):
You can run this code yourself if you are curious. Here it is. You will need 
bash 4 or better. (my version is `GNU bash, version 5.2.15(1)-release 
(aarch64-apple-darwin21.6.0)`)

{code}
#!/usr/bin/env bash

declare -A IDS_PER_INITIAL_LETTER
for ((i = 0; i < 1 ; i++)); do
./kafka-storage.sh random-uuid > /tmp/out 2> /dev/null
FIRST_LETTER=$(head -c 1 /tmp/out)

IDS_PER_INITIAL_LETTER[$FIRST_LETTER]=$((IDS_PER_INITIAL_LETTER[$FIRST_LETTER]+1))
done

for k in "${!IDS_PER_INITIAL_LETTER[@]}"; do
echo "IDs starting with $k : ${IDS_PER_INITIAL_LETTER[$k]}"
done
{code}

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781153#comment-17781153
 ] 

Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:20 PM:
-

I am closing this JIRA because {{kafka-storage.sh random-uuid}} can not, in 
fact, generate uuids starting with {{-}}.

You can see this via analysis of the code or by just running it as I did


was (Author: cmccabe):
I am closing this JIRA because {{kafka-storage.sh random-uuid}} can not, in 
fact, generate uuids starting with '-'

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781153#comment-17781153
 ] 

Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:20 PM:
-

I am closing this JIRA because {{kafka-storage.sh random-uuid}} can not, in 
fact, generate uuids starting with '-'


was (Author: cmccabe):
I am closing this JIRA because `kafka-storage.sh` can not, in fact, generate 
uuids starting with '-'

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781151#comment-17781151
 ] 

Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:20 PM:
-

I ran {{kafka-storage.sh random-uuid}} 10,000 times and got the following 
distribution of first characters:
{code}
IDs starting with 0 : 166
IDs starting with 1 : 174
IDs starting with 2 : 135
IDs starting with 3 : 172
IDs starting with 4 : 155
IDs starting with 5 : 154
IDs starting with 6 : 152
IDs starting with 7 : 172
IDs starting with 8 : 170
IDs starting with 9 : 166
IDs starting with A : 147
IDs starting with B : 161
IDs starting with C : 172
IDs starting with D : 158
IDs starting with E : 164
IDs starting with F : 164
IDs starting with G : 146
IDs starting with H : 156
IDs starting with I : 166
IDs starting with J : 172
IDs starting with K : 177
IDs starting with L : 143
IDs starting with M : 171
IDs starting with N : 144
IDs starting with O : 157
IDs starting with P : 162
IDs starting with Q : 144
IDs starting with R : 157
IDs starting with S : 161
IDs starting with T : 158
IDs starting with U : 174
IDs starting with V : 166
IDs starting with W : 166
IDs starting with X : 159
IDs starting with Y : 165
IDs starting with Z : 161
IDs starting with _ : 159
IDs starting with a : 145
IDs starting with b : 169
IDs starting with c : 166
IDs starting with d : 171
IDs starting with e : 162
IDs starting with f : 154
IDs starting with g : 132
IDs starting with h : 152
IDs starting with i : 136
IDs starting with j : 166
IDs starting with k : 159
IDs starting with l : 156
IDs starting with m : 154
IDs starting with n : 155
IDs starting with o : 154
IDs starting with p : 158
IDs starting with q : 141
IDs starting with r : 165
IDs starting with s : 154
IDs starting with t : 162
IDs starting with u : 146
IDs starting with v : 161
IDs starting with w : 164
IDs starting with x : 154
IDs starting with y : 164
IDs starting with z : 154
{code}

No IDs were generated with a first character of {{-}}, as expected. 


was (Author: cmccabe):
I ran {kafka-storage.sh random-uuid} 10,000 times and got the following 
distribution of first characters:
{code}
IDs starting with 0 : 166
IDs starting with 1 : 174
IDs starting with 2 : 135
IDs starting with 3 : 172
IDs starting with 4 : 155
IDs starting with 5 : 154
IDs starting with 6 : 152
IDs starting with 7 : 172
IDs starting with 8 : 170
IDs starting with 9 : 166
IDs starting with A : 147
IDs starting with B : 161
IDs starting with C : 172
IDs starting with D : 158
IDs starting with E : 164
IDs starting with F : 164
IDs starting with G : 146
IDs starting with H : 156
IDs starting with I : 166
IDs starting with J : 172
IDs starting with K : 177
IDs starting with L : 143
IDs starting with M : 171
IDs starting with N : 144
IDs starting with O : 157
IDs starting with P : 162
IDs starting with Q : 144
IDs starting with R : 157
IDs starting with S : 161
IDs starting with T : 158
IDs starting with U : 174
IDs starting with V : 166
IDs starting with W : 166
IDs starting with X : 159
IDs starting with Y : 165
IDs starting with Z : 161
IDs starting with _ : 159
IDs starting with a : 145
IDs starting with b : 169
IDs starting with c : 166
IDs starting with d : 171
IDs starting with e : 162
IDs starting with f : 154
IDs starting with g : 132
IDs starting with h : 152
IDs starting with i : 136
IDs starting with j : 166
IDs starting with k : 159
IDs starting with l : 156
IDs starting with m : 154
IDs starting with n : 155
IDs starting with o : 154
IDs starting with p : 158
IDs starting with q : 141
IDs starting with r : 165
IDs starting with s : 154
IDs starting with t : 162
IDs starting with u : 146
IDs starting with v : 161
IDs starting with w : 164
IDs starting with x : 154
IDs starting with y : 164
IDs starting with z : 154
{code}

No IDs were generated with a first character of {-}, as expected. 

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> 

[jira] [Resolved] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15754.
--
Resolution: Invalid

kafka-storage tool can not, in fact, generate uuids starting with '-'

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781153#comment-17781153
 ] 

Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:19 PM:
-

I am closing this JIRA because `kafka-storage.sh` can not, in fact, generate 
uuids starting with '-'


was (Author: cmccabe):
kafka-storage tool can not, in fact, generate uuids starting with '-'

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781151#comment-17781151
 ] 

Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:19 PM:
-

I ran {kafka-storage.sh random-uuid} 10,000 times and got the following 
distribution of first characters:
{code}
IDs starting with 0 : 166
IDs starting with 1 : 174
IDs starting with 2 : 135
IDs starting with 3 : 172
IDs starting with 4 : 155
IDs starting with 5 : 154
IDs starting with 6 : 152
IDs starting with 7 : 172
IDs starting with 8 : 170
IDs starting with 9 : 166
IDs starting with A : 147
IDs starting with B : 161
IDs starting with C : 172
IDs starting with D : 158
IDs starting with E : 164
IDs starting with F : 164
IDs starting with G : 146
IDs starting with H : 156
IDs starting with I : 166
IDs starting with J : 172
IDs starting with K : 177
IDs starting with L : 143
IDs starting with M : 171
IDs starting with N : 144
IDs starting with O : 157
IDs starting with P : 162
IDs starting with Q : 144
IDs starting with R : 157
IDs starting with S : 161
IDs starting with T : 158
IDs starting with U : 174
IDs starting with V : 166
IDs starting with W : 166
IDs starting with X : 159
IDs starting with Y : 165
IDs starting with Z : 161
IDs starting with _ : 159
IDs starting with a : 145
IDs starting with b : 169
IDs starting with c : 166
IDs starting with d : 171
IDs starting with e : 162
IDs starting with f : 154
IDs starting with g : 132
IDs starting with h : 152
IDs starting with i : 136
IDs starting with j : 166
IDs starting with k : 159
IDs starting with l : 156
IDs starting with m : 154
IDs starting with n : 155
IDs starting with o : 154
IDs starting with p : 158
IDs starting with q : 141
IDs starting with r : 165
IDs starting with s : 154
IDs starting with t : 162
IDs starting with u : 146
IDs starting with v : 161
IDs starting with w : 164
IDs starting with x : 154
IDs starting with y : 164
IDs starting with z : 154
{code}

No IDs were generated with a first character of {-}, as expected. 


was (Author: cmccabe):
I ran `kafka-storage.sh random-uuid` 10,000 times and got the following 
distribution of first characters:
{code}
IDs starting with 0 : 166
IDs starting with 1 : 174
IDs starting with 2 : 135
IDs starting with 3 : 172
IDs starting with 4 : 155
IDs starting with 5 : 154
IDs starting with 6 : 152
IDs starting with 7 : 172
IDs starting with 8 : 170
IDs starting with 9 : 166
IDs starting with A : 147
IDs starting with B : 161
IDs starting with C : 172
IDs starting with D : 158
IDs starting with E : 164
IDs starting with F : 164
IDs starting with G : 146
IDs starting with H : 156
IDs starting with I : 166
IDs starting with J : 172
IDs starting with K : 177
IDs starting with L : 143
IDs starting with M : 171
IDs starting with N : 144
IDs starting with O : 157
IDs starting with P : 162
IDs starting with Q : 144
IDs starting with R : 157
IDs starting with S : 161
IDs starting with T : 158
IDs starting with U : 174
IDs starting with V : 166
IDs starting with W : 166
IDs starting with X : 159
IDs starting with Y : 165
IDs starting with Z : 161
IDs starting with _ : 159
IDs starting with a : 145
IDs starting with b : 169
IDs starting with c : 166
IDs starting with d : 171
IDs starting with e : 162
IDs starting with f : 154
IDs starting with g : 132
IDs starting with h : 152
IDs starting with i : 136
IDs starting with j : 166
IDs starting with k : 159
IDs starting with l : 156
IDs starting with m : 154
IDs starting with n : 155
IDs starting with o : 154
IDs starting with p : 158
IDs starting with q : 141
IDs starting with r : 165
IDs starting with s : 154
IDs starting with t : 162
IDs starting with u : 146
IDs starting with v : 161
IDs starting with w : 164
IDs starting with x : 154
IDs starting with y : 164
IDs starting with z : 154
{code}

No IDs were generated with a first character of `-`, as expected. 

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> 

[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781152#comment-17781152
 ] 

Colin McCabe commented on KAFKA-15754:
--

You can run this code yourself if you are curious. Here it is. You will need 
bash 4 or better. (my version is `GNU bash, version 5.2.15(1)-release 
(aarch64-apple-darwin21.6.0)`)

{code}
#!/usr/bin/env bash

declare -A IDS_PER_INITIAL_LETTER
for ((i = 0; i < 1 ; i++)); do
./kafka-storage.sh random-uuid > /tmp/out 2> /dev/null
FIRST_LETTER=$(head -c 1 /tmp/out)

IDS_PER_INITIAL_LETTER[$FIRST_LETTER]=$((IDS_PER_INITIAL_LETTER[$FIRST_LETTER]+1))
done

for k in "${!IDS_PER_INITIAL_LETTER[@]}"; do
echo "IDs starting with $k : ${IDS_PER_INITIAL_LETTER[$k]}"
done
{code}

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781151#comment-17781151
 ] 

Colin McCabe commented on KAFKA-15754:
--

I ran `kafka-storage.sh random-uuid` 10,000 times and got the following 
distribution of first characters:
{code}
IDs starting with 0 : 166
IDs starting with 1 : 174
IDs starting with 2 : 135
IDs starting with 3 : 172
IDs starting with 4 : 155
IDs starting with 5 : 154
IDs starting with 6 : 152
IDs starting with 7 : 172
IDs starting with 8 : 170
IDs starting with 9 : 166
IDs starting with A : 147
IDs starting with B : 161
IDs starting with C : 172
IDs starting with D : 158
IDs starting with E : 164
IDs starting with F : 164
IDs starting with G : 146
IDs starting with H : 156
IDs starting with I : 166
IDs starting with J : 172
IDs starting with K : 177
IDs starting with L : 143
IDs starting with M : 171
IDs starting with N : 144
IDs starting with O : 157
IDs starting with P : 162
IDs starting with Q : 144
IDs starting with R : 157
IDs starting with S : 161
IDs starting with T : 158
IDs starting with U : 174
IDs starting with V : 166
IDs starting with W : 166
IDs starting with X : 159
IDs starting with Y : 165
IDs starting with Z : 161
IDs starting with _ : 159
IDs starting with a : 145
IDs starting with b : 169
IDs starting with c : 166
IDs starting with d : 171
IDs starting with e : 162
IDs starting with f : 154
IDs starting with g : 132
IDs starting with h : 152
IDs starting with i : 136
IDs starting with j : 166
IDs starting with k : 159
IDs starting with l : 156
IDs starting with m : 154
IDs starting with n : 155
IDs starting with o : 154
IDs starting with p : 158
IDs starting with q : 141
IDs starting with r : 165
IDs starting with s : 154
IDs starting with t : 162
IDs starting with u : 146
IDs starting with v : 161
IDs starting with w : 164
IDs starting with x : 154
IDs starting with y : 164
IDs starting with z : 154
{code}

No IDs were generated with a first character of `-`, as expected. 

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14349) Support dynamically resizing the KRaft controller's thread pools

2023-10-27 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780551#comment-17780551
 ] 

Colin McCabe commented on KAFKA-14349:
--

This was fixed as part of KAFKA-14351, but we forgot to close the JIRA. Closing 
now.

> Support dynamically resizing the KRaft controller's thread pools
> 
>
> Key: KAFKA-14349
> URL: https://issues.apache.org/jira/browse/KAFKA-14349
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin McCabe
>Priority: Major
>  Labels: 4.0-blocker, kip-500
>
> Support dynamically resizing the KRaft controller's request handler and 
> network handler thread pools. See {{DynamicBrokerConfig.scala}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14369) Docs - KRAFT controller authentication example

2023-10-27 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780549#comment-17780549
 ] 

Colin McCabe commented on KAFKA-14369:
--

Thanks [~dbove]. I agree that it would be helpful to have an example config 
file with non-PLAINTEXT auth. If you have one, please post it here.

> Docs - KRAFT controller authentication example
> --
>
> Key: KAFKA-14369
> URL: https://issues.apache.org/jira/browse/KAFKA-14369
> Project: Kafka
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.3.1
>Reporter: Domenic Bove
>Priority: Minor
>  Labels: kraft
>
> The [Kafka Listener docs 
> |https://kafka.apache.org/documentation/#listener_configuration]mention how 
> to handle kafka protocols (other than PLAINTEXT) on the KRAFT controller 
> listener, but it is not a working example and I found that I was missing this 
> property: 
> {code:java}
> sasl.mechanism.controller.protocol {code}
> when attempting to do SASL_PLAINTEXT on the controller listener. I see that 
> property here: 
> [https://kafka.apache.org/documentation/#brokerconfigs_sasl.mechanism.controller.protocol]
> But nowhere else. 
> I wonder if a complete working example would be better. Here are my working 
> configs for sasl plain on the controller
> {code:java}
> process.roles=controller
> listeners=CONTROLLER://:9093 
> node.id=1
> controller.quorum.voters=1@localhost:9093
> controller.listener.names=CONTROLLER
> listener.security.protocol.map=CONTROLLER:SASL_PLAINTEXT
> listener.name.controller.plain.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule
>  required username="admin" password="admin-secret" user_admin="admin-secret" 
> user_alice="alice-secret";
> listener.name.controller.sasl.enabled.mechanisms=PLAIN
> listener.name.controller.sasl.mechanism=PLAIN
> sasl.enabled.mechanisms=PLAIN
> sasl.mechanism.controller.protocol=PLAIN{code}
> Or maybe just a callout of that property in the existing docs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14369) Docs - KRAFT controller authentication example

2023-10-27 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-14369:
-
Labels: kraft  (was: 4.0-blocker)

> Docs - KRAFT controller authentication example
> --
>
> Key: KAFKA-14369
> URL: https://issues.apache.org/jira/browse/KAFKA-14369
> Project: Kafka
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.3.1
>Reporter: Domenic Bove
>Priority: Minor
>  Labels: kraft
>
> The [Kafka Listener docs 
> |https://kafka.apache.org/documentation/#listener_configuration]mention how 
> to handle kafka protocols (other than PLAINTEXT) on the KRAFT controller 
> listener, but it is not a working example and I found that I was missing this 
> property: 
> {code:java}
> sasl.mechanism.controller.protocol {code}
> when attempting to do SASL_PLAINTEXT on the controller listener. I see that 
> property here: 
> [https://kafka.apache.org/documentation/#brokerconfigs_sasl.mechanism.controller.protocol]
> But nowhere else. 
> I wonder if a complete working example would be better. Here are my working 
> configs for sasl plain on the controller
> {code:java}
> process.roles=controller
> listeners=CONTROLLER://:9093 
> node.id=1
> controller.quorum.voters=1@localhost:9093
> controller.listener.names=CONTROLLER
> listener.security.protocol.map=CONTROLLER:SASL_PLAINTEXT
> listener.name.controller.plain.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule
>  required username="admin" password="admin-secret" user_admin="admin-secret" 
> user_alice="alice-secret";
> listener.name.controller.sasl.enabled.mechanisms=PLAIN
> listener.name.controller.sasl.mechanism=PLAIN
> sasl.enabled.mechanisms=PLAIN
> sasl.mechanism.controller.protocol=PLAIN{code}
> Or maybe just a callout of that property in the existing docs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14927) Prevent kafka-configs.sh from setting non-alphanumeric config key names

2023-10-27 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-14927:
-
Labels:   (was: 4.0-blocker)

> Prevent kafka-configs.sh from setting non-alphanumeric config key names
> ---
>
> Key: KAFKA-14927
> URL: https://issues.apache.org/jira/browse/KAFKA-14927
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.3.2
>Reporter: Justin Daines
>Assignee: Aman Singh
>Priority: Minor
> Fix For: 3.7.0
>
>
> Using {{kafka-configs}} should validate dynamic configurations before 
> applying. It is possible to send a file with invalid configurations. 
> For example a file containing the following:
> {code:java}
> {
>   "routes": {
>     "crn:///kafka=*": {
>       "management": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events-denied"
>       },
>       "describe": {
>         "allowed": "",
>         "denied": "confluent-audit-log-events-denied"
>       },
>       "authentication": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events-denied-authn"
>       },
>       "authorize": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events-denied-authz"
>       },
>       "interbroker": {
>         "allowed": "",
>         "denied": ""
>       }
>     },
>     "crn:///kafka=*/group=*": {
>       "consume": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events"
>       }
>     },
>     "crn:///kafka=*/topic=*": {
>       "produce": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events"
>       },
>       "consume": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events"
>       }
>     }
>   },
>   "destinations": {
>     "topics": {
>       "confluent-audit-log-events": {
>         "retention_ms": 777600
>       },
>       "confluent-audit-log-events-denied": {
>         "retention_ms": 777600
>       },
>       "confluent-audit-log-events-denied-authn": {
>         "retention_ms": 777600
>       },
>       "confluent-audit-log-events-denied-authz": {
>         "retention_ms": 777600
>       },
>       "confluent-audit-log-events_audit": {
>         "retention_ms": 777600
>       }
>     }
>   },
>   "default_topics": {
>     "allowed": "confluent-audit-log-events_audit",
>     "denied": "confluent-audit-log-events"
>   },
>   "excluded_principals": [
>     "User:schemaregistryUser",
>     "User:ANONYMOUS",
>     "User:appSA",
>     "User:admin",
>     "User:connectAdmin",
>     "User:connectorSubmitter",
>     "User:connectorSA",
>     "User:schemaregistryUser",
>     "User:ksqlDBAdmin",
>     "User:ksqlDBUser",
>     "User:controlCenterAndKsqlDBServer",
>     "User:controlcenterAdmin",
>     "User:restAdmin",
>     "User:appSA",
>     "User:clientListen",
>     "User:superUser"
>   ]
> } {code}
> {code:java}
> kafka-configs --bootstrap-server $KAFKA_BOOTSTRAP --entity-type brokers 
> --entity-default --alter --add-config-file audit-log.json {code}
> Yields the following dynamic configs:
> {code:java}
> Default configs for brokers in the cluster are:
>   "destinations"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"destinations"=null}
>   "confluent-audit-log-events-denied-authn"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"confluent-audit-log-events-denied-authn"=null}
>   "routes"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"routes"=null}
>   "User=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"User=null}
>   },=null sensitive=true synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:},=null}
>   "excluded_principals"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"excluded_principals"=null}
>   "confluent-audit-log-events_audit"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"confluent-audit-log-events_audit"=null}
>   "authorize"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"authorize"=null}
>   "default_topics"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"default_topics"=null}
>   "topics"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"topics"=null}
>   ]=null sensitive=true synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:]=null}
>   "interbroker"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"interbroker"=null}
>   "produce"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"produce"=null}
>   "denied"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"denied"=null}
>   

[jira] [Commented] (KAFKA-14927) Prevent kafka-configs.sh from setting non-alphanumeric config key names

2023-10-27 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780541#comment-17780541
 ] 

Colin McCabe commented on KAFKA-14927:
--

It looks like this change was committed. I will close the JIRA then.

> Prevent kafka-configs.sh from setting non-alphanumeric config key names
> ---
>
> Key: KAFKA-14927
> URL: https://issues.apache.org/jira/browse/KAFKA-14927
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.3.2
>Reporter: Justin Daines
>Assignee: Aman Singh
>Priority: Minor
>  Labels: 4.0-blocker
> Fix For: 3.7.0
>
>
> Using {{kafka-configs}} should validate dynamic configurations before 
> applying. It is possible to send a file with invalid configurations. 
> For example a file containing the following:
> {code:java}
> {
>   "routes": {
>     "crn:///kafka=*": {
>       "management": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events-denied"
>       },
>       "describe": {
>         "allowed": "",
>         "denied": "confluent-audit-log-events-denied"
>       },
>       "authentication": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events-denied-authn"
>       },
>       "authorize": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events-denied-authz"
>       },
>       "interbroker": {
>         "allowed": "",
>         "denied": ""
>       }
>     },
>     "crn:///kafka=*/group=*": {
>       "consume": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events"
>       }
>     },
>     "crn:///kafka=*/topic=*": {
>       "produce": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events"
>       },
>       "consume": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events"
>       }
>     }
>   },
>   "destinations": {
>     "topics": {
>       "confluent-audit-log-events": {
>         "retention_ms": 777600
>       },
>       "confluent-audit-log-events-denied": {
>         "retention_ms": 777600
>       },
>       "confluent-audit-log-events-denied-authn": {
>         "retention_ms": 777600
>       },
>       "confluent-audit-log-events-denied-authz": {
>         "retention_ms": 777600
>       },
>       "confluent-audit-log-events_audit": {
>         "retention_ms": 777600
>       }
>     }
>   },
>   "default_topics": {
>     "allowed": "confluent-audit-log-events_audit",
>     "denied": "confluent-audit-log-events"
>   },
>   "excluded_principals": [
>     "User:schemaregistryUser",
>     "User:ANONYMOUS",
>     "User:appSA",
>     "User:admin",
>     "User:connectAdmin",
>     "User:connectorSubmitter",
>     "User:connectorSA",
>     "User:schemaregistryUser",
>     "User:ksqlDBAdmin",
>     "User:ksqlDBUser",
>     "User:controlCenterAndKsqlDBServer",
>     "User:controlcenterAdmin",
>     "User:restAdmin",
>     "User:appSA",
>     "User:clientListen",
>     "User:superUser"
>   ]
> } {code}
> {code:java}
> kafka-configs --bootstrap-server $KAFKA_BOOTSTRAP --entity-type brokers 
> --entity-default --alter --add-config-file audit-log.json {code}
> Yields the following dynamic configs:
> {code:java}
> Default configs for brokers in the cluster are:
>   "destinations"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"destinations"=null}
>   "confluent-audit-log-events-denied-authn"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"confluent-audit-log-events-denied-authn"=null}
>   "routes"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"routes"=null}
>   "User=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"User=null}
>   },=null sensitive=true synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:},=null}
>   "excluded_principals"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"excluded_principals"=null}
>   "confluent-audit-log-events_audit"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"confluent-audit-log-events_audit"=null}
>   "authorize"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"authorize"=null}
>   "default_topics"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"default_topics"=null}
>   "topics"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"topics"=null}
>   ]=null sensitive=true synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:]=null}
>   "interbroker"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"interbroker"=null}
>   "produce"=null sensitive=true 
> 

[jira] (KAFKA-14927) Prevent kafka-configs.sh from setting non-alphanumeric config key names

2023-10-27 Thread Colin McCabe (Jira)


[ https://issues.apache.org/jira/browse/KAFKA-14927 ]


Colin McCabe deleted comment on KAFKA-14927:
--

was (Author: cmccabe):
It looks like this change was committed. I will close the JIRA then.

> Prevent kafka-configs.sh from setting non-alphanumeric config key names
> ---
>
> Key: KAFKA-14927
> URL: https://issues.apache.org/jira/browse/KAFKA-14927
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.3.2
>Reporter: Justin Daines
>Assignee: Aman Singh
>Priority: Minor
>  Labels: 4.0-blocker
> Fix For: 3.7.0
>
>
> Using {{kafka-configs}} should validate dynamic configurations before 
> applying. It is possible to send a file with invalid configurations. 
> For example a file containing the following:
> {code:java}
> {
>   "routes": {
>     "crn:///kafka=*": {
>       "management": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events-denied"
>       },
>       "describe": {
>         "allowed": "",
>         "denied": "confluent-audit-log-events-denied"
>       },
>       "authentication": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events-denied-authn"
>       },
>       "authorize": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events-denied-authz"
>       },
>       "interbroker": {
>         "allowed": "",
>         "denied": ""
>       }
>     },
>     "crn:///kafka=*/group=*": {
>       "consume": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events"
>       }
>     },
>     "crn:///kafka=*/topic=*": {
>       "produce": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events"
>       },
>       "consume": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events"
>       }
>     }
>   },
>   "destinations": {
>     "topics": {
>       "confluent-audit-log-events": {
>         "retention_ms": 777600
>       },
>       "confluent-audit-log-events-denied": {
>         "retention_ms": 777600
>       },
>       "confluent-audit-log-events-denied-authn": {
>         "retention_ms": 777600
>       },
>       "confluent-audit-log-events-denied-authz": {
>         "retention_ms": 777600
>       },
>       "confluent-audit-log-events_audit": {
>         "retention_ms": 777600
>       }
>     }
>   },
>   "default_topics": {
>     "allowed": "confluent-audit-log-events_audit",
>     "denied": "confluent-audit-log-events"
>   },
>   "excluded_principals": [
>     "User:schemaregistryUser",
>     "User:ANONYMOUS",
>     "User:appSA",
>     "User:admin",
>     "User:connectAdmin",
>     "User:connectorSubmitter",
>     "User:connectorSA",
>     "User:schemaregistryUser",
>     "User:ksqlDBAdmin",
>     "User:ksqlDBUser",
>     "User:controlCenterAndKsqlDBServer",
>     "User:controlcenterAdmin",
>     "User:restAdmin",
>     "User:appSA",
>     "User:clientListen",
>     "User:superUser"
>   ]
> } {code}
> {code:java}
> kafka-configs --bootstrap-server $KAFKA_BOOTSTRAP --entity-type brokers 
> --entity-default --alter --add-config-file audit-log.json {code}
> Yields the following dynamic configs:
> {code:java}
> Default configs for brokers in the cluster are:
>   "destinations"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"destinations"=null}
>   "confluent-audit-log-events-denied-authn"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"confluent-audit-log-events-denied-authn"=null}
>   "routes"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"routes"=null}
>   "User=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"User=null}
>   },=null sensitive=true synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:},=null}
>   "excluded_principals"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"excluded_principals"=null}
>   "confluent-audit-log-events_audit"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"confluent-audit-log-events_audit"=null}
>   "authorize"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"authorize"=null}
>   "default_topics"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"default_topics"=null}
>   "topics"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"topics"=null}
>   ]=null sensitive=true synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:]=null}
>   "interbroker"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"interbroker"=null}
>   "produce"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"produce"=null}
>   "denied"=null sensitive=true 
> 

[jira] [Updated] (KAFKA-14927) Prevent kafka-configs.sh from setting non-alphanumeric config key names

2023-10-27 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-14927:
-
Summary: Prevent kafka-configs.sh from setting non-alphanumeric config key 
names  (was: Dynamic configs not validated when using kafka-configs and 
--add-config-file)

> Prevent kafka-configs.sh from setting non-alphanumeric config key names
> ---
>
> Key: KAFKA-14927
> URL: https://issues.apache.org/jira/browse/KAFKA-14927
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.3.2
>Reporter: Justin Daines
>Assignee: Aman Singh
>Priority: Minor
>  Labels: 4.0-blocker
> Fix For: 3.7.0
>
>
> Using {{kafka-configs}} should validate dynamic configurations before 
> applying. It is possible to send a file with invalid configurations. 
> For example a file containing the following:
> {code:java}
> {
>   "routes": {
>     "crn:///kafka=*": {
>       "management": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events-denied"
>       },
>       "describe": {
>         "allowed": "",
>         "denied": "confluent-audit-log-events-denied"
>       },
>       "authentication": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events-denied-authn"
>       },
>       "authorize": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events-denied-authz"
>       },
>       "interbroker": {
>         "allowed": "",
>         "denied": ""
>       }
>     },
>     "crn:///kafka=*/group=*": {
>       "consume": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events"
>       }
>     },
>     "crn:///kafka=*/topic=*": {
>       "produce": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events"
>       },
>       "consume": {
>         "allowed": "confluent-audit-log-events_audit",
>         "denied": "confluent-audit-log-events"
>       }
>     }
>   },
>   "destinations": {
>     "topics": {
>       "confluent-audit-log-events": {
>         "retention_ms": 777600
>       },
>       "confluent-audit-log-events-denied": {
>         "retention_ms": 777600
>       },
>       "confluent-audit-log-events-denied-authn": {
>         "retention_ms": 777600
>       },
>       "confluent-audit-log-events-denied-authz": {
>         "retention_ms": 777600
>       },
>       "confluent-audit-log-events_audit": {
>         "retention_ms": 777600
>       }
>     }
>   },
>   "default_topics": {
>     "allowed": "confluent-audit-log-events_audit",
>     "denied": "confluent-audit-log-events"
>   },
>   "excluded_principals": [
>     "User:schemaregistryUser",
>     "User:ANONYMOUS",
>     "User:appSA",
>     "User:admin",
>     "User:connectAdmin",
>     "User:connectorSubmitter",
>     "User:connectorSA",
>     "User:schemaregistryUser",
>     "User:ksqlDBAdmin",
>     "User:ksqlDBUser",
>     "User:controlCenterAndKsqlDBServer",
>     "User:controlcenterAdmin",
>     "User:restAdmin",
>     "User:appSA",
>     "User:clientListen",
>     "User:superUser"
>   ]
> } {code}
> {code:java}
> kafka-configs --bootstrap-server $KAFKA_BOOTSTRAP --entity-type brokers 
> --entity-default --alter --add-config-file audit-log.json {code}
> Yields the following dynamic configs:
> {code:java}
> Default configs for brokers in the cluster are:
>   "destinations"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"destinations"=null}
>   "confluent-audit-log-events-denied-authn"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"confluent-audit-log-events-denied-authn"=null}
>   "routes"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"routes"=null}
>   "User=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"User=null}
>   },=null sensitive=true synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:},=null}
>   "excluded_principals"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"excluded_principals"=null}
>   "confluent-audit-log-events_audit"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"confluent-audit-log-events_audit"=null}
>   "authorize"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"authorize"=null}
>   "default_topics"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"default_topics"=null}
>   "topics"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"topics"=null}
>   ]=null sensitive=true synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:]=null}
>   "interbroker"=null sensitive=true 
> synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:"interbroker"=null}
>   "produce"=null 

[jira] [Commented] (KAFKA-14941) Document which configuration options are applicable only to processes with broker role or controller role

2023-10-27 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780540#comment-17780540
 ] 

Colin McCabe commented on KAFKA-14941:
--

I'm not sure that I totally understand the goal here.

If the goal is to be able to dynamically change configurations, that does not 
require leaving the configuration out of the static broker or controller config 
file. The dynamic configuration always takes precedence.

If the goal is to understand what the configuration does, the help text of the 
configuration should explain that.

Can you explain a bit more about the goal?

> Document which configuration options are applicable only to processes with 
> broker role or controller role
> -
>
> Key: KAFKA-14941
> URL: https://issues.apache.org/jira/browse/KAFKA-14941
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jakub Scholz
>Priority: Major
>
> When running in KRaft mode, some of the configuration options are applicable 
> only to nodes with the broker process role and some are applicable only to 
> the nodes with the controller process roles. It would be great if this 
> information was part of the documentation (e.g. in the [Broker 
> Configs|https://kafka.apache.org/documentation/#brokerconfigs] table on the 
> website), but if it was also part of the config classes so that it can be 
> used in situations when the configuration is dynamically configured to for 
> example filter the options applicable to different nodes. This would allow 
> having configuration files with only the actually used configuration options 
> and for example, help to reduce unnecessary restarts when rolling out new 
> configurations etc.
> For some options, it seems clear and the Kafka node would refuse to start if 
> they are set - for example the configurations of the non-controler-listeners 
> in controller-only nodes. For others, it seems a bit less clear (Does 
> {{compression.type}} option apply to controller-only nodes? Or the 
> configurations for the offset topic? etc.).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14941) Document which configuration options are applicable only to processes with broker role or controller role

2023-10-27 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-14941:
-
Labels:   (was: 4.0-blocker)

> Document which configuration options are applicable only to processes with 
> broker role or controller role
> -
>
> Key: KAFKA-14941
> URL: https://issues.apache.org/jira/browse/KAFKA-14941
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jakub Scholz
>Priority: Major
>
> When running in KRaft mode, some of the configuration options are applicable 
> only to nodes with the broker process role and some are applicable only to 
> the nodes with the controller process roles. It would be great if this 
> information was part of the documentation (e.g. in the [Broker 
> Configs|https://kafka.apache.org/documentation/#brokerconfigs] table on the 
> website), but if it was also part of the config classes so that it can be 
> used in situations when the configuration is dynamically configured to for 
> example filter the options applicable to different nodes. This would allow 
> having configuration files with only the actually used configuration options 
> and for example, help to reduce unnecessary restarts when rolling out new 
> configurations etc.
> For some options, it seems clear and the Kafka node would refuse to start if 
> they are set - for example the configurations of the non-controler-listeners 
> in controller-only nodes. For others, it seems a bit less clear (Does 
> {{compression.type}} option apply to controller-only nodes? Or the 
> configurations for the offset topic? etc.).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15489) split brain in KRaft cluster

2023-10-27 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-15489:
-
Labels:   (was: 4.0-blocker)

> split brain in KRaft cluster 
> -
>
> Key: KAFKA-15489
> URL: https://issues.apache.org/jira/browse/KAFKA-15489
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.5.1
>Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Major
>
> I found in the current KRaft implementation, when network partition happened 
> between the current controller leader and the other controller nodes, the 
> "split brain" issue will happen. It causes 2 leaders will exist in the 
> controller cluster, and 2 inconsistent sets of metadata will return to the 
> clients.
>  
> *Root cause*
> In 
> [KIP-595|https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum#KIP595:ARaftProtocolfortheMetadataQuorum-Vote],
>  we said A voter will begin a new election under three conditions:
> 1. If it fails to receive a FetchResponse from the current leader before 
> expiration of quorum.fetch.timeout.ms
> 2. If it receives a EndQuorumEpoch request from the current leader
> 3. If it fails to receive a majority of votes before expiration of 
> quorum.election.timeout.ms after declaring itself a candidate.
> And that's exactly what the current KRaft's implementation.
>  
> However, when the leader is isolated from the network partition, there's no 
> way for it to resign from the leadership and start a new election. So the 
> leader will always be the leader even though all other nodes are down. And 
> this makes the split brain issue possible.
> When reading further in the KIP-595, I found we indeed considered this 
> situation and have solution for that. in [this 
> section|https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum#KIP595:ARaftProtocolfortheMetadataQuorum-LeaderProgressTimeout],
>  it said:
> {quote}In the pull-based model, however, say a new leader has been elected 
> with a new epoch and everyone has learned about it except the old leader 
> (e.g. that leader was not in the voters anymore and hence not receiving the 
> BeginQuorumEpoch as well), then that old leader would not be notified by 
> anyone about the new leader / epoch and become a pure "zombie leader", as 
> there is no regular heartbeats being pushed from leader to the follower. This 
> could lead to stale information being served to the observers and clients 
> inside the cluster.
> {quote}
> {quote}To resolve this issue, we will piggy-back on the 
> "quorum.fetch.timeout.ms" config, such that if the leader did not receive 
> Fetch requests from a majority of the quorum for that amount of time, it 
> would begin a new election and start sending VoteRequest to voter nodes in 
> the cluster to understand the latest quorum. If it couldn't connect to any 
> known voter, the old leader shall keep starting new elections and bump the 
> epoch.
> {quote}
>  
> But we missed this implementation in current KRaft.
>  
> *The flow is like this:*
> 1. 3 controller nodes, A(leader), B(follower), C(follower)
> 2. network partition happened between [A] and [B, C].
> 3. B and C starts new election since fetch timeout expired before receiving 
> fetch response from leader A.
> 4. B (or C) is elected as a leader in new epoch, while A is still the leader 
> in old epoch.
> 5. broker D creates a topic "new", and updates to leader B.
> 6. broker E describe topic "new", but got nothing because it is connecting to 
> the old leader A.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15513) KRaft cluster fails with SCRAM authentication enabled for control-plane

2023-10-27 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-15513:
-
Labels:   (was: 4.0-blocker)

> KRaft cluster fails with SCRAM authentication enabled for control-plane
> ---
>
> Key: KAFKA-15513
> URL: https://issues.apache.org/jira/browse/KAFKA-15513
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.6.0, 3.5.1
>Reporter: migruiz4
>Priority: Major
>
> We have observed a scenario where a KRaft cluster fails to bootstrap when 
> using SCRAM authentication for controller-to-controller communications.
> The steps to reproduce are simple:
>  * Deploy (at least) 2 Kafka servers using latest version 3.5.1.
>  * Configure a KRaft cluster, where the controller listener uses 
> SASL_PLAINTEXT + SCRAM-SHA-256 or SCRAM-SHA-512. In my case, I'm using the 
> recommended in-line jaas config 
> '{{{}listener.name..scram-sha-512.sasl.jaas.config{}}}'
>  * Run 'kafka-storage.sh' in both nodes using option '--add-scram' to create 
> the SCRAM user.
> When initialized, Controllers will fail to connect to each other with an 
> authentication error:
>  
> {code:java}
> [2023-08-01 11:12:45,295] ERROR [kafka-1-raft-outbound-request-thread]: 
> Failed to send the following request due to authentication error: 
> ClientRequest(expectResponse=true, 
> callback=kafka.raft.KafkaNetworkChannel$$Lambda$687/0x7f27d443fc60@2aba6075,
>  destination=0, correlationId=129, clientId=raft-client-1, 
> createdTimeMs=1690888364960, 
> requestBuilder=VoteRequestData(clusterId='abcdefghijklmnopqrstug', 
> topics=[TopicData(topicName='__cluster_metadata', 
> partitions=[PartitionData(partitionIndex=0, candidateEpoch=4, candidateId=1, 
> lastOffsetEpoch=0, lastOffset=0)])])) (kafka.raft.RaftSendThread) {code}
> Some additional details about the scenario that we tested out:
>  *  Controller listener does work when configured with SASL+PLAIN
>  * The issue only affects the Controller listener, SCRAM users created using 
> the same method work for data-plane listeners and inter-broker listeners.
>  
> Below you can find the exact configuration and command used to deploy:
>  * server.properties
> {code:java}
> listeners=INTERNAL://:9092,CLIENT://:9091,CONTROLLER://:9093
> advertised.listeners=INTERNAL://kafka-0:9092,CLIENT://:9091
> listener.security.protocol.map=INTERNAL:PLAINTEXT,CLIENT:PLAINTEXT,CONTROLLER:SASL_PLAINTEXT
> num.network.threads=3
> num.io.threads=8
> socket.send.buffer.bytes=102400
> socket.receive.buffer.bytes=102400
> socket.request.max.bytes=104857600
> log.dirs=/bitnami/kafka/data
> num.partitions=1
> num.recovery.threads.per.data.dir=1
> offsets.topic.replication.factor=1
> transaction.state.log.replication.factor=1
> transaction.state.log.min.isr=1
> log.retention.hours=168
> log.retention.check.interval.ms=30
> controller.listener.names=CONTROLLER
> controller.quorum.voters=0@kafka-0:9093,1@kafka-1:9093
> inter.broker.listener.name=INTERNAL
> node.id=0
> process.roles=controller,broker
> sasl.enabled.mechanisms=PLAIN,SCRAM-SHA-256,SCRAM-SHA-512
> sasl.mechanism.controller.protocol=SCRAM-SHA-512
> listener.name.controller.sasl.enabled.mechanisms=SCRAM-SHA-512
> listener.name.controller.scram-sha-512.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule
>  required username="controller_user" password="controller_password";{code}
>  * kafka-storage.sh command
> {code:java}
> kafka-storage.sh format --config /path/to/server.properties 
> --ignore-formatted --cluster-id abcdefghijklmnopqrstuv --add-scram 
> SCRAM-SHA-512=[name=controller_user,password=controller_password] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15513) KRaft cluster fails with SCRAM authentication enabled for control-plane

2023-10-27 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780539#comment-17780539
 ] 

Colin McCabe commented on KAFKA-15513:
--

To be more concrete, you need to use the {{--add-scram}} argument to the 
{{kafka-storage.sh format}} command.

> KRaft cluster fails with SCRAM authentication enabled for control-plane
> ---
>
> Key: KAFKA-15513
> URL: https://issues.apache.org/jira/browse/KAFKA-15513
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.6.0, 3.5.1
>Reporter: migruiz4
>Priority: Major
>  Labels: 4.0-blocker
>
> We have observed a scenario where a KRaft cluster fails to bootstrap when 
> using SCRAM authentication for controller-to-controller communications.
> The steps to reproduce are simple:
>  * Deploy (at least) 2 Kafka servers using latest version 3.5.1.
>  * Configure a KRaft cluster, where the controller listener uses 
> SASL_PLAINTEXT + SCRAM-SHA-256 or SCRAM-SHA-512. In my case, I'm using the 
> recommended in-line jaas config 
> '{{{}listener.name..scram-sha-512.sasl.jaas.config{}}}'
>  * Run 'kafka-storage.sh' in both nodes using option '--add-scram' to create 
> the SCRAM user.
> When initialized, Controllers will fail to connect to each other with an 
> authentication error:
>  
> {code:java}
> [2023-08-01 11:12:45,295] ERROR [kafka-1-raft-outbound-request-thread]: 
> Failed to send the following request due to authentication error: 
> ClientRequest(expectResponse=true, 
> callback=kafka.raft.KafkaNetworkChannel$$Lambda$687/0x7f27d443fc60@2aba6075,
>  destination=0, correlationId=129, clientId=raft-client-1, 
> createdTimeMs=1690888364960, 
> requestBuilder=VoteRequestData(clusterId='abcdefghijklmnopqrstug', 
> topics=[TopicData(topicName='__cluster_metadata', 
> partitions=[PartitionData(partitionIndex=0, candidateEpoch=4, candidateId=1, 
> lastOffsetEpoch=0, lastOffset=0)])])) (kafka.raft.RaftSendThread) {code}
> Some additional details about the scenario that we tested out:
>  *  Controller listener does work when configured with SASL+PLAIN
>  * The issue only affects the Controller listener, SCRAM users created using 
> the same method work for data-plane listeners and inter-broker listeners.
>  
> Below you can find the exact configuration and command used to deploy:
>  * server.properties
> {code:java}
> listeners=INTERNAL://:9092,CLIENT://:9091,CONTROLLER://:9093
> advertised.listeners=INTERNAL://kafka-0:9092,CLIENT://:9091
> listener.security.protocol.map=INTERNAL:PLAINTEXT,CLIENT:PLAINTEXT,CONTROLLER:SASL_PLAINTEXT
> num.network.threads=3
> num.io.threads=8
> socket.send.buffer.bytes=102400
> socket.receive.buffer.bytes=102400
> socket.request.max.bytes=104857600
> log.dirs=/bitnami/kafka/data
> num.partitions=1
> num.recovery.threads.per.data.dir=1
> offsets.topic.replication.factor=1
> transaction.state.log.replication.factor=1
> transaction.state.log.min.isr=1
> log.retention.hours=168
> log.retention.check.interval.ms=30
> controller.listener.names=CONTROLLER
> controller.quorum.voters=0@kafka-0:9093,1@kafka-1:9093
> inter.broker.listener.name=INTERNAL
> node.id=0
> process.roles=controller,broker
> sasl.enabled.mechanisms=PLAIN,SCRAM-SHA-256,SCRAM-SHA-512
> sasl.mechanism.controller.protocol=SCRAM-SHA-512
> listener.name.controller.sasl.enabled.mechanisms=SCRAM-SHA-512
> listener.name.controller.scram-sha-512.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule
>  required username="controller_user" password="controller_password";{code}
>  * kafka-storage.sh command
> {code:java}
> kafka-storage.sh format --config /path/to/server.properties 
> --ignore-formatted --cluster-id abcdefghijklmnopqrstuv --add-scram 
> SCRAM-SHA-512=[name=controller_user,password=controller_password] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (KAFKA-15513) KRaft cluster fails with SCRAM authentication enabled for control-plane

2023-10-27 Thread Colin McCabe (Jira)


[ https://issues.apache.org/jira/browse/KAFKA-15513 ]


Colin McCabe deleted comment on KAFKA-15513:
--

was (Author: cmccabe):
Currently, you need to add the controller principal to `super.users` rather 
than relying on SCRAM to configure it. This is no different than how in ZK 
mode, you must have working ZK auth before you can configure Kafka.

In the future, we will probably support configuring SCRAM prior to controller 
startup via the `kafka-format.sh` command. The mechanism is all there (in the 
form of the bootstrap file) but we haven't finished implementing it yet...

> KRaft cluster fails with SCRAM authentication enabled for control-plane
> ---
>
> Key: KAFKA-15513
> URL: https://issues.apache.org/jira/browse/KAFKA-15513
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.6.0, 3.5.1
>Reporter: migruiz4
>Priority: Major
>  Labels: 4.0-blocker
>
> We have observed a scenario where a KRaft cluster fails to bootstrap when 
> using SCRAM authentication for controller-to-controller communications.
> The steps to reproduce are simple:
>  * Deploy (at least) 2 Kafka servers using latest version 3.5.1.
>  * Configure a KRaft cluster, where the controller listener uses 
> SASL_PLAINTEXT + SCRAM-SHA-256 or SCRAM-SHA-512. In my case, I'm using the 
> recommended in-line jaas config 
> '{{{}listener.name..scram-sha-512.sasl.jaas.config{}}}'
>  * Run 'kafka-storage.sh' in both nodes using option '--add-scram' to create 
> the SCRAM user.
> When initialized, Controllers will fail to connect to each other with an 
> authentication error:
>  
> {code:java}
> [2023-08-01 11:12:45,295] ERROR [kafka-1-raft-outbound-request-thread]: 
> Failed to send the following request due to authentication error: 
> ClientRequest(expectResponse=true, 
> callback=kafka.raft.KafkaNetworkChannel$$Lambda$687/0x7f27d443fc60@2aba6075,
>  destination=0, correlationId=129, clientId=raft-client-1, 
> createdTimeMs=1690888364960, 
> requestBuilder=VoteRequestData(clusterId='abcdefghijklmnopqrstug', 
> topics=[TopicData(topicName='__cluster_metadata', 
> partitions=[PartitionData(partitionIndex=0, candidateEpoch=4, candidateId=1, 
> lastOffsetEpoch=0, lastOffset=0)])])) (kafka.raft.RaftSendThread) {code}
> Some additional details about the scenario that we tested out:
>  *  Controller listener does work when configured with SASL+PLAIN
>  * The issue only affects the Controller listener, SCRAM users created using 
> the same method work for data-plane listeners and inter-broker listeners.
>  
> Below you can find the exact configuration and command used to deploy:
>  * server.properties
> {code:java}
> listeners=INTERNAL://:9092,CLIENT://:9091,CONTROLLER://:9093
> advertised.listeners=INTERNAL://kafka-0:9092,CLIENT://:9091
> listener.security.protocol.map=INTERNAL:PLAINTEXT,CLIENT:PLAINTEXT,CONTROLLER:SASL_PLAINTEXT
> num.network.threads=3
> num.io.threads=8
> socket.send.buffer.bytes=102400
> socket.receive.buffer.bytes=102400
> socket.request.max.bytes=104857600
> log.dirs=/bitnami/kafka/data
> num.partitions=1
> num.recovery.threads.per.data.dir=1
> offsets.topic.replication.factor=1
> transaction.state.log.replication.factor=1
> transaction.state.log.min.isr=1
> log.retention.hours=168
> log.retention.check.interval.ms=30
> controller.listener.names=CONTROLLER
> controller.quorum.voters=0@kafka-0:9093,1@kafka-1:9093
> inter.broker.listener.name=INTERNAL
> node.id=0
> process.roles=controller,broker
> sasl.enabled.mechanisms=PLAIN,SCRAM-SHA-256,SCRAM-SHA-512
> sasl.mechanism.controller.protocol=SCRAM-SHA-512
> listener.name.controller.sasl.enabled.mechanisms=SCRAM-SHA-512
> listener.name.controller.scram-sha-512.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule
>  required username="controller_user" password="controller_password";{code}
>  * kafka-storage.sh command
> {code:java}
> kafka-storage.sh format --config /path/to/server.properties 
> --ignore-formatted --cluster-id abcdefghijklmnopqrstuv --add-scram 
> SCRAM-SHA-512=[name=controller_user,password=controller_password] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   >