[jira] [Created] (KAFKA-16757) Fix broker re-registration issues around MV 3.7-IV2
Colin McCabe created KAFKA-16757: Summary: Fix broker re-registration issues around MV 3.7-IV2 Key: KAFKA-16757 URL: https://issues.apache.org/jira/browse/KAFKA-16757 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe When upgrading from a MetadataVersion older than 3.7-IV2, we need to resend the broker registration, so that the controller can record the storage directories. The current code for doing this has several problems, however. One is that it tends to trigger even in cases where we don't actually need it. Another is that when re-registering the broker, the broker is marked as fenced. This PR moves the handling of the re-registration case out of BrokerMetadataPublisher and into BrokerRegistrationTracker. The re-registration code there will only trigger in the case where the broker sees an existing registration for itself with no directories set. This is much more targetted than the original code. Additionally, in ClusterControlManager, when re-registering the same broker, we now preserve its fencing and shutdown state, rather than clearing those. (There isn't any good reason re-registering the same broker should clear these things... this was purely an oversight.) Note that we can tell the broker is "the same" because it has the same IncarnationId. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16649) Fix potential deadlock in DynamicBrokerConfig
Colin McCabe created KAFKA-16649: Summary: Fix potential deadlock in DynamicBrokerConfig Key: KAFKA-16649 URL: https://issues.apache.org/jira/browse/KAFKA-16649 Project: Kafka Issue Type: Bug Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16624) Don't generate useless PartitionChangeRecord on older MV
Colin McCabe created KAFKA-16624: Summary: Don't generate useless PartitionChangeRecord on older MV Key: KAFKA-16624 URL: https://issues.apache.org/jira/browse/KAFKA-16624 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe Fix a case where we could generate useless PartitionChangeRecords on metadata versions older than 3.6-IV0. This could happen in the case where we had an ISR with only one broker in it, and we were trying to go down to a fully empty ISR. In this case, PartitionChangeBuilder would block the record to going down to a fully empty ISR (since that is not valid in these pre-KIP-966 metadata versions), but it would still emit the record, even though it had no effect. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16509) CurrentControllerId metric is unreliable in ZK mode
Colin McCabe created KAFKA-16509: Summary: CurrentControllerId metric is unreliable in ZK mode Key: KAFKA-16509 URL: https://issues.apache.org/jira/browse/KAFKA-16509 Project: Kafka Issue Type: Bug Reporter: Colin McCabe The CurrentControllerId metric added by KIP-1001 is unreliable in ZK mode. Sometimes when there is no active ZK-based controller, it still shows the previous controller ID. Instead, it should show -1 in that situation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16475) Create unit test for TopicImageNode
Colin McCabe created KAFKA-16475: Summary: Create unit test for TopicImageNode Key: KAFKA-16475 URL: https://issues.apache.org/jira/browse/KAFKA-16475 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16469) Metadata Schema Checker
Colin McCabe created KAFKA-16469: Summary: Metadata Schema Checker Key: KAFKA-16469 URL: https://issues.apache.org/jira/browse/KAFKA-16469 Project: Kafka Issue Type: New Feature Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16428) Fix bug where config change notification znode may not get created during migration
[ https://issues.apache.org/jira/browse/KAFKA-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-16428. -- Resolution: Fixed > Fix bug where config change notification znode may not get created during > migration > --- > > Key: KAFKA-16428 > URL: https://issues.apache.org/jira/browse/KAFKA-16428 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0, 3.6.1 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > Fix For: 3.6.2, 3.8.0, 3.7.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16411) Correctly migrate default client quota entities in KRaft migration
[ https://issues.apache.org/jira/browse/KAFKA-16411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-16411. -- Resolution: Fixed > Correctly migrate default client quota entities in KRaft migration > -- > > Key: KAFKA-16411 > URL: https://issues.apache.org/jira/browse/KAFKA-16411 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Blocker > Fix For: 3.6.2, 3.8.0, 3.7.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16435) Add test for KAFKA-16428
Colin McCabe created KAFKA-16435: Summary: Add test for KAFKA-16428 Key: KAFKA-16435 URL: https://issues.apache.org/jira/browse/KAFKA-16435 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Add a test for KAFKA-16428: Fix bug where config change notification znode may not get created during migration #15608 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16428) Fix bug where config change notification znode may not get created during migration
Colin McCabe created KAFKA-16428: Summary: Fix bug where config change notification znode may not get created during migration Key: KAFKA-16428 URL: https://issues.apache.org/jira/browse/KAFKA-16428 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16411) Correctly migrate default entities in KRaft migration
Colin McCabe created KAFKA-16411: Summary: Correctly migrate default entities in KRaft migration Key: KAFKA-16411 URL: https://issues.apache.org/jira/browse/KAFKA-16411 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16321) Default directory ids to MIGRATING, not UNASSIGNED
Colin McCabe created KAFKA-16321: Summary: Default directory ids to MIGRATING, not UNASSIGNED Key: KAFKA-16321 URL: https://issues.apache.org/jira/browse/KAFKA-16321 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe Directory ids should be defaulted to MIGRATING, not UNASSIGNED. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16216) Reduce batch size for initial metadata load during ZK migration
[ https://issues.apache.org/jira/browse/KAFKA-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-16216. -- Fix Version/s: 3.7.0 Reviewer: Colin McCabe Assignee: David Arthur (was: Colin McCabe) Resolution: Fixed > Reduce batch size for initial metadata load during ZK migration > --- > > Key: KAFKA-16216 > URL: https://issues.apache.org/jira/browse/KAFKA-16216 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: David Arthur >Priority: Major > Fix For: 3.7.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16216) Reduce batch size for initial metadata load during ZK migration
Colin McCabe created KAFKA-16216: Summary: Reduce batch size for initial metadata load during ZK migration Key: KAFKA-16216 URL: https://issues.apache.org/jira/browse/KAFKA-16216 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: David Arthur -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16180) Full metadata request sometimes fails during zk migration
Colin McCabe created KAFKA-16180: Summary: Full metadata request sometimes fails during zk migration Key: KAFKA-16180 URL: https://issues.apache.org/jira/browse/KAFKA-16180 Project: Kafka Issue Type: Bug Affects Versions: 3.7.0 Reporter: Colin McCabe Example: {{java.util.NoSuchElementException: lkc-gnjo9m_dev_kafka.stream.detection.bucket3-KTABLE-SUPPRESS-STATE-STORE-08-changelog at scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:508) at scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:507) at scala.collection.mutable.AnyRefMap.apply(AnyRefMap.scala:207) at kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2(ZkMetadataCache.scala:112) at kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2$adapted(ZkMetadataCache.scala:105) at scala.collection.immutable.HashSet.foreach(HashSet.scala:958) at kafka.server.metadata.ZkMetadataCache$.maybeInjectDeletedPartitionsFromFullMetadataRequest(ZkMetadataCache.scala:105) at kafka.server.metadata.ZkMetadataCache.$anonfun$updateMetadata$1(ZkMetadataCache.scala:506) at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:183) at kafka.server.metadata.ZkMetadataCache.updateMetadata(ZkMetadataCache.scala:496) at kafka.server.ReplicaManager.maybeUpdateMetadataCache(ReplicaManager.scala:2482) at kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:733) at kafka.server.KafkaApis.handle(KafkaApis.scala:349) at kafka.server.KafkaRequestHandler.$anonfun$poll$8(KafkaRequestHandler.scala:210) at kafka.server.KafkaRequestHandler.$anonfun$poll$8$adapted(KafkaRequestHandler.scala:210) at io.confluent.kafka.availability.ThreadCountersManager.wrapEngine(ThreadCountersManager.java:146) at kafka.server.KafkaRequestHandler.poll(KafkaRequestHandler.scala:210) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:151) at java.base/java.lang.Thread.run(Thread.java:1583) at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:66)}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16078) Be more consistent about getting the latest MetadataVersion
[ https://issues.apache.org/jira/browse/KAFKA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-16078. -- Fix Version/s: 3.7.0 Reviewer: Colin Patrick McCabe Resolution: Fixed > Be more consistent about getting the latest MetadataVersion > --- > > Key: KAFKA-16078 > URL: https://issues.apache.org/jira/browse/KAFKA-16078 > Project: Kafka > Issue Type: Bug >Reporter: David Arthur >Assignee: David Arthur >Priority: Major > Fix For: 3.7.0 > > > The InterBrokerProtocolVersion currently defaults to a non-production > MetadataVersion. We should be more consistent about getting the latest > MetadataVersion. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16131) Repeated UnsupportedVersionException logged when running Kafka 3.7.0-RC2 KRaft cluster with metadata version 3.6
[ https://issues.apache.org/jira/browse/KAFKA-16131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-16131. -- Resolution: Fixed > Repeated UnsupportedVersionException logged when running Kafka 3.7.0-RC2 > KRaft cluster with metadata version 3.6 > > > Key: KAFKA-16131 > URL: https://issues.apache.org/jira/browse/KAFKA-16131 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Jakub Scholz >Assignee: Proven Provenzano >Priority: Blocker > Fix For: 3.7.0 > > > When running Kafka 3.7.0-RC2 as a KRaft cluster with metadata version set to > 3.6-IV2 metadata version, it throws repeated errors like this in the > controller logs: > {quote}2024-01-13 16:58:01,197 INFO [QuorumController id=0] > assignReplicasToDirs: event failed with UnsupportedVersionException in 15 > microseconds. (org.apache.kafka.controller.QuorumController) > [quorum-controller-0-event-handler] > 2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0] Unexpected error > handling request RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS, apiVersion=0, > clientId=1000, correlationId=14, headerVersion=2) – > AssignReplicasToDirsRequestData(brokerId=1000, brokerEpoch=5, > directories=[DirectoryData(id=w_uxN7pwQ6eXSMrOKceYIQ, > topics=[TopicData(topicId=bvAKLSwmR7iJoKv2yZgygQ, > partitions=[PartitionData(partitionIndex=2), > PartitionData(partitionIndex=1)]), TopicData(topicId=uNe7f5VrQgO0zST6yH1jDQ, > partitions=[PartitionData(partitionIndex=0)])])]) with context > RequestContext(header=RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS, > apiVersion=0, clientId=1000, correlationId=14, headerVersion=2), > connectionId='172.16.14.219:9090-172.16.14.217:53590-7', > clientAddress=/[172.16.14.217|http://172.16.14.217/], > principal=User:CN=my-cluster-kafka,O=io.strimzi, > listenerName=ListenerName(CONTROLPLANE-9090), securityProtocol=SSL, > clientInformation=ClientInformation(softwareName=apache-kafka-java, > softwareVersion=3.7.0), fromPrivilegedListener=false, > principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@71004ad2]) > (kafka.server.ControllerApis) [quorum-controller-0-event-handler] > java.util.concurrent.CompletionException: > org.apache.kafka.common.errors.UnsupportedVersionException: Directory > assignment is not supported yet. > at > java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) > at > java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) > at > org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:880) > at > org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871) > at > org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:148) > at > org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:137) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181) > at java.base/java.lang.Thread.run(Thread.java:840) > Caused by: org.apache.kafka.common.errors.UnsupportedVersionException: > Directory assignment is not supported yet. > {quote} > > With the metadata version set to 3.6-IV2, it makes sense that the request is > not supported. But the request should in such case not be sent at all. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16121) Partition reassignments in ZK migration dual write mode stalled until leader epoch incremented
[ https://issues.apache.org/jira/browse/KAFKA-16121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-16121. -- Fix Version/s: 3.7.0 Reviewer: Colin McCabe Assignee: David Mao Resolution: Duplicate > Partition reassignments in ZK migration dual write mode stalled until leader > epoch incremented > -- > > Key: KAFKA-16121 > URL: https://issues.apache.org/jira/browse/KAFKA-16121 > Project: Kafka > Issue Type: Bug >Reporter: David Mao >Assignee: David Mao >Priority: Major > Fix For: 3.7.0 > > > I noticed this in an integration test in > https://github.com/apache/kafka/pull/15184 > In ZK mode, partition leaders rely on the LeaderAndIsr request to be notified > of new replicas as part of a reassignment. In ZK mode, we ignore any > LeaderAndIsr request where the partition leader epoch is less than or equal > to the current partition leader epoch. > In KRaft mode, we do not bump the leader epoch when starting a new > reassignment, see: `triggerLeaderEpochBumpIfNeeded`. This means that the > leader will ignore the LISR request initiating the reassignment until a > leader epoch bump is triggered through another means, for instance preferred > leader election. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16120) Partition reassignments in ZK migration dual write leaves stray partitions
[ https://issues.apache.org/jira/browse/KAFKA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-16120. -- Fix Version/s: 3.7.0 Reviewer: Colin McCabe Assignee: David Mao Resolution: Fixed > Partition reassignments in ZK migration dual write leaves stray partitions > -- > > Key: KAFKA-16120 > URL: https://issues.apache.org/jira/browse/KAFKA-16120 > Project: Kafka > Issue Type: Bug >Reporter: David Mao >Assignee: David Mao >Priority: Major > Fix For: 3.7.0 > > > When a reassignment is completed in ZK migration dual-write mode, the > `StopReplica` sent by the kraft quorum migration propagator is sent with > `delete = false` for deleted replicas when processing the topic delta. This > results in stray replicas. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16126) Kcontroller dynamic configurations may fail to apply at startup
Colin McCabe created KAFKA-16126: Summary: Kcontroller dynamic configurations may fail to apply at startup Key: KAFKA-16126 URL: https://issues.apache.org/jira/browse/KAFKA-16126 Project: Kafka Issue Type: Bug Affects Versions: 3.7.0 Reporter: Colin McCabe Assignee: Colin McCabe Some kcontroller dynamic configurations may fail to apply at startup. This happens because there is a race between registering the reconfigurables to the DynamicBrokerConfig class, and receiving the first update from the metadata publisher. We can fix this by registering the reconfigurables first. This seems to have been introduced by the "MINOR: Install ControllerServer metadata publishers sooner" change. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16094) BrokerRegistrationRequest.logDirs field must be ignorable
[ https://issues.apache.org/jira/browse/KAFKA-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-16094. -- Fix Version/s: 3.7.0 Resolution: Fixed > BrokerRegistrationRequest.logDirs field must be ignorable > - > > Key: KAFKA-16094 > URL: https://issues.apache.org/jira/browse/KAFKA-16094 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Blocker > Fix For: 3.7.0 > > > 3.7 brokers must be able to register with 3.6 and earlier controllers. So > this means that the logDirs field must be ignorable (aka, not sent) if the > highest BrokerRegistrationRequest version we can negotiate is older than v2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16094) 3.7 brokers must be able to register with 3.6 and earlier controllers
Colin McCabe created KAFKA-16094: Summary: 3.7 brokers must be able to register with 3.6 and earlier controllers Key: KAFKA-16094 URL: https://issues.apache.org/jira/browse/KAFKA-16094 Project: Kafka Issue Type: Bug Affects Versions: 3.7.0 Reporter: Colin McCabe Assignee: Colin McCabe 3.7 brokers must be able to register with 3.6 and earlier controllers. So this means that the logDirs field must be ignorable (aka, not sent) if the highest BrokerRegistrationRequest version we can negotiate is older than v2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14127) KIP-858: Handle JBOD broker disk failure in KRaft
[ https://issues.apache.org/jira/browse/KAFKA-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14127. -- Resolution: Fixed > KIP-858: Handle JBOD broker disk failure in KRaft > - > > Key: KAFKA-14127 > URL: https://issues.apache.org/jira/browse/KAFKA-14127 > Project: Kafka > Issue Type: Improvement > Components: jbod, kraft >Reporter: Igor Soarez >Assignee: Igor Soarez >Priority: Major > Labels: 4.0-blocker, kip-500, kraft > Fix For: 3.7.0 > > > Supporting configurations with multiple storage directories in KRaft mode -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15650) Data-loss on leader shutdown right after partition creation?
[ https://issues.apache.org/jira/browse/KAFKA-15650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15650. -- Resolution: Not A Problem > Data-loss on leader shutdown right after partition creation? > > > Key: KAFKA-15650 > URL: https://issues.apache.org/jira/browse/KAFKA-15650 > Project: Kafka > Issue Type: Sub-task >Reporter: Igor Soarez >Priority: Major > > As per KIP-858, when a replica is created, the broker selects a log directory > to host the replica and queues the propagation of the directory assignment to > the controller. The replica becomes immediately active, it isn't blocked > until the controller confirms the metadata change. If the replica is the > leader replica it can immediately start accepting writes. > Consider the following scenario: > # A partition is created in some selected log directory, and some produce > traffic is accepted > # Before the broker is able to notify the controller of the directory > assignment, the broker shuts down > # Upon coming back online, the broker has an offline directory, the same > directory which was chosen to host the replica > # The broker assumes leadership for the replica, but cannot find it in any > available directory and has no way of knowing it was already created because > the directory assignment is still missing > # The replica is created and the previously produced records are lost > Step 4. may seem unlikely due to ISR membership gating leadership, but even > assuming acks=all and replicas>1, if all other replicas are also offline the > broker may still gain leadership. Perhaps KIP-966 is relevant here. > We may need to delay new replica activation until the assignment is > propagated successfully. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16061) JBOD follow-ups
Colin McCabe created KAFKA-16061: Summary: JBOD follow-ups Key: KAFKA-16061 URL: https://issues.apache.org/jira/browse/KAFKA-16061 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15979) Add KIP-1001 CurrentControllerId metric
Colin McCabe created KAFKA-15979: Summary: Add KIP-1001 CurrentControllerId metric Key: KAFKA-15979 URL: https://issues.apache.org/jira/browse/KAFKA-15979 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15980) Add KIP-1001 CurrentControllerId metric
Colin McCabe created KAFKA-15980: Summary: Add KIP-1001 CurrentControllerId metric Key: KAFKA-15980 URL: https://issues.apache.org/jira/browse/KAFKA-15980 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15956) MetadataShell must take the directory lock when reading
Colin McCabe created KAFKA-15956: Summary: MetadataShell must take the directory lock when reading Key: KAFKA-15956 URL: https://issues.apache.org/jira/browse/KAFKA-15956 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe MetadataShell must take the directory lock when reading files, to avoid unpleasant surprises from concurrent reads and writes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15311) Fix docs about reverting to ZooKeeper mode during KRaft migration
[ https://issues.apache.org/jira/browse/KAFKA-15311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15311. -- Fix Version/s: 3.7.0 Resolution: Fixed > Fix docs about reverting to ZooKeeper mode during KRaft migration > - > > Key: KAFKA-15311 > URL: https://issues.apache.org/jira/browse/KAFKA-15311 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Minor > Fix For: 3.7.0 > > > The cocs incorrectly state that reverting to ZooKeeper mode during KRaft > migration is not possible -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15922) Add MetadataVersion for JBOD
Colin McCabe created KAFKA-15922: Summary: Add MetadataVersion for JBOD Key: KAFKA-15922 URL: https://issues.apache.org/jira/browse/KAFKA-15922 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15860) ControllerRegistration must be written out to the metadata image
[ https://issues.apache.org/jira/browse/KAFKA-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15860. -- Fix Version/s: 3.7.0 Resolution: Fixed > ControllerRegistration must be written out to the metadata image > > > Key: KAFKA-15860 > URL: https://issues.apache.org/jira/browse/KAFKA-15860 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > Fix For: 3.7.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15860) ControllerRegistration must be written out to the metadata image
Colin McCabe created KAFKA-15860: Summary: ControllerRegistration must be written out to the metadata image Key: KAFKA-15860 URL: https://issues.apache.org/jira/browse/KAFKA-15860 Project: Kafka Issue Type: Bug Affects Versions: 3.7.0 Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15532) ZkWriteBehindLag should not be reported by inactive controllers
[ https://issues.apache.org/jira/browse/KAFKA-15532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15532. -- Resolution: Fixed > ZkWriteBehindLag should not be reported by inactive controllers > --- > > Key: KAFKA-15532 > URL: https://issues.apache.org/jira/browse/KAFKA-15532 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: David Arthur >Assignee: David Arthur >Priority: Minor > > Since only the active controller is performing the dual-write to ZK during a > migration, it should be the only controller to report the ZkWriteBehindLag > metric. > > Currently, if the controller fails over during a migration, the previous > active controller will incorrectly report its last value for ZkWriteBehindLag > forever. Instead, it should report zero. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15754. -- Resolution: Invalid kafka-storage tool can not, in fact, generate uuids starting with '-' > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15704) ControllerRegistrationRequest must set ZkMigrationReady field if appropriate
[ https://issues.apache.org/jira/browse/KAFKA-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15704. -- Resolution: Fixed > ControllerRegistrationRequest must set ZkMigrationReady field if appropriate > > > Key: KAFKA-15704 > URL: https://issues.apache.org/jira/browse/KAFKA-15704 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Colin McCabe >Assignee: David Arthur >Priority: Major > Fix For: 3.7.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15704) ControllerRegistrationRequest must set ZkMigrationReady field if appropriate
Colin McCabe created KAFKA-15704: Summary: ControllerRegistrationRequest must set ZkMigrationReady field if appropriate Key: KAFKA-15704 URL: https://issues.apache.org/jira/browse/KAFKA-15704 Project: Kafka Issue Type: Bug Affects Versions: 3.7.0 Reporter: Colin McCabe Fix For: 3.7.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15230) ApiVersions data between controllers is not reliable
[ https://issues.apache.org/jira/browse/KAFKA-15230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15230. -- Fix Version/s: 3.7.0 Resolution: Fixed > ApiVersions data between controllers is not reliable > > > Key: KAFKA-15230 > URL: https://issues.apache.org/jira/browse/KAFKA-15230 > Project: Kafka > Issue Type: Bug >Reporter: David Arthur >Assignee: Colin McCabe >Priority: Critical > Fix For: 3.7.0 > > > While testing ZK migrations, I noticed a case where the controller was not > starting the migration due to the missing ApiVersions data from other > controllers. This was unexpected because the quorum was running and the > followers were replicating the metadata log as expected. After examining a > heap dump of the leader, it was in fact the case that the ApiVersions map of > NodeApiVersions was empty. > > After further investigation and offline discussion with [~jsancio], we > realized that after the initial leader election, the connection from the Raft > leader to the followers will become idle and eventually timeout and close. > This causes NetworkClient to purge the NodeApiVersions data for the closed > connections. > > There are two main side effects of this behavior: > 1) If migrations are not started within the idle timeout period (10 minutes, > by default), then they will not be able to be started. After this timeout > period, I was unable to restart the controllers in such a way that the leader > had active connections with all followers. > 2) Dynamically updating features, such as "metadata.version", is not > guaranteed to be safe > > There is a partial workaround for the migration issue. If we set " > connections.max.idle.ms" to -1, the Raft leader will never disconnect from > the followers. However, if a follower restarts, the leader will not > re-establish a connection. > > The feature update issue has no safe workarounds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15466) Add KIP-919 support to kafka-features.sh, kafka-metadata-quorum.sh, kafka-cluster.sh
Colin McCabe created KAFKA-15466: Summary: Add KIP-919 support to kafka-features.sh, kafka-metadata-quorum.sh, kafka-cluster.sh Key: KAFKA-15466 URL: https://issues.apache.org/jira/browse/KAFKA-15466 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15458) Fully resolve endpoint information before registering controllers
Colin McCabe created KAFKA-15458: Summary: Fully resolve endpoint information before registering controllers Key: KAFKA-15458 URL: https://issues.apache.org/jira/browse/KAFKA-15458 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15183) Add more controller, loader, snapshot emitter metrics
[ https://issues.apache.org/jira/browse/KAFKA-15183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15183. -- Fix Version/s: 3.6.0 Assignee: Colin McCabe Resolution: Fixed Most of the KIP-938 metrics are now implemented for 3.6. The exception is the ForwardingManager metrics, which will have to wait until 3.7. > Add more controller, loader, snapshot emitter metrics > - > > Key: KAFKA-15183 > URL: https://issues.apache.org/jira/browse/KAFKA-15183 > Project: Kafka > Issue Type: Improvement >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > Fix For: 3.6.0 > > > Add the controller, loader, and snapshot emitter metrics from KIP-938. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15406) Add the ForwardingManager metrics from KIP-938
Colin McCabe created KAFKA-15406: Summary: Add the ForwardingManager metrics from KIP-938 Key: KAFKA-15406 URL: https://issues.apache.org/jira/browse/KAFKA-15406 Project: Kafka Issue Type: Improvement Affects Versions: 3.7.0 Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14305) KRaft Metadata Transactions
[ https://issues.apache.org/jira/browse/KAFKA-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14305. -- Resolution: Fixed > KRaft Metadata Transactions > --- > > Key: KAFKA-14305 > URL: https://issues.apache.org/jira/browse/KAFKA-14305 > Project: Kafka > Issue Type: New Feature >Reporter: David Arthur >Assignee: Colin McCabe >Priority: Major > Fix For: 3.6.0 > > > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-868+Metadata+Transactions] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15374) ZK migration fails on configs for default broker resource
[ https://issues.apache.org/jira/browse/KAFKA-15374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15374. -- Assignee: David Arthur Resolution: Fixed > ZK migration fails on configs for default broker resource > - > > Key: KAFKA-15374 > URL: https://issues.apache.org/jira/browse/KAFKA-15374 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.1 >Reporter: David Arthur >Assignee: David Arthur >Priority: Critical > Fix For: 3.6.0, 3.5.2 > > > This error was seen while performing a ZK to KRaft migration on a cluster > with configs for the default broker resource > > {code:java} > java.lang.NumberFormatException: For input string: "" > at > java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67) > at java.base/java.lang.Integer.parseInt(Integer.java:678) > at java.base/java.lang.Integer.valueOf(Integer.java:999) > at > kafka.zk.ZkMigrationClient.$anonfun$migrateBrokerConfigs$2(ZkMigrationClient.scala:371) > at > kafka.zk.migration.ZkConfigMigrationClient.$anonfun$iterateBrokerConfigs$1(ZkConfigMigrationClient.scala:174) > at > kafka.zk.migration.ZkConfigMigrationClient.$anonfun$iterateBrokerConfigs$1$adapted(ZkConfigMigrationClient.scala:156) > at > scala.collection.immutable.BitmapIndexedMapNode.foreach(HashMap.scala:1076) > at scala.collection.immutable.HashMap.foreach(HashMap.scala:1083) > at > kafka.zk.migration.ZkConfigMigrationClient.iterateBrokerConfigs(ZkConfigMigrationClient.scala:156) > at > kafka.zk.ZkMigrationClient.migrateBrokerConfigs(ZkMigrationClient.scala:370) > at > kafka.zk.ZkMigrationClient.cleanAndMigrateAllMetadata(ZkMigrationClient.scala:530) > at > org.apache.kafka.metadata.migration.KRaftMigrationDriver$MigrateMetadataEvent.run(KRaftMigrationDriver.java:618) > at > org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181) > at java.base/java.lang.Thread.run(Thread.java:833) > at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:64) > {code} > > This is due to not considering the default resource type when we collect the > broker IDs in ZkMigrationClient#migrateBrokerConfigs. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15389) MetadataLoader may publish an empty image on first start
[ https://issues.apache.org/jira/browse/KAFKA-15389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15389. -- Fix Version/s: 3.6.0 Resolution: Fixed > MetadataLoader may publish an empty image on first start > > > Key: KAFKA-15389 > URL: https://issues.apache.org/jira/browse/KAFKA-15389 > Project: Kafka > Issue Type: Bug >Reporter: David Arthur >Assignee: David Arthur >Priority: Minor > Fix For: 3.6.0 > > > When first loading from an empty log, there is a case where MetadataLoader > can publish an image before the bootstrap records are processed. This isn't > exactly incorrect, since all components implicitly start from the empty image > state, but it might be unexpected for some MetadataPublishers. > > For example, in KRaftMigrationDriver, if an old MetadataVersion is > encountered, the driver transitions to the INACTIVE state. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15213) Provide the exact offset to QuorumController.replay
[ https://issues.apache.org/jira/browse/KAFKA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15213. -- Fix Version/s: 3.6.0 Resolution: Fixed > Provide the exact offset to QuorumController.replay > --- > > Key: KAFKA-15213 > URL: https://issues.apache.org/jira/browse/KAFKA-15213 > Project: Kafka > Issue Type: Improvement >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > Fix For: 3.6.0 > > > Provide the exact offset to QuorumController.replay so that we can implement > metadata transactions. We need this so that we can know the offset where the > records will be applied before we apply them in QuorumControllers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15220) KRaftMetadataCache returns fenced brokers from getAliveBrokerNode
[ https://issues.apache.org/jira/browse/KAFKA-15220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15220. -- Fix Version/s: 3.6.0 Resolution: Fixed > KRaftMetadataCache returns fenced brokers from getAliveBrokerNode > - > > Key: KAFKA-15220 > URL: https://issues.apache.org/jira/browse/KAFKA-15220 > Project: Kafka > Issue Type: Bug >Reporter: David Mao >Assignee: David Mao >Priority: Major > Fix For: 3.6.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15369) Allow AdminClient to Talk Directly with the KRaft Controller Quorum and add Controller Registration
Colin McCabe created KAFKA-15369: Summary: Allow AdminClient to Talk Directly with the KRaft Controller Quorum and add Controller Registration Key: KAFKA-15369 URL: https://issues.apache.org/jira/browse/KAFKA-15369 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15318) Move Acl publishing outside the QuorumController
Colin McCabe created KAFKA-15318: Summary: Move Acl publishing outside the QuorumController Key: KAFKA-15318 URL: https://issues.apache.org/jira/browse/KAFKA-15318 Project: Kafka Issue Type: Bug Reporter: Colin McCabe On the controller, move Acl publishing into a dedicated MetadataPublisher, AclPublisher. This publisher listens for notifications from MetadataLoader, and receives only committed data. This brings the controller side in line with how the broker has always worked. It also avoids some ugly code related to publishing directly from the QuorumController. Most important of all, it clears the way to implement metadata transactions without worrying about Authorizer state (since it will be handled by the MetadataLoader, along with other metadata image state). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15311) Docs incorrectly state that reverting to ZooKeeper mode during the migration is not possible
Colin McCabe created KAFKA-15311: Summary: Docs incorrectly state that reverting to ZooKeeper mode during the migration is not possible Key: KAFKA-15311 URL: https://issues.apache.org/jira/browse/KAFKA-15311 Project: Kafka Issue Type: Bug Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15213) Provide the exact offset to QuorumController.replay
Colin McCabe created KAFKA-15213: Summary: Provide the exact offset to QuorumController.replay Key: KAFKA-15213 URL: https://issues.apache.org/jira/browse/KAFKA-15213 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe Provide the exact offset to QuorumController.replay so that we can implement metadata transactions. We need this so that we can know the offset where the records will be applied before we apply them in QuorumControllers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15183) Add more controller, loader, snapshot emitter metrics
Colin McCabe created KAFKA-15183: Summary: Add more controller, loader, snapshot emitter metrics Key: KAFKA-15183 URL: https://issues.apache.org/jira/browse/KAFKA-15183 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe Add the controller, loader, and snapshot emitter metrics from KIP-938. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15060) Fix Admin.describeFeatures
Colin McCabe created KAFKA-15060: Summary: Fix Admin.describeFeatures Key: KAFKA-15060 URL: https://issues.apache.org/jira/browse/KAFKA-15060 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Fix Admin.describeFeatures, which was accidentally broken by KAFKA-15007. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15048) Improve handling of non-fatal quorum controller errors
Colin McCabe created KAFKA-15048: Summary: Improve handling of non-fatal quorum controller errors Key: KAFKA-15048 URL: https://issues.apache.org/jira/browse/KAFKA-15048 Project: Kafka Issue Type: Bug Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15043) Create a kcontroller metric for expired broker heartbeats
Colin McCabe created KAFKA-15043: Summary: Create a kcontroller metric for expired broker heartbeats Key: KAFKA-15043 URL: https://issues.apache.org/jira/browse/KAFKA-15043 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15019) Improve handling of overload situations in the kcontroller
Colin McCabe created KAFKA-15019: Summary: Improve handling of overload situations in the kcontroller Key: KAFKA-15019 URL: https://issues.apache.org/jira/browse/KAFKA-15019 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe Improve handling of overload situations in the KRaft controller -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14658) When listening on fixed ports, defer port opening until we're ready
[ https://issues.apache.org/jira/browse/KAFKA-14658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14658. -- Resolution: Fixed > When listening on fixed ports, defer port opening until we're ready > --- > > Key: KAFKA-14658 > URL: https://issues.apache.org/jira/browse/KAFKA-14658 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > > When we are listening on fixed ports, we should defer opening ports until > we're ready to accept traffic. If we open the broker port too early, it can > confuse monitoring and deployment systems. This is a particular concern when > in KRaft mode, since in that mode, we create the SocketServer object earlier > in the startup process than when in ZK mode. > The approach taken in this PR is to defer opening the acceptor port until > Acceptor.start is called. Note that when we are listening on a random port, > we continue to open the port "early," in the SocketServer constructor. The > reason for doing this is that there is no other way to find the random port > number the kernel has selected. Since random port assignment is not used in > production deployments, this should be reasonable. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14835) Create ControllerServerMetricsPublisher
[ https://issues.apache.org/jira/browse/KAFKA-14835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14835. -- Resolution: Fixed > Create ControllerServerMetricsPublisher > --- > > Key: KAFKA-14835 > URL: https://issues.apache.org/jira/browse/KAFKA-14835 > Project: Kafka > Issue Type: Improvement >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14857) Fix some MetadataLoader bugs
[ https://issues.apache.org/jira/browse/KAFKA-14857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14857. -- Resolution: Fixed > Fix some MetadataLoader bugs > > > Key: KAFKA-14857 > URL: https://issues.apache.org/jira/browse/KAFKA-14857 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14943) Fix ClientQuotaControlManager validation
[ https://issues.apache.org/jira/browse/KAFKA-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14943. -- Fix Version/s: 3.5.0 Resolution: Fixed > Fix ClientQuotaControlManager validation > > > Key: KAFKA-14943 > URL: https://issues.apache.org/jira/browse/KAFKA-14943 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15009) New ACLs are not written to ZK during migration
[ https://issues.apache.org/jira/browse/KAFKA-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-15009. -- Fix Version/s: 3.5.0 Resolution: Fixed > New ACLs are not written to ZK during migration > --- > > Key: KAFKA-15009 > URL: https://issues.apache.org/jira/browse/KAFKA-15009 > Project: Kafka > Issue Type: Bug > Components: kraft >Affects Versions: 3.5.0 >Reporter: Akhilesh Chaganti >Assignee: Akhilesh Chaganti >Priority: Blocker > Labels: kraft, migration > Fix For: 3.5.0 > > > While handling snapshots in dual-write mode, we are missing the logic to > detect new ACLs created in KRaft. This means we will not write these new ACLs > back to ZK and they would be missing if a user rolled back their cluster to > ZK mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14918) KRaft controller sending ZK controller RPCs to KRaft brokers
[ https://issues.apache.org/jira/browse/KAFKA-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14918. -- Resolution: Fixed > KRaft controller sending ZK controller RPCs to KRaft brokers > > > Key: KAFKA-14918 > URL: https://issues.apache.org/jira/browse/KAFKA-14918 > Project: Kafka > Issue Type: Sub-task >Reporter: David Arthur >Assignee: David Arthur >Priority: Critical > Fix For: 3.5.0 > > > During the migration, when upgrading a ZK broker to KRaft, the controller is > incorrectly sending UpdateMetadata requests to the KRaft controller. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14698) Received request api key LEADER_AND_ISR which is not enabled
[ https://issues.apache.org/jira/browse/KAFKA-14698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14698. -- Fix Version/s: (was: 3.4.1) Resolution: Duplicate > Received request api key LEADER_AND_ISR which is not enabled > > > Key: KAFKA-14698 > URL: https://issues.apache.org/jira/browse/KAFKA-14698 > Project: Kafka > Issue Type: Bug > Components: kraft >Affects Versions: 3.4.0 >Reporter: Mickael Maison >Assignee: Akhilesh Chaganti >Priority: Major > Fix For: 3.5.0 > > Attachments: broker0.log, controller.log, test_online_migration.tar.gz > > > I started from a Kafka cluster (with ZooKeeper) with 2 brokers. There's a > single topic "test" with 2 partitions and 2 replicas and the internal > __consumer_offsets topics. > While following the ZooKeeper to KRaft migration steps from > [https://kafka.apache.org/documentation/#kraft_zk_migration], I'm hitting > issues at the Migrating brokers to KRaft step. > When I restart a broker as KRaft, it repetitively prints the following error: > {code:java} > org.apache.kafka.common.errors.InvalidRequestException: Received request api > key LEADER_AND_ISR which is not enabled > [2023-02-09 16:14:30,334] ERROR Closing socket for > 192.168.1.11:9092-192.168.1.11:63737-371 because of error > (kafka.network.Processor) > {code} > The controller repetitively prints the following error: > {code:java} > [2023-02-09 16:12:27,456] WARN [Controller id=1000, targetBrokerId=0] > Connection to node 0 (mmaison-mac.home/192.168.1.11:9092) could not be > established. Broker may not be available. > (org.apache.kafka.clients.NetworkClient) > [2023-02-09 16:12:27,456] INFO [Controller id=1000, targetBrokerId=0] Client > requested connection close from node 0 > (org.apache.kafka.clients.NetworkClient) > [2023-02-09 16:12:27,560] INFO [Controller id=1000, targetBrokerId=0] Node 0 > disconnected. (org.apache.kafka.clients.NetworkClient) > {code} > Attached the controller logs and logs from broker-0 > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14943) Fix ClientQuotaControlManager validation
Colin McCabe created KAFKA-14943: Summary: Fix ClientQuotaControlManager validation Key: KAFKA-14943 URL: https://issues.apache.org/jira/browse/KAFKA-14943 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14775) Support SCRAM for broker to controller authentication
[ https://issues.apache.org/jira/browse/KAFKA-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14775. -- Fix Version/s: 3.5.0 Assignee: Colin McCabe (was: Proven Provenzano) Resolution: Fixed > Support SCRAM for broker to controller authentication > - > > Key: KAFKA-14775 > URL: https://issues.apache.org/jira/browse/KAFKA-14775 > Project: Kafka > Issue Type: Improvement > Components: kraft >Reporter: Proven Provenzano >Assignee: Colin McCabe >Priority: Major > Fix For: 3.5.0 > > > We need to apply SCRAM changes to controller nodes. > We need to handle DescribeUserScramCredentialsRequest in the controller nodes. > As part of this update I will split out the code from > {{BrokerMetadataPublisher.scala}} for applying the SCRAM into a separate > {{{}MetadataPublisher{}}}, as we did with {{DynamicConfigPublisher}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14894) MetadataLoader must call finishSnapshot after loading a snapshot
[ https://issues.apache.org/jira/browse/KAFKA-14894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14894. -- Fix Version/s: 3.5.0 Reviewer: David Arthur Resolution: Fixed > MetadataLoader must call finishSnapshot after loading a snapshot > > > Key: KAFKA-14894 > URL: https://issues.apache.org/jira/browse/KAFKA-14894 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14894) MetadataLoader must call finishSnapshot after loading a snapshot
Colin McCabe created KAFKA-14894: Summary: MetadataLoader must call finishSnapshot after loading a snapshot Key: KAFKA-14894 URL: https://issues.apache.org/jira/browse/KAFKA-14894 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14857) Fix some MetadataLoader bugs
Colin McCabe created KAFKA-14857: Summary: Fix some MetadataLoader bugs Key: KAFKA-14857 URL: https://issues.apache.org/jira/browse/KAFKA-14857 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14436) Initialize KRaft with arbitrary epoch
[ https://issues.apache.org/jira/browse/KAFKA-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14436. -- Fix Version/s: 3.4.0 Resolution: Won't Fix > Initialize KRaft with arbitrary epoch > - > > Key: KAFKA-14436 > URL: https://issues.apache.org/jira/browse/KAFKA-14436 > Project: Kafka > Issue Type: Sub-task >Reporter: David Arthur >Assignee: Alyssa Huang >Priority: Major > Fix For: 3.4.0 > > > For the ZK migration, we need to be able to initialize Raft with an > arbitrarily high epoch (within the size limit). This is because during the > migration, we want to write the Raft epoch as the controller epoch in ZK. We > require that epochs in /controller_epoch are monotonic in order for brokers > to behave normally. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14846) Fix overly large record batches in ZkMigrationClient
Colin McCabe created KAFKA-14846: Summary: Fix overly large record batches in ZkMigrationClient Key: KAFKA-14846 URL: https://issues.apache.org/jira/browse/KAFKA-14846 Project: Kafka Issue Type: Sub-task Affects Versions: 3.4.0 Reporter: Colin McCabe ZkMigrationClient should not create overly large record batches -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14493) Zk to KRaft migration state machine in KRaft controller
[ https://issues.apache.org/jira/browse/KAFKA-14493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14493. -- Fix Version/s: 3.4.0 Resolution: Fixed > Zk to KRaft migration state machine in KRaft controller > --- > > Key: KAFKA-14493 > URL: https://issues.apache.org/jira/browse/KAFKA-14493 > Project: Kafka > Issue Type: Sub-task >Reporter: Akhilesh Chaganti >Assignee: Akhilesh Chaganti >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14458) RPC Handler to ZkBrokers from KRaft Controller
[ https://issues.apache.org/jira/browse/KAFKA-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14458. -- Fix Version/s: 3.4.0 Resolution: Fixed > RPC Handler to ZkBrokers from KRaft Controller > -- > > Key: KAFKA-14458 > URL: https://issues.apache.org/jira/browse/KAFKA-14458 > Project: Kafka > Issue Type: Sub-task >Reporter: Akhilesh Chaganti >Assignee: Akhilesh Chaganti >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14446) API forwarding support in ZkBrokers
[ https://issues.apache.org/jira/browse/KAFKA-14446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14446. -- Fix Version/s: 3.4.0 Resolution: Fixed > API forwarding support in ZkBrokers > --- > > Key: KAFKA-14446 > URL: https://issues.apache.org/jira/browse/KAFKA-14446 > Project: Kafka > Issue Type: Sub-task >Reporter: Akhilesh Chaganti >Assignee: Akhilesh Chaganti >Priority: Major > Fix For: 3.4.0 > > > To support migration, zkBrokers should be able to forward API requests to the > Controller, whether it is zkController or kraftController. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14447) Controlled shutdown for ZK brokers during migration
[ https://issues.apache.org/jira/browse/KAFKA-14447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14447. -- Fix Version/s: 3.4.0 (was: 3.4.1) Resolution: Fixed > Controlled shutdown for ZK brokers during migration > --- > > Key: KAFKA-14447 > URL: https://issues.apache.org/jira/browse/KAFKA-14447 > Project: Kafka > Issue Type: Sub-task >Reporter: David Arthur >Assignee: Luke Chen >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14835) Create ControllerServerMetricsPublisher
Colin McCabe created KAFKA-14835: Summary: Create ControllerServerMetricsPublisher Key: KAFKA-14835 URL: https://issues.apache.org/jira/browse/KAFKA-14835 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14658) Do not open broker ports until we are ready to accept traffic
Colin McCabe created KAFKA-14658: Summary: Do not open broker ports until we are ready to accept traffic Key: KAFKA-14658 URL: https://issues.apache.org/jira/browse/KAFKA-14658 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe We should not open the ports on the broker until we are ready to accept traffic. This is a particular concern when in KRaft mode, since in that mode, we create the SocketServer object earlier in the startup process than when in ZK mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14622) Create a junit test which would have caught KAFKA-14618
Colin McCabe created KAFKA-14622: Summary: Create a junit test which would have caught KAFKA-14618 Key: KAFKA-14622 URL: https://issues.apache.org/jira/browse/KAFKA-14622 Project: Kafka Issue Type: Bug Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14618) Off by one error in generated snapshot IDs causes misaligned fetching
[ https://issues.apache.org/jira/browse/KAFKA-14618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14618. -- Resolution: Fixed > Off by one error in generated snapshot IDs causes misaligned fetching > - > > Key: KAFKA-14618 > URL: https://issues.apache.org/jira/browse/KAFKA-14618 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: José Armando García Sancio >Priority: Blocker > Fix For: 3.4.0 > > > We implemented new snapshot generation logic here: > [https://github.com/apache/kafka/pull/12983]. A few days prior to this patch > getting merged, we had changed the `RaftClient` API to pass the _exclusive_ > offset when generating snapshots instead of the inclusive offset: > [https://github.com/apache/kafka/pull/12981]. Unfortunately, the new snapshot > generation logic was not updated accordingly. The consequence of this is that > the state on replicas can get out of sync. In the best case, the followers > fail replication because the offset after loading a snapshot is no longer > aligned on a batch boundary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14601) Improve exception handling in KafkaEventQueue
Colin McCabe created KAFKA-14601: Summary: Improve exception handling in KafkaEventQueue Key: KAFKA-14601 URL: https://issues.apache.org/jira/browse/KAFKA-14601 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe If KafkaEventQueue gets an InterruptedException while waiting for a condition variable, it currently exits immediately. Instead, it should complete the remaining events exceptionally and then execute the cleanup event. This will allow us to finish any necessary cleanup steps. Also, handle cases where Event#handleException itself throws an exception. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14538) Implement metadata transactions at arbitrary locations in the log
Colin McCabe created KAFKA-14538: Summary: Implement metadata transactions at arbitrary locations in the log Key: KAFKA-14538 URL: https://issues.apache.org/jira/browse/KAFKA-14538 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe Assignee: Colin McCabe Implement metadata transactions at arbitrary locations in the log, not just at the beginning. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14433) Clear all yammer metrics when test harnesses clean up
Colin McCabe created KAFKA-14433: Summary: Clear all yammer metrics when test harnesses clean up Key: KAFKA-14433 URL: https://issues.apache.org/jira/browse/KAFKA-14433 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe We should clear all yammer metrics from the yammer singleton when the integration test harnesses clean up. This would avoid memory leaks in tests that have a lot of test cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14370) Properly close ImageWriter objects
Colin McCabe created KAFKA-14370: Summary: Properly close ImageWriter objects Key: KAFKA-14370 URL: https://issues.apache.org/jira/browse/KAFKA-14370 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14351) Implement controller mutation quotas in KRaft
Colin McCabe created KAFKA-14351: Summary: Implement controller mutation quotas in KRaft Key: KAFKA-14351 URL: https://issues.apache.org/jira/browse/KAFKA-14351 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14350) Support dynamically reconfiguring KRaft controller listeners
Colin McCabe created KAFKA-14350: Summary: Support dynamically reconfiguring KRaft controller listeners Key: KAFKA-14350 URL: https://issues.apache.org/jira/browse/KAFKA-14350 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe Support dynamically reconfiguring KRaft controller listeners. The first step is probably to support modifying existing listeners (SSL settings, SASL settings, connection limit settings, etc.) We can create a follow-on JIRA for adding or removing listeners dynamically (if indeed we want to do that at all, the use cases seem a bit rare) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14349) Support dynamically resizing the KRaft controller's thread pools
Colin McCabe created KAFKA-14349: Summary: Support dynamically resizing the KRaft controller's thread pools Key: KAFKA-14349 URL: https://issues.apache.org/jira/browse/KAFKA-14349 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe Support dynamically resizing the KRaft controller's request handler and network handler thread pools. See {{DynamicBrokerConfig.scala}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14348) Consider renaming MetadataBatchProcessingTimeUs to MetadataDeltaProcessingTimeUs
Colin McCabe created KAFKA-14348: Summary: Consider renaming MetadataBatchProcessingTimeUs to MetadataDeltaProcessingTimeUs Key: KAFKA-14348 URL: https://issues.apache.org/jira/browse/KAFKA-14348 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe Assignee: Colin McCabe We should consider renaming kafka.server.MetadataBatchProcessingTimeUs to kafka.server.MetadataDeltaProcessingTimeUs. The reason is because this metric isn't the time to process a single batch, but the time to process a group of batches given to us by the raft layer. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14327) Unify KRaft snapshot generation between broker and controller
Colin McCabe created KAFKA-14327: Summary: Unify KRaft snapshot generation between broker and controller Key: KAFKA-14327 URL: https://issues.apache.org/jira/browse/KAFKA-14327 Project: Kafka Issue Type: Improvement Reporter: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14290) Fix bugs that could block KRaft controlled shutdown indefinitely
Colin McCabe created KAFKA-14290: Summary: Fix bugs that could block KRaft controlled shutdown indefinitely Key: KAFKA-14290 URL: https://issues.apache.org/jira/browse/KAFKA-14290 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14265) Prefix ACLs may shadow other prefix ACLs
Colin McCabe created KAFKA-14265: Summary: Prefix ACLs may shadow other prefix ACLs Key: KAFKA-14265 URL: https://issues.apache.org/jira/browse/KAFKA-14265 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe Prefix ACLs may shadow other prefix ACLs. Consider the case where we have prefix ACLs for foobar, fooa, and f. If we were matching a resource named "foobar", we'd start scanning at the foobar ACL, hit the fooa ACL, and stop -- missing the f ACL. To fix this, we should re-scan for ACLs at the first divergence point (in this case, f) whenever we hit a mismatch of this kind. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14259) BrokerRegistration#toString throws an exception, terminating metadata replay
Colin McCabe created KAFKA-14259: Summary: BrokerRegistration#toString throws an exception, terminating metadata replay Key: KAFKA-14259 URL: https://issues.apache.org/jira/browse/KAFKA-14259 Project: Kafka Issue Type: Bug Affects Versions: 3.3 Reporter: Colin McCabe Assignee: Colin McCabe Fix For: 3.3 BrokerRegistration#toString throws an exception, terminating metadata replay, because the sorted() method is used on an entry set rather than a key set. {noformat} Caused by: java.util.concurrent.ExecutionException: java.lang.ClassCastException: class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lan g.Comparable are in module java.base of loader 'bootstrap') at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) at kafka.server.BrokerServer.startup(BrokerServer.scala:846) ... 147 more Caused by: java.lang.ClassCastException: class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap') at java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:47) at java.base/java.util.TimSort.countRunAndMakeAscending(TimSort.java:355) at java.base/java.util.TimSort.sort(TimSort.java:220) at java.base/java.util.Arrays.sort(Arrays.java:1307) at java.base/java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:353) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:510) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) at org.apache.kafka.metadata.BrokerRegistration.toString(BrokerRegistration.java:228) at java.base/java.util.Formatter$FormatSpecifier.printString(Formatter.java:3056)
[jira] [Created] (KAFKA-14258) Add ducktape or junit test verifying that brokers can reload snapshots after startup
Colin McCabe created KAFKA-14258: Summary: Add ducktape or junit test verifying that brokers can reload snapshots after startup Key: KAFKA-14258 URL: https://issues.apache.org/jira/browse/KAFKA-14258 Project: Kafka Issue Type: Test Reporter: Colin McCabe We should add a ducktape or junit test that verifies that brokers can reload snapshots after startup. This code path is not exercised frequently but it is important. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14243) Disable unsafe downgrade in 3.3
Colin McCabe created KAFKA-14243: Summary: Disable unsafe downgrade in 3.3 Key: KAFKA-14243 URL: https://issues.apache.org/jira/browse/KAFKA-14243 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe Disable unsafe downgrade in 3.3 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14216) Remove ZK reference from org.apache.kafka.server.quota.ClientQuotaCallback javadoc
[ https://issues.apache.org/jira/browse/KAFKA-14216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14216. -- Fix Version/s: 3.3 Reviewer: Luke Chen Resolution: Fixed > Remove ZK reference from org.apache.kafka.server.quota.ClientQuotaCallback > javadoc > -- > > Key: KAFKA-14216 > URL: https://issues.apache.org/jira/browse/KAFKA-14216 > Project: Kafka > Issue Type: Bug > Components: docs, documentation >Affects Versions: 3.3.0, 3.3 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Blocker > Fix For: 3.3 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14217) app-reset-tool.html should remove reference to --zookeeper flag that no longer exists
[ https://issues.apache.org/jira/browse/KAFKA-14217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14217. -- Fix Version/s: 3.3 Reviewer: Luke Chen Resolution: Fixed > app-reset-tool.html should remove reference to --zookeeper flag that no > longer exists > - > > Key: KAFKA-14217 > URL: https://issues.apache.org/jira/browse/KAFKA-14217 > Project: Kafka > Issue Type: Bug > Components: docs, documentation >Affects Versions: 3.3.0, 3.3 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Blocker > Fix For: 3.3 > > > app-reset-tool.html should remove reference to --zookeeper flag that no > longer exists -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14217) app-reset-tool.html should remove reference to --zookeeper flag that no longer exists
Colin McCabe created KAFKA-14217: Summary: app-reset-tool.html should remove reference to --zookeeper flag that no longer exists Key: KAFKA-14217 URL: https://issues.apache.org/jira/browse/KAFKA-14217 Project: Kafka Issue Type: Bug Components: docs, documentation Affects Versions: 3.30, 3.3 Reporter: Colin McCabe Assignee: Colin McCabe app-reset-tool.html should remove reference to --zookeeper flag that no longer exists -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14216) Remove ZK reference from org.apache.kafka.server.quota.ClientQuotaCallback javadoc
Colin McCabe created KAFKA-14216: Summary: Remove ZK reference from org.apache.kafka.server.quota.ClientQuotaCallback javadoc Key: KAFKA-14216 URL: https://issues.apache.org/jira/browse/KAFKA-14216 Project: Kafka Issue Type: Bug Components: docs, documentation Affects Versions: 3.3.0, 3.3 Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14200) kafka-features.sh must exit with non-zero error code on error
[ https://issues.apache.org/jira/browse/KAFKA-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14200. -- Reviewer: David Arthur Resolution: Fixed > kafka-features.sh must exit with non-zero error code on error > - > > Key: KAFKA-14200 > URL: https://issues.apache.org/jira/browse/KAFKA-14200 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.3.0, 3.3 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Blocker > Fix For: 3.3.0 > > > kafka-features.sh must exit with a non-zero error code on error. We must do > this in order to catch regressions like KAFKA-13990. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14197) Kraft broker fails to startup after topic creation failure
[ https://issues.apache.org/jira/browse/KAFKA-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14197. -- Resolution: Duplicate > Kraft broker fails to startup after topic creation failure > -- > > Key: KAFKA-14197 > URL: https://issues.apache.org/jira/browse/KAFKA-14197 > Project: Kafka > Issue Type: Bug > Components: kraft >Reporter: Luke Chen >Priority: Blocker > Fix For: 3.3.0 > > > In kraft ControllerWriteEvent, we start by trying to apply the record to > controller in-memory state, then sent out the record via raft client. But if > there is error during sending the records, there's no way to revert the > change to controller in-memory state[1]. > The issue happened when creating topics, controller state is updated with > topic and partition metadata (ex: broker to ISR map), but the record doesn't > send out successfully (ex: RecordBatchTooLargeException). Then, when shutting > down the node, the controlled shutdown will try to remove the broker from ISR > by[2]: > {code:java} > generateLeaderAndIsrUpdates("enterControlledShutdown[" + brokerId + "]", > brokerId, NO_LEADER, records, > brokersToIsrs.partitionsWithBrokerInIsr(brokerId));{code} > > After we appending the partitionChangeRecords, and send to metadata topic > successfully, it'll cause the brokers failed to "replay" these partition > change since these topic/partitions didn't get created successfully > previously. > Even worse, after restarting the node, all the metadata records will replay > again, and the same error happened again, cause the broker cannot start up > successfully. > > The error and call stack is like this, basically, it complains the topic > image can't be found > {code:java} > [2022-09-02 16:29:16,334] ERROR Encountered metadata loading fault: Error > replaying metadata log record at offset 81 > (org.apache.kafka.server.fault.LoggingFaultHandler) > java.lang.NullPointerException > at org.apache.kafka.image.TopicDelta.replay(TopicDelta.java:69) > at org.apache.kafka.image.TopicsDelta.replay(TopicsDelta.java:91) > at org.apache.kafka.image.MetadataDelta.replay(MetadataDelta.java:248) > at org.apache.kafka.image.MetadataDelta.replay(MetadataDelta.java:186) > at > kafka.server.metadata.BrokerMetadataListener.$anonfun$loadBatches$3(BrokerMetadataListener.scala:239) > at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) > at > kafka.server.metadata.BrokerMetadataListener.kafka$server$metadata$BrokerMetadataListener$$loadBatches(BrokerMetadataListener.scala:232) > at > kafka.server.metadata.BrokerMetadataListener$HandleCommitsEvent.run(BrokerMetadataListener.scala:113) > at > org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173) > at java.base/java.lang.Thread.run(Thread.java:829) > {code} > > [1] > [https://github.com/apache/kafka/blob/ef65b6e566ef69b2f9b58038c98a5993563d7a68/metadata/src/main/java/org/apache/kafka/controller/QuorumController.java#L779-L804] > > [2] > [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java#L1270] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14204) QuorumController must correctly handle overly large batches
Colin McCabe created KAFKA-14204: Summary: QuorumController must correctly handle overly large batches Key: KAFKA-14204 URL: https://issues.apache.org/jira/browse/KAFKA-14204 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14200) kafka-features.sh must exit with non-zero error code on error
Colin McCabe created KAFKA-14200: Summary: kafka-features.sh must exit with non-zero error code on error Key: KAFKA-14200 URL: https://issues.apache.org/jira/browse/KAFKA-14200 Project: Kafka Issue Type: Bug Reporter: Colin McCabe Assignee: Colin McCabe kafka-features.sh must exit with a non-zero error code on error. We must do this in order to catch regressions like KAFKA-13990. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14187) kafka-features.sh: add support for --metadata
[ https://issues.apache.org/jira/browse/KAFKA-14187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin McCabe resolved KAFKA-14187. -- Resolution: Fixed > kafka-features.sh: add support for --metadata > - > > Key: KAFKA-14187 > URL: https://issues.apache.org/jira/browse/KAFKA-14187 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.3.0, 3.3 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Blocker > Fix For: 3.3.0 > > > Fix the kafka-features.sh command so that we can upgrade to the new version > as expected. -- This message was sent by Atlassian Jira (v8.20.10#820010)