[jira] [Created] (KAFKA-16757) Fix broker re-registration issues around MV 3.7-IV2

2024-05-13 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16757:


 Summary: Fix broker re-registration issues around MV 3.7-IV2
 Key: KAFKA-16757
 URL: https://issues.apache.org/jira/browse/KAFKA-16757
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe


When upgrading from a MetadataVersion older than 3.7-IV2, we need to resend the 
broker registration, so that the controller can record the storage directories. 
The current code for doing this has several problems, however. One is that it 
tends to trigger even in cases where we don't actually need it. Another is that 
when re-registering the broker, the broker is marked as fenced.

This PR moves the handling of the re-registration case out of 
BrokerMetadataPublisher and into BrokerRegistrationTracker. The re-registration 
code there will only trigger in the case where the broker sees an existing 
registration for itself with no directories set. This is much more targetted 
than the original code.

Additionally, in ClusterControlManager, when re-registering the same broker, we 
now preserve its fencing and shutdown state, rather than clearing those. (There 
isn't any good reason re-registering the same broker should clear these 
things... this was purely an oversight.) Note that we can tell the broker is 
"the same" because it has the same IncarnationId.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16649) Fix potential deadlock in DynamicBrokerConfig

2024-04-30 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16649:


 Summary: Fix potential deadlock in DynamicBrokerConfig
 Key: KAFKA-16649
 URL: https://issues.apache.org/jira/browse/KAFKA-16649
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16624) Don't generate useless PartitionChangeRecord on older MV

2024-04-25 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16624:


 Summary: Don't generate useless PartitionChangeRecord on older MV
 Key: KAFKA-16624
 URL: https://issues.apache.org/jira/browse/KAFKA-16624
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe


Fix a case where we could generate useless PartitionChangeRecords on metadata 
versions older than 3.6-IV0. This could happen in the case where we had an ISR 
with only one broker in it, and we were trying to go down to a fully empty ISR. 
In this case, PartitionChangeBuilder would block the record to going down to a 
fully empty ISR (since that is not valid in these pre-KIP-966 metadata 
versions), but it would still emit the record, even though it had no effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16509) CurrentControllerId metric is unreliable in ZK mode

2024-04-10 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16509:


 Summary: CurrentControllerId metric is unreliable in ZK mode
 Key: KAFKA-16509
 URL: https://issues.apache.org/jira/browse/KAFKA-16509
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe


The CurrentControllerId metric added by KIP-1001 is unreliable in ZK mode. 
Sometimes when there is no active ZK-based controller, it still shows the 
previous controller ID. Instead, it should show -1 in that situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16475) Create unit test for TopicImageNode

2024-04-04 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16475:


 Summary: Create unit test for TopicImageNode
 Key: KAFKA-16475
 URL: https://issues.apache.org/jira/browse/KAFKA-16475
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16469) Metadata Schema Checker

2024-04-03 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16469:


 Summary: Metadata Schema Checker
 Key: KAFKA-16469
 URL: https://issues.apache.org/jira/browse/KAFKA-16469
 Project: Kafka
  Issue Type: New Feature
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16428) Fix bug where config change notification znode may not get created during migration

2024-03-27 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16428.
--
Resolution: Fixed

> Fix bug where config change notification znode may not get created during 
> migration
> ---
>
> Key: KAFKA-16428
> URL: https://issues.apache.org/jira/browse/KAFKA-16428
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0, 3.6.1
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.6.2, 3.8.0, 3.7.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16411) Correctly migrate default client quota entities in KRaft migration

2024-03-27 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16411.
--
Resolution: Fixed

> Correctly migrate default client quota entities in KRaft migration
> --
>
> Key: KAFKA-16411
> URL: https://issues.apache.org/jira/browse/KAFKA-16411
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.6.2, 3.8.0, 3.7.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16435) Add test for KAFKA-16428

2024-03-27 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16435:


 Summary: Add test for KAFKA-16428
 Key: KAFKA-16435
 URL: https://issues.apache.org/jira/browse/KAFKA-16435
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe


Add a test for KAFKA-16428: Fix bug where config change notification znode may 
not get created during migration #15608



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16428) Fix bug where config change notification znode may not get created during migration

2024-03-26 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16428:


 Summary: Fix bug where config change notification znode may not 
get created during migration
 Key: KAFKA-16428
 URL: https://issues.apache.org/jira/browse/KAFKA-16428
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16411) Correctly migrate default entities in KRaft migration

2024-03-22 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16411:


 Summary: Correctly migrate default entities in KRaft migration
 Key: KAFKA-16411
 URL: https://issues.apache.org/jira/browse/KAFKA-16411
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16321) Default directory ids to MIGRATING, not UNASSIGNED

2024-03-01 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16321:


 Summary: Default directory ids to MIGRATING, not UNASSIGNED
 Key: KAFKA-16321
 URL: https://issues.apache.org/jira/browse/KAFKA-16321
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe


Directory ids should be defaulted to MIGRATING, not UNASSIGNED.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16216) Reduce batch size for initial metadata load during ZK migration

2024-02-01 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16216.
--
Fix Version/s: 3.7.0
 Reviewer: Colin McCabe
 Assignee: David Arthur  (was: Colin McCabe)
   Resolution: Fixed

> Reduce batch size for initial metadata load during ZK migration
> ---
>
> Key: KAFKA-16216
> URL: https://issues.apache.org/jira/browse/KAFKA-16216
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: David Arthur
>Priority: Major
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16216) Reduce batch size for initial metadata load during ZK migration

2024-02-01 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16216:


 Summary: Reduce batch size for initial metadata load during ZK 
migration
 Key: KAFKA-16216
 URL: https://issues.apache.org/jira/browse/KAFKA-16216
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: David Arthur






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16180) Full metadata request sometimes fails during zk migration

2024-01-19 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16180:


 Summary: Full metadata request sometimes fails during zk migration
 Key: KAFKA-16180
 URL: https://issues.apache.org/jira/browse/KAFKA-16180
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Colin McCabe


Example:

{{java.util.NoSuchElementException: 
lkc-gnjo9m_dev_kafka.stream.detection.bucket3-KTABLE-SUPPRESS-STATE-STORE-08-changelog
at 
scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:508)
at 
scala.collection.mutable.AnyRefMap$ExceptionDefault.apply(AnyRefMap.scala:507)
at scala.collection.mutable.AnyRefMap.apply(AnyRefMap.scala:207)
at 
kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2(ZkMetadataCache.scala:112)
at 
kafka.server.metadata.ZkMetadataCache$.$anonfun$maybeInjectDeletedPartitionsFromFullMetadataRequest$2$adapted(ZkMetadataCache.scala:105)
at scala.collection.immutable.HashSet.foreach(HashSet.scala:958)
at 
kafka.server.metadata.ZkMetadataCache$.maybeInjectDeletedPartitionsFromFullMetadataRequest(ZkMetadataCache.scala:105)
at 
kafka.server.metadata.ZkMetadataCache.$anonfun$updateMetadata$1(ZkMetadataCache.scala:506)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:183)
at 
kafka.server.metadata.ZkMetadataCache.updateMetadata(ZkMetadataCache.scala:496)
at 
kafka.server.ReplicaManager.maybeUpdateMetadataCache(ReplicaManager.scala:2482)
at 
kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:733)
at kafka.server.KafkaApis.handle(KafkaApis.scala:349)
at 
kafka.server.KafkaRequestHandler.$anonfun$poll$8(KafkaRequestHandler.scala:210)
at 
kafka.server.KafkaRequestHandler.$anonfun$poll$8$adapted(KafkaRequestHandler.scala:210)
at 
io.confluent.kafka.availability.ThreadCountersManager.wrapEngine(ThreadCountersManager.java:146)
at kafka.server.KafkaRequestHandler.poll(KafkaRequestHandler.scala:210)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:151)
at java.base/java.lang.Thread.run(Thread.java:1583)
at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:66)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16078) Be more consistent about getting the latest MetadataVersion

2024-01-17 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16078.
--
Fix Version/s: 3.7.0
 Reviewer: Colin Patrick McCabe
   Resolution: Fixed

> Be more consistent about getting the latest MetadataVersion
> ---
>
> Key: KAFKA-16078
> URL: https://issues.apache.org/jira/browse/KAFKA-16078
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Major
> Fix For: 3.7.0
>
>
> The InterBrokerProtocolVersion currently defaults to a non-production 
> MetadataVersion. We should be more consistent about getting the latest 
> MetadataVersion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16131) Repeated UnsupportedVersionException logged when running Kafka 3.7.0-RC2 KRaft cluster with metadata version 3.6

2024-01-17 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16131.
--
Resolution: Fixed

> Repeated UnsupportedVersionException logged when running Kafka 3.7.0-RC2 
> KRaft cluster with metadata version 3.6
> 
>
> Key: KAFKA-16131
> URL: https://issues.apache.org/jira/browse/KAFKA-16131
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Jakub Scholz
>Assignee: Proven Provenzano
>Priority: Blocker
> Fix For: 3.7.0
>
>
> When running Kafka 3.7.0-RC2 as a KRaft cluster with metadata version set to 
> 3.6-IV2 metadata version, it throws repeated errors like this in the 
> controller logs:
> {quote}2024-01-13 16:58:01,197 INFO [QuorumController id=0] 
> assignReplicasToDirs: event failed with UnsupportedVersionException in 15 
> microseconds. (org.apache.kafka.controller.QuorumController) 
> [quorum-controller-0-event-handler]
> 2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0] Unexpected error 
> handling request RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS, apiVersion=0, 
> clientId=1000, correlationId=14, headerVersion=2) – 
> AssignReplicasToDirsRequestData(brokerId=1000, brokerEpoch=5, 
> directories=[DirectoryData(id=w_uxN7pwQ6eXSMrOKceYIQ, 
> topics=[TopicData(topicId=bvAKLSwmR7iJoKv2yZgygQ, 
> partitions=[PartitionData(partitionIndex=2), 
> PartitionData(partitionIndex=1)]), TopicData(topicId=uNe7f5VrQgO0zST6yH1jDQ, 
> partitions=[PartitionData(partitionIndex=0)])])]) with context 
> RequestContext(header=RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS, 
> apiVersion=0, clientId=1000, correlationId=14, headerVersion=2), 
> connectionId='172.16.14.219:9090-172.16.14.217:53590-7', 
> clientAddress=/[172.16.14.217|http://172.16.14.217/], 
> principal=User:CN=my-cluster-kafka,O=io.strimzi, 
> listenerName=ListenerName(CONTROLPLANE-9090), securityProtocol=SSL, 
> clientInformation=ClientInformation(softwareName=apache-kafka-java, 
> softwareVersion=3.7.0), fromPrivilegedListener=false, 
> principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@71004ad2])
>  (kafka.server.ControllerApis) [quorum-controller-0-event-handler]
> java.util.concurrent.CompletionException: 
> org.apache.kafka.common.errors.UnsupportedVersionException: Directory 
> assignment is not supported yet.
> at 
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
>  at 
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
>  at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636)
>  at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
>  at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
>  at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:880)
>  at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871)
>  at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:148)
>  at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:137)
>  at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
>  at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
>  at java.base/java.lang.Thread.run(Thread.java:840)
> Caused by: org.apache.kafka.common.errors.UnsupportedVersionException: 
> Directory assignment is not supported yet.
> {quote}
>  
> With the metadata version set to 3.6-IV2, it makes sense that the request is 
> not supported. But the request should in such case not be sent at all.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16121) Partition reassignments in ZK migration dual write mode stalled until leader epoch incremented

2024-01-15 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16121.
--
Fix Version/s: 3.7.0
 Reviewer: Colin McCabe
 Assignee: David Mao
   Resolution: Duplicate

> Partition reassignments in ZK migration dual write mode stalled until leader 
> epoch incremented
> --
>
> Key: KAFKA-16121
> URL: https://issues.apache.org/jira/browse/KAFKA-16121
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Mao
>Assignee: David Mao
>Priority: Major
> Fix For: 3.7.0
>
>
> I noticed this in an integration test in 
> https://github.com/apache/kafka/pull/15184
> In ZK mode, partition leaders rely on the LeaderAndIsr request to be notified 
> of new replicas as part of a reassignment. In ZK mode, we ignore any 
> LeaderAndIsr request where the partition leader epoch is less than or equal 
> to the current partition leader epoch.
> In KRaft mode, we do not bump the leader epoch when starting a new 
> reassignment, see: `triggerLeaderEpochBumpIfNeeded`. This means that the 
> leader will ignore the LISR request initiating the reassignment until a 
> leader epoch bump is triggered through another means, for instance preferred 
> leader election.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16120) Partition reassignments in ZK migration dual write leaves stray partitions

2024-01-14 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16120.
--
Fix Version/s: 3.7.0
 Reviewer: Colin McCabe
 Assignee: David Mao
   Resolution: Fixed

> Partition reassignments in ZK migration dual write leaves stray partitions
> --
>
> Key: KAFKA-16120
> URL: https://issues.apache.org/jira/browse/KAFKA-16120
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Mao
>Assignee: David Mao
>Priority: Major
> Fix For: 3.7.0
>
>
> When a reassignment is completed in ZK migration dual-write mode, the 
> `StopReplica` sent by the kraft quorum migration propagator is sent with 
> `delete = false` for deleted replicas when processing the topic delta. This 
> results in stray replicas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16126) Kcontroller dynamic configurations may fail to apply at startup

2024-01-14 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16126:


 Summary: Kcontroller dynamic configurations may fail to apply at 
startup
 Key: KAFKA-16126
 URL: https://issues.apache.org/jira/browse/KAFKA-16126
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Colin McCabe
Assignee: Colin McCabe


Some kcontroller dynamic configurations may fail to apply at startup. This 
happens because there is a race between registering the reconfigurables to the 
DynamicBrokerConfig class, and receiving the first update from the metadata 
publisher. We can fix this by registering the reconfigurables first. This seems 
to have been introduced by the "MINOR: Install ControllerServer metadata 
publishers sooner" change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16094) BrokerRegistrationRequest.logDirs field must be ignorable

2024-01-09 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16094.
--
Fix Version/s: 3.7.0
   Resolution: Fixed

> BrokerRegistrationRequest.logDirs field must be ignorable
> -
>
> Key: KAFKA-16094
> URL: https://issues.apache.org/jira/browse/KAFKA-16094
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.7.0
>
>
> 3.7 brokers must be able to register with 3.6 and earlier controllers. So 
> this means that the logDirs field must be ignorable (aka, not sent) if the 
> highest BrokerRegistrationRequest version we can negotiate is older than v2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16094) 3.7 brokers must be able to register with 3.6 and earlier controllers

2024-01-08 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16094:


 Summary: 3.7 brokers must be able to register with 3.6 and earlier 
controllers
 Key: KAFKA-16094
 URL: https://issues.apache.org/jira/browse/KAFKA-16094
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Colin McCabe
Assignee: Colin McCabe


3.7 brokers must be able to register with 3.6 and earlier controllers. So this 
means that the logDirs field must be ignorable (aka, not sent) if the highest 
BrokerRegistrationRequest version we can negotiate is older than v2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14127) KIP-858: Handle JBOD broker disk failure in KRaft

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14127.
--
Resolution: Fixed

> KIP-858: Handle JBOD broker disk failure in KRaft
> -
>
> Key: KAFKA-14127
> URL: https://issues.apache.org/jira/browse/KAFKA-14127
> Project: Kafka
>  Issue Type: Improvement
>  Components: jbod, kraft
>Reporter: Igor Soarez
>Assignee: Igor Soarez
>Priority: Major
>  Labels: 4.0-blocker, kip-500, kraft
> Fix For: 3.7.0
>
>
> Supporting configurations with multiple storage directories in KRaft mode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15650) Data-loss on leader shutdown right after partition creation?

2023-12-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15650.
--
Resolution: Not A Problem

> Data-loss on leader shutdown right after partition creation?
> 
>
> Key: KAFKA-15650
> URL: https://issues.apache.org/jira/browse/KAFKA-15650
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Igor Soarez
>Priority: Major
>
> As per KIP-858, when a replica is created, the broker selects a log directory 
> to host the replica and queues the propagation of the directory assignment to 
> the controller. The replica becomes immediately active, it isn't blocked 
> until the controller confirms the metadata change. If the replica is the 
> leader replica it can immediately start accepting writes. 
> Consider the following scenario:
>  # A partition is created in some selected log directory, and some produce 
> traffic is accepted
>  # Before the broker is able to notify the controller of the directory 
> assignment, the broker shuts down
>  # Upon coming back online, the broker has an offline directory, the same 
> directory which was chosen to host the replica
>  # The broker assumes leadership for the replica, but cannot find it in any 
> available directory and has no way of knowing it was already created because 
> the directory assignment is still missing
>  # The replica is created and the previously produced records are lost
> Step 4. may seem unlikely due to ISR membership gating leadership, but even 
> assuming acks=all and replicas>1, if all other replicas are also offline the 
> broker may still gain leadership. Perhaps KIP-966 is relevant here.
> We may need to delay new replica activation until the assignment is 
> propagated successfully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16061) JBOD follow-ups

2023-12-28 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16061:


 Summary: JBOD follow-ups
 Key: KAFKA-16061
 URL: https://issues.apache.org/jira/browse/KAFKA-16061
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15979) Add KIP-1001 CurrentControllerId metric

2023-12-06 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15979:


 Summary: Add KIP-1001 CurrentControllerId metric
 Key: KAFKA-15979
 URL: https://issues.apache.org/jira/browse/KAFKA-15979
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15980) Add KIP-1001 CurrentControllerId metric

2023-12-06 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15980:


 Summary: Add KIP-1001 CurrentControllerId metric
 Key: KAFKA-15980
 URL: https://issues.apache.org/jira/browse/KAFKA-15980
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15956) MetadataShell must take the directory lock when reading

2023-12-01 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15956:


 Summary: MetadataShell must take the directory lock when reading
 Key: KAFKA-15956
 URL: https://issues.apache.org/jira/browse/KAFKA-15956
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe


MetadataShell must take the directory lock when reading files, to avoid 
unpleasant surprises from concurrent reads and writes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15311) Fix docs about reverting to ZooKeeper mode during KRaft migration

2023-11-29 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15311.
--
Fix Version/s: 3.7.0
   Resolution: Fixed

> Fix docs about reverting to ZooKeeper mode during KRaft migration
> -
>
> Key: KAFKA-15311
> URL: https://issues.apache.org/jira/browse/KAFKA-15311
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Minor
> Fix For: 3.7.0
>
>
> The cocs incorrectly state that reverting to ZooKeeper mode during KRaft 
> migration is not possible



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15922) Add MetadataVersion for JBOD

2023-11-28 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15922:


 Summary: Add MetadataVersion for JBOD
 Key: KAFKA-15922
 URL: https://issues.apache.org/jira/browse/KAFKA-15922
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15860) ControllerRegistration must be written out to the metadata image

2023-11-27 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15860.
--
Fix Version/s: 3.7.0
   Resolution: Fixed

> ControllerRegistration must be written out to the metadata image
> 
>
> Key: KAFKA-15860
> URL: https://issues.apache.org/jira/browse/KAFKA-15860
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15860) ControllerRegistration must be written out to the metadata image

2023-11-20 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15860:


 Summary: ControllerRegistration must be written out to the 
metadata image
 Key: KAFKA-15860
 URL: https://issues.apache.org/jira/browse/KAFKA-15860
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15532) ZkWriteBehindLag should not be reported by inactive controllers

2023-11-13 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15532.
--
Resolution: Fixed

> ZkWriteBehindLag should not be reported by inactive controllers
> ---
>
> Key: KAFKA-15532
> URL: https://issues.apache.org/jira/browse/KAFKA-15532
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Minor
>
> Since only the active controller is performing the dual-write to ZK during a 
> migration, it should be the only controller to report the ZkWriteBehindLag 
> metric. 
>  
> Currently, if the controller fails over during a migration, the previous 
> active controller will incorrectly report its last value for ZkWriteBehindLag 
> forever. Instead, it should report zero.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15754.
--
Resolution: Invalid

kafka-storage tool can not, in fact, generate uuids starting with '-'

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15704) ControllerRegistrationRequest must set ZkMigrationReady field if appropriate

2023-10-27 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15704.
--
Resolution: Fixed

> ControllerRegistrationRequest must set ZkMigrationReady field if appropriate
> 
>
> Key: KAFKA-15704
> URL: https://issues.apache.org/jira/browse/KAFKA-15704
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Colin McCabe
>Assignee: David Arthur
>Priority: Major
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15704) ControllerRegistrationRequest must set ZkMigrationReady field if appropriate

2023-10-27 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15704:


 Summary: ControllerRegistrationRequest must set ZkMigrationReady 
field if appropriate
 Key: KAFKA-15704
 URL: https://issues.apache.org/jira/browse/KAFKA-15704
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Colin McCabe
 Fix For: 3.7.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15230) ApiVersions data between controllers is not reliable

2023-09-28 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15230.
--
Fix Version/s: 3.7.0
   Resolution: Fixed

> ApiVersions data between controllers is not reliable
> 
>
> Key: KAFKA-15230
> URL: https://issues.apache.org/jira/browse/KAFKA-15230
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Arthur
>Assignee: Colin McCabe
>Priority: Critical
> Fix For: 3.7.0
>
>
> While testing ZK migrations, I noticed a case where the controller was not 
> starting the migration due to the missing ApiVersions data from other 
> controllers. This was unexpected because the quorum was running and the 
> followers were replicating the metadata log as expected. After examining a 
> heap dump of the leader, it was in fact the case that the ApiVersions map of 
> NodeApiVersions was empty.
>  
> After further investigation and offline discussion with [~jsancio], we 
> realized that after the initial leader election, the connection from the Raft 
> leader to the followers will become idle and eventually timeout and close. 
> This causes NetworkClient to purge the NodeApiVersions data for the closed 
> connections.
>  
> There are two main side effects of this behavior: 
> 1) If migrations are not started within the idle timeout period (10 minutes, 
> by default), then they will not be able to be started. After this timeout 
> period, I was unable to restart the controllers in such a way that the leader 
> had active connections with all followers.
> 2) Dynamically updating features, such as "metadata.version", is not 
> guaranteed to be safe
>  
> There is a partial workaround for the migration issue. If we set "
> connections.max.idle.ms" to -1, the Raft leader will never disconnect from 
> the followers. However, if a follower restarts, the leader will not 
> re-establish a connection.
>  
> The feature update issue has no safe workarounds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15466) Add KIP-919 support to kafka-features.sh, kafka-metadata-quorum.sh, kafka-cluster.sh

2023-09-13 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15466:


 Summary: Add KIP-919 support to kafka-features.sh, 
kafka-metadata-quorum.sh, kafka-cluster.sh
 Key: KAFKA-15466
 URL: https://issues.apache.org/jira/browse/KAFKA-15466
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15458) Fully resolve endpoint information before registering controllers

2023-09-12 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15458:


 Summary: Fully resolve endpoint information before registering 
controllers
 Key: KAFKA-15458
 URL: https://issues.apache.org/jira/browse/KAFKA-15458
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15183) Add more controller, loader, snapshot emitter metrics

2023-08-25 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15183.
--
Fix Version/s: 3.6.0
 Assignee: Colin McCabe
   Resolution: Fixed

Most of the KIP-938 metrics are now implemented for 3.6. The exception is the 
ForwardingManager metrics, which will have to wait until 3.7.

> Add more controller, loader, snapshot emitter metrics
> -
>
> Key: KAFKA-15183
> URL: https://issues.apache.org/jira/browse/KAFKA-15183
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.6.0
>
>
> Add the controller, loader, and snapshot emitter metrics from KIP-938.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15406) Add the ForwardingManager metrics from KIP-938

2023-08-25 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15406:


 Summary: Add the ForwardingManager metrics from KIP-938
 Key: KAFKA-15406
 URL: https://issues.apache.org/jira/browse/KAFKA-15406
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 3.7.0
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14305) KRaft Metadata Transactions

2023-08-25 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14305.
--
Resolution: Fixed

> KRaft Metadata Transactions
> ---
>
> Key: KAFKA-14305
> URL: https://issues.apache.org/jira/browse/KAFKA-14305
> Project: Kafka
>  Issue Type: New Feature
>Reporter: David Arthur
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.6.0
>
>
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-868+Metadata+Transactions]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15374) ZK migration fails on configs for default broker resource

2023-08-25 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15374.
--
  Assignee: David Arthur
Resolution: Fixed

> ZK migration fails on configs for default broker resource
> -
>
> Key: KAFKA-15374
> URL: https://issues.apache.org/jira/browse/KAFKA-15374
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.5.1
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Critical
> Fix For: 3.6.0, 3.5.2
>
>
> This error was seen while performing a ZK to KRaft migration on a cluster 
> with configs for the default broker resource
>  
> {code:java}
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
>   at java.base/java.lang.Integer.parseInt(Integer.java:678)
>   at java.base/java.lang.Integer.valueOf(Integer.java:999)
>   at 
> kafka.zk.ZkMigrationClient.$anonfun$migrateBrokerConfigs$2(ZkMigrationClient.scala:371)
>   at 
> kafka.zk.migration.ZkConfigMigrationClient.$anonfun$iterateBrokerConfigs$1(ZkConfigMigrationClient.scala:174)
>   at 
> kafka.zk.migration.ZkConfigMigrationClient.$anonfun$iterateBrokerConfigs$1$adapted(ZkConfigMigrationClient.scala:156)
>   at 
> scala.collection.immutable.BitmapIndexedMapNode.foreach(HashMap.scala:1076)
>   at scala.collection.immutable.HashMap.foreach(HashMap.scala:1083)
>   at 
> kafka.zk.migration.ZkConfigMigrationClient.iterateBrokerConfigs(ZkConfigMigrationClient.scala:156)
>   at 
> kafka.zk.ZkMigrationClient.migrateBrokerConfigs(ZkMigrationClient.scala:370)
>   at 
> kafka.zk.ZkMigrationClient.cleanAndMigrateAllMetadata(ZkMigrationClient.scala:530)
>   at 
> org.apache.kafka.metadata.migration.KRaftMigrationDriver$MigrateMetadataEvent.run(KRaftMigrationDriver.java:618)
>   at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
>   at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
>   at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
>   at java.base/java.lang.Thread.run(Thread.java:833)
>   at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:64) 
> {code}
>  
> This is due to not considering the default resource type when we collect the 
> broker IDs in ZkMigrationClient#migrateBrokerConfigs.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15389) MetadataLoader may publish an empty image on first start

2023-08-25 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15389.
--
Fix Version/s: 3.6.0
   Resolution: Fixed

> MetadataLoader may publish an empty image on first start
> 
>
> Key: KAFKA-15389
> URL: https://issues.apache.org/jira/browse/KAFKA-15389
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Minor
> Fix For: 3.6.0
>
>
> When first loading from an empty log, there is a case where MetadataLoader 
> can publish an image before the bootstrap records are processed. This isn't 
> exactly incorrect, since all components implicitly start from the empty image 
> state, but it might be unexpected for some MetadataPublishers. 
>  
> For example, in KRaftMigrationDriver, if an old MetadataVersion is 
> encountered, the driver transitions to the INACTIVE state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15213) Provide the exact offset to QuorumController.replay

2023-08-22 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15213.
--
Fix Version/s: 3.6.0
   Resolution: Fixed

> Provide the exact offset to QuorumController.replay
> ---
>
> Key: KAFKA-15213
> URL: https://issues.apache.org/jira/browse/KAFKA-15213
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.6.0
>
>
> Provide the exact offset to QuorumController.replay so that we can implement 
> metadata transactions. We need this so that we can know the offset where the 
> records will be applied before we apply them in QuorumControllers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15220) KRaftMetadataCache returns fenced brokers from getAliveBrokerNode

2023-08-16 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15220.
--
Fix Version/s: 3.6.0
   Resolution: Fixed

> KRaftMetadataCache returns fenced brokers from getAliveBrokerNode
> -
>
> Key: KAFKA-15220
> URL: https://issues.apache.org/jira/browse/KAFKA-15220
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Mao
>Assignee: David Mao
>Priority: Major
> Fix For: 3.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15369) Allow AdminClient to Talk Directly with the KRaft Controller Quorum and add Controller Registration

2023-08-16 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15369:


 Summary: Allow AdminClient to Talk Directly with the KRaft 
Controller Quorum and add Controller Registration
 Key: KAFKA-15369
 URL: https://issues.apache.org/jira/browse/KAFKA-15369
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15318) Move Acl publishing outside the QuorumController

2023-08-08 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15318:


 Summary: Move Acl publishing outside the QuorumController
 Key: KAFKA-15318
 URL: https://issues.apache.org/jira/browse/KAFKA-15318
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe


On the controller, move Acl publishing into a dedicated MetadataPublisher, 
AclPublisher. This publisher listens for notifications from MetadataLoader, and 
receives only committed data. This brings the controller side in line with how 
the broker has always worked. It also avoids some ugly code related to 
publishing directly from the QuorumController. Most important of all, it clears 
the way to implement metadata transactions without worrying about Authorizer 
state (since it will be handled by the MetadataLoader, along with other 
metadata image state).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15311) Docs incorrectly state that reverting to ZooKeeper mode during the migration is not possible

2023-08-07 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15311:


 Summary: Docs incorrectly state that reverting to ZooKeeper mode 
during the migration is not possible
 Key: KAFKA-15311
 URL: https://issues.apache.org/jira/browse/KAFKA-15311
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15213) Provide the exact offset to QuorumController.replay

2023-07-18 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15213:


 Summary: Provide the exact offset to QuorumController.replay
 Key: KAFKA-15213
 URL: https://issues.apache.org/jira/browse/KAFKA-15213
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe


Provide the exact offset to QuorumController.replay so that we can implement 
metadata transactions. We need this so that we can know the offset where the 
records will be applied before we apply them in QuorumControllers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15183) Add more controller, loader, snapshot emitter metrics

2023-07-12 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15183:


 Summary: Add more controller, loader, snapshot emitter metrics
 Key: KAFKA-15183
 URL: https://issues.apache.org/jira/browse/KAFKA-15183
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe


Add the controller, loader, and snapshot emitter metrics from KIP-938.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15060) Fix Admin.describeFeatures

2023-06-05 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15060:


 Summary: Fix Admin.describeFeatures
 Key: KAFKA-15060
 URL: https://issues.apache.org/jira/browse/KAFKA-15060
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe


Fix Admin.describeFeatures, which was accidentally broken by KAFKA-15007.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15048) Improve handling of non-fatal quorum controller errors

2023-06-01 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15048:


 Summary: Improve handling of non-fatal quorum controller errors
 Key: KAFKA-15048
 URL: https://issues.apache.org/jira/browse/KAFKA-15048
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15043) Create a kcontroller metric for expired broker heartbeats

2023-05-31 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15043:


 Summary: Create a kcontroller metric for expired broker heartbeats
 Key: KAFKA-15043
 URL: https://issues.apache.org/jira/browse/KAFKA-15043
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15019) Improve handling of overload situations in the kcontroller

2023-05-24 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15019:


 Summary: Improve handling of overload situations in the kcontroller
 Key: KAFKA-15019
 URL: https://issues.apache.org/jira/browse/KAFKA-15019
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe


Improve handling of overload situations in the KRaft controller



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14658) When listening on fixed ports, defer port opening until we're ready

2023-05-24 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14658.
--
Resolution: Fixed

> When listening on fixed ports, defer port opening until we're ready
> ---
>
> Key: KAFKA-14658
> URL: https://issues.apache.org/jira/browse/KAFKA-14658
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
>
> When we are listening on fixed ports, we should defer opening ports until 
> we're ready to accept traffic. If we open the broker port too early, it can 
> confuse monitoring and deployment systems. This is a particular concern when 
> in KRaft mode, since in that mode, we create the SocketServer object earlier 
> in the startup process than when in ZK mode.
> The approach taken in this PR is to defer opening the acceptor port until 
> Acceptor.start is called. Note that when we are listening on a random port, 
> we continue to open the port "early," in the SocketServer constructor. The 
> reason for doing this is that there is no other way to find the random port 
> number the kernel has selected. Since random port assignment is not used in 
> production deployments, this should be reasonable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14835) Create ControllerServerMetricsPublisher

2023-05-24 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14835.
--
Resolution: Fixed

> Create ControllerServerMetricsPublisher
> ---
>
> Key: KAFKA-14835
> URL: https://issues.apache.org/jira/browse/KAFKA-14835
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14857) Fix some MetadataLoader bugs

2023-05-24 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14857.
--
Resolution: Fixed

> Fix some MetadataLoader bugs
> 
>
> Key: KAFKA-14857
> URL: https://issues.apache.org/jira/browse/KAFKA-14857
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14943) Fix ClientQuotaControlManager validation

2023-05-24 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14943.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

> Fix ClientQuotaControlManager validation
> 
>
> Key: KAFKA-14943
> URL: https://issues.apache.org/jira/browse/KAFKA-14943
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15009) New ACLs are not written to ZK during migration

2023-05-24 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15009.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

> New ACLs are not written to ZK during migration
> ---
>
> Key: KAFKA-15009
> URL: https://issues.apache.org/jira/browse/KAFKA-15009
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.5.0
>Reporter: Akhilesh Chaganti
>Assignee: Akhilesh Chaganti
>Priority: Blocker
>  Labels: kraft, migration
> Fix For: 3.5.0
>
>
> While handling snapshots in dual-write mode, we are missing the logic to 
> detect new ACLs created in KRaft. This means we will not write these new ACLs 
> back to ZK and they would be missing if a user rolled back their cluster to 
> ZK mode. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14918) KRaft controller sending ZK controller RPCs to KRaft brokers

2023-05-08 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14918.
--
Resolution: Fixed

> KRaft controller sending ZK controller RPCs to KRaft brokers
> 
>
> Key: KAFKA-14918
> URL: https://issues.apache.org/jira/browse/KAFKA-14918
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Critical
> Fix For: 3.5.0
>
>
> During the migration, when upgrading a ZK broker to KRaft, the controller is 
> incorrectly sending UpdateMetadata requests to the KRaft controller. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14698) Received request api key LEADER_AND_ISR which is not enabled

2023-05-08 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14698.
--
Fix Version/s: (was: 3.4.1)
   Resolution: Duplicate

> Received request api key LEADER_AND_ISR which is not enabled
> 
>
> Key: KAFKA-14698
> URL: https://issues.apache.org/jira/browse/KAFKA-14698
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.4.0
>Reporter: Mickael Maison
>Assignee: Akhilesh Chaganti
>Priority: Major
> Fix For: 3.5.0
>
> Attachments: broker0.log, controller.log, test_online_migration.tar.gz
>
>
> I started from a Kafka cluster (with ZooKeeper) with 2 brokers. There's a 
> single topic "test" with 2 partitions and 2 replicas and the internal 
> __consumer_offsets topics.
> While following the ZooKeeper to KRaft migration steps from 
> [https://kafka.apache.org/documentation/#kraft_zk_migration], I'm hitting 
> issues at the Migrating brokers to KRaft step.
> When I restart a broker as KRaft, it repetitively prints the following error:
> {code:java}
> org.apache.kafka.common.errors.InvalidRequestException: Received request api 
> key LEADER_AND_ISR which is not enabled
> [2023-02-09 16:14:30,334] ERROR Closing socket for 
> 192.168.1.11:9092-192.168.1.11:63737-371 because of error 
> (kafka.network.Processor)
> {code}
> The controller repetitively prints the following error:
> {code:java}
> [2023-02-09 16:12:27,456] WARN [Controller id=1000, targetBrokerId=0] 
> Connection to node 0 (mmaison-mac.home/192.168.1.11:9092) could not be 
> established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)
> [2023-02-09 16:12:27,456] INFO [Controller id=1000, targetBrokerId=0] Client 
> requested connection close from node 0 
> (org.apache.kafka.clients.NetworkClient)
> [2023-02-09 16:12:27,560] INFO [Controller id=1000, targetBrokerId=0] Node 0 
> disconnected. (org.apache.kafka.clients.NetworkClient)
> {code}
> Attached the controller logs and logs from broker-0
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14943) Fix ClientQuotaControlManager validation

2023-04-26 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14943:


 Summary: Fix ClientQuotaControlManager validation
 Key: KAFKA-14943
 URL: https://issues.apache.org/jira/browse/KAFKA-14943
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14775) Support SCRAM for broker to controller authentication

2023-04-17 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14775.
--
Fix Version/s: 3.5.0
 Assignee: Colin McCabe  (was: Proven Provenzano)
   Resolution: Fixed

> Support SCRAM for broker to controller authentication
> -
>
> Key: KAFKA-14775
> URL: https://issues.apache.org/jira/browse/KAFKA-14775
> Project: Kafka
>  Issue Type: Improvement
>  Components: kraft
>Reporter: Proven Provenzano
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.5.0
>
>
> We need to apply SCRAM changes to controller nodes.
> We need to handle DescribeUserScramCredentialsRequest in the controller nodes.
> As part of this update I will split out the code from 
> {{BrokerMetadataPublisher.scala}} for applying the SCRAM  into a separate 
> {{{}MetadataPublisher{}}}, as we did with {{DynamicConfigPublisher}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14894) MetadataLoader must call finishSnapshot after loading a snapshot

2023-04-12 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14894.
--
Fix Version/s: 3.5.0
 Reviewer: David Arthur
   Resolution: Fixed

> MetadataLoader must call finishSnapshot after loading a snapshot
> 
>
> Key: KAFKA-14894
> URL: https://issues.apache.org/jira/browse/KAFKA-14894
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14894) MetadataLoader must call finishSnapshot after loading a snapshot

2023-04-11 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14894:


 Summary: MetadataLoader must call finishSnapshot after loading a 
snapshot
 Key: KAFKA-14894
 URL: https://issues.apache.org/jira/browse/KAFKA-14894
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14857) Fix some MetadataLoader bugs

2023-03-27 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14857:


 Summary: Fix some MetadataLoader bugs
 Key: KAFKA-14857
 URL: https://issues.apache.org/jira/browse/KAFKA-14857
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14436) Initialize KRaft with arbitrary epoch

2023-03-24 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14436.
--
Fix Version/s: 3.4.0
   Resolution: Won't Fix

> Initialize KRaft with arbitrary epoch
> -
>
> Key: KAFKA-14436
> URL: https://issues.apache.org/jira/browse/KAFKA-14436
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: David Arthur
>Assignee: Alyssa Huang
>Priority: Major
> Fix For: 3.4.0
>
>
> For the ZK migration, we need to be able to initialize Raft with an 
> arbitrarily high epoch (within the size limit). This is because during the 
> migration, we want to write the Raft epoch as the controller epoch in ZK. We 
> require that epochs in /controller_epoch are monotonic in order for brokers 
> to behave normally. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14846) Fix overly large record batches in ZkMigrationClient

2023-03-24 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14846:


 Summary: Fix overly large record batches in ZkMigrationClient
 Key: KAFKA-14846
 URL: https://issues.apache.org/jira/browse/KAFKA-14846
 Project: Kafka
  Issue Type: Sub-task
Affects Versions: 3.4.0
Reporter: Colin McCabe


ZkMigrationClient should not create overly large record batches



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14493) Zk to KRaft migration state machine in KRaft controller

2023-03-24 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14493.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

> Zk to KRaft migration state machine in KRaft controller
> ---
>
> Key: KAFKA-14493
> URL: https://issues.apache.org/jira/browse/KAFKA-14493
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Akhilesh Chaganti
>Assignee: Akhilesh Chaganti
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14458) RPC Handler to ZkBrokers from KRaft Controller

2023-03-24 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14458.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

> RPC Handler to ZkBrokers from KRaft Controller
> --
>
> Key: KAFKA-14458
> URL: https://issues.apache.org/jira/browse/KAFKA-14458
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Akhilesh Chaganti
>Assignee: Akhilesh Chaganti
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14446) API forwarding support in ZkBrokers

2023-03-24 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14446.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

> API forwarding support in ZkBrokers
> ---
>
> Key: KAFKA-14446
> URL: https://issues.apache.org/jira/browse/KAFKA-14446
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Akhilesh Chaganti
>Assignee: Akhilesh Chaganti
>Priority: Major
> Fix For: 3.4.0
>
>
> To support migration, zkBrokers should be able to forward API requests to the 
> Controller, whether it is zkController or kraftController. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14447) Controlled shutdown for ZK brokers during migration

2023-03-24 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14447.
--
Fix Version/s: 3.4.0
   (was: 3.4.1)
   Resolution: Fixed

> Controlled shutdown for ZK brokers during migration
> ---
>
> Key: KAFKA-14447
> URL: https://issues.apache.org/jira/browse/KAFKA-14447
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: David Arthur
>Assignee: Luke Chen
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14835) Create ControllerServerMetricsPublisher

2023-03-22 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14835:


 Summary: Create ControllerServerMetricsPublisher
 Key: KAFKA-14835
 URL: https://issues.apache.org/jira/browse/KAFKA-14835
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14658) Do not open broker ports until we are ready to accept traffic

2023-01-27 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14658:


 Summary: Do not open broker ports until we are ready to accept 
traffic
 Key: KAFKA-14658
 URL: https://issues.apache.org/jira/browse/KAFKA-14658
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe


We should not open the ports on the broker until we are ready to accept 
traffic. This is a particular concern when in KRaft mode, since in that mode, 
we create the SocketServer object earlier in the startup process than when in 
ZK mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14622) Create a junit test which would have caught KAFKA-14618

2023-01-13 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14622:


 Summary: Create a junit test which would have caught KAFKA-14618
 Key: KAFKA-14622
 URL: https://issues.apache.org/jira/browse/KAFKA-14622
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14618) Off by one error in generated snapshot IDs causes misaligned fetching

2023-01-13 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14618.
--
Resolution: Fixed

> Off by one error in generated snapshot IDs causes misaligned fetching
> -
>
> Key: KAFKA-14618
> URL: https://issues.apache.org/jira/browse/KAFKA-14618
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: José Armando García Sancio
>Priority: Blocker
> Fix For: 3.4.0
>
>
> We implemented new snapshot generation logic here: 
> [https://github.com/apache/kafka/pull/12983]. A few days prior to this patch 
> getting merged, we had changed the `RaftClient` API to pass the _exclusive_ 
> offset when generating snapshots instead of the inclusive offset: 
> [https://github.com/apache/kafka/pull/12981]. Unfortunately, the new snapshot 
> generation logic was not updated accordingly. The consequence of this is that 
> the state on replicas can get out of sync. In the best case, the followers 
> fail replication because the offset after loading a snapshot is no longer 
> aligned on a batch boundary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14601) Improve exception handling in KafkaEventQueue

2023-01-06 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14601:


 Summary: Improve exception handling in KafkaEventQueue
 Key: KAFKA-14601
 URL: https://issues.apache.org/jira/browse/KAFKA-14601
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe


If KafkaEventQueue gets an InterruptedException while waiting for a condition 
variable, it currently exits immediately. Instead, it should complete the 
remaining events exceptionally and then execute the cleanup event. This will 
allow us to finish any necessary cleanup steps.

Also, handle cases where Event#handleException itself throws an exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14538) Implement metadata transactions at arbitrary locations in the log

2022-12-20 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14538:


 Summary: Implement metadata transactions at arbitrary locations in 
the log
 Key: KAFKA-14538
 URL: https://issues.apache.org/jira/browse/KAFKA-14538
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe
Assignee: Colin McCabe


Implement metadata transactions at arbitrary locations in the log, not just at 
the beginning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14433) Clear all yammer metrics when test harnesses clean up

2022-12-01 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14433:


 Summary: Clear all yammer metrics when test harnesses clean up
 Key: KAFKA-14433
 URL: https://issues.apache.org/jira/browse/KAFKA-14433
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe


We should clear all yammer metrics from the yammer singleton when the 
integration test harnesses clean up. This would avoid memory leaks in tests 
that have a lot of test cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14370) Properly close ImageWriter objects

2022-11-08 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14370:


 Summary: Properly close ImageWriter objects
 Key: KAFKA-14370
 URL: https://issues.apache.org/jira/browse/KAFKA-14370
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14351) Implement controller mutation quotas in KRaft

2022-11-02 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14351:


 Summary: Implement controller mutation quotas in KRaft
 Key: KAFKA-14351
 URL: https://issues.apache.org/jira/browse/KAFKA-14351
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14350) Support dynamically reconfiguring KRaft controller listeners

2022-11-02 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14350:


 Summary: Support dynamically reconfiguring KRaft controller 
listeners
 Key: KAFKA-14350
 URL: https://issues.apache.org/jira/browse/KAFKA-14350
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe


Support dynamically reconfiguring KRaft controller listeners. The first step is 
probably to support modifying existing listeners (SSL settings, SASL settings, 
connection limit settings, etc.) We can create a follow-on JIRA for adding or 
removing listeners dynamically (if indeed we want to do that at all, the use 
cases seem a bit rare)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14349) Support dynamically resizing the KRaft controller's thread pools

2022-11-02 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14349:


 Summary: Support dynamically resizing the KRaft controller's 
thread pools
 Key: KAFKA-14349
 URL: https://issues.apache.org/jira/browse/KAFKA-14349
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe


Support dynamically resizing the KRaft controller's request handler and network 
handler thread pools. See {{DynamicBrokerConfig.scala}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14348) Consider renaming MetadataBatchProcessingTimeUs to MetadataDeltaProcessingTimeUs

2022-11-02 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14348:


 Summary: Consider renaming MetadataBatchProcessingTimeUs to 
MetadataDeltaProcessingTimeUs
 Key: KAFKA-14348
 URL: https://issues.apache.org/jira/browse/KAFKA-14348
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe
Assignee: Colin McCabe


We should consider renaming kafka.server.MetadataBatchProcessingTimeUs to 
kafka.server.MetadataDeltaProcessingTimeUs. The reason is because this metric 
isn't the time to process a single batch, but the time to process a group of 
batches given to us by the raft layer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14327) Unify KRaft snapshot generation between broker and controller

2022-10-20 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14327:


 Summary: Unify KRaft snapshot generation between broker and 
controller
 Key: KAFKA-14327
 URL: https://issues.apache.org/jira/browse/KAFKA-14327
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14290) Fix bugs that could block KRaft controlled shutdown indefinitely

2022-10-11 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14290:


 Summary: Fix bugs that could block KRaft controlled shutdown 
indefinitely
 Key: KAFKA-14290
 URL: https://issues.apache.org/jira/browse/KAFKA-14290
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14265) Prefix ACLs may shadow other prefix ACLs

2022-09-28 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14265:


 Summary: Prefix ACLs may shadow other prefix ACLs
 Key: KAFKA-14265
 URL: https://issues.apache.org/jira/browse/KAFKA-14265
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe


Prefix ACLs may shadow other prefix ACLs. Consider the case where we have 
prefix ACLs for foobar, fooa, and f. If we were matching a resource named 
"foobar", we'd start scanning at the foobar ACL, hit the fooa ACL, and stop -- 
missing the f ACL.

To fix this, we should re-scan for ACLs at the first divergence point (in this 
case, f) whenever we hit a mismatch of this kind.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14259) BrokerRegistration#toString throws an exception, terminating metadata replay

2022-09-23 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14259:


 Summary: BrokerRegistration#toString throws an exception, 
terminating metadata replay
 Key: KAFKA-14259
 URL: https://issues.apache.org/jira/browse/KAFKA-14259
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.3
Reporter: Colin McCabe
Assignee: Colin McCabe
 Fix For: 3.3


BrokerRegistration#toString throws an exception, terminating metadata replay, 
because the sorted() method is used on an entry set rather than a key set.


{noformat}
Caused by:  

 
java.util.concurrent.ExecutionException: java.lang.ClassCastException: 
class java.util.HashMap$Node cannot be cast to class java.lang.Comparable 
(java.util.HashMap$Node and java.lan
g.Comparable are in module java.base of loader 'bootstrap') 

 
at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)


at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)

 
at kafka.server.BrokerServer.startup(BrokerServer.scala:846)

 
... 147 more

 


 
Caused by:  

 
java.lang.ClassCastException: class java.util.HashMap$Node cannot 
be cast to class java.lang.Comparable (java.util.HashMap$Node and 
java.lang.Comparable are in module java.base 
of loader 'bootstrap')  

 
at 
java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:47)

   
at 
java.base/java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)  

  
at java.base/java.util.TimSort.sort(TimSort.java:220)   

 
at java.base/java.util.Arrays.sort(Arrays.java:1307)

 
at 
java.base/java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:353)

  
at 
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:510) 

  
at 
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)


at 
java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)

  
at 
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) 

  
at 
java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)

  
at 
org.apache.kafka.metadata.BrokerRegistration.toString(BrokerRegistration.java:228)


at 
java.base/java.util.Formatter$FormatSpecifier.printString(Formatter.java:3056)  

[jira] [Created] (KAFKA-14258) Add ducktape or junit test verifying that brokers can reload snapshots after startup

2022-09-23 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14258:


 Summary: Add ducktape or junit test verifying that brokers can 
reload snapshots after startup
 Key: KAFKA-14258
 URL: https://issues.apache.org/jira/browse/KAFKA-14258
 Project: Kafka
  Issue Type: Test
Reporter: Colin McCabe


We should add a ducktape or junit test that verifies that brokers can reload 
snapshots after startup. This code path is not exercised frequently but it is 
important.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14243) Disable unsafe downgrade in 3.3

2022-09-19 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14243:


 Summary: Disable unsafe downgrade in 3.3
 Key: KAFKA-14243
 URL: https://issues.apache.org/jira/browse/KAFKA-14243
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe


Disable unsafe downgrade in 3.3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14216) Remove ZK reference from org.apache.kafka.server.quota.ClientQuotaCallback javadoc

2022-09-12 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14216.
--
Fix Version/s: 3.3
 Reviewer: Luke Chen
   Resolution: Fixed

> Remove ZK reference from org.apache.kafka.server.quota.ClientQuotaCallback 
> javadoc
> --
>
> Key: KAFKA-14216
> URL: https://issues.apache.org/jira/browse/KAFKA-14216
> Project: Kafka
>  Issue Type: Bug
>  Components: docs, documentation
>Affects Versions: 3.3.0, 3.3
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14217) app-reset-tool.html should remove reference to --zookeeper flag that no longer exists

2022-09-12 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14217.
--
Fix Version/s: 3.3
 Reviewer: Luke Chen
   Resolution: Fixed

> app-reset-tool.html should remove reference to --zookeeper flag that no 
> longer exists
> -
>
> Key: KAFKA-14217
> URL: https://issues.apache.org/jira/browse/KAFKA-14217
> Project: Kafka
>  Issue Type: Bug
>  Components: docs, documentation
>Affects Versions: 3.3.0, 3.3
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3
>
>
> app-reset-tool.html should remove reference to --zookeeper flag that no 
> longer exists



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14217) app-reset-tool.html should remove reference to --zookeeper flag that no longer exists

2022-09-09 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14217:


 Summary: app-reset-tool.html should remove reference to 
--zookeeper flag that no longer exists
 Key: KAFKA-14217
 URL: https://issues.apache.org/jira/browse/KAFKA-14217
 Project: Kafka
  Issue Type: Bug
  Components: docs, documentation
Affects Versions: 3.30, 3.3
Reporter: Colin McCabe
Assignee: Colin McCabe


app-reset-tool.html should remove reference to --zookeeper flag that no longer 
exists



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14216) Remove ZK reference from org.apache.kafka.server.quota.ClientQuotaCallback javadoc

2022-09-09 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14216:


 Summary: Remove ZK reference from 
org.apache.kafka.server.quota.ClientQuotaCallback javadoc
 Key: KAFKA-14216
 URL: https://issues.apache.org/jira/browse/KAFKA-14216
 Project: Kafka
  Issue Type: Bug
  Components: docs, documentation
Affects Versions: 3.3.0, 3.3
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14200) kafka-features.sh must exit with non-zero error code on error

2022-09-07 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14200.
--
  Reviewer: David Arthur
Resolution: Fixed

> kafka-features.sh must exit with non-zero error code on error
> -
>
> Key: KAFKA-14200
> URL: https://issues.apache.org/jira/browse/KAFKA-14200
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.0, 3.3
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.0
>
>
> kafka-features.sh must exit with a non-zero error code on error. We must do 
> this in order to catch regressions like KAFKA-13990.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14197) Kraft broker fails to startup after topic creation failure

2022-09-06 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14197.
--
Resolution: Duplicate

> Kraft broker fails to startup after topic creation failure
> --
>
> Key: KAFKA-14197
> URL: https://issues.apache.org/jira/browse/KAFKA-14197
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Reporter: Luke Chen
>Priority: Blocker
> Fix For: 3.3.0
>
>
> In kraft ControllerWriteEvent, we start by trying to apply the record to 
> controller in-memory state, then sent out the record via raft client. But if 
> there is error during sending the records, there's no way to revert the 
> change to controller in-memory state[1].
> The issue happened when creating topics, controller state is updated with 
> topic and partition metadata (ex: broker to ISR map), but the record doesn't 
> send out successfully (ex: RecordBatchTooLargeException). Then, when shutting 
> down the node, the controlled shutdown will try to remove the broker from ISR 
> by[2]:
> {code:java}
> generateLeaderAndIsrUpdates("enterControlledShutdown[" + brokerId + "]", 
> brokerId, NO_LEADER, records, 
> brokersToIsrs.partitionsWithBrokerInIsr(brokerId));{code}
>  
> After we appending the partitionChangeRecords, and send to metadata topic 
> successfully, it'll cause the brokers failed to "replay" these partition 
> change since these topic/partitions didn't get created successfully 
> previously.
> Even worse, after restarting the node, all the metadata records will replay 
> again, and the same error happened again, cause the broker cannot start up 
> successfully.
>  
> The error and call stack is like this, basically, it complains the topic 
> image can't be found
> {code:java}
> [2022-09-02 16:29:16,334] ERROR Encountered metadata loading fault: Error 
> replaying metadata log record at offset 81 
> (org.apache.kafka.server.fault.LoggingFaultHandler)
> java.lang.NullPointerException
>     at org.apache.kafka.image.TopicDelta.replay(TopicDelta.java:69)
>     at org.apache.kafka.image.TopicsDelta.replay(TopicsDelta.java:91)
>     at org.apache.kafka.image.MetadataDelta.replay(MetadataDelta.java:248)
>     at org.apache.kafka.image.MetadataDelta.replay(MetadataDelta.java:186)
>     at 
> kafka.server.metadata.BrokerMetadataListener.$anonfun$loadBatches$3(BrokerMetadataListener.scala:239)
>     at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
>     at 
> kafka.server.metadata.BrokerMetadataListener.kafka$server$metadata$BrokerMetadataListener$$loadBatches(BrokerMetadataListener.scala:232)
>     at 
> kafka.server.metadata.BrokerMetadataListener$HandleCommitsEvent.run(BrokerMetadataListener.scala:113)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
>     at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
>  
> [1] 
> [https://github.com/apache/kafka/blob/ef65b6e566ef69b2f9b58038c98a5993563d7a68/metadata/src/main/java/org/apache/kafka/controller/QuorumController.java#L779-L804]
>  
> [2] 
> [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java#L1270]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14204) QuorumController must correctly handle overly large batches

2022-09-06 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14204:


 Summary: QuorumController must correctly handle overly large 
batches
 Key: KAFKA-14204
 URL: https://issues.apache.org/jira/browse/KAFKA-14204
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14200) kafka-features.sh must exit with non-zero error code on error

2022-09-02 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14200:


 Summary: kafka-features.sh must exit with non-zero error code on 
error
 Key: KAFKA-14200
 URL: https://issues.apache.org/jira/browse/KAFKA-14200
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe


kafka-features.sh must exit with a non-zero error code on error. We must do 
this in order to catch regressions like KAFKA-13990.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14187) kafka-features.sh: add support for --metadata

2022-08-30 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-14187.
--
Resolution: Fixed

> kafka-features.sh: add support for --metadata
> -
>
> Key: KAFKA-14187
> URL: https://issues.apache.org/jira/browse/KAFKA-14187
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.0, 3.3
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.0
>
>
> Fix the kafka-features.sh command so that we can upgrade to the new version 
> as expected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   >