[jira] [Created] (KAFKA-16179) NPE handle ApiVersions during controller failover
Jason Gustafson created KAFKA-16179: --- Summary: NPE handle ApiVersions during controller failover Key: KAFKA-16179 URL: https://issues.apache.org/jira/browse/KAFKA-16179 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson {code:java} java.lang.NullPointerException: Cannot invoke "org.apache.kafka.metadata.publisher.FeaturesPublisher.features()" because the return value of "kafka.server.ControllerServer.featuresPublisher()" is null at kafka.server.ControllerServer.$anonfun$startup$12(ControllerServer.scala:242) at kafka.server.SimpleApiVersionManager.features(ApiVersionManager.scala:122) at kafka.server.SimpleApiVersionManager.apiVersionResponse(ApiVersionManager.scala:111) at kafka.server.ControllerApis.$anonfun$handleApiVersionsRequest$3(ControllerApis.scala:566) at kafka.server.ControllerApis.$anonfun$handleApiVersionsRequest$3$adapted(ControllerApis.scala:566 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16012) Incomplete range assignment in consumer
Jason Gustafson created KAFKA-16012: --- Summary: Incomplete range assignment in consumer Key: KAFKA-16012 URL: https://issues.apache.org/jira/browse/KAFKA-16012 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Fix For: 3.7.0 We were looking into test failures here: https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1702475525--jolshan--kafka-15784--7cad567675/2023-12-13--001./2023-12-13–001./report.html. Here is the first failure in the report: {code:java} test_id: kafkatest.tests.core.group_mode_transactions_test.GroupModeTransactionsTest.test_transactions.failure_mode=clean_bounce.bounce_target=brokers status: FAIL run time: 3 minutes 4.950 seconds TimeoutError('Consumer consumed only 88223 out of 10 messages in 90s') {code} We traced the failure to an apparent bug during the last rebalance before the group became empty. The last remaining instance seems to receive an incomplete assignment which prevents it from completing expected consumption on some partitions. Here is the rebalance from the coordinator's perspective: {code:java} server.log.2023-12-13-04:[2023-12-13 04:58:56,987] INFO [GroupCoordinator 3]: Stabilized group grouped-transactions-test-consumer-group generation 5 (__consumer_offsets-2) with 1 members (kafka.coordinator.group.GroupCoordinator) server.log.2023-12-13-04:[2023-12-13 04:58:56,990] INFO [GroupCoordinator 3]: Assignment received from leader consumer-grouped-transactions-test-consumer-group-1-2164f472-93f3-4176-af3f-23d4ed8b37fd for group grouped-transactions-test-consumer-group for generation 5. The group has 1 members, 0 of which are static. (kafka.coordinator.group.GroupCoordinator) {code} The group is down to one member in generation 5. In the previous generation, the consumer in question reported this assignment: {code:java} // Gen 4: we've got partitions 0-4 [2023-12-13 04:58:52,631] DEBUG [Consumer clientId=consumer-grouped-transactions-test-consumer-group-1, groupId=grouped-transactions-test-consumer-group] Executing onJoinComplete with generation 4 and memberId consumer-grouped-transactions-test-consumer-group-1-2164f472-93f3-4176-af3f-23d4ed8b37fd (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2023-12-13 04:58:52,631] INFO [Consumer clientId=consumer-grouped-transactions-test-consumer-group-1, groupId=grouped-transactions-test-consumer-group] Notifying assignor about the new Assignment(partitions=[input-topic-0, input-topic-1, input-topic-2, input-topic-3, input-topic-4]) (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) {code} However, in generation 5, we seem to be assigned only one partition: {code:java} // Gen 5: Now we have only partition 1? But aren't we the last member in the group? [2023-12-13 04:58:56,954] DEBUG [Consumer clientId=consumer-grouped-transactions-test-consumer-group-1, groupId=grouped-transactions-test-consumer-group] Executing onJoinComplete with generation 5 and memberId consumer-grouped-transactions-test-consumer-group-1-2164f472-93f3-4176-af3f-23d4ed8b37fd (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2023-12-13 04:58:56,955] INFO [Consumer clientId=consumer-grouped-transactions-test-consumer-group-1, groupId=grouped-transactions-test-consumer-group] Notifying assignor about the new Assignment(partitions=[input-topic-1]) (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) {code} The assignment type is range from the JoinGroup for generation 5. The decoded metadata sent by the consumer is this: {code:java} Subscription(topics=[input-topic], ownedPartitions=[], groupInstanceId=null, generationId=4, rackId=null) {code} Here is the decoded assignment from the SyncGroup: {code:java} Assignment(partitions=[input-topic-1]) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15828) Protect clients from broker hostname reuse
Jason Gustafson created KAFKA-15828: --- Summary: Protect clients from broker hostname reuse Key: KAFKA-15828 URL: https://issues.apache.org/jira/browse/KAFKA-15828 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson In some environments such as k8s, brokers may be assigned to nodes dynamically from an available pool. When a cluster is rolling, it is possible for the client to see the same node advertised for different broker IDs in a short period of time. For example, kafka-1 might be initially assigned to node1. Before the client is able to establish a connection, it could be that kafka-3 is now on node1 instead. Currently there is no protection in the client or in the protocol for this scenario. If the connection succeeds, the client will assume it has a good connection to kafka-1. Until something disrupts the connection, it will continue under this assumption even if the hostname for kafka-1 changes. We have observed this scenario in practice. The client connected to the wrong broker through stale hostname information. It was unable to produce data because of persistent NOT_LEADER errors. The only way to recover in the end was by restarting the client to force a reconnection. We have discussed a couple potential solutions to this problem: # Let the client be smarter managing the connection/hostname mapping. When it detects that a hostname has changed, it should force a disconnect to ensure it connects to the right node. # We can modify the protocol to verify that the client has connected to the intended broker. For example, we can add a field to ApiVersions to indicate the intended broker ID. The broker receiving the request can return an error if its ID does not match that in the request. Are there alternatives? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15221) Potential race condition between requests from rebooted followers
[ https://issues.apache.org/jira/browse/KAFKA-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-15221. - Fix Version/s: (was: 3.5.2) Resolution: Fixed > Potential race condition between requests from rebooted followers > - > > Key: KAFKA-15221 > URL: https://issues.apache.org/jira/browse/KAFKA-15221 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Calvin Liu >Assignee: Calvin Liu >Priority: Blocker > Fix For: 3.7.0 > > > When the leader processes the fetch request, it does not acquire locks when > updating the replica fetch state. Then there can be a race between the fetch > requests from a rebooted follower. > T0, broker 1 sends a fetch to broker 0(leader). At the moment, broker 1 is > not in ISR. > T1, broker 1 crashes. > T2 broker 1 is back online and receives a new broker epoch. Also, it sends a > new Fetch request. > T3 broker 0 receives the old fetch requests and decides to expand the ISR. > T4 Right before broker 0 starts to fill the AlterPartitoin request, the new > fetch request comes in and overwrites the fetch state. Then broker 0 uses the > new broker epoch on the AlterPartition request. > In this way, the AlterPartition request can get around KIP-903 and wrongly > update the ISR. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14694) RPCProducerIdManager should not wait for a new block
[ https://issues.apache.org/jira/browse/KAFKA-14694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14694. - Fix Version/s: 3.6.0 Resolution: Fixed > RPCProducerIdManager should not wait for a new block > > > Key: KAFKA-14694 > URL: https://issues.apache.org/jira/browse/KAFKA-14694 > Project: Kafka > Issue Type: Bug >Reporter: Jeff Kim >Assignee: Jeff Kim >Priority: Major > Fix For: 3.6.0 > > > RPCProducerIdManager initiates an async request to the controller to grab a > block of producer IDs and then blocks waiting for a response from the > controller. > This is done in the request handler threads while holding a global lock. This > means that if many producers are requesting producer IDs and the controller > is slow to respond, many threads can get stuck waiting for the lock. > This may also be a deadlock concern under the following scenario: > if the controller has 1 request handler thread (1 chosen for simplicity) and > receives an InitProducerId request, it may deadlock. > basically any time the controller has N InitProducerId requests where N >= # > of request handler threads has the potential to deadlock. > consider this: > 1. the request handler thread tries to handle an InitProducerId request to > the controller by forwarding an AllocateProducerIds request. > 2. the request handler thread then waits on the controller response (timed > poll on nextProducerIdBlock) > 3. the controller's request handler threads need to pick this request up, and > handle it, but the controller's request handler threads are blocked waiting > for the forwarded AllocateProducerIds response. > > We should not block while waiting for a new block and instead return > immediately to free the request handler threads. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14831) Illegal state errors should be fatal in transactional producer
Jason Gustafson created KAFKA-14831: --- Summary: Illegal state errors should be fatal in transactional producer Key: KAFKA-14831 URL: https://issues.apache.org/jira/browse/KAFKA-14831 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson In KAFKA-14830, the producer hit an illegal state error. The error was propagated to the `Sender` thread and logged, but the producer otherwise continued on. It would be better to make illegal state errors fatal since continuing to write to transactions when the internal state is inconsistent may cause incorrect and unpredictable behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14830) Illegal state error in transactional producer
Jason Gustafson created KAFKA-14830: --- Summary: Illegal state error in transactional producer Key: KAFKA-14830 URL: https://issues.apache.org/jira/browse/KAFKA-14830 Project: Kafka Issue Type: Bug Affects Versions: 3.1.2 Reporter: Jason Gustafson We have seen the following illegal state error in the producer: {code:java} [Producer clientId=client-id2, transactionalId=transactional-id] Transiting to abortable error state due to org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for topic-0:120027 ms has passed since batch creation [Producer clientId=client-id2, transactionalId=transactional-id] Transiting to abortable error state due to org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for topic-1:120026 ms has passed since batch creation [Producer clientId=client-id2, transactionalId=transactional-id] Aborting incomplete transaction [Producer clientId=client-id2, transactionalId=transactional-id] Invoking InitProducerId with current producer ID and epoch ProducerIdAndEpoch(producerId=191799, epoch=0) in order to bump the epoch [Producer clientId=client-id2, transactionalId=transactional-id] ProducerId set to 191799 with epoch 1 [Producer clientId=client-id2, transactionalId=transactional-id] Transiting to abortable error state due to org.apache.kafka.common.errors.NetworkException: Disconnected from node 4 [Producer clientId=client-id2, transactionalId=transactional-id] Transiting to abortable error state due to org.apache.kafka.common.errors.TimeoutException: The request timed out. [Producer clientId=client-id2, transactionalId=transactional-id] Uncaught error in request completion: java.lang.IllegalStateException: TransactionalId transactional-id: Invalid transition attempted from state READY to state ABORTABLE_ERROR at org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:1089) at org.apache.kafka.clients.producer.internals.TransactionManager.transitionToAbortableError(TransactionManager.java:508) at org.apache.kafka.clients.producer.internals.TransactionManager.maybeTransitionToErrorState(TransactionManager.java:734) at org.apache.kafka.clients.producer.internals.TransactionManager.handleFailedBatch(TransactionManager.java:739) at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:753) at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:743) at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:695) at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:634) at org.apache.kafka.clients.producer.internals.Sender.lambda$null$1(Sender.java:575) at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) at org.apache.kafka.clients.producer.internals.Sender.lambda$handleProduceResponse$2(Sender.java:562) at java.base/java.lang.Iterable.forEach(Iterable.java:75) at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:562) at org.apache.kafka.clients.producer.internals.Sender.lambda$sendProduceRequest$5(Sender.java:836) at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109) at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:583) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:575) at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243) at java.base/java.lang.Thread.run(Thread.java:829) {code} The producer hits timeouts which cause it to abort an active transaction. After aborting, the producer bumps its epoch, which transitions it back to the `READY` state. Following this, there are two errors for inflight requests, which cause an illegal state transition to `ABORTABLE_ERROR`. But how could the transaction ABORT complete if there were still inflight requests? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14664) Raft idle ratio is inaccurate
[ https://issues.apache.org/jira/browse/KAFKA-14664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14664. - Resolution: Fixed > Raft idle ratio is inaccurate > - > > Key: KAFKA-14664 > URL: https://issues.apache.org/jira/browse/KAFKA-14664 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > Fix For: 3.5.0 > > > The `poll-idle-ratio-avg` metric is intended to track how idle the raft IO > thread is. When completely idle, it should measure 1. When saturated, it > should measure 0. The problem with the current measurements is that they are > treated equally with respect to time. For example, say we poll twice with the > following durations: > Poll 1: 2s > Poll 2: 0s > Assume that the busy time is negligible, so 2s passes overall. > In the first measurement, 2s is spent waiting, so we compute and record a > ratio of 1.0. In the second measurement, no time passes, and we record 0.0. > The idle ratio is then computed as the average of these two values (1.0 + 0.0 > / 2 = 0.5), which suggests that the process was busy for 1s, which > overestimates the true busy time. > Instead, we should sum up the time waiting over the full interval. 2s passes > total here and 2s is idle, so we should compute 1.0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-6793) Unnecessary warning log message
[ https://issues.apache.org/jira/browse/KAFKA-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-6793. Fix Version/s: 3.5.0 Resolution: Fixed We resolved this by changing the logging to info level. It may still be useful in some cases, but most of the time it the "warning" is spurious. > Unnecessary warning log message > > > Key: KAFKA-6793 > URL: https://issues.apache.org/jira/browse/KAFKA-6793 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.1.0 >Reporter: Anna O >Assignee: Philip Nee >Priority: Minor > Fix For: 3.5.0 > > > When upgraded KafkaStreams from 0.11.0.2 to 1.1.0 the following warning log > started to appear: > level: WARN > logger: org.apache.kafka.clients.consumer.ConsumerConfig > message: The configuration 'admin.retries' was supplied but isn't a known > config. > The config is not explicitly supplied to the streams. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-13972) Reassignment cancellation causes stray replicas
[ https://issues.apache.org/jira/browse/KAFKA-13972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13972. - Resolution: Fixed > Reassignment cancellation causes stray replicas > --- > > Key: KAFKA-13972 > URL: https://issues.apache.org/jira/browse/KAFKA-13972 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > Fix For: 3.4.1 > > > A stray replica is one that is left behind on a broker after the partition > has been reassigned to other brokers or the partition has been deleted. We > found one case where this can happen is after a cancelled reassignment. When > a reassignment is cancelled, the controller sends `StopReplica` requests to > any of the adding replicas, but it does not necessarily bump the leader > epoch. Following > [KIP-570|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-570%3A+Add+leader+epoch+in+StopReplicaRequest],] > brokers will ignore `StopReplica` requests if the leader epoch matches the > current partition leader epoch. So we need to bump the epoch whenever we need > to ensure that `StopReplica` will be received. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14672) Producer queue time does not reflect batches expired in the accumulator
Jason Gustafson created KAFKA-14672: --- Summary: Producer queue time does not reflect batches expired in the accumulator Key: KAFKA-14672 URL: https://issues.apache.org/jira/browse/KAFKA-14672 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson The producer exposes two metrics for the time a record has spent in the accumulator waiting to be drained: * `record-queue-time-avg` * `record-queue-time-max` The metric is only updated when a batch is drained for sending to the broker. It is also possible for a batch to be expired before it can be drained, but in this case, the metric is not updated. This seems surprising and makes the queue time misleading. The only metric I could find that does reflect batch expirations in the accumulator is the generic `record-error-rate`. It would make sense to let the queue-time metrics record the time spent in the queue regardless of the outcome of the record send attempt. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14664) Raft idle ratio is inaccurate
Jason Gustafson created KAFKA-14664: --- Summary: Raft idle ratio is inaccurate Key: KAFKA-14664 URL: https://issues.apache.org/jira/browse/KAFKA-14664 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Assignee: Jason Gustafson The `poll-idle-ratio-avg` metric is intended to track how idle the raft IO thread is. When completely idle, it should measure 1. When saturated, it should measure 0. The problem with the current measurements is that they are treated equally with respect to time. For example, say we poll twice with the following durations: Poll 1: 2s Poll 2: 0s Assume that the busy time is negligible, so 2s passes overall. In the first measurement, 2s is spent waiting, so we compute and record a ratio of 1.0. In the second measurement, no time passes, and we record 0.0. The idle ratio is then computed as the average of these two values (1.0 + 0.0 / 2 = 0.5), which suggests that the process was busy for 1s. Instead, we should sum up the time waiting over the full interval. 2s passes total here and 2s is idle, so we should compute 1.0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14644) Process should stop after failure in raft IO thread
[ https://issues.apache.org/jira/browse/KAFKA-14644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14644. - Fix Version/s: 3.5.0 Resolution: Fixed > Process should stop after failure in raft IO thread > --- > > Key: KAFKA-14644 > URL: https://issues.apache.org/jira/browse/KAFKA-14644 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Major > Fix For: 3.5.0 > > > We have seen a few cases where an unexpected error in the Raft IO thread > causes the process to enter a zombie state where it is no longer > participating in the raft quorum. In this state, a controller can no longer > become leader or help in elections, and brokers can no longer update > metadata. It may be better to stop the process in this case since there is no > way to recover. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14648) Do not fail clients if bootstrap servers is not immediately resolvable
Jason Gustafson created KAFKA-14648: --- Summary: Do not fail clients if bootstrap servers is not immediately resolvable Key: KAFKA-14648 URL: https://issues.apache.org/jira/browse/KAFKA-14648 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson In dynamic environments, such as system tests, there is sometimes a delay between when a client is initialized and when the configured bootstrap servers become available in DNS. Currently clients will fail immediately if none of the bootstrap servers can resolve. It would be more convenient for these environments to provide a grace period to give more time for initialization. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14644) Process should stop after failure in raft IO thread
Jason Gustafson created KAFKA-14644: --- Summary: Process should stop after failure in raft IO thread Key: KAFKA-14644 URL: https://issues.apache.org/jira/browse/KAFKA-14644 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson We have seen a few cases where an unexpected error in the Raft IO thread causes the process to enter a zombie state where it is no longer participating in the raft quorum. In this state, a controller can no longer become leader or help in elections, and brokers can no longer update metadata. It may be better to stop the process in this case since there is no way to recover. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14612) Topic config records written to log even when topic creation fails
[ https://issues.apache.org/jira/browse/KAFKA-14612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14612. - Fix Version/s: 3.4.0 Resolution: Fixed > Topic config records written to log even when topic creation fails > -- > > Key: KAFKA-14612 > URL: https://issues.apache.org/jira/browse/KAFKA-14612 > Project: Kafka > Issue Type: Bug > Components: kraft >Reporter: Jason Gustafson >Assignee: Andrew Grant >Priority: Major > Fix For: 3.4.0 > > > Config records are added when handling a `CreateTopics` request here: > [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java#L549.] > If the subsequent validations fail and the topic is not created, these > records will still be written to the log. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14618) Off by one error in generated snapshot IDs causes misaligned fetching
Jason Gustafson created KAFKA-14618: --- Summary: Off by one error in generated snapshot IDs causes misaligned fetching Key: KAFKA-14618 URL: https://issues.apache.org/jira/browse/KAFKA-14618 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Assignee: Jason Gustafson Fix For: 3.4.0 We implemented new snapshot generation logic here: [https://github.com/apache/kafka/pull/12983.] A few days prior to this patch getting merged, we had changed the `RaftClient` API to pass the _exclusive_ offset when generating snapshots instead of the inclusive offset: [https://github.com/apache/kafka/pull/12981.] Unfortunately, the new snapshot generation logic was not updated accordingly. The consequence of this is that the state on replicas can get out of sync. In the best case, the followers fail replication because the offset after loading a snapshot is no longer aligned on a batch boundary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14612) Topic config records written to log even when topic creation fails
Jason Gustafson created KAFKA-14612: --- Summary: Topic config records written to log even when topic creation fails Key: KAFKA-14612 URL: https://issues.apache.org/jira/browse/KAFKA-14612 Project: Kafka Issue Type: Bug Components: kraft Reporter: Jason Gustafson Config records are added when handling a `CreateTopics` request here: [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java#L549.] If the subsequent validations fail and the topic is not created, these records will still be written to the log. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14417) Producer doesn't handle REQUEST_TIMED_OUT for InitProducerIdRequest, treats as fatal error
[ https://issues.apache.org/jira/browse/KAFKA-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14417. - Fix Version/s: 4.0.0 3.3.2 Resolution: Fixed > Producer doesn't handle REQUEST_TIMED_OUT for InitProducerIdRequest, treats > as fatal error > -- > > Key: KAFKA-14417 > URL: https://issues.apache.org/jira/browse/KAFKA-14417 > Project: Kafka > Issue Type: Task >Affects Versions: 3.1.0, 3.0.0, 3.2.0, 3.3.0, 3.4.0 >Reporter: Justine Olshan >Assignee: Justine Olshan >Priority: Blocker > Fix For: 4.0.0, 3.3.2 > > > In TransactionManager we have a handler for InitProducerIdRequests > [https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#LL1276C14-L1276C14] > However, we have the potential to return a REQUEST_TIMED_OUT error in > RPCProducerIdManager when the BrokerToControllerChannel manager times out: > [https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/core/src/main/scala/kafka/coordinator/transaction/ProducerIdManager.scala#L236] > > or when the poll returns null: > [https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/core/src/main/scala/kafka/coordinator/transaction/ProducerIdManager.scala#L170] > Since REQUEST_TIMED_OUT is not handled by the producer, we treat it as a > fatal error and the producer fails. With the default of idempotent producers, > this can cause more issues. > See this stack trace from 3.0: > {code:java} > ERROR [Producer clientId=console-producer] Aborting producer batches due to > fatal error (org.apache.kafka.clients.producer.internals.Sender) > org.apache.kafka.common.KafkaException: Unexpected error in > InitProducerIdResponse; The request timed out. > at > org.apache.kafka.clients.producer.internals.TransactionManager$InitProducerIdHandler.handleResponse(TransactionManager.java:1390) > at > org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1294) > at > org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109) > at > org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:658) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:650) > at > org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:418) > at > org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:316) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:256) > at java.lang.Thread.run(Thread.java:748) > {code} > Seems like the commit that introduced the changes was this one: > [https://github.com/apache/kafka/commit/72d108274c98dca44514007254552481c731c958] > so we are vulnerable when the server code is ibp 3.0 and beyond. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14397) Idempotent producer may bump epoch and reset sequence numbers prematurely
Jason Gustafson created KAFKA-14397: --- Summary: Idempotent producer may bump epoch and reset sequence numbers prematurely Key: KAFKA-14397 URL: https://issues.apache.org/jira/browse/KAFKA-14397 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Assignee: Jason Gustafson Suppose that idempotence is enabled in the producer and we send the following single-record batches to a partition leader: * A: epoch=0, seq=0 * B: epoch=0, seq=1 * C: epoch=0, seq=2 The partition leader receives all 3 of these batches and commits them to the log. However, the connection is lost before the `Produce` responses are received by the client. Subsequent retries by the producer all fail to be delivered. It is possible in this scenario for the first batch `A` to reach the delivery timeout before the subsequence batches. This triggers the following check: [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L642.] Depending whether retries are exhausted, we may adjust sequence numbers. The intuition behind this check is that if retries have not been exhausted, then we saw a fatal error and the batch could not have been written to the log. Hence we should bump the epoch and adjust the sequence numbers of the pending batches since they are presumed to be doomed to failure. So in this case, batches B and C might get reset with the bumped epoch: * B: epoch=1, seq=0 * C: epoch=1, seq=1 This can result in duplicate records in the log. The root of the issue is that this logic does not account for expiration of the delivery timeout. When the delivery timeout is reached, the number of retries is still likely much lower than the max allowed number of retries (which is `Int.MaxValue` by default). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-13964) kafka-configs.sh end with UnsupportedVersionException when describing TLS user with quotas
[ https://issues.apache.org/jira/browse/KAFKA-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13964. - Resolution: Duplicate Thanks for reporting the issue. This will be resolved by https://issues.apache.org/jira/browse/KAFKA-14084. > kafka-configs.sh end with UnsupportedVersionException when describing TLS > user with quotas > --- > > Key: KAFKA-13964 > URL: https://issues.apache.org/jira/browse/KAFKA-13964 > Project: Kafka > Issue Type: Bug > Components: admin, kraft >Affects Versions: 3.2.0 > Environment: Kafka 3.2.0 running on OpenShift 4.10 in KRaft mode > managed by Strimzi >Reporter: Jakub Stejskal >Priority: Minor > > {color:#424242}Usage of {color:#424242}kafka-configs.sh end with > {color:#424242}org.apache.kafka.common.errors.UnsupportedVersionException: > The broker does not support DESCRIBE_USER_SCRAM_CREDENTIALS when describing > TLS user with quotas enabled. {color}{color}{color} > > {code:java} > bin/kafka-configs.sh --bootstrap-server localhost:9092 --describe --user > CN=encrypted-arnost` got status code 1 and stderr: -- Error while > executing config command with args '--bootstrap-server localhost:9092 > --describe --user CN=encrypted-arnost' > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.UnsupportedVersionException: The broker does > not support DESCRIBE_USER_SCRAM_CREDENTIALS{code} > STDOUT contains all necessary data, but the script itself ends with return > code 1 and the error above. Scram-sha has not been configured anywhere in > that case (not supported by KRaft). This might be fixed by adding support for > scram-sha in the next version (not reproducible without KRaft enabled). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14316) NoSuchElementException in feature control iterator
[ https://issues.apache.org/jira/browse/KAFKA-14316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14316. - Fix Version/s: 3.4.0 3.3.2 Resolution: Fixed > NoSuchElementException in feature control iterator > -- > > Key: KAFKA-14316 > URL: https://issues.apache.org/jira/browse/KAFKA-14316 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > Fix For: 3.4.0, 3.3.2 > > > We noticed this exception during testing: > {code:java} > java.util.NoSuchElementException > 2 at > org.apache.kafka.timeline.SnapshottableHashTable$HistoricalIterator.next(SnapshottableHashTable.java:276) > 3 at > org.apache.kafka.timeline.SnapshottableHashTable$HistoricalIterator.next(SnapshottableHashTable.java:189) > 4 at > org.apache.kafka.timeline.TimelineHashMap$EntryIterator.next(TimelineHashMap.java:360) > 5 at > org.apache.kafka.timeline.TimelineHashMap$EntryIterator.next(TimelineHashMap.java:346) > 6 at > org.apache.kafka.controller.FeatureControlManager$FeatureControlIterator.next(FeatureControlManager.java:375) > 7 at > org.apache.kafka.controller.FeatureControlManager$FeatureControlIterator.next(FeatureControlManager.java:347) > 8 at > org.apache.kafka.controller.SnapshotGenerator.generateBatch(SnapshotGenerator.java:109) > 9 at > org.apache.kafka.controller.SnapshotGenerator.generateBatches(SnapshotGenerator.java:126) > 10at > org.apache.kafka.controller.QuorumController$SnapshotGeneratorManager.run(QuorumController.java:637) > 11at > org.apache.kafka.controller.QuorumController$ControlEvent.run(QuorumController.java:539) > 12at > org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121) > 13at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200) > 14at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173) > 15at java.base/java.lang.Thread.run(Thread.java:833) > 16at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:64) > {code} > The iterator `FeatureControlIterator.hasNext()` checks two conditions: 1) > whether we have already written the metadata version, and 2) whether the > underlying iterator has additional records. However, in `next()`, we also > check that the metadata version is at least high enough to include it in the > log. When this fails, then we can see an unexpected `NoSuchElementException` > if the underlying iterator is empty. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14319) Storage tool format command does not work with old metadata versions
Jason Gustafson created KAFKA-14319: --- Summary: Storage tool format command does not work with old metadata versions Key: KAFKA-14319 URL: https://issues.apache.org/jira/browse/KAFKA-14319 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson When using the format tool with older metadata versions, we see the following error: {code:java} $ bin/kafka-storage.sh format --cluster-id vVlhxM7VT3C-3nz7yEkiCQ --config config/kraft/server.properties --release-version "3.1-IV0" Exception in thread "main" java.lang.RuntimeException: Bootstrap metadata versions before 3.3-IV0 are not supported. Can't load metadata from format command at org.apache.kafka.metadata.bootstrap.BootstrapMetadata.(BootstrapMetadata.java:83) at org.apache.kafka.metadata.bootstrap.BootstrapMetadata.fromVersion(BootstrapMetadata.java:48) at kafka.tools.StorageTool$.$anonfun$formatCommand$2(StorageTool.scala:265) at kafka.tools.StorageTool$.$anonfun$formatCommand$2$adapted(StorageTool.scala:254) at scala.collection.immutable.List.foreach(List.scala:333) at kafka.tools.StorageTool$.formatCommand(StorageTool.scala:254) at kafka.tools.StorageTool$.main(StorageTool.scala:61) at kafka.tools.StorageTool.main(StorageTool.scala) {code} For versions prior to `3.3-IV0`, we should skip creation of the `bootstrap.checkpoint` file instead of failing. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14316) NoSuchElementException in feature control iterator
Jason Gustafson created KAFKA-14316: --- Summary: NoSuchElementException in feature control iterator Key: KAFKA-14316 URL: https://issues.apache.org/jira/browse/KAFKA-14316 Project: Kafka Issue Type: Bug Affects Versions: 3.3.0 Reporter: Jason Gustafson Assignee: Jason Gustafson We noticed this exception during testing: {code:java} java.util.NoSuchElementException 2 at org.apache.kafka.timeline.SnapshottableHashTable$HistoricalIterator.next(SnapshottableHashTable.java:276) 3 at org.apache.kafka.timeline.SnapshottableHashTable$HistoricalIterator.next(SnapshottableHashTable.java:189) 4 at org.apache.kafka.timeline.TimelineHashMap$EntryIterator.next(TimelineHashMap.java:360) 5 at org.apache.kafka.timeline.TimelineHashMap$EntryIterator.next(TimelineHashMap.java:346) 6 at org.apache.kafka.controller.FeatureControlManager$FeatureControlIterator.next(FeatureControlManager.java:375) 7 at org.apache.kafka.controller.FeatureControlManager$FeatureControlIterator.next(FeatureControlManager.java:347) 8 at org.apache.kafka.controller.SnapshotGenerator.generateBatch(SnapshotGenerator.java:109) 9 at org.apache.kafka.controller.SnapshotGenerator.generateBatches(SnapshotGenerator.java:126) 10 at org.apache.kafka.controller.QuorumController$SnapshotGeneratorManager.run(QuorumController.java:637) 11 at org.apache.kafka.controller.QuorumController$ControlEvent.run(QuorumController.java:539) 12 at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121) 13 at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200) 14 at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173) 15 at java.base/java.lang.Thread.run(Thread.java:833) 16 at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:64) {code} The iterator `FeatureControlIterator.hasNext()` checks two conditions: 1) whether we have already written the metadata version, and 2) whether the underlying iterator has additional records. However, in `next()`, we also check that the metadata version is at least high enough to include it in the log. When this fails, then we can see an unexpected `NoSuchElementException` if the underlying iterator is empty. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14296) Partition leaders are not demoted during kraft controlled shutdown
[ https://issues.apache.org/jira/browse/KAFKA-14296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14296. - Fix Version/s: 3.4.0 3.3.2 Resolution: Fixed > Partition leaders are not demoted during kraft controlled shutdown > -- > > Key: KAFKA-14296 > URL: https://issues.apache.org/jira/browse/KAFKA-14296 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.3.1 >Reporter: David Jacot >Assignee: David Jacot >Priority: Major > Fix For: 3.4.0, 3.3.2 > > > When the BrokerServer starts its shutting down process, it transitions to > SHUTTING_DOWN and sets isShuttingDown to true. With this state change, the > follower state changes are short-cutted. This means that a broker which was > serving as leader would remain acting as a leader until controlled shutdown > completes. Instead, we want the leader and ISR state to be updated so that > requests will return NOT_LEADER and the client can find the new leader. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14292) KRaft broker controlled shutdown can be delayed indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14292. - Fix Version/s: 3.4.0 3.3.2 Resolution: Fixed > KRaft broker controlled shutdown can be delayed indefinitely > > > Key: KAFKA-14292 > URL: https://issues.apache.org/jira/browse/KAFKA-14292 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Alyssa Huang >Priority: Major > Fix For: 3.4.0, 3.3.2 > > > We noticed when rolling a kraft cluster that it took an unexpectedly long > time for one of the brokers to shutdown. In the logs, we saw the following: > {code:java} > Oct 11, 2022 @ 17:53:38.277 [Controller 1] The request from broker 8 to > shut down can not yet be granted because the lowest active offset 2283357 is > not greater than the broker's shutdown offset 2283358. > org.apache.kafka.controller.BrokerHeartbeatManager DEBUG > 2Oct 11, 2022 @ 17:53:38.277 [Controller 1] Updated the controlled shutdown > offset for broker 8 to 2283362. > org.apache.kafka.controller.BrokerHeartbeatManager DEBUG > 3Oct 11, 2022 @ 17:53:40.278 [Controller 1] Updated the controlled shutdown > offset for broker 8 to 2283366. > org.apache.kafka.controller.BrokerHeartbeatManager DEBUG > 4Oct 11, 2022 @ 17:53:40.278 [Controller 1] The request from broker 8 to > shut down can not yet be granted because the lowest active offset 2283361 is > not greater than the broker's shutdown offset 2283362. > org.apache.kafka.controller.BrokerHeartbeatManager DEBUG > 5Oct 11, 2022 @ 17:53:42.279 [Controller 1] The request from broker 8 to > shut down can not yet be granted because the lowest active offset 2283365 is > not greater than the broker's shutdown offset 2283366. > org.apache.kafka.controller.BrokerHeartbeatManager DEBUG > 6Oct 11, 2022 @ 17:53:42.279 [Controller 1] Updated the controlled shutdown > offset for broker 8 to 2283370. > org.apache.kafka.controller.BrokerHeartbeatManager DEBUG > 7Oct 11, 2022 @ 17:53:44.280 [Controller 1] The request from broker 8 to > shut down can not yet be granted because the lowest active offset 2283369 is > not greater than the broker's shutdown offset 2283370. > org.apache.kafka.controller.BrokerHeartbeatManager DEBUG > 8Oct 11, 2022 @ 17:53:44.281 [Controller 1] Updated the controlled shutdown > offset for broker 8 to 2283374. > org.apache.kafka.controller.BrokerHeartbeatManager DEBUG{code} > From what I can tell, it looks like the controller waits until all brokers > have caught up to the {{controlledShutdownOffset}} of the broker that is > shutting down before allowing it to proceed. Probably the intent is to make > sure they have all the leader and ISR state. > The problem is that the {{controlledShutdownOffset}} seems to be updated > after every heartbeat that the controller receives: > https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/QuorumController.java#L1996. > Unless all other brokers can catch up to that offset before the next > heartbeat from the shutting down broker is received, then the broker remains > in the shutting down state indefinitely. > In this case, it took more than 40 minutes before the broker completed > shutdown: > {code:java} > 1Oct 11, 2022 @ 18:36:36.105 [Controller 1] The request from broker 8 to > shut down has been granted since the lowest active offset 2288510 is now > greater than the broker's controlled shutdown offset 2288510. > org.apache.kafka.controller.BrokerHeartbeatManager INFO > 2Oct 11, 2022 @ 18:40:35.197 [Controller 1] The request from broker 8 to > unfence has been granted because it has caught up with the offset of it's > register broker record 2288906. > org.apache.kafka.controller.BrokerHeartbeatManager INFO{code} > It seems like the bug here is that we should not keep updating > {{controlledShutdownOffset}} if it has already been set. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14292) KRaft broker controlled shutdown can be delayed indefinitely
Jason Gustafson created KAFKA-14292: --- Summary: KRaft broker controlled shutdown can be delayed indefinitely Key: KAFKA-14292 URL: https://issues.apache.org/jira/browse/KAFKA-14292 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Assignee: Alyssa Huang We noticed when rolling a kraft cluster that it took an unexpectedly long time for one of the brokers to shutdown. In the logs, we saw the following: {code:java} Oct 11, 2022 @ 17:53:38.277 [Controller 1] The request from broker 8 to shut down can not yet be granted because the lowest active offset 2283357 is not greater than the broker's shutdown offset 2283358. org.apache.kafka.controller.BrokerHeartbeatManager DEBUG 2Oct 11, 2022 @ 17:53:38.277[Controller 1] Updated the controlled shutdown offset for broker 8 to 2283362. org.apache.kafka.controller.BrokerHeartbeatManager DEBUG 3Oct 11, 2022 @ 17:53:40.278[Controller 1] Updated the controlled shutdown offset for broker 8 to 2283366. org.apache.kafka.controller.BrokerHeartbeatManager DEBUG 4Oct 11, 2022 @ 17:53:40.278[Controller 1] The request from broker 8 to shut down can not yet be granted because the lowest active offset 2283361 is not greater than the broker's shutdown offset 2283362. org.apache.kafka.controller.BrokerHeartbeatManager DEBUG 5Oct 11, 2022 @ 17:53:42.279[Controller 1] The request from broker 8 to shut down can not yet be granted because the lowest active offset 2283365 is not greater than the broker's shutdown offset 2283366. org.apache.kafka.controller.BrokerHeartbeatManager DEBUG 6Oct 11, 2022 @ 17:53:42.279[Controller 1] Updated the controlled shutdown offset for broker 8 to 2283370. org.apache.kafka.controller.BrokerHeartbeatManager DEBUG 7Oct 11, 2022 @ 17:53:44.280[Controller 1] The request from broker 8 to shut down can not yet be granted because the lowest active offset 2283369 is not greater than the broker's shutdown offset 2283370. org.apache.kafka.controller.BrokerHeartbeatManager DEBUG 8Oct 11, 2022 @ 17:53:44.281[Controller 1] Updated the controlled shutdown offset for broker 8 to 2283374. org.apache.kafka.controller.BrokerHeartbeatManager DEBUG{code} >From what I can tell, it looks like the controller waits until all brokers >have caught up to the {{controlledShutdownOffset}} of the broker that is >shutting down before allowing it to proceed. Probably the intent is to make >sure they have all the leader and ISR state. The problem is that the {{controlledShutdownOffset}} seems to be updated after every heartbeat that the controller receives: https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/QuorumController.java#L1996. Unless all other brokers can catch up to that offset before the next heartbeat from the shutting down broker is received, then the broker remains in the shutting down state indefinitely. In this case, it took more than 40 minutes before the broker completed shutdown: {code:java} 1Oct 11, 2022 @ 18:36:36.105[Controller 1] The request from broker 8 to shut down has been granted since the lowest active offset 2288510 is now greater than the broker's controlled shutdown offset 2288510. org.apache.kafka.controller.BrokerHeartbeatManager INFO 2Oct 11, 2022 @ 18:40:35.197[Controller 1] The request from broker 8 to unfence has been granted because it has caught up with the offset of it's register broker record 2288906. org.apache.kafka.controller.BrokerHeartbeatManager INFO{code} It seems like the bug here is that we should not keep updating {{controlledShutdownOffset}} if it has already been set. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14247) Implement EventHandler interface and DefaultEventHandler
[ https://issues.apache.org/jira/browse/KAFKA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14247. - Resolution: Fixed > Implement EventHandler interface and DefaultEventHandler > > > Key: KAFKA-14247 > URL: https://issues.apache.org/jira/browse/KAFKA-14247 > Project: Kafka > Issue Type: Sub-task > Components: consumer >Reporter: Philip Nee >Assignee: Philip Nee >Priority: Major > > The polling thread uses events to communicate with the background thread. > The events send to the background thread are the {_}Requests{_}, and the > events send from the background thread to the polling thread are the > {_}Responses{_}. > > Here we have an EventHandler interface and DefaultEventHandler > implementation. The implementation uses two blocking queues to send events > both ways. The two methods, add and poll allows the client, i.e., the > polling thread, to retrieve and add events to the handler. > > PR: https://github.com/apache/kafka/pull/12663 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14236) ListGroups request produces too much Denied logs in authorizer
[ https://issues.apache.org/jira/browse/KAFKA-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14236. - Fix Version/s: 3.4.0 Resolution: Fixed > ListGroups request produces too much Denied logs in authorizer > -- > > Key: KAFKA-14236 > URL: https://issues.apache.org/jira/browse/KAFKA-14236 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 2.0.0 >Reporter: Alexandre GRIFFAUT >Priority: Major > Labels: patch-available > Fix For: 3.4.0 > > > Context > On a multi-tenant secured cluster, with many consumers, a call to ListGroups > api will log an authorization error for each consumer group of other tenant. > Reason > The handleListGroupsRequest function first tries to authorize a DESCRIBE > CLUSTER, and if it fails it will then try to authorize a DESCRIBE GROUP on > each consumer group. > Fix > In that case neither the DESCRIBE CLUSTER, nor the DESCRIBE GROUP of other > tenant were intended, and should be specified in the Action using > logIfDenied: false -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14240) Ensure kraft metadata log dir is initialized with valid snapshot state
[ https://issues.apache.org/jira/browse/KAFKA-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14240. - Resolution: Fixed > Ensure kraft metadata log dir is initialized with valid snapshot state > -- > > Key: KAFKA-14240 > URL: https://issues.apache.org/jira/browse/KAFKA-14240 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > Fix For: 3.3.0 > > > If the first segment under __cluster_metadata has a base offset greater than > 0, then there must exist at least one snapshot which has a larger offset than > whatever the first segment starts at. We should check for this at startup to > prevent the controller from initialization with invalid state. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14238) KRaft replicas can delete segments not included in a snapshot
[ https://issues.apache.org/jira/browse/KAFKA-14238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14238. - Resolution: Fixed > KRaft replicas can delete segments not included in a snapshot > - > > Key: KAFKA-14238 > URL: https://issues.apache.org/jira/browse/KAFKA-14238 > Project: Kafka > Issue Type: Bug > Components: core, kraft >Reporter: Jose Armando Garcia Sancio >Assignee: Jose Armando Garcia Sancio >Priority: Blocker > Fix For: 3.3.0 > > > We see this in the log > {code:java} > Deleting segment LogSegment(baseOffset=243864, size=9269150, > lastModifiedTime=1662486784182, largestRecordTimestamp=Some(1662486784160)) > due to retention time 60480ms breach based on the largest record > timestamp in the segment {code} > This then cause {{KafkaRaftClient}} to throw an exception when sending > batches to the listener: > {code:java} > java.lang.IllegalStateException: Snapshot expected since next offset of > org.apache.kafka.controller.QuorumController$QuorumMetaLogListener@195461949 > is 0, log start offset is 369668 and high-watermark is 547379 > at > org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$4(KafkaRaftClient.java:312) > at java.base/java.util.Optional.orElseThrow(Optional.java:403) > at > org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$5(KafkaRaftClient.java:311) > at java.base/java.util.OptionalLong.ifPresent(OptionalLong.java:165) > at > org.apache.kafka.raft.KafkaRaftClient.updateListenersProgress(KafkaRaftClient.java:309){code} > The on disk state for the cluster metadata partition confirms this: > {code:java} > ls __cluster_metadata-0/ > 00369668.index > 00369668.log > 00369668.timeindex > 00503411.index > 00503411.log > 00503411.snapshot > 00503411.timeindex > 00548746.snapshot > leader-epoch-checkpoint > partition.metadata > quorum-state{code} > Noticed that there are no {{checkpoint}} files and the log doesn't have a > segment at base offset 0. > This is happening because the {{LogConfig}} used for KRaft sets the retention > policy to {{delete}} which causes the method {{deleteOldSegments}} to delete > old segments even if there are no snaspshot for it. For KRaft, Kafka should > only delete segment that breach the log start offset. > Log configuration for KRaft: > {code:java} > val props = new Properties() > props.put(LogConfig.MaxMessageBytesProp, > config.maxBatchSizeInBytes.toString) > props.put(LogConfig.SegmentBytesProp, Int.box(config.logSegmentBytes)) > props.put(LogConfig.SegmentMsProp, Long.box(config.logSegmentMillis)) > props.put(LogConfig.FileDeleteDelayMsProp, > Int.box(Defaults.FileDeleteDelayMs)) > LogConfig.validateValues(props) > val defaultLogConfig = LogConfig(props){code} > Segment deletion code: > {code:java} > def deleteOldSegments(): Int = { > if (config.delete) { > deleteLogStartOffsetBreachedSegments() + > deleteRetentionSizeBreachedSegments() + > deleteRetentionMsBreachedSegments() > } else { > deleteLogStartOffsetBreachedSegments() > } > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14240) Ensure kraft metadata log dir is initialized with valid snapshot state
Jason Gustafson created KAFKA-14240: --- Summary: Ensure kraft metadata log dir is initialized with valid snapshot state Key: KAFKA-14240 URL: https://issues.apache.org/jira/browse/KAFKA-14240 Project: Kafka Issue Type: Improvement Reporter: Jason Gustafson Assignee: Jason Gustafson If the first segment under __cluster_metadata has a base offset greater than 0, then there must exist at least one snapshot which has a larger offset than whatever the first segment starts at. We should check for this at startup to prevent the controller from initialization with invalid state. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14215) KRaft forwarded requests have no quota enforcement
[ https://issues.apache.org/jira/browse/KAFKA-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14215. - Resolution: Fixed > KRaft forwarded requests have no quota enforcement > -- > > Key: KAFKA-14215 > URL: https://issues.apache.org/jira/browse/KAFKA-14215 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.3.0, 3.3 >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Blocker > Fix For: 3.3.0, 3.3 > > > On the broker, the `BrokerMetadataPublisher` is responsible for propagating > quota changes from `ClientQuota` records to `ClientQuotaManager`. On the > controller, there is no similar logic, so no client quotas are enforced on > the controller. > On the broker side, there is no enforcement as well since the broker assumes > that the controller will be the one to do it. Basically it looks at the > throttle time returned in the response from the controller. If it is 0, then > the response is sent immediately without any throttling. > So the consequence of both of these issues is that controller-bound requests > have no throttling today. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14224) Consumer should auto-commit prior to cooperative partition revocation
Jason Gustafson created KAFKA-14224: --- Summary: Consumer should auto-commit prior to cooperative partition revocation Key: KAFKA-14224 URL: https://issues.apache.org/jira/browse/KAFKA-14224 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson With the old "eager" reassignment logic, we always revoked all partitions prior to each rebalance. When auto-commit is enabled, a part of this process is committing current position. Under the cooperative logic, we defer revocation until after the rebalance, which means we can continue fetching while the rebalance is in progress. However, when reviewing KAFKA-14196, we noticed that there is no similar logic to commit offsets prior to this deferred revocation. This means that cooperative consumption is more likely to lead to have duplicate consumption even when there is no failure involved. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14215) KRaft forwarded requests have no quota enforcement
Jason Gustafson created KAFKA-14215: --- Summary: KRaft forwarded requests have no quota enforcement Key: KAFKA-14215 URL: https://issues.apache.org/jira/browse/KAFKA-14215 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson On the broker, the `BrokerMetadataPublisher` is responsible for propagating quota changes from `ClientQuota` records to `ClientQuotaManager`. On the controller, there is no similar logic, so no client quotas are enforced on the controller. On the broker side, there is no enforcement as well since the broker assumes that the controller will be the one to do it. Basically it looks at the throttle time returned in the response from the controller. If it is 0, then the response is sent immediately without any throttling. So the consequence of both of these issues is that controller-bound requests have no throttling today. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14201) Consumer should not send group instance ID if committing with empty member ID
Jason Gustafson created KAFKA-14201: --- Summary: Consumer should not send group instance ID if committing with empty member ID Key: KAFKA-14201 URL: https://issues.apache.org/jira/browse/KAFKA-14201 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson The consumer group instance ID is used to support a notion of "static" consumer groups. The idea is to be able to identify the same group instance across restarts so that a rebalance is not needed. However, if the user sets `group.instance.id` in the consumer configuration, but uses "simple" assignment with `assign()`, then the instance ID nevertheless is sent in the OffsetCommit request to the coordinator. This may result in a surprising UNKNOWN_MEMBER_ID error. The consumer should probably be smart enough to only send the instance ID when committing as part of a consumer group. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14177) Correctly support older kraft versions without FeatureLevelRecord
[ https://issues.apache.org/jira/browse/KAFKA-14177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14177. - Resolution: Fixed > Correctly support older kraft versions without FeatureLevelRecord > - > > Key: KAFKA-14177 > URL: https://issues.apache.org/jira/browse/KAFKA-14177 > Project: Kafka > Issue Type: Bug >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Blocker > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14183) Kraft bootstrap metadata file should use snapshot header/footer
Jason Gustafson created KAFKA-14183: --- Summary: Kraft bootstrap metadata file should use snapshot header/footer Key: KAFKA-14183 URL: https://issues.apache.org/jira/browse/KAFKA-14183 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Fix For: 3.3.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-13166) EOFException when Controller handles unknown API
[ https://issues.apache.org/jira/browse/KAFKA-13166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13166. - Resolution: Fixed > EOFException when Controller handles unknown API > > > Key: KAFKA-13166 > URL: https://issues.apache.org/jira/browse/KAFKA-13166 > Project: Kafka > Issue Type: Bug >Reporter: David Arthur >Assignee: David Arthur >Priority: Minor > Fix For: 3.3.0, 3.4.0 > > > When ControllerApis handles an unsupported RPC, it silently drops the request > due to an unhandled exception. > The following stack trace was manually printed since this exception was > suppressed on the controller. > {code} > java.util.NoSuchElementException: key not found: UpdateFeatures > at scala.collection.MapOps.default(Map.scala:274) > at scala.collection.MapOps.default$(Map.scala:273) > at scala.collection.AbstractMap.default(Map.scala:405) > at scala.collection.mutable.HashMap.apply(HashMap.scala:425) > at kafka.network.RequestChannel$Metrics.apply(RequestChannel.scala:74) > at > kafka.network.RequestChannel.$anonfun$updateErrorMetrics$1(RequestChannel.scala:458) > at > kafka.network.RequestChannel.$anonfun$updateErrorMetrics$1$adapted(RequestChannel.scala:457) > at > kafka.utils.Implicits$MapExtensionMethods$.$anonfun$forKeyValue$1(Implicits.scala:62) > at > scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry(JavaCollectionWrappers.scala:359) > at > scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry$(JavaCollectionWrappers.scala:355) > at > scala.collection.convert.JavaCollectionWrappers$AbstractJMapWrapper.foreachEntry(JavaCollectionWrappers.scala:309) > at > kafka.network.RequestChannel.updateErrorMetrics(RequestChannel.scala:457) > at kafka.network.RequestChannel.sendResponse(RequestChannel.scala:388) > at > kafka.server.RequestHandlerHelper.sendErrorOrCloseConnection(RequestHandlerHelper.scala:93) > at > kafka.server.RequestHandlerHelper.sendErrorResponseMaybeThrottle(RequestHandlerHelper.scala:121) > at > kafka.server.RequestHandlerHelper.handleError(RequestHandlerHelper.scala:78) > at kafka.server.ControllerApis.handle(ControllerApis.scala:116) > at > kafka.server.ControllerApis.$anonfun$handleEnvelopeRequest$1(ControllerApis.scala:125) > at > kafka.server.ControllerApis.$anonfun$handleEnvelopeRequest$1$adapted(ControllerApis.scala:125) > at > kafka.server.EnvelopeUtils$.handleEnvelopeRequest(EnvelopeUtils.scala:65) > at > kafka.server.ControllerApis.handleEnvelopeRequest(ControllerApis.scala:125) > at kafka.server.ControllerApis.handle(ControllerApis.scala:103) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75) > at java.lang.Thread.run(Thread.java:748) > {code} > This is due to a bug in the metrics code in RequestChannel. > The result is that the request fails, but no indication is given that it was > due to an unsupported API on either the broker, controller, or client. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-13914) Implement kafka-metadata-quorum.sh
[ https://issues.apache.org/jira/browse/KAFKA-13914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13914. - Fix Version/s: 3.3.0 Resolution: Fixed > Implement kafka-metadata-quorum.sh > -- > > Key: KAFKA-13914 > URL: https://issues.apache.org/jira/browse/KAFKA-13914 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: dengziming >Priority: Major > Fix For: 3.3.0 > > > KIP-595 documents a tool for describing quorum status > `kafka-metadata-quorum.sh`: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum#KIP595:ARaftProtocolfortheMetadataQuorum-ToolingSupport.] > We need to implement this. > Note that this depends on the Admin API for `DescribeQuorum`, which is > proposed in KIP-836: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-836%3A+Addition+of+Information+in+DescribeQuorumResponse+about+Voter+Lag.] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14167) Unexpected UNKNOWN_SERVER_ERROR raised from kraft controller
[ https://issues.apache.org/jira/browse/KAFKA-14167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14167. - Resolution: Fixed > Unexpected UNKNOWN_SERVER_ERROR raised from kraft controller > > > Key: KAFKA-14167 > URL: https://issues.apache.org/jira/browse/KAFKA-14167 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > Fix For: 3.3.0 > > > In `ControllerApis`, we have callbacks such as the following after completion: > {code:java} > controller.allocateProducerIds(context, allocatedProducerIdsRequest.data) > .handle[Unit] { (results, exception) => > if (exception != null) { > requestHelper.handleError(request, exception) > } else { > requestHelper.sendResponseMaybeThrottle(request, requestThrottleMs > => { > results.setThrottleTimeMs(requestThrottleMs) > new AllocateProducerIdsResponse(results) > }) > } > } {code} > What I see locally is that the underlying exception that gets passed to > `handle` always gets wrapped in a `CompletionException`. When passed to > `getErrorResponse`, this error will get converted to `UNKNOWN_SERVER_ERROR`. > For example, in this case, a `NOT_CONTROLLER` error returned from the > controller would be returned as `UNKNOWN_SERVER_ERROR`. It looks like there > are a few APIs that are potentially affected by this bug, such as > `DeleteTopics` and `UpdateFeatures`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-13940) DescribeQuorum returns INVALID_REQUEST if not handled by leader
[ https://issues.apache.org/jira/browse/KAFKA-13940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13940. - Resolution: Fixed > DescribeQuorum returns INVALID_REQUEST if not handled by leader > --- > > Key: KAFKA-13940 > URL: https://issues.apache.org/jira/browse/KAFKA-13940 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > Fix For: 3.3.0 > > > In `KafkaRaftClient.handleDescribeQuorum`, we currently return > INVALID_REQUEST if the node is not the current raft leader. This is > surprising and doesn't work with our general approach for retrying forwarded > APIs. In `BrokerToControllerChannelManager`, we only retry after > `NOT_CONTROLLER` errors. It would be more consistent with the other Raft APIs > if we returned NOT_LEADER_OR_FOLLOWER, but that also means we need additional > logic in `BrokerToControllerChannelManager` to handle that error and retry > correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14167) Unexpected UNKNOWN_SERVER_ERROR raised from kraft controller
Jason Gustafson created KAFKA-14167: --- Summary: Unexpected UNKNOWN_SERVER_ERROR raised from kraft controller Key: KAFKA-14167 URL: https://issues.apache.org/jira/browse/KAFKA-14167 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson In `ControllerApis`, we have callbacks such as the following after completion: {code:java} controller.allocateProducerIds(context, allocatedProducerIdsRequest.data) .handle[Unit] { (results, exception) => if (exception != null) { requestHelper.handleError(request, exception) } else { requestHelper.sendResponseMaybeThrottle(request, requestThrottleMs => { results.setThrottleTimeMs(requestThrottleMs) new AllocateProducerIdsResponse(results) }) } } {code} What I see locally is that the underlying exception that gets passed to `handle` always gets wrapped in a `CompletionException`. When passed to `getErrorResponse`, this error will get converted to `UNKNOWN_SERVER_ERROR`. For example, in this case, a `NOT_CONTROLLER` error returned from the controller would be returned as `UNKNOWN_SERVER_ERROR`. It looks like there are a few APIs that are potentially affected by this bug, such as `DeleteTopics` and `UpdateFeatures`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14154) Persistent URP after controller soft failure
[ https://issues.apache.org/jira/browse/KAFKA-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14154. - Resolution: Fixed > Persistent URP after controller soft failure > > > Key: KAFKA-14154 > URL: https://issues.apache.org/jira/browse/KAFKA-14154 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Blocker > Fix For: 3.3.0 > > > We ran into a scenario where a partition leader was unable to expand the ISR > after a soft controller failover. Here is what happened: > Initial state: leader=1, isr=[1,2], leader epoch=10. Broker 1 is acting as > the current controller. > 1. Broker 1 loses its session in Zookeeper. > 2. Broker 2 becomes the new controller. > 3. During initialization, controller 2 removes 1 from the ISR. So state is > updated: leader=2, isr=[2], leader epoch=11. > 4. Broker 2 receives `LeaderAndIsr` from the new controller with leader > epoch=11. > 5. Broker 2 immediately tries to add replica 1 back to the ISR since it is > still fetching and is caught up. However, the > `BrokerToControllerChannelManager` is still pointed at controller 1, so that > is where the `AlterPartition` is sent. > 6. Controller 1 does not yet realize that it is not the controller, so it > processes the `AlterPartition` request. It sees the leader epoch of 11, which > is higher than what it has in its own context. Following changes to the > `AlterPartition` validation in > [https://github.com/apache/kafka/pull/12032/files,] the controller returns > FENCED_LEADER_EPOCH. > 7. After receiving the FENCED_LEADER_EPOCH from the old controller, the > leader is stuck because it assumes that the error implies that another > LeaderAndIsr request should be sent. > Prior to > [https://github.com/apache/kafka/pull/12032/files|https://github.com/apache/kafka/pull/12032/files,], > the way we handled this case was a little different. We only verified that > the leader epoch in the request was at least as large as the current epoch in > the controller context. Anything higher was accepted. The controller would > have attempted to write the updated state to Zookeeper. This update would > have failed because of the controller epoch check, however, we would have > returned NOT_CONTROLLER in this case, which is handled in > `AlterPartitionManager`. > It is tempting to revert the logic, but the risk is in the idempotency check: > [https://github.com/apache/kafka/pull/12032/files#diff-3e042c962e80577a4cc9bbcccf0950651c6b312097a86164af50003c00c50d37L2369.] > If the AlterPartition request happened to match the state inside the old > controller, the controller would consider the update successful and return no > error. But if its state was already stale at that point, then that might > cause the leader to incorrectly assume that the state had been updated. > One way to fix this problem without weakening the validation is to rely on > the controller epoch in `AlterPartitionManager`. When we discover a new > controller, we also discover its epoch, so we can pass that through. The > `LeaderAndIsr` request already includes the controller epoch of the > controller that sent it and we already propagate this through to > `AlterPartition.submit`. Hence all we need to do is verify that the epoch of > the current controller target is at least as large as the one discovered > through the `LeaderAndIsr`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14166) Consistent toString implementations for byte arrays in generated messages
Jason Gustafson created KAFKA-14166: --- Summary: Consistent toString implementations for byte arrays in generated messages Key: KAFKA-14166 URL: https://issues.apache.org/jira/browse/KAFKA-14166 Project: Kafka Issue Type: Improvement Reporter: Jason Gustafson In the generated `toString()` implementations for message objects (such as protocol RPCs), we are a little inconsistent in how we display types with raw bytes. If the type is `Array[Byte]`, then we use Arrays.toString. If the type is `ByteBuffer` (i.e. when `zeroCopy` is set), then we use the corresponding `ByteBuffer.toString`, which is not often useful. We should try to be consistent. By default, it is probably not useful to print the full array contents, but we might print summary information (e.g. size, checksum). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-13986) DescribeQuorum does not return the observers (brokers) for the Metadata log
[ https://issues.apache.org/jira/browse/KAFKA-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13986. - Fix Version/s: 3.3.0 Resolution: Fixed > DescribeQuorum does not return the observers (brokers) for the Metadata log > --- > > Key: KAFKA-13986 > URL: https://issues.apache.org/jira/browse/KAFKA-13986 > Project: Kafka > Issue Type: Improvement > Components: kraft >Affects Versions: 3.0.0, 3.0.1 >Reporter: Niket Goel >Assignee: Niket Goel >Priority: Major > Fix For: 3.3.0 > > > h2. Background > While working on the [PR|https://github.com/apache/kafka/pull/12206] for > KIP-836, we realized that the `DescribeQuorum` API does not return the > brokers as observers for the metadata log. > As noted by [~dengziming] : > _We set nodeId=-1 if it's a broker so observers.size==0_ > The related code is: > [https://github.com/apache/kafka/blob/4c9eeef5b2dff9a4f0977fbc5ac7eaaf930d0d0e/core/src/main/scala/kafka/raft/RaftManager.scala#L185-L189] > {code:java} > val nodeId = if (config.processRoles.contains(ControllerRole)) > { OptionalInt.of(config.nodeId) } > else > { OptionalInt.empty() } > {code} > h2. ToDo > We should fix this and have the DescribeMetadata API return the brokers as > observers for the metadata log. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14163) Build failing in streams-scala:compileScala due to zinc compiler cache
[ https://issues.apache.org/jira/browse/KAFKA-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14163. - Resolution: Workaround > Build failing in streams-scala:compileScala due to zinc compiler cache > -- > > Key: KAFKA-14163 > URL: https://issues.apache.org/jira/browse/KAFKA-14163 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Major > > I have been seeing builds failing recently with the following error: > {code:java} > [2022-08-11T17:08:22.279Z] * What went wrong: > [2022-08-11T17:08:22.279Z] Execution failed for task > ':streams:streams-scala:compileScala'. > [2022-08-11T17:08:22.279Z] > Timeout waiting to lock zinc-1.6.1_2.13.8_17 > compiler cache (/home/jenkins/.gradle/caches/7.5.1/zinc-1.6.1_2.13.8_17). It > is currently in use by another Gradle instance. > [2022-08-11T17:08:22.279Z] Owner PID: 25008 > [2022-08-11T17:08:22.279Z] Our PID: 25524 > [2022-08-11T17:08:22.279Z] Owner Operation: > [2022-08-11T17:08:22.279Z] Our operation: > [2022-08-11T17:08:22.279Z] Lock file: > /home/jenkins/.gradle/caches/7.5.1/zinc-1.6.1_2.13.8_17/zinc-1.6.1_2.13.8_17.lock > {code} > And another: > {code:java} > [2022-08-10T21:30:41.779Z] * What went wrong: > [2022-08-10T21:30:41.779Z] Execution failed for task > ':streams:streams-scala:compileScala'. > [2022-08-10T21:30:41.779Z] > Timeout waiting to lock zinc-1.6.1_2.13.8_17 > compiler cache (/home/jenkins/.gradle/caches/7.5.1/zinc-1.6.1_2.13.8_17). It > is currently in use by another Gradle instance. > [2022-08-10T21:30:41.779Z] Owner PID: 11022 > [2022-08-10T21:30:41.779Z] Our PID: 11766 > [2022-08-10T21:30:41.779Z] Owner Operation: > [2022-08-10T21:30:41.779Z] Our operation: > [2022-08-10T21:30:41.779Z] Lock file: > /home/jenkins/.gradle/caches/7.5.1/zinc-1.6.1_2.13.8_17/zinc-1.6.1_2.13.8_17.lock > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14163) Build failing in scala:compileScala due to zinc compiler cache
Jason Gustafson created KAFKA-14163: --- Summary: Build failing in scala:compileScala due to zinc compiler cache Key: KAFKA-14163 URL: https://issues.apache.org/jira/browse/KAFKA-14163 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson I have been seeing builds failing recently with the following error: {code:java} [2022-08-10T21:30:41.779Z] * What went wrong: [2022-08-10T21:30:41.779Z] Execution failed for task ':streams:streams-scala:compileScala'. [2022-08-10T21:30:41.779Z] > Timeout waiting to lock zinc-1.6.1_2.13.8_17 compiler cache (/home/jenkins/.gradle/caches/7.5.1/zinc-1.6.1_2.13.8_17). It is currently in use by another Gradle instance. {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14154) Persistent URP after controller soft failure
Jason Gustafson created KAFKA-14154: --- Summary: Persistent URP after controller soft failure Key: KAFKA-14154 URL: https://issues.apache.org/jira/browse/KAFKA-14154 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Assignee: Jason Gustafson We ran into a scenario where a partition leader was unable to expand the ISR after a soft controller failover. Here is what happened: Initial state: leader=1, isr=[1,2], leader epoch=10. Broker 1 is acting as the current controller. 1. Broker 1 loses its session in Zookeeper. 2. Broker 2 becomes the new controller. 3. During initialization, controller 2 removes 1 from the ISR. So state is updated: leader=2, isr=[1, 2], leader epoch=11. 4. Broker 2 receives `LeaderAndIsr` from the new controller with leader epoch=11. 5. Broker 2 immediately tries to add replica 1 back to the ISR since it is still fetching and is caught up. However, the `BrokerToControllerChannelManager` is still pointed at controller 1, so that is where the `AlterPartition` is sent. 6. Controller 1 does not yet realize that it is not the controller, so it processes the `AlterPartition` request. It sees the leader epoch of 11, which is higher than what it has in its own context. Following changes to the `AlterPartition` validation in [https://github.com/apache/kafka/pull/12032/files,] the controller returns FENCED_LEADER_EPOCH. 7. After receiving the FENCED_LEADER_EPOCH from the old controller, the leader is stuck because it assumes that the error implies that another LeaderAndIsr request should be sent. Prior to [https://github.com/apache/kafka/pull/12032/files|https://github.com/apache/kafka/pull/12032/files,], the way we handled this case was a little different. We only verified that the leader epoch in the request was at least as large as the current epoch in the controller context. Anything higher was accepted. The controller would have attempted to write the updated state to Zookeeper. This update would have failed because of the controller epoch check, however, we would have returned NOT_CONTROLLER in this case, which is handled in `AlterPartitionManager`. It is tempting to revert the logic, but the risk is in the idempotency check: [https://github.com/apache/kafka/pull/12032/files#diff-3e042c962e80577a4cc9bbcccf0950651c6b312097a86164af50003c00c50d37L2369.] If the AlterPartition request happened to match the state inside the old controller, the controller would consider the update successful and return no error. But if its state was already stale at that point, then that might cause the leader to incorrectly assume that the state had been updated. One way to fix this problem without weakening the validation is to rely on the controller epoch in `AlterPartitionManager`. When we discover a new controller, we also discover its epoch, so we can pass that through. The `LeaderAndIsr` request already includes the controller epoch of the controller that sent it and we already propagate this through to `AlterPartition.submit`. Hence all we need to do is verify that the epoch of the current controller target is at least as large as the one discovered through the `LeaderAndIsr`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14152) Add logic to fence kraft brokers which have fallen behind in replication
Jason Gustafson created KAFKA-14152: --- Summary: Add logic to fence kraft brokers which have fallen behind in replication Key: KAFKA-14152 URL: https://issues.apache.org/jira/browse/KAFKA-14152 Project: Kafka Issue Type: Improvement Reporter: Jason Gustafson When a kraft broker registers with the controller, it must catch up to the current metadata before it is unfenced. However, once it has been unfenced, it only needs to continue sending heartbeats to remain unfenced. It can fall arbitrarily behind in the replication of the metadata log and remain unfenced. We should consider whether there is an inverse condition that we can use to fence a broker that has fallen behind. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14144) AlterPartition is not idempotent when requests time out
[ https://issues.apache.org/jira/browse/KAFKA-14144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14144. - Resolution: Fixed > AlterPartition is not idempotent when requests time out > --- > > Key: KAFKA-14144 > URL: https://issues.apache.org/jira/browse/KAFKA-14144 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: David Mao >Assignee: David Mao >Priority: Blocker > Fix For: 3.3.0 > > > [https://github.com/apache/kafka/pull/12032] changed the validation order of > AlterPartition requests to fence requests with a stale partition epoch before > we compare the leader and ISR contents. > This results in a loss of idempotency if a leader does not receive an > AlterPartition response because retries will receive an > INVALID_UPDATE_VERSION error. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14104) Perform CRC validation on KRaft Batch Records and Snapshots
[ https://issues.apache.org/jira/browse/KAFKA-14104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14104. - Resolution: Fixed > Perform CRC validation on KRaft Batch Records and Snapshots > --- > > Key: KAFKA-14104 > URL: https://issues.apache.org/jira/browse/KAFKA-14104 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Niket Goel >Assignee: Niket Goel >Priority: Blocker > Fix For: 3.3.0 > > > Today we stamp the BatchRecord header with a CRC [1] and verify that CRC > before the log is written to disk [2]. The CRC checks should also be verified > when the records are read back from disk. The same procedure should be > followed for KRaft snapshots as well. > [1] > [https://github.com/apache/kafka/blob/6b76c01cf895db0651e2cdcc07c2c392f00a8ceb/clients/src/main/java/org/apache/kafka/common/record/DefaultRecordBatch.java#L501=] > > [2] > [https://github.com/apache/kafka/blob/679e9e0cee67e7d3d2ece204a421ea7da31d73e9/core/src/main/scala/kafka/log/UnifiedLog.scala#L1143] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14139) Replaced disk can lead to loss of committed data even with non-empty ISR
Jason Gustafson created KAFKA-14139: --- Summary: Replaced disk can lead to loss of committed data even with non-empty ISR Key: KAFKA-14139 URL: https://issues.apache.org/jira/browse/KAFKA-14139 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson We have been thinking about disk failure cases recently. Suppose that a disk has failed and the user needs to restart the disk from an empty state. The concern is whether this can lead to the unnecessary loss of committed data. For normal topic partitions, removal from the ISR during controlled shutdown buys us some protection. After the replica is restarted, it must prove its state to the leader before it can be added back to the ISR. And it cannot become a leader until it does so. An obvious exception to this is when the replica is the last member in the ISR. In this case, the disk failure itself has compromised the committed data, so some amount of loss must be expected. We have been considering other scenarios in which the loss of one disk can lead to data loss even when there are replicas remaining which have all of the committed entries. One such scenario is this: Suppose we have a partition with two replicas: A and B. Initially A is the leader and it is the only member of the ISR. # Broker B catches up to A, so A attempts to send an AlterPartition request to the controller to add B into the ISR. # Before the AlterPartition request is received, replica B has a hard failure. # The current controller successfully fences broker B. It takes no action on this partition since B is already out of the ISR. # Before the controller receives the AlterPartition request to add B, it also fails. # While the new controller is initializing, suppose that replica B finishes startup, but the disk has been replaced (all of the previous state has been lost). # The new controller sees the registration from broker B first. # Finally, the AlterPartition from A arrives which adds B back into the ISR. (Credit for coming up with this scenario goes to [~junrao] .) I tested this in KRaft and confirmed that this sequence is possible (even if perhaps unlikely). There are a few ways we could have potentially detected the issue. First, perhaps the leader should have bumped the leader epoch on all partitions when B was fenced. Then the inflight AlterPartition would be doomed no matter when it arrived. Alternatively, we could have relied on the broker epoch to distinguish the dead broker's state from that of the restarted broker. This could be done by including the broker epoch in both the `Fetch` request and in `AlterPartition`. Finally, perhaps even normal kafka replication should be using a unique identifier for each disk so that we can reliably detect when it has changed. For example, something like what was proposed for the metadata quorum here: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Voter+Changes.] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14078) Replica fetches to follower should return NOT_LEADER error
[ https://issues.apache.org/jira/browse/KAFKA-14078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14078. - Resolution: Fixed > Replica fetches to follower should return NOT_LEADER error > -- > > Key: KAFKA-14078 > URL: https://issues.apache.org/jira/browse/KAFKA-14078 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > Fix For: 3.3.0 > > > After the fix for KAFKA-13837, if a follower receives a request from another > replica, it will return UNKNOWN_LEADER_EPOCH even if the leader epoch > matches. We need to do epoch leader/epoch validation first before we check > whether we have a valid replica. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14078) Replica fetches to follower should return NOT_LEADER error
Jason Gustafson created KAFKA-14078: --- Summary: Replica fetches to follower should return NOT_LEADER error Key: KAFKA-14078 URL: https://issues.apache.org/jira/browse/KAFKA-14078 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Assignee: Jason Gustafson Fix For: 3.3.0 After the fix for KAFKA-13837, if a follower receives a request from another replica, it will return UNKNOWN_LEADER_EPOCH even if the leader epoch matches. We need to do epoch leader/epoch validation first before we check whether we have a valid replica. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14077) KRaft should support recovery from failed disk
Jason Gustafson created KAFKA-14077: --- Summary: KRaft should support recovery from failed disk Key: KAFKA-14077 URL: https://issues.apache.org/jira/browse/KAFKA-14077 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Fix For: 3.3.0 If one of the nodes in the metadata quorum has a disk failure, there is no way currently to safely bring the node back into the quorum. When we lose disk state, we are at risk of losing committed data even if the failure only affects a minority of the cluster. Here is an example. Suppose that a metadata quorum has 3 members: v1, v2, and v3. Initially, v1 is the leader and writes a record at offset 1. After v2 acknowledges replication of the record, it becomes committed. Suppose that v1 fails before v3 has a chance to replicate this record. As long as v1 remains down, the raft protocol guarantees that only v2 can become leader, so the record cannot be lost. The raft protocol expects that when v1 returns, it will still have that record, but what if there is a disk failure, the state cannot be recovered and v1 participates in leader election? Then we would have committed data on a minority of the voters. The main problem here concerns how we recover from this impaired state without risking the loss of this data. Consider a naive solution which brings v1 back with an empty disk. Since the node has lost is prior knowledge of the state of the quorum, it will vote for any candidate that comes along. If v3 becomes a candidate, then it will vote for itself and it just needs the vote from v1 to become leader. If that happens, then the committed data on v2 will become lost. This is just one scenario. In general, the invariants that the raft protocol is designed to preserve go out the window when disk state is lost. For example, it is also possible to contrive a scenario where the loss of disk state leads to multiple leaders. There is a good reason why raft requires that any vote cast by a voter is written to disk since otherwise the voter may vote for different candidates in the same epoch. Many systems solve this problem with a unique identifier which is generated automatically and stored on disk. This identifier is then committed to the raft log. If a disk changes, we would see a new identifier and we can prevent the node from breaking raft invariants. Then recovery from a failed disk requires a quorum reconfiguration. We need something like this in KRaft to make disk recovery possible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14055) Transaction markers may be lost during cleaning if data keys conflict with marker keys
[ https://issues.apache.org/jira/browse/KAFKA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14055. - Resolution: Fixed > Transaction markers may be lost during cleaning if data keys conflict with > marker keys > -- > > Key: KAFKA-14055 > URL: https://issues.apache.org/jira/browse/KAFKA-14055 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Critical > Fix For: 3.3.0, 3.0.2, 3.1.2, 3.2.1 > > > We have been seeing recently hanging transactions occur on streams changelog > topics quite frequently. After investigation, we found that the keys used in > the changelog topic conflict with the keys used in the transaction markers > (the schema used in control records is 4 bytes, which happens to be the same > for the changelog topics that we investigated). When we build the offset map > prior to cleaning, we do properly exclude the transaction marker keys, but > the bug is the fact that we do not exclude them during the cleaning phase. > This can result in the marker being removed from the cleaned log before the > corresponding data is removed when there is a user record with a conflicting > key at a higher offset. A side effect of this is a so-called "hanging" > transaction, but the bigger problem is that we lose the atomicity of the > transaction. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14050) Older clients cannot deserialize ApiVersions response with finalized feature epoch
[ https://issues.apache.org/jira/browse/KAFKA-14050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14050. - Resolution: Not A Problem I'm going to close this since the incompatible schema change did not affect any released versions. > Older clients cannot deserialize ApiVersions response with finalized feature > epoch > -- > > Key: KAFKA-14050 > URL: https://issues.apache.org/jira/browse/KAFKA-14050 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > Fix For: 3.3.0 > > > When testing kraft locally, we encountered this exception from an older > client: > {code:java} > [ERROR] 2022-07-05 16:45:01,165 [kafka-admin-client-thread | > adminclient-1394] org.apache.kafka.common.utils.KafkaThread > lambda$configureThread$0 - Uncaught exception in thread > 'kafka-admin-client-thread | adminclient-1394': > org.apache.kafka.common.protocol.types.SchemaException: Error reading field > 'api_keys': Error reading array of size 1207959552, only 579 bytes available > at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:118) > at > org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:378) > at > org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:187) > at > org.apache.kafka.clients.NetworkClient$DefaultClientInterceptor.parseResponse(NetworkClient.java:1333) > at > org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:752) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:888) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:577) > at > org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.processRequests(KafkaAdminClient.java:1329) > at > org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1260) > at java.base/java.lang.Thread.run(Thread.java:832) {code} > The cause appears to be from a change to the type of the > `FinalizedFeaturesEpoch` field in the `ApiVersions` response from int32 to > int64: > [https://github.com/apache/kafka/pull/9001/files#diff-32006e8becae918416debdb9ac76bf8a1ad12b83aaaf5f8819b6ecc00c1fb56bR58.] > Fortunately, `FinalizedFeaturesEpoch` is a tagged field, so we can fix this > by creating a new field. We will have to leave the existing tag in the > protocol spec and consider it dead. > Credit for this find goes to [~dajac] . -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14055) Transaction markers may be lost during cleaning if data keys conflict with marker keys
Jason Gustafson created KAFKA-14055: --- Summary: Transaction markers may be lost during cleaning if data keys conflict with marker keys Key: KAFKA-14055 URL: https://issues.apache.org/jira/browse/KAFKA-14055 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Fix For: 3.3.0, 3.0.2, 3.1.2, 3.2.1 We have been seeing recently hanging transactions occur on streams changelog topics quite frequently. After investigation, we found that the keys used in the changelog topic conflict with the keys used in the transaction markers (the schema used in control records is 4 bytes, which happens to be the same for the changelog topics that we investigated). When we build the offset map prior to cleaning, we do properly exclude the transaction marker keys, but the bug is the fact that we do not exclude them during the cleaning phase. This can result in the marker being removed from the cleaned log before the corresponding data is removed when there is a user record with a conflicting key at a higher offset. A side effect of this is a so-called "hanging" transaction, but the bigger problem is that we lose the atomicity of the transaction. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14032) Dequeue time for forwarded requests is ignored to set
[ https://issues.apache.org/jira/browse/KAFKA-14032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14032. - Fix Version/s: 3.3.0 Resolution: Fixed > Dequeue time for forwarded requests is ignored to set > - > > Key: KAFKA-14032 > URL: https://issues.apache.org/jira/browse/KAFKA-14032 > Project: Kafka > Issue Type: Bug >Reporter: Feiyan Yu >Priority: Minor > Fix For: 3.3.0 > > > It seems like `requestDequeueTimeNanos` is ignored to set. > As a property of a `Request object`, `requestDequeueTimeNanos` is set only > when handlers manage to poll and handle this request from `requestQueue`, > however, handlers only poll the request from envelop request once, but calls > handle method twice, which lead to an ignorance of `requestDequeueTimeNanos` > for parsed forwarded requests. > The parsed envelop requests have `requestDequeueTimeNanos` = -1, and it > affect the correctness of statistics and metrics of `LocalTimeMs`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14050) Older clients cannot deserialize ApiVersions response with finalized feature epoch
Jason Gustafson created KAFKA-14050: --- Summary: Older clients cannot deserialize ApiVersions response with finalized feature epoch Key: KAFKA-14050 URL: https://issues.apache.org/jira/browse/KAFKA-14050 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Fix For: 3.3.0 When testing kraft locally, we encountered this exception from an older client: {code:java} [ERROR] 2022-07-05 16:45:01,165 [kafka-admin-client-thread | adminclient-1394] org.apache.kafka.common.utils.KafkaThread lambda$configureThread$0 - Uncaught exception in thread 'kafka-admin-client-thread | admi nclient-1394': org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'api_keys': Error reading array of size 1207959552, only 579 bytes available at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:118) at org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:378) at org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:187) at org.apache.kafka.clients.NetworkClient$DefaultClientInterceptor.parseResponse(NetworkClient.java:1333) at org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:752) at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:888) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:577) at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.processRequests(KafkaAdminClient.java:1329) at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1260) at java.base/java.lang.Thread.run(Thread.java:832) {code} The cause appears to be from a change to the type of the `FinalizedFeaturesEpoch` field in the `ApiVersions` response from int32 to int64: [https://github.com/confluentinc/ce-kafka/commit/fb4f297207ef62f71e4a6d2d0dac75752933043d#diff-32006e8becae918416debdb9ac76bf8a1ad12b83aaaf5f8819b6ecc00c1fb56bL58.] Fortunately, `FinalizedFeaturesEpoch` is a tagged field, so we can fix this by creating a new field. We will have to leave the existing tag in the protocol spec and consider it dead. Credit for this find goes to [~dajac] . -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-13943) Fix flaky test QuorumControllerTest.testMissingInMemorySnapshot()
[ https://issues.apache.org/jira/browse/KAFKA-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13943. - Resolution: Fixed > Fix flaky test QuorumControllerTest.testMissingInMemorySnapshot() > - > > Key: KAFKA-13943 > URL: https://issues.apache.org/jira/browse/KAFKA-13943 > Project: Kafka > Issue Type: Bug > Components: unit tests >Reporter: Divij Vaidya >Assignee: Divij Vaidya >Priority: Major > Labels: flaky-test > Fix For: 3.3.0, 3.3 > > > Test failed at > [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-12197/3/tests] > > {noformat} > [2022-05-27 09:34:42,382] INFO [Controller 0] Creating new QuorumController > with clusterId wj9LhgPJTV-KYEItgqvtQA, authorizer Optional.empty. > (org.apache.kafka.controller.QuorumController:1484) > [2022-05-27 09:34:42,393] DEBUG [LocalLogManager 0] Node 0: running log > check. (org.apache.kafka.metalog.LocalLogManager:479) > [2022-05-27 09:34:42,394] DEBUG [LocalLogManager 0] initialized local log > manager for node 0 (org.apache.kafka.metalog.LocalLogManager:622) > [2022-05-27 09:34:42,396] INFO [LocalLogManager 0] Node 0: registered > MetaLogListener 1774961169 (org.apache.kafka.metalog.LocalLogManager:640) > [2022-05-27 09:34:42,397] DEBUG [LocalLogManager 0] Node 0: running log > check. (org.apache.kafka.metalog.LocalLogManager:479) > [2022-05-27 09:34:42,397] DEBUG [LocalLogManager 0] Node 0: Executing > handleLeaderChange LeaderAndEpoch(leaderId=OptionalInt[0], epoch=1) > (org.apache.kafka.metalog.LocalLogManager:520) > [2022-05-27 09:34:42,398] DEBUG [Controller 0] Executing > handleLeaderChange[1]. (org.apache.kafka.controller.QuorumController:438) > [2022-05-27 09:34:42,398] INFO [Controller 0] Becoming the active controller > at epoch 1, committed offset -1, committed epoch -1, and metadata.version 5 > (org.apache.kafka.controller.QuorumController:950) > [2022-05-27 09:34:42,398] DEBUG [Controller 0] Creating snapshot -1 > (org.apache.kafka.timeline.SnapshotRegistry:197) > [2022-05-27 09:34:42,399] DEBUG [Controller 0] Processed > handleLeaderChange[1] in 951 us > (org.apache.kafka.controller.QuorumController:385) > [2022-05-27 09:34:42,399] INFO [Controller 0] Initializing metadata.version > to 5 (org.apache.kafka.controller.QuorumController:926) > [2022-05-27 09:34:42,399] INFO [Controller 0] Setting metadata.version to 5 > (org.apache.kafka.controller.FeatureControlManager:273) > [2022-05-27 09:34:42,400] DEBUG [Controller 0] Creating snapshot > 9223372036854775807 (org.apache.kafka.timeline.SnapshotRegistry:197) > [2022-05-27 09:34:42,400] DEBUG [Controller 0] Read-write operation > bootstrapMetadata(1863535402) will be completed when the log reaches offset > 9223372036854775807. (org.apache.kafka.controller.QuorumController:725) > [2022-05-27 09:34:42,402] DEBUG append(batch=LocalRecordBatch(leaderEpoch=1, > appendTimestamp=10, > records=[ApiMessageAndVersion(RegisterBrokerRecord(brokerId=0, > incarnationId=kxAT73dKQsitIedpiPtwBw, brokerEpoch=-9223372036854775808, > endPoints=[BrokerEndpoint(name='PLAINTEXT', host='localhost', port=9092, > securityProtocol=0)], features=[], rack=null, fenced=true) at version 0)]), > prevOffset=1) (org.apache.kafka.metalog.LocalLogManager$SharedLogData:247) > [2022-05-27 09:34:42,402] INFO [Controller 0] Registered new broker: > RegisterBrokerRecord(brokerId=0, incarnationId=kxAT73dKQsitIedpiPtwBw, > brokerEpoch=-9223372036854775808, endPoints=[BrokerEndpoint(name='PLAINTEXT', > host='localhost', port=9092, securityProtocol=0)], features=[], rack=null, > fenced=true) (org.apache.kafka.controller.ClusterControlManager:368) > [2022-05-27 09:34:42,403] WARN [Controller 0] registerBroker: failed with > unknown server exception RuntimeException at epoch 1 in 2449 us. Reverting > to last committed offset -1. > (org.apache.kafka.controller.QuorumController:410)java.lang.RuntimeException: > Can't create a new snapshot at epoch 1 because there is already a snapshot > with epoch 9223372036854775807at > org.apache.kafka.timeline.SnapshotRegistry.getOrCreateSnapshot(SnapshotRegistry.java:190) > at > org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:723) > at > org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173) > at java.base/java.lang.Thread.run(Thread.java:833){noformat} > {noformat} > Full stack trace > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.UnknownServe
[jira] [Resolved] (KAFKA-14035) QuorumController handleRenounce throws NPE
[ https://issues.apache.org/jira/browse/KAFKA-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14035. - Fix Version/s: 3.1.2 3.2.1 Resolution: Fixed > QuorumController handleRenounce throws NPE > -- > > Key: KAFKA-14035 > URL: https://issues.apache.org/jira/browse/KAFKA-14035 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Niket Goel >Assignee: Niket Goel >Priority: Major > Fix For: 3.3.0, 3.1.2, 3.2.1 > > > Sometimes when the controller is rolled you can encounter the following > exception, after which the controller in-memory state seems to become > inconsistent with the Metadata Log. > > [Controller 1] handleRenounce[23]: failed with unknown server exception > NullPointerException at epoch -1 in us. Reverting to last committed > offset . -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14036) Kraft controller local time not computed correctly.
[ https://issues.apache.org/jira/browse/KAFKA-14036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-14036. - Fix Version/s: 3.3.0 Resolution: Fixed > Kraft controller local time not computed correctly. > --- > > Key: KAFKA-14036 > URL: https://issues.apache.org/jira/browse/KAFKA-14036 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Major > Fix For: 3.3.0 > > > In `ControllerApis`, we are missing the logic to set the local processing end > time after `handle` returns. As a consequence of this, the remote time ends > up reported as the local time in the request level metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14036) Kraft controller local time not computed correctly.
Jason Gustafson created KAFKA-14036: --- Summary: Kraft controller local time not computed correctly. Key: KAFKA-14036 URL: https://issues.apache.org/jira/browse/KAFKA-14036 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson In `ControllerApis`, we are missing the logic to set the local processing end time after `handle` returns. As a consequence of this, the remote time ends up reported as the local time in the request level metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (KAFKA-13888) KIP-836: Addition of Information in DescribeQuorumResponse about Voter Lag
[ https://issues.apache.org/jira/browse/KAFKA-13888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson reopened KAFKA-13888: - > KIP-836: Addition of Information in DescribeQuorumResponse about Voter Lag > -- > > Key: KAFKA-13888 > URL: https://issues.apache.org/jira/browse/KAFKA-13888 > Project: Kafka > Issue Type: Improvement > Components: kraft >Reporter: Niket Goel >Assignee: lqjacklee >Priority: Major > Fix For: 3.3.0 > > > Tracking issue for the implementation of KIP:836 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (KAFKA-13888) KIP-836: Addition of Information in DescribeQuorumResponse about Voter Lag
[ https://issues.apache.org/jira/browse/KAFKA-13888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13888. - Fix Version/s: 3.3.0 Resolution: Fixed > KIP-836: Addition of Information in DescribeQuorumResponse about Voter Lag > -- > > Key: KAFKA-13888 > URL: https://issues.apache.org/jira/browse/KAFKA-13888 > Project: Kafka > Issue Type: Improvement > Components: kraft >Reporter: Niket Goel >Assignee: lqjacklee >Priority: Major > Fix For: 3.3.0 > > > Tracking issue for the implementation of KIP:836 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (KAFKA-13967) Guarantees for producer callbacks on transaction commit should be documented
[ https://issues.apache.org/jira/browse/KAFKA-13967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13967. - Resolution: Fixed > Guarantees for producer callbacks on transaction commit should be documented > > > Key: KAFKA-13967 > URL: https://issues.apache.org/jira/browse/KAFKA-13967 > Project: Kafka > Issue Type: Improvement > Components: clients >Reporter: Chris Egerton >Assignee: Chris Egerton >Priority: Major > > As discussed in > https://github.com/apache/kafka/pull/11780#discussion_r891835221, part of the > contract for a transactional producer is that all callbacks given to the > producer will have been invoked and completed (either successfully or by > throwing an exception) by the time that {{KafkaProducer::commitTransaction}} > returns. This should be documented so that users of the clients library can > have a guarantee that they're not on the hook to do that kind of bookkeeping > themselves. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13972) Reassignment cancellation causes stray replicas
Jason Gustafson created KAFKA-13972: --- Summary: Reassignment cancellation causes stray replicas Key: KAFKA-13972 URL: https://issues.apache.org/jira/browse/KAFKA-13972 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson A stray replica is one that is left behind on a broker after the partition has been reassigned to other brokers or the partition has been deleted. We found one case where this can happen is after a cancelled reassignment. When a reassignment is cancelled, the controller sends `StopReplica` requests to any of the adding replicas, but it does not necessarily bump the leader epoch. Following [KIP-570|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-570%3A+Add+leader+epoch+in+StopReplicaRequest],] brokers will ignore `StopReplica` requests if the leader epoch matches the current partition leader epoch. So we need to bump the epoch whenever we need to ensure that `StopReplica` will be received. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13966) Flaky test `QuorumControllerTest.testUnregisterBroker`
Jason Gustafson created KAFKA-13966: --- Summary: Flaky test `QuorumControllerTest.testUnregisterBroker` Key: KAFKA-13966 URL: https://issues.apache.org/jira/browse/KAFKA-13966 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson We have seen the following assertion failure in `QuorumControllerTest.testUnregisterBroker`: ``` org.opentest4j.AssertionFailedError: expected: <2> but was: <0> at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55) at org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62) at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166) at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161) at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:628) at org.apache.kafka.controller.QuorumControllerTest.testUnregisterBroker(QuorumControllerTest.java:494) ``` I reproduced it by running the test in a loop. It looks like what happens is that the BrokerRegistration request is able to get interleaved between the leader change event and the write of the bootstrap metadata. Something like this: # handleLeaderChange() start # appendWriteEvent(registerBroker) # appendWriteEvent(bootstrapMetadata) # handleLeaderChange() finish # registerBroker() -> writes broker registration to log # bootstrapMetadata() -> writes bootstrap metadata to log -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (KAFKA-13942) LogOffsetTest occasionally hangs during Jenkins build
[ https://issues.apache.org/jira/browse/KAFKA-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13942. - Resolution: Fixed > LogOffsetTest occasionally hangs during Jenkins build > - > > Key: KAFKA-13942 > URL: https://issues.apache.org/jira/browse/KAFKA-13942 > Project: Kafka > Issue Type: Bug > Components: unit tests >Reporter: David Arthur >Priority: Minor > > [~hachikuji] parsed the log output of one of the recent stalled Jenkins > builds and singled out LogOffsetTest as a likely culprit for not completing. > I looked closely at the following build which appeared to be stuck and found > this test case had STARTED but not PASSED or FAILED. > 15:19:58 LogOffsetTest > > testFetchOffsetByTimestampForMaxTimestampWithUnorderedTimestamps(String) > > kafka.server.LogOffsetTest.testFetchOffsetByTimestampForMaxTimestampWithUnorderedTimestamps(String)[2] > STARTED -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (KAFKA-13592) Fix flaky test ControllerIntegrationTest.testTopicIdUpgradeAfterReassigningPartitions
[ https://issues.apache.org/jira/browse/KAFKA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13592. - Resolution: Fixed > Fix flaky test > ControllerIntegrationTest.testTopicIdUpgradeAfterReassigningPartitions > - > > Key: KAFKA-13592 > URL: https://issues.apache.org/jira/browse/KAFKA-13592 > Project: Kafka > Issue Type: Bug >Reporter: David Jacot >Assignee: Kvicii.Yu >Priority: Minor > > {noformat} > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:529) > at scala.None$.get(Option.scala:527) > at > kafka.controller.ControllerIntegrationTest.testTopicIdUpgradeAfterReassigningPartitions(ControllerIntegrationTest.scala:1239){noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (KAFKA-13946) setMetadataDirectory() method in builder for ControllerNode has no parameters
[ https://issues.apache.org/jira/browse/KAFKA-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13946. - Resolution: Fixed > setMetadataDirectory() method in builder for ControllerNode has no parameters > - > > Key: KAFKA-13946 > URL: https://issues.apache.org/jira/browse/KAFKA-13946 > Project: Kafka > Issue Type: Bug >Reporter: Clara Fang >Priority: Minor > > In core/src/test/java/kafka/testkit/ControllerNode.java, the method > setMetadataDirectory for the builder has no parameters and is assigning the > variable metadataDirectory to itself. > {code:java} > public Builder setMetadataDirectory() { > this.metadataDirectory = metadataDirectory; > return this; > } > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (KAFKA-13941) Re-enable ARM builds following INFRA-23305
[ https://issues.apache.org/jira/browse/KAFKA-13941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13941. - Resolution: Fixed > Re-enable ARM builds following INFRA-23305 > -- > > Key: KAFKA-13941 > URL: https://issues.apache.org/jira/browse/KAFKA-13941 > Project: Kafka > Issue Type: Task > Components: build >Reporter: David Arthur >Priority: Major > > Once https://issues.apache.org/jira/browse/INFRA-23305 is resolved, we should > re-enable ARM builds in the Jenkinsfile. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13944) Shutting down broker can be elected as partition leader in KRaft
Jason Gustafson created KAFKA-13944: --- Summary: Shutting down broker can be elected as partition leader in KRaft Key: KAFKA-13944 URL: https://issues.apache.org/jira/browse/KAFKA-13944 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson When a broker requests shutdown, it transitions to the CONTROLLED_SHUTDOWN state in the controller. It is possible for the broker to remain unfenced in this state until the controlled shutdown completes. When doing an election, the only thing we generally check is that the broker is unfenced, so this means we can elect a broker that is in controlled shutdown. Here are a few snippets from a recent system test in which this occurred: {code:java} // broker 2 starts controlled shutdown [2022-05-26 21:17:26,451] INFO [Controller 3001] Unfenced broker 2 has requested and been granted a controlled shutdown. (org.apache.kafka.controller.BrokerHeartbeatManager) // there is only one replica, so we set leader to -1 [2022-05-26 21:17:26,452] DEBUG [Controller 3001] partition change for _foo-1 with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: 2 -> -1, leaderEpoch: 0 -> 1, partitionEpoch: 0 -> 1 (org.apache.kafka.controller.ReplicationControlManager) // controlled shutdown cannot complete immediately [2022-05-26 21:17:26,529] DEBUG [Controller 3001] The request from broker 2 to shut down can not yet be granted because the lowest active offset 177 is not greater than the broker's shutdown offset 244. (org.apache.kafka.controller.BrokerHeartbeatManager) [2022-05-26 21:17:26,530] DEBUG [Controller 3001] Updated the controlled shutdown offset for broker 2 to 244. (org.apache.kafka.controller.BrokerHeartbeatManager) // later on we elect leader 2 again [2022-05-26 21:17:27,703] DEBUG [Controller 3001] partition change for _foo-1 with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: -1 -> 2, leaderEpoch: 1 -> 2, partitionEpoch: 1 -> 2 (org.apache.kafka.controller.ReplicationControlManager) // now controlled shutdown is stuck because of the newly elected leader [2022-05-26 21:17:28,531] DEBUG [Controller 3001] Broker 2 is in controlled shutdown state, but can not shut down because more leaders still need to be moved. (org.apache.kafka.controller.BrokerHeartbeatManager) {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (KAFKA-13858) Kraft should not shutdown metadata listener until controller shutdown is finished
[ https://issues.apache.org/jira/browse/KAFKA-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13858. - Fix Version/s: 3.3 Resolution: Fixed > Kraft should not shutdown metadata listener until controller shutdown is > finished > - > > Key: KAFKA-13858 > URL: https://issues.apache.org/jira/browse/KAFKA-13858 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: David Jacot >Priority: Major > Labels: kip-500 > Fix For: 3.3 > > > When the kraft broker begins controlled shutdown, it immediately disables the > metadata listener. This means that metadata changes as part of the controlled > shutdown do not get sent to the respective components. For partitions that > the broker is follower of, that is what we want. It prevents the follower > from being able to rejoin the ISR while still shutting down. But for > partitions that the broker is leading, it means the leader will remain active > until controlled shutdown finishes and the socket server is stopped. That > delay can be as much as 5 seconds and probably even worse. > In the zk world, we have an explicit request `StopReplica` which serves the > purpose of shutting down both follower and leader, but we don't have > something similar in kraft. For KRaft, we may not necessarily need an > explicit signal like this. We know that the broker is shutting down, so we > can treat partition changes as implicit `StopReplica` requests rather than > going through the normal `LeaderAndIsr` flow. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13940) DescribeQuorum returns INVALID_REQUEST if not handled by leader
Jason Gustafson created KAFKA-13940: --- Summary: DescribeQuorum returns INVALID_REQUEST if not handled by leader Key: KAFKA-13940 URL: https://issues.apache.org/jira/browse/KAFKA-13940 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson In `KafkaRaftClient.handleDescribeQuorum`, we currently return INVALID_REQUEST if the node is not the current raft leader. This is surprising and doesn't work with our general approach for retrying forwarded APIs. In `BrokerToControllerChannelManager`, we only retry after `NOT_CONTROLLER` errors. It would be more consistent with the other Raft APIs if we returned NOT_LEADER_OR_FOLLOWER, but that also means we need additional logic in `BrokerToControllerChannelManager` to handle that error and retry correctly. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (KAFKA-13923) ZooKeeperAuthorizerTest should use standard authorizer for kraft
[ https://issues.apache.org/jira/browse/KAFKA-13923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13923. - Resolution: Fixed > ZooKeeperAuthorizerTest should use standard authorizer for kraft > > > Key: KAFKA-13923 > URL: https://issues.apache.org/jira/browse/KAFKA-13923 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > > Our system test `ZooKeeperAuthorizerTest` relies on the zk-based > `AclAuthorizer` even when running KRaft. We should update this test to use > `StandardAuthorizer` (and probably change the name while we're at it). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (KAFKA-13889) Fix AclsDelta to handle ACCESS_CONTROL_ENTRY_RECORD quickly followed by REMOVE_ACCESS_CONTROL_ENTRY_RECORD for same ACL
[ https://issues.apache.org/jira/browse/KAFKA-13889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13889. - Fix Version/s: 3.3.0 Resolution: Fixed > Fix AclsDelta to handle ACCESS_CONTROL_ENTRY_RECORD quickly followed by > REMOVE_ACCESS_CONTROL_ENTRY_RECORD for same ACL > --- > > Key: KAFKA-13889 > URL: https://issues.apache.org/jira/browse/KAFKA-13889 > Project: Kafka > Issue Type: Bug >Reporter: Andrew Grant >Priority: Major > Fix For: 3.3.0 > > > In > [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/image/AclsDelta.java#L64] > we store the pending deletion in the changes map. This could override a > creation that might have just happened. This is an issue because in > BrokerMetadataPublisher this results in us making a removeAcl call which > finally results in > [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/metadata/authorizer/StandardAuthorizerData.java#L203] > being executed and this code throws an exception if the ACL isnt in the Map > yet. If the ACCESS_CONTROL_ENTRY_RECORD event never got processed by > BrokerMetadataPublisher then the ACL wont be in the Map yet. > My feeling is we might want to make removeAcl idempotent in that it returns > success if the ACL doesn't exist: no matter how many times removeAcl is > called it returns success if the ACL is deleted. Maybe we’d just log a > warning or something? > Note, I dont think the AclControlManager has this issue because it doesn't > batch the events like AclsDelta does. However, we still do throw a > RuntimeException here > [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/AclControlManager.java#L197] > - maybe we should still follow the same logic (if we make the fix suggested > above) and just log a warning if the ACL doesnt exist in the Map? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13923) ZooKeeperAuthorizerTest should use standard authorizer for kraft
Jason Gustafson created KAFKA-13923: --- Summary: ZooKeeperAuthorizerTest should use standard authorizer for kraft Key: KAFKA-13923 URL: https://issues.apache.org/jira/browse/KAFKA-13923 Project: Kafka Issue Type: Improvement Reporter: Jason Gustafson Assignee: Jason Gustafson Our system test `ZooKeeperAuthorizerTest` relies on the zk-based `AclAuthorizer` even when running KRaft. We should update this test to use `StandardAuthorizer` (and probably change the name while we're at it). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (KAFKA-13863) Prevent null config value when create topic in KRaft mode
[ https://issues.apache.org/jira/browse/KAFKA-13863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13863. - Fix Version/s: 3.3.0 Resolution: Fixed > Prevent null config value when create topic in KRaft mode > - > > Key: KAFKA-13863 > URL: https://issues.apache.org/jira/browse/KAFKA-13863 > Project: Kafka > Issue Type: Bug >Reporter: dengziming >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (KAFKA-13837) Return error for Fetch requests from unrecognized followers
[ https://issues.apache.org/jira/browse/KAFKA-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13837. - Fix Version/s: 3.3.0 Resolution: Fixed > Return error for Fetch requests from unrecognized followers > --- > > Key: KAFKA-13837 > URL: https://issues.apache.org/jira/browse/KAFKA-13837 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > Fix For: 3.3.0 > > > If the leader of a partition receives a request from a replica which is not > in the current replica set, we currently return an empty fetch response with > no error. I think the rationale for this is that the leader may not have > received the latest `LeaderAndIsr` update which adds the replica, so we just > want the follower to retry. The problem with this is that if the > `LeaderAndIsr` request never arrives, then there might not be an external > indication of a problem. This probably was the only reasonable option before > we added the leader epoch to the `Fetch` request API. Now that we have it, it > would be clearer to return an `UNKNOWN_LEADER_EPOCH` error to indicate that > the (replicaId, leaderEpoch) tuple is not recognized. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13914) Implement kafka-metadata-quorum.sh
Jason Gustafson created KAFKA-13914: --- Summary: Implement kafka-metadata-quorum.sh Key: KAFKA-13914 URL: https://issues.apache.org/jira/browse/KAFKA-13914 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson KIP-595 documents a tool for describing quorum status `kafka-metadata-quorum.sh`: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum#KIP595:ARaftProtocolfortheMetadataQuorum-ToolingSupport.] We need to implement this. Note that this depends on the Admin API for `DescribeQuorum`, which is proposed in KIP-836: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-836%3A+Addition+of+Information+in+DescribeQuorumResponse+about+Voter+Lag.] -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (KAFKA-13899) Inconsistent error codes returned from AlterConfig APIs
[ https://issues.apache.org/jira/browse/KAFKA-13899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13899. - Fix Version/s: 3.2.1 Resolution: Fixed > Inconsistent error codes returned from AlterConfig APIs > --- > > Key: KAFKA-13899 > URL: https://issues.apache.org/jira/browse/KAFKA-13899 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > Fix For: 3.2.1 > > > In the AlterConfigs/IncrementalAlterConfigs zk handler, we return > INVALID_REQUEST and INVALID_CONFIG inconsistently. The problem is in > `LogConfig.validate`. We may either return `ConfigException` or > `InvalidConfigException`. When the first of these is thrown, we catch it and > convert to INVALID_REQUEST. It seems more consistent to convert to > INVALID_CONFIG. > Note that the kraft implementation returns INVALID_CONFIG consistently. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13899) Inconsistent error codes returned from AlterConfig APIs
Jason Gustafson created KAFKA-13899: --- Summary: Inconsistent error codes returned from AlterConfig APIs Key: KAFKA-13899 URL: https://issues.apache.org/jira/browse/KAFKA-13899 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Assignee: Jason Gustafson In the AlterConfigs/IncrementalAlterConfigs zk handler, we return INVALID_REQUEST and INVALID_CONFIG inconsistently. The problem is in `LogConfig.validate`. We may either return `ConfigException` or `InvalidConfigException`. When the first of these is thrown, we catch it and convert to INVALID_REQUEST. It seems more consistent to convert to INVALID_CONFIG. Note that the kraft implementation returns INVALID_CONFIG consistently. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (KAFKA-13862) Add And Subtract multiple config values is not supported in KRaft mode
[ https://issues.apache.org/jira/browse/KAFKA-13862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13862. - Fix Version/s: 3.3.0 Resolution: Fixed > Add And Subtract multiple config values is not supported in KRaft mode > -- > > Key: KAFKA-13862 > URL: https://issues.apache.org/jira/browse/KAFKA-13862 > Project: Kafka > Issue Type: Bug >Reporter: dengziming >Assignee: dengziming >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13869) Update quota callback metadata in KRaft
Jason Gustafson created KAFKA-13869: --- Summary: Update quota callback metadata in KRaft Key: KAFKA-13869 URL: https://issues.apache.org/jira/browse/KAFKA-13869 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Assignee: Jason Gustafson The `ClientQuotaCallback` interface implements a method `updateClusterMetadata`, which allows the callback to take partition assignments into account when assigning quotas. For zk , this method is called after receiving `UpdateMetadata` requests from the controller. We do not yet have this implemented in KRaft for updates from the metadata log. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13858) Kraft should not shutdown metadata listener until controller shutdown is finished
Jason Gustafson created KAFKA-13858: --- Summary: Kraft should not shutdown metadata listener until controller shutdown is finished Key: KAFKA-13858 URL: https://issues.apache.org/jira/browse/KAFKA-13858 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Assignee: Jason Gustafson When the kraft broker begins controlled shutdown, it immediately disables the metadata listener. This means that metadata changes as part of the controlled shutdown do not get sent to the respective components. For partitions that the broker is follower of, that is what we want. It prevents the follower from being able to rejoin the ISR while still shutting down. But for partitions that the broker is leading, it means the leader will remain active until controlled shutdown is complete. In the zk world, we have an explicit request `StopReplica` which serves the purpose of shutting down both follower and leader, but we don't have something similar in kraft. For KRaft, we may not necessarily need an explicit signal like this. We know that the broker is shutting down, so we can treat partition changes as implicit `StopReplica` requests rather than going through the normal `LeaderAndIsr` flow. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KAFKA-13837) Return error for Fetch requests from unrecognized followers
Jason Gustafson created KAFKA-13837: --- Summary: Return error for Fetch requests from unrecognized followers Key: KAFKA-13837 URL: https://issues.apache.org/jira/browse/KAFKA-13837 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Assignee: Jason Gustafson If the leader of a partition receives a request from a replica which is not in the current replica set, we currently return an empty fetch response with no error. I think the rationale for this is that the leader may not have received the latest `LeaderAndIsr` update which adds the replica, so we just want the follower to retry. The problem with this is that if the `LeaderAndIsr` request never arrives, then there might not be an external indication of a problem. This probably was the only reasonable option before we added the leader epoch to the `Fetch` request API. Now that we have it, it would be clearer to return an `UNKNOWN_LEADER_EPOCH` error to indicate that the (replicaId, leaderEpoch) tuple is not recognized. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (KAFKA-13782) Producer may fail to add the correct partition to transaction
[ https://issues.apache.org/jira/browse/KAFKA-13782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13782. - Resolution: Fixed > Producer may fail to add the correct partition to transaction > - > > Key: KAFKA-13782 > URL: https://issues.apache.org/jira/browse/KAFKA-13782 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Blocker > Fix For: 3.2.0, 3.1.1 > > > In KAFKA-13412, we changed the logic to add partitions to transactions in the > producer. The intention was to ensure that the partition is added in > `TransactionManager` before the record is appended to the > `RecordAccumulator`. However, this does not take into account the possibility > that the originally selected partition may be changed if `abortForNewBatch` > is set in `RecordAppendResult` in the call to `RecordAccumulator.append`. > When this happens, the partitioner can choose a different partition, which > means that the `TransactionManager` would be tracking the wrong partition. > I think the consequence of this is that the batches sent to this partition > would get stuck in the `RecordAccumulator` until they timed out because we > validate before sending that the partition has been added correctly to the > transaction. > Note that KAFKA-13412 has not been included in any release, so there are no > affected versions. > Thanks to [~alivshits] for identifying the bug. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (KAFKA-13794) Producer batch lost silently in TransactionManager
[ https://issues.apache.org/jira/browse/KAFKA-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13794. - Fix Version/s: 3.1.1 3.0.2 Resolution: Fixed > Producer batch lost silently in TransactionManager > -- > > Key: KAFKA-13794 > URL: https://issues.apache.org/jira/browse/KAFKA-13794 > Project: Kafka > Issue Type: Bug >Reporter: xuexiaoyue >Priority: Major > Fix For: 3.1.1, 3.0.2 > > > Under the case of idempotence is enabled, when a batch reaches its > request.timeout.ms but not yet reaches delivery.timeout.ms, it will be > retried and wait for another request.timeout.ms. During the time of this > interval, the delivery.timeout.ms may be reached and Sender will remove this > in flight batch and bump the producer epoch because of the unresolved > sequence, then the sequence of this partition will be reset to 0. > At this time, if a new batch is sent to the same partition and the former > batch reaches request.timeout.ms again, we will see an exception being thrown > out by NetworkClient: > {code:java} > [ERROR] [kafka-producer-network-thread | producer-1] > org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] > Uncaught error in request completion: > java.lang.IllegalStateException: We are re-enqueueing a batch which is not > tracked as part of the in flight requests. batch.topicPartition: > txn_test_1648891362900-2; batch.baseSequence: 0 > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.insertInSequenceOrder(RecordAccumulator.java:388) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.reenqueue(RecordAccumulator.java:334) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at > org.apache.kafka.clients.producer.internals.Sender.reenqueueBatch(Sender.java:668) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at > org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:622) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at > org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:548) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at > org.apache.kafka.clients.producer.internals.Sender.lambda$sendProduceRequest$5(Sender.java:836) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at > org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at > org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:583) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:575) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at > org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_102] {code} > The cause of this is the inflightBatchesBySequence in TransactionManager is > not being remove correctly. One batch may be removed by another batch with > the same sequence number. > The potential consequence of this I can think out is that the send progress > will be blocked until the latter batch being expired by delivery.timeout.ms > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13790) ReplicaManager should be robust to all partition updates from kraft metadata log
Jason Gustafson created KAFKA-13790: --- Summary: ReplicaManager should be robust to all partition updates from kraft metadata log Key: KAFKA-13790 URL: https://issues.apache.org/jira/browse/KAFKA-13790 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Assignee: Jason Gustafson There are two ways that partition state can be updated in the zk world: one is through `LeaderAndIsr` requests and one is through `AlterPartition` responses. All changes made to partition state result in new LeaderAndIsr requests, but replicas will ignore them if the leader epoch is less than or equal to the current known leader epoch. Basically it works like this: * Changes made by the leader are done through AlterPartition requests. These changes bump the partition epoch (or zk version), but leave the leader epoch unchanged. LeaderAndIsr requests are sent by the controller, but replicas ignore them. Partition state is instead only updated when the AlterIsr response is received. * Changes made by the controller are made directly by the controller and always result in a leader epoch bump. These changes are sent to replicas through LeaderAndIsr requests and are applied by replicas. The code in `kafka.server.ReplicaManager` and `kafka.cluster.Partition` are built on top of these assumptions. The logic in `makeLeader`, for example, assumes that the leader epoch has indeed been bumped. Specifically, follower state gets reset and a new entry is written to the leader epoch cache. In KRaft, we also have two paths to update partition state. One is AlterPartition, just like in the zk world. The second is updates received from the metadata log. These follow the same path as LeaderAndIsr requests for the most part, but a big difference is that all changes are sent down to `kafka.cluster.Partition`, even those which do not have a bumped leader epoch. This breaks the assumptions mentioned above in `makeLeader`, which could result in leader epoch cache inconsistency. Another side effect of this on the follower side is that replica fetchers for updated partitions get unnecessarily restarted. There may be others as well. We need to either replicate the same logic on the zookeeper side or make the logic robust to all updates including those without a leader epoch bump. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13789) Fragile tag ordering in metric mbean names
Jason Gustafson created KAFKA-13789: --- Summary: Fragile tag ordering in metric mbean names Key: KAFKA-13789 URL: https://issues.apache.org/jira/browse/KAFKA-13789 Project: Kafka Issue Type: Improvement Reporter: Jason Gustafson We noticed that mbean name creation logic is a bit fragile in `KafkaMetricsGroup`, which many server components rely on. We rely on the ordering of tags in the underlying map collection. Any change to the map implementation could result in a different tag ordering, which would result in a different mbean name. In [https://github.com/apache/kafka/pull/11970,] we reimplemented the metric naming function to rely on LinkedHashMap so that the ordering is explicit. We should try to upgrade current metrics to rely on the new method, which probably will involve ensuring that we have good test coverage for registered metrics. At a minimum, we should ensure new metrics use the explicit ordering. Perhaps we can consider deprecating the old methods or creating a new `SafeKafkaMetricsGroup` implementation. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13782) Producer may fail to add the correct partition to transaction
Jason Gustafson created KAFKA-13782: --- Summary: Producer may fail to add the correct partition to transaction Key: KAFKA-13782 URL: https://issues.apache.org/jira/browse/KAFKA-13782 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Fix For: 3.2.0, 3.1.1 In KAFKA-13412, we changed the logic to add partitions to transactions in the producer. The intention was to ensure that the partition is added in `TransactionManager` before the record is appended to the `RecordAccumulator`. However, this does not take into account the possibility that the originally selected partition may be changed if `abortForNewBatch` is set in `RecordAppendResult` in the call to `RecordAccumulator.append`. When this happens, the partitioner can choose a different partition, which means that the `TransactionManager` would be tracking the wrong partition. I think the consequence of this is that the batches sent to this partition would get stuck in the `RecordAccumulator` until they timed out because we validate before sending that the partition has been added correctly to the transaction. Note that KAFKA-13412 has not been included in any release, so there are no affected versions. Thanks to [~alivshits] for identifying the bug. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13781) Ensure consistent inter-broker listeners in KRaft
Jason Gustafson created KAFKA-13781: --- Summary: Ensure consistent inter-broker listeners in KRaft Key: KAFKA-13781 URL: https://issues.apache.org/jira/browse/KAFKA-13781 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Assignee: Jason Gustafson A common cause of cluster misconfiguration is having mismatched inter-broker listeners. This can cause replication to fail, or in the case of zk clusters, it can prevent the controller from sending LeaderAndIsr state to the brokers. In KRaft, we can detect the misconfiguration at registration time on the controller and let the broker fail startup. This would allow users to detect the problem much more quickly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13753) Log cleaner should transaction metadata in index until corresponding marker is removed
Jason Gustafson created KAFKA-13753: --- Summary: Log cleaner should transaction metadata in index until corresponding marker is removed Key: KAFKA-13753 URL: https://issues.apache.org/jira/browse/KAFKA-13753 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Assignee: Jason Gustafson Currently the log cleaner will remove aborted transactions from the index as soon as it detects that the data from the transaction is gone. It does not wait until the corresponding marker has also been removed. Although it is extremely unlikely, it seems possible today that a Fetch might fail to return the aborted transaction metadata correctly if a log cleaning occurs concurrently. This is because the collection of aborted transactions is only done after the reading data from the log. It would be safer to preserve the aborted transaction metadata in the index until the marker is also removed. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (KAFKA-13727) Edge case in cleaner can result in premature removal of ABORT marker
[ https://issues.apache.org/jira/browse/KAFKA-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13727. - Fix Version/s: 2.8.2 3.1.1 3.0.2 Resolution: Fixed > Edge case in cleaner can result in premature removal of ABORT marker > > > Key: KAFKA-13727 > URL: https://issues.apache.org/jira/browse/KAFKA-13727 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > Fix For: 2.8.2, 3.1.1, 3.0.2 > > > The log cleaner works by first building a map of the active keys beginning > from the dirty offset, and then scanning forward from the beginning of the > log to decide which records should be retained based on whether they are > included in the map. The map of keys has a limited size. As soon as it fills > up, we stop building it. The offset corresponding to the last record that was > included in the map becomes the next dirty offset. Then when we are cleaning, > we stop scanning forward at the dirty offset. Or to be more precise, we > continue scanning until the end of the segment which includes the dirty > offset, but all records above that offset are coped as is without checking > the map of active keys. > Compaction is complicated by the presence of transactions. The cleaner must > keep track of which transactions have data remaining so that it can tell when > it is safe to remove the respective markers. It works a bit like the > consumer. Before scanning a segment, the cleaner consults the aborted > transaction index to figure out which transactions have been aborted. All > other transactions are considered committed. > The problem we have found is that the cleaner does not take into account the > range of offsets between the dirty offset and the end offset of the segment > containing it when querying ahead for aborted transactions. This means that > when the cleaner is scanning forward from the dirty offset, it does not have > the complete set of aborted transactions. The main consequence of this is > that abort markers associated with transactions which start within this range > of offsets become eligible for deletion even before the corresponding data > has been removed from the log. > Here is an example. Suppose that the log contains the following entries: > offset=0, key=a > offset=1, key=b > offset=2, COMMIT > offset=3, key=c > offset=4, key=d > offset=5, COMMIT > offset=6, key=b > offset=7, ABORT > Suppose we have an offset map which can only contain 2 keys and the dirty > offset starts at 0. The first time we scan forward, we will build a map with > keys a and b, which will allow us to move the dirty offset up to 3. Due to > the issue documented here, we will not detect the aborted transaction > starting at offset 6. But it will not be eligible for deletion on this round > of cleaning because it is bound by `delete.retention.ms`. Instead, our new > logic will set the deletion horizon for this batch based to the current time > plus the configured `delete.retention.ms`. > offset=0, key=a > offset=1, key=b > offset=2, COMMIT > offset=3, key=c > offset=4, key=d > offset=5, COMMIT > offset=6, key=b > offset=7, ABORT (deleteHorizon: N) > Suppose that the time reaches N+1 before the next cleaning. We will begin > from the dirty offset of 3 and collect keys c and d before stopping at offset > 6. Again, we will not detect the aborted transaction beginning at offset 6 > since it is out of the range. This time when we scan, the marker at offset 7 > will be deleted because the transaction will be seen as empty and now the > deletion horizon has passed. So we end up with this state: > offset=0, key=a > offset=1, key=b > offset=2, COMMIT > offset=3, key=c > offset=4, key=d > offset=5, COMMIT > offset=6, key=b > Effectively it becomes a hanging transaction. The interesting thing is that > we might not even detect it. As far as the leader is concerned, it had > already completed that transaction, so it is not expecting any additional > markers. The transaction index would have been rewritten without the aborted > transaction when the log was cleaned, so any consumer fetching the data would > see the transaction as committed. On the other hand, if we did a reassignment > to a new replica, or if we had to rebuild the full log state during recovery, > then we would suddenly detect it. > I am not sure how likely this scenario is in practice. I think it's fair to > say it is an extremely rare case. The cleaner has to fail to clean a full > segment at least two times and you still need enough time to pass for the > marker's deletion horizon to be reached. Perhaps it is possible if the > cardinality of keys is very
[jira] [Created] (KAFKA-13727) Edge case in cleaner can result in premature removal of ABORT marker
Jason Gustafson created KAFKA-13727: --- Summary: Edge case in cleaner can result in premature removal of ABORT marker Key: KAFKA-13727 URL: https://issues.apache.org/jira/browse/KAFKA-13727 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson Assignee: Jason Gustafson The log cleaner works by first building a map of the active keys beginning from the dirty offset, and then scanning forward from the beginning of the log to decide which records should be retained based on whether they are included in the map. The map of keys has a limited size. As soon as it fills up, we stop building it. The offset corresponding to the last record that was included in the map becomes the next dirty offset. Then when we are cleaning, we stop scanning forward at the dirty offset. Or to be more precise, we continue scanning until the end of the segment which includes the dirty offset, but all records above that offset are coped as is without checking the map of active keys. Compaction is complicated by the presence of transactions. The cleaner must keep track of which transactions have data remaining so that it can tell when it is safe to remove the respective markers. It works a bit like the consumer. Before scanning a segment, the cleaner consults the aborted transaction index to figure out which transactions have been aborted. All other transactions are considered committed. The problem we have found is that the cleaner does not take into account the range of offsets between the dirty offset and the end offset of the segment containing it when querying ahead for aborted transactions. This means that when the cleaner is scanning forward from the dirty offset, it does not have the complete set of aborted transactions. The main consequence of this is that abort markers associated with transactions which start within this range of offsets become eligible for deletion even before the corresponding data has been removed from the log. Here is an example. Suppose that the log contains the following entries: offset=0, key=1 offset=1, key=2 offset=2, COMMIT offset=3, key=3 offset=4, key=4 offset=5, COMMIT offset=6, key=2 offset=7, ABORT Suppose we have an offset map which can only contain 2 keys and the dirty offset starts at 0. The first time we scan forward, we will build a map with keys 1 and 2, which will allow us to move the dirty offset up to 3. Due to the issue documented here, we will not detect the aborted transaction starting at offset 6. But it will not be eligible for deletion on this round of cleaning because it is bound by `delete.retention.ms`. Instead, our new logic will set the deletion horizon for this batch based to the current time plus the configured `delete.retention.ms`. offset=0, key=a offset=1, key=b offset=2, COMMIT offset=3, key=c offset=4, key=d offset=5, COMMIT offset=6, key=b offset=7, ABORT (deleteHorizon: N) Suppose that the time reaches N+1 before the next cleaning. We will begin from the dirty offset of 3 and collect keys c and d before stopping at offset 6. Again, we will not detect the aborted transaction beginning at offset 6 since it is out of the range. This time when we scan, the marker at offset 7 will be deleted because the transaction will be seen as empty and now the deletion horizon has passed. So we end up with this state: offset=0, key=a offset=1, key=b offset=2, COMMIT offset=3, key=c offset=4, key=d offset=5, COMMIT offset=6, key=b Effectively it becomes a hanging transaction. The interesting thing is that we might not even detect it. As far as the leader is concerned, it had already completed that transaction, so it is not expecting any additional markers. The transaction index would have been rewritten without the aborted transaction when the log was cleaned, so any consumer fetching the data would see the transaction as committed. On the other hand, if we did a reassignment to a new replica, or if we had to rebuild the full log state during recovery, then we would suddenly detect it. I am not sure how likely this scenario is in practice. I think it's fair to say it is an extremely rare case. The cleaner has to fail to clean a full segment at least two times and you still need enough time to pass for the marker's deletion horizon to be reached. Perhaps it is possible if the cardinality of keys is very high and the configured memory limit for the cleaner is low. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (KAFKA-13704) Include TopicId in kafka-topics describe output
[ https://issues.apache.org/jira/browse/KAFKA-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-13704. - Resolution: Duplicate > Include TopicId in kafka-topics describe output > --- > > Key: KAFKA-13704 > URL: https://issues.apache.org/jira/browse/KAFKA-13704 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Priority: Major > > It would be helpful if `kafka-topics --describe` displayed the TopicId when > we have it available. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KAFKA-13704) Include TopicId in kafka-topics describe output
Jason Gustafson created KAFKA-13704: --- Summary: Include TopicId in kafka-topics describe output Key: KAFKA-13704 URL: https://issues.apache.org/jira/browse/KAFKA-13704 Project: Kafka Issue Type: Improvement Reporter: Jason Gustafson It would be helpful if `kafka-topics --describe` displayed the TopicId when we have it available. -- This message was sent by Atlassian Jira (v8.20.1#820001)