[jira] [Created] (KAFKA-16863) Consider removing `default.` prefix for exception handler config
Matthias J. Sax created KAFKA-16863: --- Summary: Consider removing `default.` prefix for exception handler config Key: KAFKA-16863 URL: https://issues.apache.org/jira/browse/KAFKA-16863 Project: Kafka Issue Type: Improvement Components: streams Reporter: Matthias J. Sax Kafka Streams has a set of configs with `default.` prefix. The intent for the default-prefix is to make a distinction between, well the default, and in-place overwrites in the code. Eg, users can specify ts-extractors on a per-topic basis. However, for the deserialization- and production-exception handlers, no such overwrites are possible, and thus, `default.` does not really make sense, because there is just one handler overall. Via KIP-1033 we added a new processing-exception handler w/o a default-prefix, too. Thus, we should consider to deprecate the two existing configs names and add them back w/o the `default.` prefix. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16862) Refactor ConsumerTaskTest to be deterministic and avoid tight loops
Greg Harris created KAFKA-16862: --- Summary: Refactor ConsumerTaskTest to be deterministic and avoid tight loops Key: KAFKA-16862 URL: https://issues.apache.org/jira/browse/KAFKA-16862 Project: Kafka Issue Type: Task Components: Tiered-Storage Affects Versions: 3.8.0 Reporter: Greg Harris The ConsumerTaskTest instantiates a MockConsumer, and uses this MockConsumer instance in the ConsumerTask, which is run in a background thread. * This causes the background thread to tight loop on MockConsumer#poll, which has no sleep or other delay mechanism. This wastes CPU cycles and makes it impossible to use MockConsumer#schedulePollTask to meaningfully mock out the poll behavior. * The test thread then needs to use TestUtils.waitForCondition, which repeatedly polls a result until it is satisfactory, wasting CPU cycles and introducing opportunities for timeout errors. (The test is not currently flaky in CI, so this is less of a concern). Instead, the ConsumerTaskTest can be rewritten to not utilize a background thread, and make all calls to the MockConsumer on the same thread. This is the model that the DistributedHerderTest uses with DistributedHerder#tick, and WorkerSinkTaskTest uses with WorkerSinkTask#iteration. AbstractWorkerSourceTaskTest uses a similar model with multiple methods, the most notable being AbstractWorkerSourceTask#sendRecords. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16771) First log directory printed twice when formatting storage
[ https://issues.apache.org/jira/browse/KAFKA-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-16771. Fix Version/s: 3.8.0 Resolution: Fixed > First log directory printed twice when formatting storage > - > > Key: KAFKA-16771 > URL: https://issues.apache.org/jira/browse/KAFKA-16771 > Project: Kafka > Issue Type: Task > Components: tools >Affects Versions: 3.7.0 >Reporter: Mickael Maison >Assignee: xuanzhang gong >Priority: Major > Fix For: 3.8.0 > > > If multiple log directories are set, when running bin/kafka-storage.sh > format, the first directory is printed twice. For example: > {noformat} > bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c > config/kraft/server.properties --release-version 3.6 > metaPropertiesEnsemble=MetaPropertiesEnsemble(metadataLogDir=Optional.empty, > dirs={/tmp/kraft-combined-logs: EMPTY, /tmp/kraft-combined-logs2: EMPTY}) > Formatting /tmp/kraft-combined-logs with metadata.version 3.6-IV2. > Formatting /tmp/kraft-combined-logs with metadata.version 3.6-IV2. > Formatting /tmp/kraft-combined-logs2 with metadata.version 3.6-IV2. > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16861) Don't convert to group to classic if the size is larger than group max size
Chia-Ping Tsai created KAFKA-16861: -- Summary: Don't convert to group to classic if the size is larger than group max size Key: KAFKA-16861 URL: https://issues.apache.org/jira/browse/KAFKA-16861 Project: Kafka Issue Type: Bug Reporter: Chia-Ping Tsai Assignee: Chia-Ping Tsai It should be one-line fix [0] [0] https://github.com/apache/kafka/blob/2d9994e0de915037525f041ff9a9b9325f838938/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java#L810 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16722) Add ConsumerGroupPartitionAssignor interface
[ https://issues.apache.org/jira/browse/KAFKA-16722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Jacot resolved KAFKA-16722. - Reviewer: David Jacot Resolution: Fixed > Add ConsumerGroupPartitionAssignor interface > > > Key: KAFKA-16722 > URL: https://issues.apache.org/jira/browse/KAFKA-16722 > Project: Kafka > Issue Type: Sub-task >Reporter: Andrew Schofield >Assignee: Andrew Schofield >Priority: Major > Fix For: 3.8.0 > > > Adds the interface > `org.apache.kafka.coordinator.group.assignor.ConsumerGroupPartitionAssignor` > as described in KIP-932. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16705) the flag "started" of RaftClusterInstance is false even though the cluster is started
[ https://issues.apache.org/jira/browse/KAFKA-16705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-16705. Fix Version/s: 3.8.0 Resolution: Fixed > the flag "started" of RaftClusterInstance is false even though the cluster is > started > - > > Key: KAFKA-16705 > URL: https://issues.apache.org/jira/browse/KAFKA-16705 > Project: Kafka > Issue Type: Bug >Reporter: Chia-Ping Tsai >Assignee: xuanzhang gong >Priority: Minor > Fix For: 3.8.0 > > > we should set `started` to true after > https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/test/java/kafka/test/junit/RaftClusterInvocationContext.java#L113 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16569) Target Assignment Format Change
[ https://issues.apache.org/jira/browse/KAFKA-16569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Jacot resolved KAFKA-16569. - Resolution: Won't Do > Target Assignment Format Change > --- > > Key: KAFKA-16569 > URL: https://issues.apache.org/jira/browse/KAFKA-16569 > Project: Kafka > Issue Type: Sub-task >Reporter: Ritika Reddy >Assignee: Ritika Reddy >Priority: Major > > Currently the assignment is stored as Map>, we > want to change it to a list > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16832) LeaveGroup API for upgrading ConsumerGroup
[ https://issues.apache.org/jira/browse/KAFKA-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Jacot resolved KAFKA-16832. - Fix Version/s: 3.8.0 Resolution: Fixed > LeaveGroup API for upgrading ConsumerGroup > -- > > Key: KAFKA-16832 > URL: https://issues.apache.org/jira/browse/KAFKA-16832 > Project: Kafka > Issue Type: Sub-task >Reporter: Dongnuo Lyu >Assignee: Dongnuo Lyu >Priority: Major > Fix For: 3.8.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16399) Add JBOD support in tiered storage
[ https://issues.apache.org/jira/browse/KAFKA-16399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Chen resolved KAFKA-16399. --- Fix Version/s: 3.8.0 Resolution: Fixed > Add JBOD support in tiered storage > -- > > Key: KAFKA-16399 > URL: https://issues.apache.org/jira/browse/KAFKA-16399 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > Fix For: 3.8.0 > > > Add JBOD support in tiered storage > Currently, when JBOD is configured, the Tiered Storage feature is forced to > be disabled. This Jira is to fix the gap. And why is that important? Because > it doesn't make sense that to use Tiered Storage feature, users cannot use > JBOD storage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16860) Introduce `group.version` feature flag
David Jacot created KAFKA-16860: --- Summary: Introduce `group.version` feature flag Key: KAFKA-16860 URL: https://issues.apache.org/jira/browse/KAFKA-16860 Project: Kafka Issue Type: Sub-task Reporter: David Jacot Assignee: David Jacot Fix For: 3.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16847) Revise the README for recent CI changes
[ https://issues.apache.org/jira/browse/KAFKA-16847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-16847. Resolution: Invalid > Revise the README for recent CI changes > > > Key: KAFKA-16847 > URL: https://issues.apache.org/jira/browse/KAFKA-16847 > Project: Kafka > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Cheng-Kai, Zhang >Priority: Minor > > The recent changes [0] removes the test of 11 and 17, and that is good to our > CI resources. However, in the root readme we still declaim "We build and test > Apache Kafka with Java 8, 11, 17 and 21" > [0] > https://github.com/apache/kafka/commit/adab48df6830259d33bd9705b91885c4f384f267 > [1] https://github.com/apache/kafka/blob/trunk/README.md?plain=1#L7 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16796) Introduce new org.apache.kafka.tools.api.Decoder to replace kafka.serializer.Decoder
[ https://issues.apache.org/jira/browse/KAFKA-16796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-16796. Fix Version/s: 3.8.0 Resolution: Fixed > Introduce new org.apache.kafka.tools.api.Decoder to replace > kafka.serializer.Decoder > > > Key: KAFKA-16796 > URL: https://issues.apache.org/jira/browse/KAFKA-16796 > Project: Kafka > Issue Type: Sub-task >Reporter: Chia-Ping Tsai >Assignee: PoAn Yang >Priority: Minor > Labels: need-kip > Fix For: 3.8.0 > > > We need a replacement in order to complete > https://issues.apache.org/jira/browse/KAFKA-14579 in kafak 4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16859) Cleanup check if tiered storage is enabled
Mickael Maison created KAFKA-16859: -- Summary: Cleanup check if tiered storage is enabled Key: KAFKA-16859 URL: https://issues.apache.org/jira/browse/KAFKA-16859 Project: Kafka Issue Type: Task Reporter: Mickael Maison We have 2 ways to detect whether tiered storage is enabled: - KafkaConfig.isRemoteLogStorageSystemEnabled - KafkaConfig.remoteLogManagerConfig().enableRemoteStorageSystem() We use both in various files. We should stick with one way to do it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16516) Fix the controller node provider for broker to control channel
[ https://issues.apache.org/jira/browse/KAFKA-16516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] José Armando García Sancio resolved KAFKA-16516. Resolution: Fixed > Fix the controller node provider for broker to control channel > -- > > Key: KAFKA-16516 > URL: https://issues.apache.org/jira/browse/KAFKA-16516 > Project: Kafka > Issue Type: Sub-task > Components: core >Reporter: José Armando García Sancio >Assignee: Colin McCabe >Priority: Major > Fix For: 3.8.0 > > > The broker to controller channel gets the set of voters directly from the > static configuration. This needs to change so that the leader nodes comes > from the kraft client/manager. > The code is in KafkaServer where it construct the RaftControllerNodeProvider. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16515) Fix the ZK Metadata cache use of voter static configuration
[ https://issues.apache.org/jira/browse/KAFKA-16515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] José Armando García Sancio resolved KAFKA-16515. Resolution: Fixed > Fix the ZK Metadata cache use of voter static configuration > --- > > Key: KAFKA-16515 > URL: https://issues.apache.org/jira/browse/KAFKA-16515 > Project: Kafka > Issue Type: Sub-task > Components: core >Reporter: José Armando García Sancio >Assignee: Colin McCabe >Priority: Major > Fix For: 3.8.0 > > > Looks like because of ZK migration to KRaft the ZK Metadata cache was changed > to read the voter static configuration. This needs to change to use the voter > nodes reported by the raft manager or the kraft client. > The injection code is in KafkaServer where it constructs > MetadataCache.zkMetadata. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16759) Invalid client telemetry transition on consumer close
[ https://issues.apache.org/jira/browse/KAFKA-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Schofield resolved KAFKA-16759. -- Resolution: Fixed > Invalid client telemetry transition on consumer close > - > > Key: KAFKA-16759 > URL: https://issues.apache.org/jira/browse/KAFKA-16759 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Andrew Schofield >Assignee: Andrew Schofield >Priority: Minor > Fix For: 3.8.0 > > > Using the console consumer with client telemetry enabled, I hit an invalid > state transition when closing the consumer with CTRL-C. The consumer sends a > final "terminating" telemetry push which puts the client telemetry reporter > into TERMINATING_PUSH_IN_PROGRESS state. When it receives a response in this > state, it attempts an invalid state transition. > > {noformat} > [2024-05-13 19:19:35,804] WARN Error updating client telemetry state, > disabled telemetry > (org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter) > java.lang.IllegalStateException: Invalid telemetry state transition from > TERMINATING_PUSH_IN_PROGRESS to PUSH_NEEDED; the valid telemetry state > transitions from TERMINATING_PUSH_IN_PROGRESS are: TERMINATED > at > org.apache.kafka.common.telemetry.ClientTelemetryState.validateTransition(ClientTelemetryState.java:163) > at > org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter$DefaultClientTelemetrySender.maybeSetState(ClientTelemetryReporter.java:827) > at > org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter$DefaultClientTelemetrySender.handleResponse(ClientTelemetryReporter.java:520) > at > org.apache.kafka.clients.NetworkClient$TelemetrySender.handleResponse(NetworkClient.java:1321) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:948) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:594) > at > org.apache.kafka.clients.consumer.internals.NetworkClientDelegate.poll(NetworkClientDelegate.java:130) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.sendUnsentRequests(ConsumerNetworkThread.java:262) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.cleanup(ConsumerNetworkThread.java:275) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.run(ConsumerNetworkThread.java:95) > [2024-05-13 19:19:35,805] WARN Unable to transition state after successful > push telemetry from state TERMINATING_PUSH_IN_PROGRESS > (org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-5451) Kafka Connect should scan classpath asynchronously
[ https://issues.apache.org/jira/browse/KAFKA-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Harris resolved KAFKA-5451. Resolution: Won't Do Kafka Connect needs the results of plugin scanning to answer basic REST queries, and to be assigned workloads via joining the group. Without finalized scan results, neither of these operations can meaningfully complete. Rather than make the scanning asynchronous, we have elected to make it faster via KAFKA-14627/KIP-898. It no longer makes sense to async-process something that takes <1s. > Kafka Connect should scan classpath asynchronously > -- > > Key: KAFKA-5451 > URL: https://issues.apache.org/jira/browse/KAFKA-5451 > Project: Kafka > Issue Type: Improvement > Components: connect >Affects Versions: 0.11.0.0 >Reporter: Randall Hauch >Assignee: Randall Hauch >Priority: Major > Labels: performance > Original Estimate: 24h > Remaining Estimate: 24h > > When Kafka Connect workers start up, they scan the classpath and module paths > for connectors, transformations, and converters. This takes anywhere from > 15-30sec or longer depending upon how many JARs are included. Currently, this > scanning is done synchronously during startup of the Kafka Connect workers, > even though the workers may not need the result of the scan. > The scanning logic should be asynchronous and should only block any > components that require the result of the scan. This will improve startup > time of the workers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-6208) Reduce startup time for Kafka Connect workers
[ https://issues.apache.org/jira/browse/KAFKA-6208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Harris resolved KAFKA-6208. Fix Version/s: 3.6.0 Resolution: Fixed This is fixed by setting plugin.discovery=service_load on 3.6+, see KAFKA-14627/KIP-898 for more details. > Reduce startup time for Kafka Connect workers > - > > Key: KAFKA-6208 > URL: https://issues.apache.org/jira/browse/KAFKA-6208 > Project: Kafka > Issue Type: Improvement > Components: connect >Affects Versions: 1.0.0 >Reporter: Randall Hauch >Priority: Major > Labels: needs-kip > Fix For: 3.6.0 > > > Kafka Connect startup times are excessive with a handful of connectors on the > plugin path or classpath. We should not be scanning three times (once for > connectors, once for SMTs, and once for converters), and hopefully we can > avoid scanning directories that are clearly not plugin directories. > We should also consider using Java's Service Loader to quickly identify > connectors. The latter would require a KIP and would require time to for > connectors to migrate, but we could be smarter about only scanning plugin > directories that need to be scanned. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16858) Flatten SMT throws NPE
Adam Strickland created KAFKA-16858: --- Summary: Flatten SMT throws NPE Key: KAFKA-16858 URL: https://issues.apache.org/jira/browse/KAFKA-16858 Project: Kafka Issue Type: Bug Components: connect Affects Versions: 3.6.0 Environment: Kafka 3.6 by way of CP 7.6.0 Reporter: Adam Strickland Attachments: FlattenTest.java {{ConnectSchema.expectedClassesFor}} sometimes will throw an NPE as part of a call to an SMT chain. Stack trace snippet: {{at com.github.momenttechnology.kafka.connect.transforms.MomentFlatten.apply(MomentFlatten.java:84)}} {{at com.github.momenttechnology.kafka.connect.transforms.MomentFlatten.applyWithSchema(MomentFlatten.java:173)}} {{at com.github.momenttechnology.kafka.connect.transforms.MomentFlatten.buildWithSchema(MomentFlatten.java:280)}} {{at com.github.momenttechnology.kafka.connect.transforms.MomentFlatten.buildWithSchema(MomentFlatten.java:280)}} {{at com.github.momenttechnology.kafka.connect.transforms.MomentFlatten.buildWithSchema(MomentFlatten.java:280)}} {{at com.github.momenttechnology.kafka.connect.transforms.MomentFlatten.buildWithSchema(MomentFlatten.java:274)}} {{at org.apache.kafka.connect.data.Struct.put(Struct.java:203)}} {{at org.apache.kafka.connect.data.Struct.put(Struct.java:216)}} {{at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:255)}} {{at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:213)}} {{at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:224)}} {{at org.apache.kafka.connect.data.ConnectSchema.expectedClassesFor(ConnectSchema.java:268)}} (the above transform is a sub-class of {{{}o.a.k.connect.transforms.Flatten{}}}; have confirmed that the error occurs regardless). The field being transformed is an array of structs. If the call to {{Schema#valueSchema()}} (o.a.k.connect.data.ConnectSchema.java:255) returns {{{}null{}}}, the subsequent call to {{Schema#name()}} at o.a.k.connect.data.ConnectSchema:268 throws an NPE. The strange thing that we have observed is that this doesn't always happen; *sometimes* the struct's schema is found and sometimes it is not. We have been unable to determine the root cause, but have constructed a test that replicates the problem as observed (see attachment). In our case we have worked around the issue with the aforementioned sub-class of {{{}Flatten{}}}, catching and logging the NPE on that specific use-case. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16857) Zookeeper - Add new ZNodes
Christo Lolov created KAFKA-16857: - Summary: Zookeeper - Add new ZNodes Key: KAFKA-16857 URL: https://issues.apache.org/jira/browse/KAFKA-16857 Project: Kafka Issue Type: Sub-task Reporter: Christo Lolov *Summary* Additional information needs to be stored in new ZNodes as part of disablement. Ensure that said information makes it into Zookeeper. {code:java} /brokers/topics/{topic-name}/partitions /tieredstorage/ /tiered_epoch /state {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16856) Zookeeper - Add new exception
Christo Lolov created KAFKA-16856: - Summary: Zookeeper - Add new exception Key: KAFKA-16856 URL: https://issues.apache.org/jira/browse/KAFKA-16856 Project: Kafka Issue Type: Sub-task Reporter: Christo Lolov *Summary* Add TIERED_STORAGE_DISABLEMENT_IN_PROGRESS exception -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16855) KRaft - Wire replaying a TopicRecord
Christo Lolov created KAFKA-16855: - Summary: KRaft - Wire replaying a TopicRecord Key: KAFKA-16855 URL: https://issues.apache.org/jira/browse/KAFKA-16855 Project: Kafka Issue Type: Sub-task Reporter: Christo Lolov *Summary* Replaying a TopicRecord containing a new TieredEpoch and TieredState needs to interact with the two thread pools in the RemoteLogManager to add/remove the correct tasks from each -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16854) Zookeeper - Add v5 of StopReplica
Christo Lolov created KAFKA-16854: - Summary: Zookeeper - Add v5 of StopReplica Key: KAFKA-16854 URL: https://issues.apache.org/jira/browse/KAFKA-16854 Project: Kafka Issue Type: Sub-task Reporter: Christo Lolov -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16853) Split RemoteLogManagerScheduledThreadPool
Christo Lolov created KAFKA-16853: - Summary: Split RemoteLogManagerScheduledThreadPool Key: KAFKA-16853 URL: https://issues.apache.org/jira/browse/KAFKA-16853 Project: Kafka Issue Type: Sub-task Reporter: Christo Lolov *Summary* To begin with create just the RemoteDataExpirationThreadPool and move expiration to it. Keep all settings as if the only thread pool was the RemoteLogManagerScheduledThreadPool. Ensure that the new thread pool is wired correctly to the RemoteLogManager. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16851) Add remote.log.disable.policy
Christo Lolov created KAFKA-16851: - Summary: Add remote.log.disable.policy Key: KAFKA-16851 URL: https://issues.apache.org/jira/browse/KAFKA-16851 Project: Kafka Issue Type: Sub-task Reporter: Christo Lolov *Summary* Add the configuration as internal-only to begin with. Do not wire it to anything yet, just ensure that it is settable dynamically -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16852) Add *.thread.pool.size
Christo Lolov created KAFKA-16852: - Summary: Add *.thread.pool.size Key: KAFKA-16852 URL: https://issues.apache.org/jira/browse/KAFKA-16852 Project: Kafka Issue Type: Sub-task Reporter: Christo Lolov *Summary* Add the remote.log.manager.copier.thread.pool.size and remote.log.manager.expiration.thread.pool.size configurations as internal-only -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16850) KRaft - Add v2 of TopicRecord
Christo Lolov created KAFKA-16850: - Summary: KRaft - Add v2 of TopicRecord Key: KAFKA-16850 URL: https://issues.apache.org/jira/browse/KAFKA-16850 Project: Kafka Issue Type: Sub-task Reporter: Christo Lolov -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16805) Stop using a ClosureBackedAction to configure Spotbugs reports
[ https://issues.apache.org/jira/browse/KAFKA-16805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-16805. Fix Version/s: 3.8.0 Resolution: Fixed > Stop using a ClosureBackedAction to configure Spotbugs reports > -- > > Key: KAFKA-16805 > URL: https://issues.apache.org/jira/browse/KAFKA-16805 > Project: Kafka > Issue Type: Sub-task >Reporter: Greg Harris >Assignee: 黃竣陽 >Priority: Major > Labels: newbie > Fix For: 3.8.0 > > > The org.gradle.util.ClosureBackedAction type has been deprecated. > This is scheduled to be removed in Gradle 9.0. > [Documentation|https://docs.gradle.org/8.7/userguide/upgrading_version_7.html#org_gradle_util_reports_deprecations] > > 1 usage > [https://github.com/apache/kafka/blob/95adb7bfbfc69b3e9f3538cc5d6f7c6a577d30ee/build.gradle#L745-L749] > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16849) ERROR Failed to read /opt/kafka/data/meta.properties
Agostino Sarubbo created KAFKA-16849: Summary: ERROR Failed to read /opt/kafka/data/meta.properties Key: KAFKA-16849 URL: https://issues.apache.org/jira/browse/KAFKA-16849 Project: Kafka Issue Type: Bug Affects Versions: 2.8.1 Environment: RockyLinux-9 openjdk version "1.8.0_412" Reporter: Agostino Sarubbo I'm running a kafka-2.8.1 cluster with 5 machines. For how it is configured (I have set a replica of 3) I can completely destroy a machine and re-create it from scratch. After have re-created it, it joins again the cluster and everything works as before. So, right now I'm trying to migrate the host from CentOS-7 to RockyLinux-9. The idea was destroy and re-create the machines one-by-one. I have increased the loglevel to DEBUG and this is what I get: ``` [2024-05-28 07:35:09,287] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$) [2024-05-28 07:35:09,676] INFO Registered signal handlers for TERM, INT, HUP (org.apache.kafka.common.utils.LoggingSignalHandler) [2024-05-28 07:35:09,680] INFO starting (kafka.server.KafkaServer) [2024-05-28 07:35:09,680] INFO Connecting to zookeeper on zookeeper1.my.domain:2281,zookeeper2.my.domain:2281,zookeeper3.my.domain:2281,zookeeper4.my.domain:2281,zookeeper5.my.domain: 2281 (kafka.server.KafkaServer) [2024-05-28 07:35:09,681] DEBUG Checking login config for Zookeeper JAAS context [java.security.auth.login.config=null, zookeeper.sasl.client=default:true, zookeeper.sasl.clientconfig =default:Client] (org.apache.kafka.common.security.JaasUtils) [2024-05-28 07:35:09,695] INFO [ZooKeeperClient Kafka server] Initializing a new session to zookeeper1.my.domain:2281,zookeeper2.my.domain:2281,zookeeper3.my.domain:2281,zookeeper4.my .domain:2281,zookeeper5.my.domain:2281. (kafka.zookeeper.ZooKeeperClient) [2024-05-28 07:35:09,758] DEBUG Initializing task scheduler. (kafka.utils.KafkaScheduler) [2024-05-28 07:35:09,759] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient) [2024-05-28 07:35:10,051] DEBUG [ZooKeeperClient Kafka server] Received event: WatchedEvent state:SyncConnected type:None path:null (kafka.zookeeper.ZooKeeperClient) [2024-05-28 07:35:10,053] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient) [2024-05-28 07:35:10,139] INFO [feature-zk-node-event-process-thread]: Starting (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread) [2024-05-28 07:35:10,144] DEBUG Reading feature ZK node at path: /feature (kafka.server.FinalizedFeatureChangeListener) [2024-05-28 07:35:10,251] INFO Updated cache from existing to latest FinalizedFeaturesAndEpoch(features=Features{}, epoch=1). (kafka.server.FinalizedFeatureCache) [2024-05-28 07:35:10,255] INFO Cluster ID = 3G4teZlrS-uT6-Sk5MkbPQ (kafka.server.KafkaServer) [2024-05-28 07:35:10,258] ERROR Failed to read /opt/kafka/data/meta.properties (kafka.server.BrokerMetadataCheckpoint$) java.nio.file.AccessDeniedException: /opt/kafka/data/meta.properties.tmp at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244) at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108) at java.nio.file.Files.deleteIfExists(Files.java:1165) at kafka.server.BrokerMetadataCheckpoint.read(BrokerMetadataCheckpoint.scala:224) at kafka.server.BrokerMetadataCheckpoint$.$anonfun$getBrokerMetadataAndOfflineDirs$2(BrokerMetadataCheckpoint.scala:158) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at kafka.server.BrokerMetadataCheckpoint$.getBrokerMetadataAndOfflineDirs(BrokerMetadataCheckpoint.scala:153) at kafka.server.KafkaServer.startup(KafkaServer.scala:206) at kafka.Kafka$.main(Kafka.scala:109) at kafka.Kafka.main(Kafka.scala) [2024-05-28 07:35:10,326] INFO KafkaConfig values: advertised.host.name = null advertised.listeners = null advertised.port = null alter.config.policy.class.name = null alter.log.dirs.replication.quota.window.num = 11 alter.log.dirs.replication.quota.window.size.seconds = 1 authorizer.class.name = auto.create.topics.enable = true auto.leader.rebalance.enable = true background.threads = 10 broker.heartbeat.interval.ms = 2000 bro
[jira] [Created] (KAFKA-16848) Reverting KRaft migration for "Migrating brokers to KRaft" state is wrong
Luke Chen created KAFKA-16848: - Summary: Reverting KRaft migration for "Migrating brokers to KRaft" state is wrong Key: KAFKA-16848 URL: https://issues.apache.org/jira/browse/KAFKA-16848 Project: Kafka Issue Type: Bug Affects Versions: 3.7.0 Reporter: Luke Chen Hello, I would like to report a mistake in the {_}Kafka 3.7 Documentation -> 6.10 KRaft -> ZooKeeper to KRaft Migration -> Reverting to ZooKeeper mode During the Migration{_}. While migrating my Kafka + Zookeeper cluster to KRaft and testing rollbacks at a different migration stages I have noticed, that "{_}Directions for reverting{_}" provided for "{_}Migrating brokers to KRaft{_}" are wrong. Following the first step provided in documentation you suppose to : _On each broker, remove the process.roles configuration, and restore the zookeeper.connect configuration to its previous value. If your cluster requires other ZooKeeper configurations for brokers, such as zookeeper.ssl.protocol, re-add those configurations as well. Then perform a rolling._ In that case, if you remove _process.roles_ configuration and restore _zookeeper.connect_ as well as other _ZooKeeper_ configuration (If your cluster requires) you will receive an error that looks like this: [2024-05-28 08:09:49,396] lvl=ERROR Exiting Kafka due to fatal exception logger=kafka.Kafka$ java.lang.IllegalArgumentException: requirement failed: controller.listener.names must be empty when not running in KRaft mode: [CONTROLLER] at scala.Predef$.require(Predef.scala:337) at kafka.server.KafkaConfig.validateValues(KafkaConfig.scala:2441) at kafka.server.KafkaConfig.(KafkaConfig.scala:2290) at kafka.server.KafkaConfig.(KafkaConfig.scala:1639) at kafka.Kafka$.buildServer(Kafka.scala:71) at kafka.Kafka$.main(Kafka.scala:90) at kafka.Kafka.main(Kafka.scala) However I was able to perform rollback successfully by performing additional steps: * Restore _zookeeper.metadata.migration.enable=true_ line in broker configuration; * We are using {_}[authorizer.class.name|http://authorizer.class.name/]{_}, so it also had to be reverted: _org.apache.kafka.metadata.authorizer.StandardAuthorizer_ -> {_}kafka.security.authorizer.AclAuthorizer{_}; I believe that should be mentioned. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16847) Revise the README for recent CI changes
Chia-Ping Tsai created KAFKA-16847: -- Summary: Revise the README for recent CI changes Key: KAFKA-16847 URL: https://issues.apache.org/jira/browse/KAFKA-16847 Project: Kafka Issue Type: Improvement Reporter: Chia-Ping Tsai Assignee: Chia-Ping Tsai The recent changes [0] removes the test of 11 and 17, and that is good to our CI resources. However, in the root readme we still declaim "We build and test Apache Kafka with Java 8, 11, 17 and 21" [0] https://github.com/apache/kafka/commit/adab48df6830259d33bd9705b91885c4f384f267 [1] https://github.com/apache/kafka/blob/trunk/README.md?plain=1#L7 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16709) alter logDir within broker might cause log cleanup hanging
[ https://issues.apache.org/jira/browse/KAFKA-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Chen resolved KAFKA-16709. --- Resolution: Fixed > alter logDir within broker might cause log cleanup hanging > -- > > Key: KAFKA-16709 > URL: https://issues.apache.org/jira/browse/KAFKA-16709 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > Fix For: 3.8.0 > > > When doing alter replica logDirs, we'll create a future log and pause log > cleaning for the partition( > [here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/server/ReplicaManager.scala#L1200]). > And this log cleaning pausing will resume after alter replica logDirs > completes > ([here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/log/LogManager.scala#L1254]). > And when in the resuming log cleaning, we'll decrement 1 for the > LogCleaningPaused count. Once the count reached 0, the cleaning pause is > really resuming. > ([here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/log/LogCleanerManager.scala#L310]). > For more explanation about the logCleaningPaused state can check > [here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/log/LogCleanerManager.scala#L55]. > > But, there's still one factor that could increase the LogCleaningPaused > count: leadership change > ([here|https://github.com/apache/kafka/blob/643db430a707479c9e87eec1ad67e1d4f43c9268/core/src/main/scala/kafka/server/ReplicaManager.scala#L2126]). > When there's a leadership change, we'll check if there's a future log in > this partition, if so, we'll create future log and pauseCleaning > (LogCleaningPaused count + 1). So, if during the alter replica logDirs: > # alter replica logDirs for tp0 triggered (LogCleaningPaused count = 1) > # tp0 leadership changed (LogCleaningPaused count = 2) > # alter replica logDirs completes, resuming logCleaning (LogCleaningPaused > count = 1) > # LogCleaning keeps paused because the count is always > 0 > > The log cleaning is not just related to compacting logs, but also affecting > the normal log retention processing, which means, the log retention for these > paused partitions will be pending. This issue can be fixed when broker > restarted. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16418) Review/split long-running admin client integration tests
[ https://issues.apache.org/jira/browse/KAFKA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lianet Magrans resolved KAFKA-16418. Resolution: Not A Problem > Review/split long-running admin client integration tests > > > Key: KAFKA-16418 > URL: https://issues.apache.org/jira/browse/KAFKA-16418 > Project: Kafka > Issue Type: Task > Components: clients >Reporter: Lianet Magrans >Assignee: Lianet Magrans >Priority: Major > > Review PlaintextAdminIntegrationTest and attempt to split it to allow for > parallelization and improve build times. This tests is the longest running > integration test in kafka.api, so a similar approach to what has been done > with the consumer tests in PlaintextConsumerTest should be a good > improvement. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16841) ZKMigrationIntegrationTests broken
[ https://issues.apache.org/jira/browse/KAFKA-16841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justine Olshan resolved KAFKA-16841. Resolution: Fixed fixed by https://github.com/apache/kafka/commit/bac8df56ffdf8a64ecfb78ec0779bcbc8e9f7c10 > ZKMigrationIntegrationTests broken > -- > > Key: KAFKA-16841 > URL: https://issues.apache.org/jira/browse/KAFKA-16841 > Project: Kafka > Issue Type: Task >Reporter: Justine Olshan >Priority: Blocker > > A recent merge to trunk seems to have broken tests so that I see 78 failures > in the CI. > I see lots of timeout errors and `Alter Topic Configs had an error` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16371) Unstable committed offsets after triggering commits where metadata for some partitions are over the limit
[ https://issues.apache.org/jira/browse/KAFKA-16371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Jacot resolved KAFKA-16371. - Fix Version/s: 3.8.0 3.7.1 Assignee: David Jacot Resolution: Fixed > Unstable committed offsets after triggering commits where metadata for some > partitions are over the limit > - > > Key: KAFKA-16371 > URL: https://issues.apache.org/jira/browse/KAFKA-16371 > Project: Kafka > Issue Type: Bug > Components: offset manager >Affects Versions: 3.7.0 >Reporter: mlowicki >Assignee: David Jacot >Priority: Major > Fix For: 3.8.0, 3.7.1 > > > Issue is reproducible with simple CLI tool - > [https://gist.github.com/mlowicki/c3b942f5545faced93dc414e01a2da70] > {code:java} > #!/usr/bin/env bash > for i in {1..100} > do > kafka-committer --bootstrap "ADDR:9092" --topic "TOPIC" --group foo > --metadata-min 6000 --metadata-max 1 --partitions 72 --fetch > done{code} > What it does it that initially it fetches committed offsets and then tries to > commit for multiple partitions. If some of commits have metadata over the > allowed limit then: > 1. I see errors about too large commits (expected) > 2. Another run the tool fails at the stage of fetching commits with (this is > the problem): > {code:java} > config: ClientConfig { conf_map: { "group.id": "bar", "bootstrap.servers": > "ADDR:9092", }, log_level: Error, } > fetching committed offsets.. > Error: Meta data fetch error: OperationTimedOut (Local: Timed out) Caused by: > OperationTimedOut (Local: Timed out){code} > On the Kafka side I see _unstable_offset_commits_ errors reported by out > internal metric which is derived from: > {noformat} > > kafka.network:type=RequestMetrics,name=ErrorsPerSec,request=X,error=Y{noformat} > Increasing the timeout doesn't help and the only solution I've found is to > trigger commits for all partitions with metadata below the limit or to use: > {code:java} > isolation.level=read_uncommitted{code} > > I don't know that code very well but > [https://github.com/apache/kafka/blob/3.7/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L492-L496] > seems fishy: > {code:java} > if (isTxnOffsetCommit) { > addProducerGroup(producerId, group.groupId) > group.prepareTxnOffsetCommit(producerId, offsetMetadata) > } else { > group.prepareOffsetCommit(offsetMetadata) > }{code} > as it's using _offsetMetadata_ and not _filteredOffsetMetadata_ and I see > that while removing those pending commits we use filtered offset metadata > around > [https://github.com/apache/kafka/blob/3.7/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L397-L422] > > {code:java} > val responseError = group.inLock { > if (status.error == Errors.NONE) { > if (!group.is(Dead)) { > filteredOffsetMetadata.forKeyValue { (topicIdPartition, > offsetAndMetadata) => > if (isTxnOffsetCommit) > group.onTxnOffsetCommitAppend(producerId, topicIdPartition, > CommitRecordMetadataAndOffset(Some(status.baseOffset), offsetAndMetadata)) > else > group.onOffsetCommitAppend(topicIdPartition, > CommitRecordMetadataAndOffset(Some(status.baseOffset), offsetAndMetadata)) > } > } > // Record the number of offsets committed to the log > offsetCommitsSensor.record(records.size) > Errors.NONE > } else { > if (!group.is(Dead)) { > if (!group.hasPendingOffsetCommitsFromProducer(producerId)) > removeProducerGroup(producerId, group.groupId) > filteredOffsetMetadata.forKeyValue { (topicIdPartition, > offsetAndMetadata) => > if (isTxnOffsetCommit) > group.failPendingTxnOffsetCommit(producerId, topicIdPartition) > else > group.failPendingOffsetWrite(topicIdPartition, > offsetAndMetadata) > } > } > {code} > so the problem might be related to not cleaning up the data structure with > pending commits properly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16846) Should TxnOffsetCommit API fail all the offsets if any fails the validation?
David Jacot created KAFKA-16846: --- Summary: Should TxnOffsetCommit API fail all the offsets if any fails the validation? Key: KAFKA-16846 URL: https://issues.apache.org/jira/browse/KAFKA-16846 Project: Kafka Issue Type: Improvement Reporter: David Jacot While working on KAFKA-16371, we realized that the handling of INVALID_COMMIT_OFFSET_SIZE errors while committer transaction offsets, is a bit inconsistent between the server and the client. On the server, the offsets are validated independently from each others. Hence if two offsets A and B are committed and A fails the validation, B is still written to the log as part of the transaction. On the client, when INVALID_COMMIT_OFFSET_SIZE is received, the transaction transitions to the fatal state. Hence the transaction will be eventually aborted. The client side API is quite limiting here because it does not return an error per committed offsets. It is all or nothing. From this point of view, the current behaviour is correct. It seems that we could either change the API and let the user decide what to do; or we could strengthen the validation on the server to fail all the offsets if any of them fails (all or nothing). We could also leave it as it is. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16845) Migrate ReplicationQuotasTestRig to new test infra
Chia-Ping Tsai created KAFKA-16845: -- Summary: Migrate ReplicationQuotasTestRig to new test infra Key: KAFKA-16845 URL: https://issues.apache.org/jira/browse/KAFKA-16845 Project: Kafka Issue Type: Improvement Reporter: Chia-Ping Tsai Assignee: Chia-Ping Tsai as title -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16844) ByteArrayConverter can't convert ByteBuffer
Fan Yang created KAFKA-16844: Summary: ByteArrayConverter can't convert ByteBuffer Key: KAFKA-16844 URL: https://issues.apache.org/jira/browse/KAFKA-16844 Project: Kafka Issue Type: Improvement Components: connect Reporter: Fan Yang In current Schema design, schema type Bytes correspond to two kinds of classes, byte[] and ByteBuffer. But current ByteArrayConverter can only convert byte[]. My suggestion is to add ByteBuffer support in current ByteArrayConverter. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16804) Replace gradle archivesBaseName with archivesName
[ https://issues.apache.org/jira/browse/KAFKA-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Harris resolved KAFKA-16804. - Fix Version/s: 3.8.0 Resolution: Fixed > Replace gradle archivesBaseName with archivesName > - > > Key: KAFKA-16804 > URL: https://issues.apache.org/jira/browse/KAFKA-16804 > Project: Kafka > Issue Type: Sub-task >Reporter: Greg Harris >Assignee: TengYao Chi >Priority: Major > Labels: newbie > Fix For: 3.8.0 > > > The BasePluginExtension.archivesBaseName property has been deprecated. > This is scheduled to be removed in Gradle 9.0. > Please use the archivesName property instead. > [Documentation|https://docs.gradle.org/8.7/dsl/org.gradle.api.plugins.BasePluginExtension.html#org.gradle.api.plugins.BasePluginExtension:archivesName] > 1 usage > Script:build.gradle > > The org.gradle.api.plugins.BasePluginConvention type has been deprecated. > This is scheduled to be removed in Gradle 9.0. > [Documentation|https://docs.gradle.org/8.7/userguide/upgrading_version_8.html#base_convention_deprecation] > 1 usage > Script:build.gradle -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16843) Remove preAppendErrors from createPutCacheCallback
Chia-Ping Tsai created KAFKA-16843: -- Summary: Remove preAppendErrors from createPutCacheCallback Key: KAFKA-16843 URL: https://issues.apache.org/jira/browse/KAFKA-16843 Project: Kafka Issue Type: Improvement Reporter: Chia-Ping Tsai Assignee: Chia-Ping Tsai origin discussion: [https://github.com/apache/kafka/pull/16072#pullrequestreview-2077368462] The method `createPutCacheCallback` has a input argument `preAppendErrors` [0]. It is used to keep the "error" happens before appending. However, the pre-append error is handled before by calling `responseCallback` [1]. Hence, we can remove `preAppendErrors`. [0] https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L387 [1] https://github.com/apache/kafka/blob/4f55786a8a86fe228a0b10a2f28529f5128e5d6f/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L927C15-L927C84 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16842) Make sure to validate controller.quorum.bootstrap.server
José Armando García Sancio created KAFKA-16842: -- Summary: Make sure to validate controller.quorum.bootstrap.server Key: KAFKA-16842 URL: https://issues.apache.org/jira/browse/KAFKA-16842 Project: Kafka Issue Type: Sub-task Reporter: José Armando García Sancio Assignee: José Armando García Sancio controller.quorum.bootstrap.server is only allowed to be empty when controller.quorum.voter is set or it is a standalone voter. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16625) Reverse Lookup Partition to Member in Assignors
[ https://issues.apache.org/jira/browse/KAFKA-16625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Jacot resolved KAFKA-16625. - Fix Version/s: 3.8.0 Reviewer: David Jacot Resolution: Fixed > Reverse Lookup Partition to Member in Assignors > --- > > Key: KAFKA-16625 > URL: https://issues.apache.org/jira/browse/KAFKA-16625 > Project: Kafka > Issue Type: Sub-task >Reporter: Ritika Reddy >Assignee: Ritika Reddy >Priority: Major > Fix For: 3.8.0 > > > Calculating unassigned partitions within the Uniform assignor is a costly > process, this can be improved by using a reverse lookup map between > topicIdPartition and the member -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16841) ZKIntegrationTests broken
Justine Olshan created KAFKA-16841: -- Summary: ZKIntegrationTests broken Key: KAFKA-16841 URL: https://issues.apache.org/jira/browse/KAFKA-16841 Project: Kafka Issue Type: Task Reporter: Justine Olshan A recent merge to trunk seems to have broken tests so that I see 78 failures in the CI. I see lots of timeout errors and `Alter Topic Configs had an error` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16687) Native memory leak by Unsafe_allocatememory in Kafka Clients 3.7.0
[ https://issues.apache.org/jira/browse/KAFKA-16687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] FTR resolved KAFKA-16687. - Resolution: Invalid > Native memory leak by Unsafe_allocatememory in Kafka Clients 3.7.0 > > > Key: KAFKA-16687 > URL: https://issues.apache.org/jira/browse/KAFKA-16687 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 3.7.0 >Reporter: FTR >Assignee: Philip Nee >Priority: Major > Attachments: diff > > > I am building a Java Project which using Maven dependency Kafka-clients with > 3.7.0 version. > My Java application logic is to use Kafka Consumer to poll Kakfa broker topic > continuously. > I have configured my Java application with JVM options with -Xms8G -Xmx8G > -XX:MaxMetaspaceSize=4G, and then run it. > Also, there are 16G physical memory on my virtual machine. > After my Java application running a long time, I have found that resident > memory of the Java Process was being grown to more than 14G. > In the end, the Java process ate Swap space. > I checked it with jmap -heap pid, and found heap memory usage is Ok. > Also with Native Memory Tracking [jcmd pid Native.memory detail.diff], I > found that it's caused by [NMT Internal] memory, which created by > Unsafe_allocatememory xxx. > In my Java application, I don't use any NIO DirectByteBuffer to allocate > memory. > And I check it the Kafka-clients source code, it have codes with use > "sun.misc.unsafe" to allocate memory. And MaxMetaspaceSize not work for it . > > Could you help to check it? How could I to stop this growing native memory to > avoid my System hang? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16840) Add a --timeout option to ConfigCommand
Vishal Palla created KAFKA-16840: Summary: Add a --timeout option to ConfigCommand Key: KAFKA-16840 URL: https://issues.apache.org/jira/browse/KAFKA-16840 Project: Kafka Issue Type: Improvement Components: admin Reporter: Vishal Palla -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16839) Replace KafkaRaftClient's voterNode with leadeNode
José Armando García Sancio created KAFKA-16839: -- Summary: Replace KafkaRaftClient's voterNode with leadeNode Key: KAFKA-16839 URL: https://issues.apache.org/jira/browse/KAFKA-16839 Project: Kafka Issue Type: Sub-task Reporter: José Armando García Sancio Assignee: José Armando García Sancio The id passed to KafkaRaftClient.voterNode is always the leader id. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16831) CoordinatorRuntime should initialize MemoryRecordsBuilder with max batch size write limit
[ https://issues.apache.org/jira/browse/KAFKA-16831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Jacot resolved KAFKA-16831. - Fix Version/s: 3.8.0 Reviewer: David Jacot Resolution: Fixed > CoordinatorRuntime should initialize MemoryRecordsBuilder with max batch size > write limit > - > > Key: KAFKA-16831 > URL: https://issues.apache.org/jira/browse/KAFKA-16831 > Project: Kafka > Issue Type: Sub-task >Reporter: Jeff Kim >Assignee: Jeff Kim >Priority: Major > Fix For: 3.8.0 > > > Otherwise, we default to the min buffer size of 16384 for the write limit. > This causes the coordinator to threw RecordTooLargeException even when it's > under the 1MB max batch size limit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-10234) The key/value deserializer used by ConsoleConsumer is not closed
[ https://issues.apache.org/jira/browse/KAFKA-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-10234. Resolution: Invalid > The key/value deserializer used by ConsoleConsumer is not closed > > > Key: KAFKA-10234 > URL: https://issues.apache.org/jira/browse/KAFKA-10234 > Project: Kafka > Issue Type: Bug >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Minor > > We instantiate, configure and use them but them are never closed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16838) Kafka Connect loads old tasks from removed connectors
Sergey Ivanov created KAFKA-16838: - Summary: Kafka Connect loads old tasks from removed connectors Key: KAFKA-16838 URL: https://issues.apache.org/jira/browse/KAFKA-16838 Project: Kafka Issue Type: Bug Components: connect Affects Versions: 3.6.1, 3.5.1, 3.8.0 Reporter: Sergey Ivanov Hello, When creating connector we faced an error from one of our ConfigProviders about not existing resource, but we didn't try to set that resource as config value: {code:java} [2024-05-24T12:08:24.362][ERROR][request_id= ][tenant_id= ][thread=DistributedHerder-connect-1-1][class=org.apache.kafka.connect.runtime.distributed.DistributedHerder][method=lambda$reconfigureConnectorTasksWithExponentialBackoffRetries$44] [Worker clientId=connect-1, groupId=streaming-service_streaming_service] Failed to reconfigure connector's tasks (local-file-sink), retrying after backoff. org.apache.kafka.common.config.ConfigException: Could not read properties from file /opt/kafka/provider.properties at org.apache.kafka.common.config.provider.FileConfigProvider.get(FileConfigProvider.java:98) at org.apache.kafka.common.config.ConfigTransformer.transform(ConfigTransformer.java:103) at org.apache.kafka.connect.runtime.WorkerConfigTransformer.transform(WorkerConfigTransformer.java:58) at org.apache.kafka.connect.storage.ClusterConfigState.taskConfig(ClusterConfigState.java:181) at org.apache.kafka.connect.runtime.AbstractHerder.taskConfigsChanged(AbstractHerder.java:804) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.publishConnectorTaskConfigs(DistributedHerder.java:2089) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.reconfigureConnector(DistributedHerder.java:2082) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.reconfigureConnectorTasksWithExponentialBackoffRetries(DistributedHerder.java:2025) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.lambda$null$42(DistributedHerder.java:2038) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.runRequest(DistributedHerder.java:2232) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:470) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:371) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840) {code} After investigation we found out, that few months ago on that cloud there was the connector with the same name and another value for config provider. Then it was removed, but by some reason when we tried to create connector with the same name months ago AbstractHerder tried to update tasks from our previous connector As an example I use FileConfigProvider, but actually any ConfigProvider is accceptable which could raise exception if something wrong with config (like result doesn't exist). We continued our investigation and found the issue https://issues.apache.org/jira/browse/KAFKA-7745 that says Connect doesn't tombstone commit and task messages in the config topic of Kafka. As we remember config topic is `compact` *that means commit and tasks are stored every time (months, years after connector removing)* and impact further connector creations with the same name. We didn't investigate reasons in ConfigClusterStore and how to avoid that issue, because would {+}like to ask{+}, probably it's better to fix KAFKA-7745 and send tombstones for commit and task messages as connect does for connector and target messages? I have synthetic TC to reproduce that error if needed. This is linked with https://issues.apache.org/jira/browse/KAFKA-16837 but it's not the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16837) Kafka Connect fails on update connector for incorrect previous Config Provider tasks
Sergey Ivanov created KAFKA-16837: - Summary: Kafka Connect fails on update connector for incorrect previous Config Provider tasks Key: KAFKA-16837 URL: https://issues.apache.org/jira/browse/KAFKA-16837 Project: Kafka Issue Type: Bug Components: connect Affects Versions: 3.6.1, 3.5.1, 3.8.0 Reporter: Sergey Ivanov Hello, We faced an issue when is not possible to update Connector config if the *previous* task contains ConfigProvider's value with incorrect value that leads to ConfigException. I can provide simple Test Case to reproduce it with FileConfigProvider, but actually any ConfigProvider is acceptable that could raise exception if something wrong with config (like resource doesn't exist). *Prerequisites:* Kafka Connect instance with config providers: {code:java} config.providers=file config.providers.file.class=org.apache.kafka.common.config.provider.FileConfigProvider{code} 1. Create Kafka topic "test" 2. On the KK instance create the file "/opt/kafka/provider.properties" with content {code:java} topics=test {code} 3. Create simple FileSink connector: {code:java} PUT /connectors/local-file-sink/config { "connector.class": "FileStreamSink", "tasks.max": "1", "file": "/opt/kafka/test.sink.txt", "topics": "${file:/opt/kafka/provider.properties:topics}" } {code} 4. Checks that everything works fine: {code:java} GET /connectors?expand=info=status ... "status": { "name": "local-file-sink", "connector": { "state": "RUNNING", "worker_id": "10.10.10.10:8083" }, "tasks": [ { "id": 0, "state": "RUNNING", "worker_id": "10.10.10.10:8083" } ], "type": "sink" } } } {code} Looks fine. 5. Renames the file to "/opt/kafka/provider2.properties". 6. Update connector with new correct file name: {code:java} PUT /connectors/local-file-sink/config { "connector.class": "FileStreamSink", "tasks.max": "1", "file": "/opt/kafka/test.sink.txt", "topics": "${file:/opt/kafka/provider2.properties:topics}" } {code} Update {*}succeed{*}, got 200. 7. Checks that everything works fine: {code:java} { "local-file-sink": { "info": { "name": "local-file-sink", "config": { "connector.class": "FileStreamSink", "file": "/opt/kafka/test.sink.txt", "tasks.max": "1", "topics": "${file:/opt/kafka/provider2.properties:topics}", "name": "local-file-sink" }, "tasks": [ { "connector": "local-file-sink", "task": 0 } ], "type": "sink" }, "status": { "name": "local-file-sink", "connector": { "state": "RUNNING", "worker_id": "10.10.10.10:8083" }, "tasks": [ { "id": 0, "state": "FAILED", "worker_id": "10.10.10.10:8083", "trace": "org.apache.kafka.common.errors.InvalidTopicException: Invalid topics: [${file:/opt/kafka/provider.properties:topics}]" } ], "type": "sink" } } } {code} Config has been updated, but new task has not been created. And as result connector doesn't work. It failed on: {code:java} [2024-05-24T12:08:24.362][ERROR][request_id= ][tenant_id= ][thread=DistributedHerder-connect-1-1][class=org.apache.kafka.connect.runtime.distributed.DistributedHerder][method=lambda$reconfigureConnectorTasksWithExponentialBackoffRetries$44] [Worker clientId=connect-1, groupId=streaming-service_streaming_service] Failed to reconfigure connector's tasks (local-file-sink), retrying after backoff. org.apache.kafka.common.config.ConfigException: Could not read properties from file /opt/kafka/provider.properties at org.apache.kafka.common.config.provider.FileConfigProvider.get(FileConfigProvider.java:98) at org.apache.kafka.common.config.ConfigTransformer.transform(ConfigTransformer.java:103) at org.apache.kafka.connect.runtime.WorkerConfigTransformer.transform(WorkerConfigTransformer.java:58) at org.apache.kafka.connect.storage.ClusterConfigState.taskConfig
[jira] [Resolved] (KAFKA-16815) Handle FencedInstanceId on heartbeat for new consumer
[ https://issues.apache.org/jira/browse/KAFKA-16815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Jacot resolved KAFKA-16815. - Fix Version/s: 3.8.0 Reviewer: David Jacot Resolution: Fixed > Handle FencedInstanceId on heartbeat for new consumer > - > > Key: KAFKA-16815 > URL: https://issues.apache.org/jira/browse/KAFKA-16815 > Project: Kafka > Issue Type: Task > Components: clients, consumer >Reporter: Lianet Magrans >Assignee: Lianet Magrans >Priority: Major > Labels: kip-848-client-support > Fix For: 3.8.0 > > > With the new consumer group protocol, a member could receive a > FencedInstanceIdError in the heartbeat response. This could be the case when > an active member using a group instance id is removed from the group by an > admin client. If a second member joins with the same instance id, the first > member will receive a FencedInstanceId on the next heartbeat response. This > should be treated as a fatal error (consumer should not attempt to rejoin). > Currently, the FencedInstanceId is not explicitly handled by the client in > the HeartbeatRequestManager. It ends up being treated as a fatal error, see > [here|https://github.com/apache/kafka/blob/5552f5c26df4eb07b2d6ee218e4a29e4ca790d5c/clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java#L417] > (just because it lands on the "unexpected" error category). We should handle > it explicitly, just to make sure that we express that it's is an expected > error: log a proper message for it and fail (handleFatalFailure). We should > also that the error is included in the tests that cover the HB request error > handling > ([here|https://github.com/apache/kafka/blob/5552f5c26df4eb07b2d6ee218e4a29e4ca790d5c/clients/src/test/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManagerTest.java#L798]) > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16626) Uuid to String for subscribed topic names in assignment spec
[ https://issues.apache.org/jira/browse/KAFKA-16626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Jacot resolved KAFKA-16626. - Fix Version/s: 3.8.0 Resolution: Fixed > Uuid to String for subscribed topic names in assignment spec > > > Key: KAFKA-16626 > URL: https://issues.apache.org/jira/browse/KAFKA-16626 > Project: Kafka > Issue Type: Sub-task >Reporter: Ritika Reddy >Assignee: Jeff Kim >Priority: Major > Fix For: 3.8.0 > > > In creating the assignment spec from the existing consumer subscription > metadata, quite some time is spent in converting the String to a Uuid > Change from Uuid to String for the subscribed topics in assignment spec and > convert on the fly -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16798) Mirrormaker2 dedicated mode - sync.group.offsets.interval not working
[ https://issues.apache.org/jira/browse/KAFKA-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thanos Athanasopoulos resolved KAFKA-16798. --- Resolution: Not A Bug This issue was resolved after input stated in the previous comments. > Mirrormaker2 dedicated mode - sync.group.offsets.interval not working > - > > Key: KAFKA-16798 > URL: https://issues.apache.org/jira/browse/KAFKA-16798 > Project: Kafka > Issue Type: Bug > Components: mirrormaker >Affects Versions: 3.7.0 >Reporter: Thanos Athanasopoulos >Priority: Major > > Single instance MirrorMaker2 in dedicated mode, active passive replication > logic. > sync.group.offsets.interval.seconds=2 configuration is enabled and active > {noformat} > [root@x-x ~]# docker logs cc-mm 2>&1 -f | grep -i > "auto.commit.interval\|checkpoint.interval\|consumer.commit.interval\|sync.topics.interval\|sync.group.offsets.interval\|offset-syncs.interval.seconds > " > auto.commit.interval.ms = 5000 > auto.commit.interval.ms = 5000 > auto.commit.interval.ms = 5000 > auto.commit.interval.ms = 5000 > auto.commit.interval.ms = 5000 > auto.commit.interval.ms = 5000 > sync.group.offsets.interval.seconds = 2 > sync.group.offsets.interval.seconds = 2 > auto.commit.interval.ms = 5000 > sync.group.offsets.interval.seconds = 2 > auto.commit.interval.ms = 5000 > auto.commit.interval.ms = 5000 > auto.commit.interval.ms = 5000 > auto.commit.interval.ms = 5000 > auto.commit.interval.ms = 5000 > auto.commit.interval.ms = 5000 > auto.commit.interval.ms = 5000 > auto.commit.interval.ms = 5000 > sync.group.offsets.interval.seconds = 2 > auto.commit.interval.ms = 5000 > auto.commit.interval.ms = 5000 > sync.group.offsets.interval.seconds = 2 > sync.group.offsets.interval.seconds = 2 > auto.commit.interval.ms = 5000 > {noformat} > but is not working, the commit of offsets happens *always every 60 seconds* > as you can see in the logs > {noformat} > [2024-05-20 09:32:44,847] INFO [MirrorSourceConnector|task-0|offsets] > WorkerSourceTask{id=MirrorSourceConnector-0} Committing offsets for 20 > acknowledged messages (org.apache.kafka.connect.runtime.WorkerSourceTask:233) > [2024-05-20 09:32:44,852] INFO [MirrorSourceConnector|task-1|offsets] > WorkerSourceTask{id=MirrorSourceConnector-1} Committing offsets for 12 > acknowledged messages (org.apache.kafka.connect.runtime.WorkerSourceTask:233) > [2024-05-20 09:32:44,875] INFO [MirrorHeartbeatConnector|task-0|offsets] > WorkerSourceTask{id=MirrorHeartbeatConnector-0} Committing offsets for 12 > acknowledged messages (org.apache.kafka.connect.runtime.WorkerSourceTask:233) > [2024-05-20 09:32:44,881] INFO [MirrorCheckpointConnector|task-0|offsets] > WorkerSourceTask{id=MirrorCheckpointConnector-0} Committing offsets for 1 > acknowledged messages (org.apache.kafka.connect.runtime.WorkerSourceTask:233) > [2024-05-20 09:32:44,890] INFO [MirrorHeartbeatConnector|task-0|offsets] > WorkerSourceTask{id=MirrorHeartbeatConnector-0} Committing offsets for 12 > acknowledged messages (org.apache.kafka.connect.runtime.WorkerSourceTask:233) > [2024-05-20 09:33:44,850] INFO [MirrorSourceConnector|task-0|offsets] > WorkerSourceTask{id=MirrorSourceConnector-0} Committing offsets for 21 > acknowledged messages (org.apache.kafka.connect.runtime.WorkerSourceTask:233) > [2024-05-20 09:33:44,854] INFO [MirrorSourceConnector|task-1|offsets] > WorkerSourceTask{id=MirrorSourceConnector-1} Committing offsets for 12 > acknowledged messages (org.apache.kafka.connect.runtime.WorkerSourceTask:233) > [2024-05-20 09:33:44,878] INFO [MirrorHeartbeatConnector|task-0|offsets] > WorkerSourceTask{id=MirrorHeartbeatConnector-0} Committing offsets for 12 > acknowledged messages (org.apache.kafka.connect.runtime.WorkerSourceTask:233) > [2024-05-20 09:33:44,883] INFO [MirrorCheckpointConnector|task-0|offsets] > WorkerSourceTask{id=MirrorCheckpointConnector-0} Committing offsets for 2 > acknowledged messages (org.apache.kafka.connect.runtime.WorkerSourceTask:233) > [2024-05-20 09:33:44,895] INFO [MirrorHeartbeatConnector|task-0|offsets] > WorkerSourceTask{id=MirrorHeartbeatConnector-0} Committing offsets for 12 > acknowledged messages (org.apache.kafka.connect.runtime.WorkerSourceTask:233) > [2024-05-20 09:34:44,852] INFO [MirrorSourceConnector|task-0|offsets
[jira] [Resolved] (KAFKA-16826) Integrate Native Kafka Docker Image with github Actions
[ https://issues.apache.org/jira/browse/KAFKA-16826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar resolved KAFKA-16826. --- Fix Version/s: 3.8.0 Resolution: Fixed > Integrate Native Kafka Docker Image with github Actions > --- > > Key: KAFKA-16826 > URL: https://issues.apache.org/jira/browse/KAFKA-16826 > Project: Kafka > Issue Type: Task >Reporter: Krishna Agarwal >Assignee: Krishna Agarwal >Priority: Major > Labels: KIP-974 > Fix For: 3.8.0 > > > Integrate the Native Apache Kafka Docker Image with existing github actions > # Build and test > # Rc release > # Promote -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16835) Add Support for consumer to read in commit order.
Manjunath created KAFKA-16835: - Summary: Add Support for consumer to read in commit order. Key: KAFKA-16835 URL: https://issues.apache.org/jira/browse/KAFKA-16835 Project: Kafka Issue Type: New Feature Components: consumer, offset manager Reporter: Manjunath Currently consumer supports offset order to receive messages.There are some cases where commit order is very important.For example assume case where PostgreSQL-14 randomly streams multiple in-progress large transactions to some intermediate client which starts transactional producer instances for multiple in-progress transactions,using this producer instances client pushes data to kafka. Now consumer should strictly read messages based on commit order. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16834) add PartitionRegistration#toRecord loss info
Jianbin Chen created KAFKA-16834: Summary: add PartitionRegistration#toRecord loss info Key: KAFKA-16834 URL: https://issues.apache.org/jira/browse/KAFKA-16834 Project: Kafka Issue Type: Wish Affects Versions: 3.7.0 Reporter: Jianbin Chen Transform it into the following output, which is easier for users to understand and identify the cause of the problem. {code:java} options.handleLoss("the directory " + directory + " state of one or more replicas");{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16833) PartitionInfo missing equals and hashCode methods
Alyssa Huang created KAFKA-16833: Summary: PartitionInfo missing equals and hashCode methods Key: KAFKA-16833 URL: https://issues.apache.org/jira/browse/KAFKA-16833 Project: Kafka Issue Type: Bug Reporter: Alyssa Huang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16828) RackAwareTaskAssignorTest failed
[ https://issues.apache.org/jira/browse/KAFKA-16828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-16828. Fix Version/s: 3.8.0 Resolution: Fixed > RackAwareTaskAssignorTest failed > > > Key: KAFKA-16828 > URL: https://issues.apache.org/jira/browse/KAFKA-16828 > Project: Kafka > Issue Type: Test >Reporter: Luke Chen >Assignee: Kuan Po Tseng >Priority: Major > Fix For: 3.8.0 > > > Found in the latest trunk build. > It fails many tests in `RackAwareTaskAssignorTest` suite. > > https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15951/7/#showFailuresLink -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16832) LeaveGroup API for upgrading ConsumerGroup
Dongnuo Lyu created KAFKA-16832: --- Summary: LeaveGroup API for upgrading ConsumerGroup Key: KAFKA-16832 URL: https://issues.apache.org/jira/browse/KAFKA-16832 Project: Kafka Issue Type: Sub-task Reporter: Dongnuo Lyu -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16831) CoordinatorRuntime should initialize MemoryRecordsBuilder with max batch size write limit
Jeff Kim created KAFKA-16831: Summary: CoordinatorRuntime should initialize MemoryRecordsBuilder with max batch size write limit Key: KAFKA-16831 URL: https://issues.apache.org/jira/browse/KAFKA-16831 Project: Kafka Issue Type: Sub-task Reporter: Jeff Kim Assignee: Jeff Kim Otherwise, we default to the min buffer size of 16384 for the write limit. This causes the coordinator to threw RecordTooLargeException even when it's under the 1MB max batch size limit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16830) Remove the scala version formatters support
Chia-Ping Tsai created KAFKA-16830: -- Summary: Remove the scala version formatters support Key: KAFKA-16830 URL: https://issues.apache.org/jira/browse/KAFKA-16830 Project: Kafka Issue Type: Improvement Reporter: Chia-Ping Tsai Assignee: Kuan Po Tseng Fix For: 4.0.0 [https://github.com/apache/kafka/blob/trunk/tools/src/main/java/org/apache/kafka/tools/consumer/ConsoleConsumerOptions.java#L72] Those deprecated formatters "strings" should be removed from 4.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16795) Fix broken compatibility in kafka.tools.NoOpMessageFormatter, kafka.tools.DefaultMessageFormatter, and kafka.tools.LoggingMessageFormatter
[ https://issues.apache.org/jira/browse/KAFKA-16795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-16795. Resolution: Fixed > Fix broken compatibility in kafka.tools.NoOpMessageFormatter, > kafka.tools.DefaultMessageFormatter, and kafka.tools.LoggingMessageFormatter > -- > > Key: KAFKA-16795 > URL: https://issues.apache.org/jira/browse/KAFKA-16795 > Project: Kafka > Issue Type: Sub-task >Reporter: Kuan Po Tseng >Assignee: Kuan Po Tseng >Priority: Major > Fix For: 3.8.0 > > > [{{0bf830f}}|https://github.com/apache/kafka/commit/0bf830fc9c3915bc99b6e487e6083dabd593c5d3] > moved NoOpMessageFormatter, DefaultMessageFormatter and > LoggingMessageFormatter package from {{kafka.tools}} to > {{{}org.apache.kafka.tools.consumer{}}}{{{}{}}} > These classes could be used via cmd kafka-console-consumer.sh. We should have > a dependency cycle before 3.8.0 comes out. > > {code:java} > bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \ > --topic streams-wordcount-output \ > --from-beginning \ > --formatter kafka.tools.DefaultMessageFormatter \ > --property print.key=true \ > --property print.value=true \ > --property > key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \ > --property > value.deserializer=org.apache.kafka.common.serialization.LongDeserializer{code} > The goal in this Jira is to allow user to keep using > {{{}kafka.tools.NoOpMessageFormatter{}}}, > {{{}kafka.tools.DefaultMessageFormatter{}}}, and > {{{}kafka.tools.LoggingMessageFormatter{}}}, but we also display warning > messages to say those "strings" will be removed in 4.0. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (KAFKA-15905) Restarts of MirrorCheckpointTask should not permanently interrupt offset translation
[ https://issues.apache.org/jira/browse/KAFKA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edoardo Comar reopened KAFKA-15905: --- reopening for backporting to 3.7.1 to be confermed > Restarts of MirrorCheckpointTask should not permanently interrupt offset > translation > > > Key: KAFKA-15905 > URL: https://issues.apache.org/jira/browse/KAFKA-15905 > Project: Kafka > Issue Type: Improvement > Components: mirrormaker >Affects Versions: 3.6.0 >Reporter: Greg Harris >Assignee: Edoardo Comar >Priority: Major > Fix For: 3.7.1, 3.8 > > > Executive summary: When the MirrorCheckpointTask restarts, it loses the state > of checkpointsPerConsumerGroup, which limits offset translation to records > mirrored after the latest restart. > For example, if 1000 records are mirrored and the OffsetSyncs are read by > MirrorCheckpointTask, the emitted checkpoints are cached, and translation can > happen at the ~500th record. If MirrorCheckpointTask restarts, and 1000 more > records are mirrored, translation can happen at the ~1500th record, but no > longer at the ~500th record. > Context: > Before KAFKA-13659, MM2 made translation decisions based on the > incompletely-initialized OffsetSyncStore, and the checkpoint could appear to > go backwards temporarily during restarts. To fix this, we forced the > OffsetSyncStore to initialize completely before translation could take place, > ensuring that the latest OffsetSync had been read, and thus providing the > most accurate translation. > Before KAFKA-14666, MM2 translated offsets only off of the latest OffsetSync. > Afterwards, an in-memory sparse cache of historical OffsetSyncs was kept, to > allow for translation of earlier offsets. This came with the caveat that the > cache's sparseness allowed translations to go backwards permanently. To > prevent this behavior, a cache of the latest Checkpoints was kept in the > MirrorCheckpointTask#checkpointsPerConsumerGroup variable, and offset > translation remained restricted to the fully-initialized OffsetSyncStore. > Effectively, the MirrorCheckpointTask ensures that it translates based on an > OffsetSync emitted during it's lifetime, to ensure that no previous > MirrorCheckpointTask emitted a later sync. If we can read the checkpoints > emitted by previous generations of MirrorCheckpointTask, we can still ensure > that checkpoints are monotonic, while allowing translation further back in > history. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16825) CVE vulnerabilities in Jetty and netty
[ https://issues.apache.org/jira/browse/KAFKA-16825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mickael Maison resolved KAFKA-16825. Fix Version/s: 3.8.0 Resolution: Fixed > CVE vulnerabilities in Jetty and netty > -- > > Key: KAFKA-16825 > URL: https://issues.apache.org/jira/browse/KAFKA-16825 > Project: Kafka > Issue Type: Task >Affects Versions: 3.7.0 >Reporter: mooner >Assignee: Mickael Maison >Priority: Major > Fix For: 3.8.0 > > > There is a vulnerability (CVE-2024-29025) in the passive dependency software > Netty used by Kafka, which has been fixed in version 4.1.108.Final. > There is also a vulnerability (CVE-2024-22201) in the passive dependency > software Jetty, which has been fixed in version 9.4.54.v20240208. > When will Kafka upgrade the versions of Netty and Jetty to fix these two > vulnerabilities? > Reference website: > https://nvd.nist.gov/vuln/detail/CVE-2024-29025 > https://nvd.nist.gov/vuln/detail/CVE-2024-22201 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16829) Consider removing delegation.token.master.key
[ https://issues.apache.org/jira/browse/KAFKA-16829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-16829. Resolution: Duplicate see KAFKA-12601 > Consider removing delegation.token.master.key > - > > Key: KAFKA-16829 > URL: https://issues.apache.org/jira/browse/KAFKA-16829 > Project: Kafka > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Minor > > It was marked as deprecated since 2020 [0], and maybe we should remove it now. > [0] > https://cwiki.apache.org/confluence/display/KAFKA/KIP-681:+Rename+master+key+in+delegation+token+feature -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16829) Consider removing delegation.token.master.key
Chia-Ping Tsai created KAFKA-16829: -- Summary: Consider removing delegation.token.master.key Key: KAFKA-16829 URL: https://issues.apache.org/jira/browse/KAFKA-16829 Project: Kafka Issue Type: Improvement Reporter: Chia-Ping Tsai Assignee: Chia-Ping Tsai It was marked as deprecated since 2020 [0], and maybe we should remove it now. [0] https://cwiki.apache.org/confluence/display/KAFKA/KIP-681:+Rename+master+key+in+delegation+token+feature -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16828) RackAwareTaskAssignorTest failed
Luke Chen created KAFKA-16828: - Summary: RackAwareTaskAssignorTest failed Key: KAFKA-16828 URL: https://issues.apache.org/jira/browse/KAFKA-16828 Project: Kafka Issue Type: Test Reporter: Luke Chen Found in the latest trunk build. It fails many tests in `RackAwareTaskAssignorTest` suite. https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15951/7/#showFailuresLink -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16827) Integrate Native Apache Kafka with vagrant System tests
Krishna Agarwal created KAFKA-16827: --- Summary: Integrate Native Apache Kafka with vagrant System tests Key: KAFKA-16827 URL: https://issues.apache.org/jira/browse/KAFKA-16827 Project: Kafka Issue Type: Task Reporter: Krishna Agarwal Assignee: Krishna Agarwal Integrate the Native Apache Kafka Docker Image with existing github actions # Build and test # Rc release # Promote -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16826) Integrate Native Kafka Docker Image with github Actions
Krishna Agarwal created KAFKA-16826: --- Summary: Integrate Native Kafka Docker Image with github Actions Key: KAFKA-16826 URL: https://issues.apache.org/jira/browse/KAFKA-16826 Project: Kafka Issue Type: Task Reporter: Krishna Agarwal Assignee: Krishna Agarwal Integrate the Native Apache Kafka Docker Image with existing github actions # Build and test # Rc release # Promote -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16793) Heartbeat API for upgrading ConsumerGroup
[ https://issues.apache.org/jira/browse/KAFKA-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Jacot resolved KAFKA-16793. - Fix Version/s: 3.8.0 Reviewer: David Jacot Resolution: Fixed > Heartbeat API for upgrading ConsumerGroup > - > > Key: KAFKA-16793 > URL: https://issues.apache.org/jira/browse/KAFKA-16793 > Project: Kafka > Issue Type: Sub-task >Reporter: Dongnuo Lyu >Assignee: Dongnuo Lyu >Priority: Major > Fix For: 3.8.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16825) CVE vulnerabilities in Jetty and netty
mooner created KAFKA-16825: -- Summary: CVE vulnerabilities in Jetty and netty Key: KAFKA-16825 URL: https://issues.apache.org/jira/browse/KAFKA-16825 Project: Kafka Issue Type: Task Affects Versions: 3.7.0 Reporter: mooner There is a vulnerability (CVE-2024-29025) in the passive dependency software Netty used by Kafka, which has been fixed in version 4.1.108.Final. There is also a vulnerability (CVE-2024-22201) in the passive dependency software Jetty, which has been fixed in version 9.4.54.v20240208. When will Kafka upgrade the versions of Netty and Jetty to fix these two vulnerabilities? Reference website: https://nvd.nist.gov/vuln/detail/CVE-2024-29025 https://nvd.nist.gov/vuln/detail/CVE-2024-22201 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16160) AsyncKafkaConsumer is trying to connect to a disconnected node in a tight loop
[ https://issues.apache.org/jira/browse/KAFKA-16160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Nee resolved KAFKA-16160. Resolution: Cannot Reproduce > AsyncKafkaConsumer is trying to connect to a disconnected node in a tight loop > -- > > Key: KAFKA-16160 > URL: https://issues.apache.org/jira/browse/KAFKA-16160 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Reporter: Philip Nee >Assignee: Phuc Hong Tran >Priority: Major > Labels: consumer-threading-refactor > Fix For: 3.8.0 > > > Observing some excessive logging running AsyncKafkaConsumer and observing > excessive logging of : > {code:java} > 1271 [2024-01-15 09:43:36,627] DEBUG [Consumer clientId=console-consumer, > groupId=concurrent_consumer] Node is not ready, handle the request in the > next event loop: node=worker4:9092 (id: 2147483644 rack: null), > request=UnsentRequest{requestBuil > der=ConsumerGroupHeartbeatRequestData(groupId='concurrent_consumer', > memberId='laIqS789StuhXFpTwjh6hA', memberEpoch=1, instanceId=null, > rackId=null, rebalanceTimeoutMs=30, subscribedTopicNames=[output-topic], > serverAssignor=null, topicP > artitions=[TopicPartitions(topicId=I5P5lIXvR1Cjc8hfoJg5bg, partitions=[0])]), > handler=org.apache.kafka.clients.consumer.internals.NetworkClientDelegate$FutureCompletionHandler@918925b, > node=Optional[worker4:9092 (id: 2147483644 rack: null)] , > timer=org.apache.kafka.common.utils.Timer@55ed4733} > (org.apache.kafka.clients.consumer.internals.NetworkClientDelegate) {code} > This seems to be triggered by a tight poll loop of the network thread. The > right thing to do is to backoff a bit for that given node and retry later. > This should be a blocker for 3.8 release. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16824) Utils.getHost and Utils.getPort do not catch a lot of invalid host and ports
José Armando García Sancio created KAFKA-16824: -- Summary: Utils.getHost and Utils.getPort do not catch a lot of invalid host and ports Key: KAFKA-16824 URL: https://issues.apache.org/jira/browse/KAFKA-16824 Project: Kafka Issue Type: Bug Reporter: José Armando García Sancio For example it is not able to detect at least this malformed hosts and ports: # ho(st:9092 # host:-92 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16823) Extract LegacyConsumer-specific unit tests from generic KafkaConsumerTest
Lianet Magrans created KAFKA-16823: -- Summary: Extract LegacyConsumer-specific unit tests from generic KafkaConsumerTest Key: KAFKA-16823 URL: https://issues.apache.org/jira/browse/KAFKA-16823 Project: Kafka Issue Type: Task Components: clients, consumer Reporter: Lianet Magrans Currently the KafkaConsumerTest file contains unit tests that apply to both consumer implementations, but also tests that apply to the legacy consumer only. We should consider splitting the tests that apply to the legacy only into their own LegacyConsumerTest file (aligning with the existing AsyncKafkaConsumerTest). End result would be: KafkaConsumerTest -> unit tests that apply to both consumers. LegacyKafkaConsumerTest -> unit tests that apply only to the LegacyKafkaConsumer, either because of the logic they test, or the way they are written (file to be created with this task) AsyncKafkaConsumerTest -> unit tests that apply only to the AsyncKafkaConsumer (this file already exist) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16822) Abstract consumer group in coordinator to share functionality with share group
Apoorv Mittal created KAFKA-16822: - Summary: Abstract consumer group in coordinator to share functionality with share group Key: KAFKA-16822 URL: https://issues.apache.org/jira/browse/KAFKA-16822 Project: Kafka Issue Type: Sub-task Reporter: Apoorv Mittal Assignee: Apoorv Mittal -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16821) Create a new interface to store member metadata
Ritika Reddy created KAFKA-16821: Summary: Create a new interface to store member metadata Key: KAFKA-16821 URL: https://issues.apache.org/jira/browse/KAFKA-16821 Project: Kafka Issue Type: Sub-task Reporter: Ritika Reddy Assignee: Ritika Reddy Attachments: Screenshot 2024-05-14 at 11.03.10 AM.png !Screenshot 2024-05-14 at 11.03.10 AM.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16820) Kafka Broker fails to connect to Kraft Controller with no DNS matching
Arushi Helms created KAFKA-16820: Summary: Kafka Broker fails to connect to Kraft Controller with no DNS matching Key: KAFKA-16820 URL: https://issues.apache.org/jira/browse/KAFKA-16820 Project: Kafka Issue Type: Bug Components: kraft Affects Versions: 3.6.1, 3.7.0, 3.8.0 Reporter: Arushi Helms Attachments: Screenshot 2024-05-22 at 1.09.11 PM-1.png We are migrating our Kafka cluster from zookeeper to Kraft mode. We are running individual brokers and controllers with TLS enabled and IPs are given for communication. TLS enabled setup works fine among the brokers and the certificate looks something like: h5. {noformat} Common Name: *.kafka.service.consul Subject Alternative Names: *.kafka.service.consul, IP Address:10.87.171.84{noformat} Note: The DNS name for the node does not match the CN but since we are using IPs as communication, we have provided IPs as SAN. Same with the controllers, IPs are given as SAN in the certificate. In the current setup I am running 3 brokers and 3 controllers. Relevant controller configurations from one of the controllers: {{}} {noformat} KAFKA_CFG_PROCESS_ROLES=controller KAFKA_KRAFT_CLUSTER_ID=5kztjhJ4SxSu-kdiEYDUow KAFKA_CFG_NODE_ID=6 KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=4@10.87.170.83:9097,5@10.87.170.9:9097,6@10.87.170.6:9097 KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INSIDE_SSL KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:SSL,INSIDE_SSL:SSL KAFKA_CFG_LISTENERS=CONTROLLER://10.87.170.6:9097{noformat} {{}} Relevant broker configuration from one of the brokers: {noformat} KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INSIDE_SSL KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=4@10.87.170.83:9097,5@10.87.170.9:9097,6@10.87.170.6:9097 KAFKA_CFG_PROCESS_ROLES=broker KAFKA_CFG_NODE_ID=3 KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=INSIDE_SSL:SSL,OUTSIDE_SSL:SSL,CONTROLLER:SSL KAFKA_CFG_LISTENERS=INSIDE_SSL://10.87.170.78:9093,OUTSIDE_SSL://10.87.170.78:9096 KAFKA_CFG_ADVERTISED_LISTENERS=INSIDE_SSL://10.87.170.78:9093,OUTSIDE_SSL://10.87.170.78:9096{noformat} {{}} ISSUE 1: With this setup Kafka broker is failing to connect to the controller, see the following error: {noformat} 2024-05-22 17:53:46,413] ERROR [broker-2-to-controller-heartbeat-channel-manager]: Request BrokerRegistrationRequestData(brokerId=2, clusterId='5kztjhJ4SxSu-kdiEYDUow', incarnationId=7741fgH6T4SQqGsho8E6mw, listeners=[Listener(name='INSIDE_SSL', host='10.87.170.81', port=9093, securityProtocol=1), Listener(name='INSIDE', host='10.87.170.81', port=9094, securityProtocol=0), Listener(name='OUTSIDE', host='10.87.170.81', port=9092, securityProtocol=0), Listener(name='OUTSIDE_SSL', host='10.87.170.81', port=9096, securityProtocol=1)], features=[Feature(name='metadata.version', minSupportedVersion=1, maxSupportedVersion=19)], rack=null, isMigratingZkBroker=false, logDirs=[TJssfKDD-iBFYfIYCKOcew], previousBrokerEpoch=-1) failed due to authentication error with controller (kafka.server.NodeToControllerRequestThread)org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake failedCaused by: javax.net.ssl.SSLHandshakeException: No subject alternative DNS name matching cp-internal-onecloud-kfkc1.node.cp-internal-onecloud.consul found. at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:378) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:316) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1351) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1226) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1169) at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396) at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480) at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1277) at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1264) at java.base/java.security.AccessController.doPrivileged(AccessController.java:712) at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(SSLEngineImpl.java:1209) at org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks(SslTransportLayer.java:435) at org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:523) at org.apache.kafka.common.network.SslTransportLayer.doHandshake
[jira] [Created] (KAFKA-16819) CoordinatorRequestManager seems to return 0ms during the coordinator discovery
Philip Nee created KAFKA-16819: -- Summary: CoordinatorRequestManager seems to return 0ms during the coordinator discovery Key: KAFKA-16819 URL: https://issues.apache.org/jira/browse/KAFKA-16819 Project: Kafka Issue Type: Bug Components: consumer Reporter: Philip Nee Assignee: Philip Nee In KAFKA-15250 we discovered the ConsumerNetworkThread is looping without much backoff. The in-flight check PR fixed a lot of it; however, during the coordinator discovery phase, CoordinatorRequestManager would keep on returning 0 before the coordinator node was found. The impact is minor but we should be expecting the coordinator manager to backoff until the exp backoff expired (so it should return around 100ms). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-12399) Deprecate Log4J Appender KIP-719
[ https://issues.apache.org/jira/browse/KAFKA-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mickael Maison resolved KAFKA-12399. Fix Version/s: 3.8.0 Resolution: Fixed > Deprecate Log4J Appender KIP-719 > > > Key: KAFKA-12399 > URL: https://issues.apache.org/jira/browse/KAFKA-12399 > Project: Kafka > Issue Type: Improvement > Components: logging >Reporter: Dongjin Lee >Assignee: Mickael Maison >Priority: Major > Labels: needs-kip > Fix For: 3.8.0 > > > As a following job of KAFKA-9366, we have to entirely remove the log4j 1.2.7 > dependency from the classpath by removing dependencies on log4j-appender. > KIP-719: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-719%3A+Deprecate+Log4J+Appender -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16818) Move event-processing tests from ConsumerNetworkThreadTest to ApplicationEventProcessorTest
Kirk True created KAFKA-16818: - Summary: Move event-processing tests from ConsumerNetworkThreadTest to ApplicationEventProcessorTest Key: KAFKA-16818 URL: https://issues.apache.org/jira/browse/KAFKA-16818 Project: Kafka Issue Type: Improvement Components: clients, consumer, unit tests Affects Versions: 3.8.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 The {{ConsumerNetworkThreadTest}} currently has a number of tests which do the following: # Add event of type _T_ to the event queue # Call {{ConsumerNetworkThread.runOnce()}} to dequeue the events and call {{ApplicationEventProcessor.process()}} # Verify that the appropriate {{ApplicationEventProcessor}} process method was invoked for the event Those types of tests should be moved to {{{}ApplicationEventProcessorTest{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15041) Source Connector auto topic creation fails when topic is deleted and brokers don't support auto topic creation
[ https://issues.apache.org/jira/browse/KAFKA-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Rao resolved KAFKA-15041. --- Resolution: Won't Fix For now, setting the config `producer.override.max.block.ms` at a connector config level or `producer.max.block.ms` at a worker config level to a lower value should fix this value. The problem is that the default value for the above config is[ set to Long.MAX_VALUE |https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L820]in the configs and when topics are deleted manually, there's really no signal that is received to indicate the same. We could add some heuristics like checking if a topic is present or not periodically and refreshing the cache, or check the source topic metrics to see if the records are just being buffered and not being sent but that's outside the scope of runtime. > Source Connector auto topic creation fails when topic is deleted and brokers > don't support auto topic creation > -- > > Key: KAFKA-15041 > URL: https://issues.apache.org/jira/browse/KAFKA-15041 > Project: Kafka > Issue Type: Bug > Components: connect >Reporter: Sagar Rao >Assignee: Sagar Rao >Priority: Major > > [KIP-158|https://cwiki.apache.org/confluence/display/KAFKA/KIP-158%3A+Kafka+Connect+should+allow+source+connectors+to+set+topic-specific+settings+for+new+topics] > allows the source connectors to create topics even when the broker doesn't > allow to do so. It does so by checking for every record if a topic needs to > be created > [https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractWorkerSourceTask.java#L500.] > To not always keep checking for topic presence via admin topics, it also > maintains a cache of the topics that it has created and doesn't create those > anymore. This helps to create topics when brokers don't support automatic > topic creation. > However, lets say the topic gets created initially and later on gets deleted > while the connector is still running and the brokers don't support automatic > topic creation. For such cases, the connector has cached the topic it has > already created and wouldn't recreate it because the cache never updates and > since the broker doesn't support topic creation, the logs would just be full > of messages like > > {code:java} > Error while fetching metadata with correlation id 3260 : > {connect-test=UNKNOWN_TOPIC_OR_PARTITION}{code} > > This can become a problem on environments where brokers don't allow topic > creation. We need a way to refresh the topics cache for such cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16816) Remove unneeded FencedInstanceId support on commit path for new consumer
Lianet Magrans created KAFKA-16816: -- Summary: Remove unneeded FencedInstanceId support on commit path for new consumer Key: KAFKA-16816 URL: https://issues.apache.org/jira/browse/KAFKA-16816 Project: Kafka Issue Type: Task Components: clients, consumer Reporter: Lianet Magrans The new consumer contains logic related to handling FencedInstanceId exception received as a response to an OffsetCommit request (on the [consumer|https://github.com/apache/kafka/blob/028e7a06dcdca7d4dbeae83f2fce0a4120cc2753/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L776] and [commit manager|https://github.com/apache/kafka/blob/028e7a06dcdca7d4dbeae83f2fce0a4120cc2753/clients/src/main/java/org/apache/kafka/clients/consumer/internals/CommitRequestManager.java#L715]), but with the new group protocol, we will never get that error on a commit response. We should remove the code that expects the FencedInstanceId on the commit response, and also clean up the other related usages that we added to propagate the FencedInstanceId exception on the poll, commitSync and commitAsync API. Note that throwing that exception is part of the contract of the poll, commitSync and commitAsync APIs of the KafkaConsumer, but it changes with the new protocol. We should update the java doc for the new AsyncKafkaConsumer to reflect the change. With the new protocol If a consumer tries to commit offsets, there could be 2 cases: # empty group -> commit succeeds, fencing an instance id would never happen because group is empty # non-empty group -> commit fails with UnknownMemberId, indicating that the member is not known to the group. The consumer needs to join the non-empty group in order to commit offsets to it. To complete the story, the moment the consumer attempts to join, it will receive an UnreleasedInstanceId error on the HB response, indicating it using a groupInstanceId that is already in use. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16815) Handle FencedInstanceId on heartbeat for new consumer
Lianet Magrans created KAFKA-16815: -- Summary: Handle FencedInstanceId on heartbeat for new consumer Key: KAFKA-16815 URL: https://issues.apache.org/jira/browse/KAFKA-16815 Project: Kafka Issue Type: Task Components: clients, consumer Reporter: Lianet Magrans With the new consumer group protocol, a member could receive a FencedInstanceIdError in the heartbeat response. This could be the case when an active member using a group instance id is removed from the group by an admin client. If a second member joins with the same instance id, the first member will receive a FencedInstanceId on the next heartbeat response. This should be treated as a fatal error (consumer should not attempt to rejoin). Currently, the FencedInstanceId is not explicitly handled by the client in the HeartbeatRequestManager. It ends up being treated as a fatal error, see [here|https://github.com/apache/kafka/blob/5552f5c26df4eb07b2d6ee218e4a29e4ca790d5c/clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java#L417] (just because it lands on the "unexpected" error category). We should handle it explicitly, just to make sure that we express that it's is an expected error: log a proper message for it and fail (handleFatalFailure). We should also that the error is included in the tests that cover the HB request error handling ([here|https://github.com/apache/kafka/blob/5552f5c26df4eb07b2d6ee218e4a29e4ca790d5c/clients/src/test/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManagerTest.java#L798]) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16814) KRaft broker cannot startup when `partition.metadata` is missing
Luke Chen created KAFKA-16814: - Summary: KRaft broker cannot startup when `partition.metadata` is missing Key: KAFKA-16814 URL: https://issues.apache.org/jira/browse/KAFKA-16814 Project: Kafka Issue Type: Bug Affects Versions: 3.7.0 Reporter: Luke Chen When starting up kafka logManager, we'll check stray replicas to avoid some corner cases. But this check might cause broker unable to startup if `partition.metadata` is missing because when startup kafka, we load log from file, and the topicId of the log is coming from `partition.metadata` file. So, if `partition.metadata` is missing, the topicId will be None, and the `LogManager#isStrayKraftReplica` will fail with no topicID error. The `partition.metadata` missing could be some storage failure, or another possible path is unclean shutdown after topic is created in the replica, but before data is flushed into `partition.metadata` file. This is possible because we do the flush, it's done async [here|https://github.com/apache/kafka/blob/5552f5c26df4eb07b2d6ee218e4a29e4ca790d5c/core/src/main/scala/kafka/log/UnifiedLog.scala#L229]. {code:java} ERROR Encountered fatal fault: Error starting LogManager (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler) java.lang.RuntimeException: The log dir Log(dir=/tmp/kraft-broker-logs/quickstart-events-0, topic=quickstart-events, partition=0, highWatermark=0, lastStableOffset=0, logStartOffset=0, logEndOffset=0) does not have a topic ID, which is not allowed when running in KRaft mode. at kafka.log.LogManager$.$anonfun$isStrayKraftReplica$1(LogManager.scala:1609) at scala.Option.getOrElse(Option.scala:201) at kafka.log.LogManager$.isStrayKraftReplica(LogManager.scala:1608) at kafka.server.metadata.BrokerMetadataPublisher.$anonfun$initializeManagers$1(BrokerMetadataPublisher.scala:294) at kafka.server.metadata.BrokerMetadataPublisher.$anonfun$initializeManagers$1$adapted(BrokerMetadataPublisher.scala:294) at kafka.log.LogManager.loadLog(LogManager.scala:359) at kafka.log.LogManager.$anonfun$loadLogs$15(LogManager.scala:493) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1623) {code} Because if we don't do the isStrayKraftReplica check, the topicID and the `partition.metadata` will get recovered after getting topic partition update and becoming leader or follower later. I'm proposing we skip the `isStrayKraftReplica` check if topicID is None, instead of throwing exception to terminate the kafka. `isStrayKraftReplica` check is just for a corner case only, it should be fine IMO. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16813) Add global timeout for "@Test" and "@TestTemplate"
Chia-Ping Tsai created KAFKA-16813: -- Summary: Add global timeout for "@Test" and "@TestTemplate" Key: KAFKA-16813 URL: https://issues.apache.org/jira/browse/KAFKA-16813 Project: Kafka Issue Type: Improvement Reporter: Chia-Ping Tsai Assignee: Chia-Ping Tsai in code base `@Test` is used by unit test and `@TestTemplate` is used by integration test. The later includes `ParameterizedTest`, `ClusterTest`, `ClusterTests`, and `ClusterTemplate`. Hence, we can add two different timeout for `@Test` and `@TestTemplate`. For example: junit.jupiter.execution.timeout.default = 30s junit.jupiter.execution.timeout.testtemplate.method.default = 120s The accurate timeout value may need more discussion, but we can try it in small junit5 module first. For example: tools module and storage module. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16812) The tools-related tests are slow
Chia-Ping Tsai created KAFKA-16812: -- Summary: The tools-related tests are slow Key: KAFKA-16812 URL: https://issues.apache.org/jira/browse/KAFKA-16812 Project: Kafka Issue Type: Improvement Reporter: Chia-Ping Tsai see https://ci-builds.apache.org/job/Kafka/job/kafka/job/trunk/2923/testReport/org.apache.kafka.tools/ Maybe we run too many cluster types (5), and we can remove some unrelated types for those tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16811) Punctuate Ratio metric almost impossible to track
Sebastien Viale created KAFKA-16811: --- Summary: Punctuate Ratio metric almost impossible to track Key: KAFKA-16811 URL: https://issues.apache.org/jira/browse/KAFKA-16811 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 3.7.0 Reporter: Sebastien Viale The Punctuate ratio metric is returned after the last record of the poll loop. It is recomputed in every poll loop. After a puntuate, the value is close to 1, but there is little chance that metric is sampled at this time. So its value is almost always 0. A solution could be to apply a kind of "sliding window" to it and report the value for the last x seconds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16810) Improve kafka-consumer-perf-test to benchmark single partition
Harsh Panchal created KAFKA-16810: - Summary: Improve kafka-consumer-perf-test to benchmark single partition Key: KAFKA-16810 URL: https://issues.apache.org/jira/browse/KAFKA-16810 Project: Kafka Issue Type: New Feature Components: tools Reporter: Harsh Panchal kafka-consumer-perf-test is a great tool to quickly check raw consumer performance. Currently, It subscribes to all the partitions and gives overall cluster performance, however If we want to test performance of single broker/partition, existing tool does not support. We can introduce two optional flags --partitions and --offsets which gives flexibility to benchmark only specific partitions optionally from specified offsets. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16783) Migrate RemoteLogMetadataManagerTest to new test infra
[ https://issues.apache.org/jira/browse/KAFKA-16783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Chen resolved KAFKA-16783. --- Fix Version/s: 3.8.0 Resolution: Fixed > Migrate RemoteLogMetadataManagerTest to new test infra > -- > > Key: KAFKA-16783 > URL: https://issues.apache.org/jira/browse/KAFKA-16783 > Project: Kafka > Issue Type: Test >Reporter: Chia-Ping Tsai >Assignee: PoAn Yang >Priority: Minor > Labels: storage_test > Fix For: 3.8.0 > > > as title > `TopicBasedRemoteLogMetadataManagerWrapperWithHarness` could be replaced by > `RemoteLogMetadataManagerTestUtils#builder` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16809) Run javadoc build in CI
Greg Harris created KAFKA-16809: --- Summary: Run javadoc build in CI Key: KAFKA-16809 URL: https://issues.apache.org/jira/browse/KAFKA-16809 Project: Kafka Issue Type: Task Components: build, docs Reporter: Greg Harris Assignee: Greg Harris The `javadoc` target isn't run during CI builds, allowing for errors in javadocs to leak in. Instead, we can include javadoc like checkstyle, spotbugs, and import control as a pre-test step, to ensure that PRs aren't causing javadoc build regressions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15242) FixedKeyProcessor testing is unusable
[ https://issues.apache.org/jira/browse/KAFKA-15242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax resolved KAFKA-15242. - Assignee: (was: Alexander Aghili) Resolution: Duplicate > FixedKeyProcessor testing is unusable > - > > Key: KAFKA-15242 > URL: https://issues.apache.org/jira/browse/KAFKA-15242 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Zlstibor Veljkovic >Priority: Major > > Using mock processor context to get the forwarded message doesn't work. > Also there is not a well documented way for testing FixedKeyProcessors. > Please see the repo at [https://github.com/zveljkovic/kafka-repro] > but most important piece is test file with runtime and compile time errors: > [https://github.com/zveljkovic/kafka-repro/blob/master/src/test/java/com/example/demo/MyFixedKeyProcessorTest.java] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16578) Revert changes to connect_distributed_test.py for the new async Consumer
[ https://issues.apache.org/jira/browse/KAFKA-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-16578. --- Resolution: Won't Fix Most of the {{connect_distributed_test.py}} system tests were fixed, and {{test_exactly_once_source}} was reverted in a separate Jira/PR. > Revert changes to connect_distributed_test.py for the new async Consumer > > > Key: KAFKA-16578 > URL: https://issues.apache.org/jira/browse/KAFKA-16578 > Project: Kafka > Issue Type: Task > Components: clients, consumer, system tests >Affects Versions: 3.8.0 >Reporter: Kirk True >Assignee: Kirk True >Priority: Critical > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > > To test the new, asynchronous Kafka {{Consumer}} implementation, we migrated > a slew of system tests to run both the "old" and "new" implementations. > KAFKA-16272 updated the system tests in {{connect_distributed_test.py}} so it > could test the new consumer with Connect. However, we are not supporting > Connect with the new consumer in 3.8.0. Unsurprisingly, when we run the > Connect system tests with the new {{AsyncKafkaConsumer}}, we get errors like > the following: > {code} > test_id: > kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_exactly_once_source.clean=False.connect_protocol=eager.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer > status: FAIL > run time: 6 minutes 3.899 seconds > InsufficientResourcesError('Not enough nodes available to allocate. linux > nodes requested: 1. linux nodes available: 0') > Traceback (most recent call last): > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", > line 184, in _do_run > data = self.run_test() > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", > line 262, in run_test > return self.test_context.function(self.test) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py", > line 433, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py", > line 919, in test_exactly_once_source > consumer_validator = ConsoleConsumer(self.test_context, 1, self.kafka, > self.source.topic, consumer_timeout_ms=1000, print_key=True) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/console_consumer.py", > line 97, in __init__ > BackgroundThreadService.__init__(self, context, num_nodes) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/background_thread.py", > line 26, in __init__ > super(BackgroundThreadService, self).__init__(context, num_nodes, > cluster_spec, *args, **kwargs) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py", > line 107, in __init__ > self.allocate_nodes() > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py", > line 217, in allocate_nodes > self.nodes = self.cluster.alloc(self.cluster_spec) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/cluster.py", > line 54, in alloc > allocated = self.do_alloc(cluster_spec) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/finite_subcluster.py", > line 31, in do_alloc > allocated = self._available_nodes.remove_spec(cluster_spec) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/node_container.py", > line 117, in remove_spec > raise InsufficientResourcesError("Not enough nodes available to allocate. > " + msg) > ducktape.cluster.node_container.InsufficientResourcesError: Not enough nodes > available to allocate. linux nodes requested: 1. linux nodes available: 0 > {code} > The task here is to revert the changes made in KAFKA-16272 [PR > 15576|https://github.com/apache/kafka/pull/15576]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-7632) Support Compression Level
[ https://issues.apache.org/jira/browse/KAFKA-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mickael Maison resolved KAFKA-7632. --- Fix Version/s: 3.8.0 Assignee: Mickael Maison (was: Dongjin Lee) Resolution: Fixed > Support Compression Level > - > > Key: KAFKA-7632 > URL: https://issues.apache.org/jira/browse/KAFKA-7632 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 2.1.0 > Environment: all >Reporter: Dave Waters >Assignee: Mickael Maison >Priority: Major > Labels: needs-kip > Fix For: 3.8.0 > > > The compression level for ZSTD is currently set to use the default level (3), > which is a conservative setting that in some use cases eliminates the value > that ZSTD provides with improved compression. Each use case will vary, so > exposing the level as a producer, broker, and topic configuration setting > will allow the user to adjust the level. > Since it applies to the other compression codecs, we should add the same > functionalities to them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16784) Migrate TopicBasedRemoteLogMetadataManagerMultipleSubscriptionsTest to new test infra
[ https://issues.apache.org/jira/browse/KAFKA-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-16784. Fix Version/s: 3.8.0 Resolution: Fixed > Migrate TopicBasedRemoteLogMetadataManagerMultipleSubscriptionsTest to new > test infra > - > > Key: KAFKA-16784 > URL: https://issues.apache.org/jira/browse/KAFKA-16784 > Project: Kafka > Issue Type: Test >Reporter: Chia-Ping Tsai >Assignee: PoAn Yang >Priority: Minor > Labels: storage_test > Fix For: 3.8.0 > > > as title -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16654) Refactor kafka.test.annotation.Type and ClusterTestExtensions
[ https://issues.apache.org/jira/browse/KAFKA-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-16654. Fix Version/s: 3.8.0 Resolution: Fixed > Refactor kafka.test.annotation.Type and ClusterTestExtensions > - > > Key: KAFKA-16654 > URL: https://issues.apache.org/jira/browse/KAFKA-16654 > Project: Kafka > Issue Type: Improvement >Reporter: Chia-Ping Tsai >Assignee: TaiJuWu >Priority: Minor > Fix For: 3.8.0 > > > It seems to me the refactor could include following tasks. > 1. change `invocationContexts`, method invoked by `ClusterTemplate`, and > generate-related methods in `ClusterTestExtensions` to return a > java.util.Collection instead of accepting a `java.util.function.Consumer`. > That can brings two benefit. 1) more simple in production: we don't need to > create a List and then pass it to be a function to collect stuff. 2) more > easy to write unit test. > 2. separate `provideTestTemplateInvocationContexts` to multi methods to > handle each annotation. That can help us to write tests, and make core more > readable. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16808) Consumer join Group requests response contains 2 different members
Badhusha Muhammed created KAFKA-16808: - Summary: Consumer join Group requests response contains 2 different members Key: KAFKA-16808 URL: https://issues.apache.org/jira/browse/KAFKA-16808 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 2.8.0 Reporter: Badhusha Muhammed Fix For: 2.8.0 Even though there is only one consumer running for a group.id, On group (re)-join, We are getting 2 different members in response, Which is causing assignment to assign partition to different members, and only processing half of the partition. Log for group join and partition assignment {code:java} 24/05/13 10:26:28 WARN ProcessingTimeExecutor: Current batch is falling behind. The trigger interval is 155000 milliseconds, but spent 391883 milliseconds 24/05/13 10:26:28 INFO ConsumerCoordinator: [Consumer clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3, groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group 24/05/13 10:26:28 INFO ConsumerCoordinator: [Consumer clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3, groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0] Lost previously assigned partitions topic-0 topic-1 topic-2 topic-3 topic-4 topic-5 topic-6 topic-7 24/05/13 10:26:28 INFO AbstractCoordinator: [Consumer clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3, groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0] (Re-)joining group 24/05/13 10:26:28 INFO AbstractCoordinator: [Consumer clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3, groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0] Group coordinator va2kafka014.va2.pubmatic.local:6667 (id: 2147482646 rack: null) is unavailable or invalid due to cause: null.isDisconnected: true. Rediscovery will be attempted. 24/05/13 10:26:28 INFO AbstractCoordinator: [Consumer clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3, groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0] Rebalance failed. org.apache.kafka.common.errors.DisconnectException 24/05/13 10:26:28 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 436704. 24/05/13 10:26:28 INFO DAGScheduler: Executor lost: 436704 (epoch 0) 24/05/13 10:26:28 INFO BlockManagerMaster: Removed 436704 successfully in removeExecutor 24/05/13 10:26:28 INFO YarnClusterScheduler: Executor 436704 on va2aggr2503.va2.pubmatic.local killed by driver. 24/05/13 10:26:28 INFO ExecutorMonitor: Executor 436704 is removed. Remove reason statistics: (gracefully decommissioned: 0, decommision unfinished: 0, driver killed: 436456, unexpectedly exited: 399). 24/05/13 10:26:28 INFO AbstractCoordinator: [Consumer clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3, groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0] Discovered group coordinator va2kafka014.va2.pubmatic.local:6667 (id: 2147482646 rack: null) 24/05/13 10:26:28 INFO AbstractCoordinator: [Consumer clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3, groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0] (Re-)joining group 24/05/13 10:26:31 INFO ConsumerCoordinator: [Consumer clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3, groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0] Finished assignment for group at generation 6: {consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3-d4448c3e-8f23-490b-b800-be15a14efd32=Assignment(partitions=[topic-4, topic-5, topic-6, topic-7]), consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3-0daf3497-feac-4eee-a5ca-596e2b2e1649=Assignment(partitions=[topic-0, topic-1, topic-2, topic-3])} 24/05/13 10:26:31 INFO ConsumerCoordinator: [Consumer clientId=consumer-spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0-3, groupId=spark-kafka-source-c48ddb22-762b-4aa3-bf60-7541efb19b2a-1436000591-driver-0] Adding newly assigned partitions: topic-0 topic-1 topic-2 topic-3{code} Can this be due to the generation reset that we are doing on rebalancing code on 2.8.0 which eventually got fixed on version 2.8.1 https://issues.apache.org/jira/browse/KAFKA-13214 {code:java} else { final RuntimeException
[jira] [Created] (KAFKA-16807) DescribeLogDirsResponseData#results#topics have unexpected topics having empty partitions
Chia-Ping Tsai created KAFKA-16807: -- Summary: DescribeLogDirsResponseData#results#topics have unexpected topics having empty partitions Key: KAFKA-16807 URL: https://issues.apache.org/jira/browse/KAFKA-16807 Project: Kafka Issue Type: Bug Reporter: Chia-Ping Tsai Assignee: Chia-Ping Tsai ReplicaManager [0] could generate a response having unexpected topics which have empty partitions. The root cause is it always generate the topic collection even though they have no matched partitions. That is not a issue to Kafka clients, since we loop the "partitions" to fill all future responses [1]. Hence, those unexpected topics won't be existent in the final results. However, that could be a issue to the users who implement Kafka client based on Kafka protocol [2] [0] https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L1252 [1] https://github.com/apache/kafka/blob/b5a013e4564ad93026b9c61431e4573a39bec766/clients/src/main/java/org/apache/kafka/clients/admin/KafkaAdminClient.java#L3145 [2] https://lists.apache.org/thread/lp7ktmm17pbg7nqk7p4s904lcn3mrvhy -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16794) Can't open videos in streams documentation
[ https://issues.apache.org/jira/browse/KAFKA-16794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-16794. Fix Version/s: 3.8.0 Resolution: Fixed > Can't open videos in streams documentation > -- > > Key: KAFKA-16794 > URL: https://issues.apache.org/jira/browse/KAFKA-16794 > Project: Kafka > Issue Type: Bug > Components: docs, streams >Reporter: Kuan Po Tseng >Assignee: 黃竣陽 >Priority: Minor > Fix For: 3.8.0 > > Attachments: IMG_4445.png, image.png > > > Can't open videos in page [https://kafka.apache.org/documentation/streams/] > Open console in chrome browser and it shows error message: > {{Refused to frame 'https://www.youtube.com/' because it violates the > following Content Security Policy directive: "frame-src 'self'".}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16760) alterReplicaLogDirs failed even if responded with none error
[ https://issues.apache.org/jira/browse/KAFKA-16760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Chen resolved KAFKA-16760. --- Resolution: Not A Problem > alterReplicaLogDirs failed even if responded with none error > > > Key: KAFKA-16760 > URL: https://issues.apache.org/jira/browse/KAFKA-16760 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Luke Chen >Priority: Major > > When firing alterLogDirRequest, it gets error NONE result. But actually, the > alterLogDir never happened with these errors: > {code:java} > [2024-05-14 16:48:50,796] INFO [ReplicaAlterLogDirsThread-1]: Partition > topicB-0 has an older epoch (0) than the current leader. Will await the new > LeaderAndIsr state before resuming fetching. > (kafka.server.ReplicaAlterLogDirsThread:66) > [2024-05-14 16:48:50,796] WARN [ReplicaAlterLogDirsThread-1]: Partition > topicB-0 marked as failed (kafka.server.ReplicaAlterLogDirsThread:70) > {code} > Note: It's under KRaft mode. So the log with LeaderAndIsr is wrong. > This can be reproduced in this > [branch|https://github.com/showuon/kafka/tree/alterLogDirTest] and running > this test: > {code:java} > ./gradlew cleanTest storage:test --tests > org.apache.kafka.tiered.storage.integration.AlterLogDirTest > {code} > The complete logs can be found here: > https://gist.github.com/showuon/b16cdb05a125a7c445cc6e377a2b7923 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16197) Connect Worker poll timeout prints Consumer poll timeout specific warnings.
[ https://issues.apache.org/jira/browse/KAFKA-16197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Harris resolved KAFKA-16197. - Fix Version/s: 3.8.0 Resolution: Fixed > Connect Worker poll timeout prints Consumer poll timeout specific warnings. > --- > > Key: KAFKA-16197 > URL: https://issues.apache.org/jira/browse/KAFKA-16197 > Project: Kafka > Issue Type: Bug > Components: connect >Reporter: Sagar Rao >Assignee: Sagar Rao >Priority: Major > Fix For: 3.8.0 > > > When a Connect worker's poll timeout expires in Connect, the log lines that > we see are: > {noformat} > consumer poll timeout has expired. This means the time between subsequent > calls to poll() was longer than the configured max.poll.interval.ms, which > typically implies that the poll loop is spending too much time processing > messages. You can address this either by increasing max.poll.interval.ms or > by reducing the maximum size of batches returned in poll() with > max.poll.records. > {noformat} > and the reason for leaving the group is > {noformat} > Member XX sending LeaveGroup request to coordinator XX due to consumer poll > timeout has expired. > {noformat} > which is specific to Consumers and not to Connect workers. The log line above > in specially misleading because the config `max.poll.interval.ms` is not > configurable for a Connect worker and could make someone believe that the > logs are being written for Sink Connectors and not for Connect worker. > Ideally, we should print something specific to Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010)