[jira] [Resolved] (KAFKA-16103) Review client logic for triggering offset commit callbacks
[ https://issues.apache.org/jira/browse/KAFKA-16103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-16103. Resolution: Fixed > Review client logic for triggering offset commit callbacks > -- > > Key: KAFKA-16103 > URL: https://issues.apache.org/jira/browse/KAFKA-16103 > Project: Kafka > Issue Type: Sub-task > Components: clients, consumer >Reporter: Lianet Magrans >Assignee: Lucas Brutschy >Priority: Critical > Labels: kip-848-client-support, offset > Fix For: 3.8.0 > > > Review logic for triggering commit callbacks, ensuring that all callbacks are > triggered before returning from commitSync -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16599) Always await async commit callbacks in commitSync and close
Lucas Brutschy created KAFKA-16599: -- Summary: Always await async commit callbacks in commitSync and close Key: KAFKA-16599 URL: https://issues.apache.org/jira/browse/KAFKA-16599 Project: Kafka Issue Type: Task Reporter: Lucas Brutschy -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16008) Fix PlaintextConsumerTest.testMaxPollIntervalMs
[ https://issues.apache.org/jira/browse/KAFKA-16008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-16008. Resolution: Duplicate > Fix PlaintextConsumerTest.testMaxPollIntervalMs > --- > > Key: KAFKA-16008 > URL: https://issues.apache.org/jira/browse/KAFKA-16008 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 3.7.0 >Reporter: Kirk True >Assignee: Lucas Brutschy >Priority: Critical > Labels: consumer-threading-refactor, integration-tests, timeout > Fix For: 3.8.0 > > > The integration test {{PlaintextConsumerTest.testMaxPollIntervalMs}} is > failing when using the {{AsyncKafkaConsumer}}. > The error is: > {code} > org.opentest4j.AssertionFailedError: Timed out before expected rebalance > completed > at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38) > at org.junit.jupiter.api.Assertions.fail(Assertions.java:134) > at > kafka.api.AbstractConsumerTest.awaitRebalance(AbstractConsumerTest.scala:317) > at > kafka.api.PlaintextConsumerTest.testMaxPollIntervalMs(PlaintextConsumerTest.scala:194) > {code} > The logs include this line: > > {code} > [2023-12-13 15:11:16,134] WARN [Consumer clientId=ConsumerTestConsumer, > groupId=my-test] consumer poll timeout has expired. This means the time > between subsequent calls to poll() was longer than the configured > max.poll.interval.ms, which typically implies that the poll loop is spending > too much time processing messages. You can address this either by increasing > max.poll.interval.ms or by reducing the maximum size of batches returned in > poll() with max.poll.records. > (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188) > {code} > I don't know if that's related or not. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16010) Fix PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling
[ https://issues.apache.org/jira/browse/KAFKA-16010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-16010. Resolution: Duplicate > Fix PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling > -- > > Key: KAFKA-16010 > URL: https://issues.apache.org/jira/browse/KAFKA-16010 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 3.7.0 >Reporter: Kirk True >Assignee: Lucas Brutschy >Priority: Critical > Labels: consumer-threading-refactor, integration-tests, timeout > Fix For: 3.8.0 > > > The integration test > {{PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling}} is > failing when using the {{AsyncKafkaConsumer}}. > The error is: > {code} > org.opentest4j.AssertionFailedError: Did not get valid assignment for > partitions [topic1-2, topic1-4, topic-1, topic-0, topic1-5, topic1-1, > topic1-0, topic1-3] after one consumer left > at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38) > at org.junit.jupiter.api.Assertions.fail(Assertions.java:134) > at > kafka.api.AbstractConsumerTest.validateGroupAssignment(AbstractConsumerTest.scala:286) > at > kafka.api.PlaintextConsumerTest.runMultiConsumerSessionTimeoutTest(PlaintextConsumerTest.scala:1883) > at > kafka.api.PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling(PlaintextConsumerTest.scala:1281) > {code} > The logs include these lines: > > {code} > [2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, > groupId=my-test] consumer poll timeout has expired. This means the time > between subsequent calls to poll() was longer than the configured > max.poll.interval.ms, which typically implies that the poll loop is spending > too much time processing messages. You can address this either by increasing > max.poll.interval.ms or by reducing the maximum size of batches returned in > poll() with max.poll.records. > (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188) > [2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, > groupId=my-test] consumer poll timeout has expired. This means the time > between subsequent calls to poll() was longer than the configured > max.poll.interval.ms, which typically implies that the poll loop is spending > too much time processing messages. You can address this either by increasing > max.poll.interval.ms or by reducing the maximum size of batches returned in > poll() with max.poll.records. > (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188) > [2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, > groupId=my-test] consumer poll timeout has expired. This means the time > between subsequent calls to poll() was longer than the configured > max.poll.interval.ms, which typically implies that the poll loop is spending > too much time processing messages. You can address this either by increasing > max.poll.interval.ms or by reducing the maximum size of batches returned in > poll() with max.poll.records. > (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188) > {code} > I don't know if that's related or not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16290) Investigate propagating subscription state updates via queues
Lucas Brutschy created KAFKA-16290: -- Summary: Investigate propagating subscription state updates via queues Key: KAFKA-16290 URL: https://issues.apache.org/jira/browse/KAFKA-16290 Project: Kafka Issue Type: Task Components: clients, consumer Reporter: Lucas Brutschy We are mostly using the queues for interaction between application thread and background thread, but the subscription object is shared between the threads, and it is updated directly without going through the queues. The way we allow updates to the subscription state from both threads is definitely not right, and will bring trouble. Places like the assign() is probably the most obvious, where we send an event to the background to commit, but then update the subscription in the foreground right away. It seems sensible to aim for having all updates to the subscription state in the background, triggered from the app thread via events (and I think we already have related events for all updates, just that the subscription state was left out in the app thread). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16284) Performance regression in RocksDB
Lucas Brutschy created KAFKA-16284: -- Summary: Performance regression in RocksDB Key: KAFKA-16284 URL: https://issues.apache.org/jira/browse/KAFKA-16284 Project: Kafka Issue Type: Task Components: streams Reporter: Lucas Brutschy In benchmarks, we are noticing a performance regression in the performance of `RocksDBStore`. The regression happens between those two commits: {code:java} trunk - 70c8b8d0af - regressed - 2024-01-06T14:00:20Z trunk - d5aa341a18 - not regressed - 2023-12-31T11:47:14Z {code} The regression can be reproduced by the following test: {code:java} package org.apache.kafka.streams.state.internals; import org.apache.kafka.common.serialization.Serdes; import org.apache.kafka.common.utils.Bytes; import org.apache.kafka.streams.StreamsConfig; import org.apache.kafka.streams.processor.StateStoreContext; import org.apache.kafka.test.InternalMockProcessorContext; import org.apache.kafka.test.MockRocksDbConfigSetter; import org.apache.kafka.test.StreamsTestUtils; import org.apache.kafka.test.TestUtils; import org.junit.Before; import org.junit.Test; import java.io.File; import java.nio.ByteBuffer; import java.util.Properties; public class RocksDBStorePerfTest { InternalMockProcessorContext context; RocksDBStore rocksDBStore; final static String DB_NAME = "db-name"; final static String METRICS_SCOPE = "metrics-scope"; RocksDBStore getRocksDBStore() { return new RocksDBStore(DB_NAME, METRICS_SCOPE); } @Before public void setUp() { final Properties props = StreamsTestUtils.getStreamsConfig(); props.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG, MockRocksDbConfigSetter.class); File dir = TestUtils.tempDirectory(); context = new InternalMockProcessorContext<>( dir, Serdes.String(), Serdes.String(), new StreamsConfig(props) ); } @Test public void testPerf() { long start = System.currentTimeMillis(); for (int i = 0; i < 10; i++) { System.out.println("Iteration: "+i+" Time: " + (System.currentTimeMillis() - start)); RocksDBStore rocksDBStore = getRocksDBStore(); rocksDBStore.init((StateStoreContext) context, rocksDBStore); for (int j = 0; j < 100; j++) { rocksDBStore.put(new Bytes(ByteBuffer.allocate(4).putInt(j).array()), "perf".getBytes()); } rocksDBStore.close(); } long end = System.currentTimeMillis(); System.out.println("Time: " + (end - start)); } } {code} I have isolated the regression to commit [5bc3aa4|https://github.com/apache/kafka/commit/5bc3aa428067dff1f2b9075ff5d1351fb05d4b10]. On my machine, the test takes ~8 seconds before [5bc3aa4|https://github.com/apache/kafka/commit/5bc3aa428067dff1f2b9075ff5d1351fb05d4b10] and ~30 seconds after [5bc3aa4|https://github.com/apache/kafka/commit/5bc3aa428067dff1f2b9075ff5d1351fb05d4b10]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (KAFKA-16167) Fix PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup
[ https://issues.apache.org/jira/browse/KAFKA-16167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy reopened KAFKA-16167: > Fix PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup > -- > > Key: KAFKA-16167 > URL: https://issues.apache.org/jira/browse/KAFKA-16167 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 3.7.0 >Reporter: Kirk True >Assignee: Kirk True >Priority: Critical > Labels: consumer-threading-refactor, integration-tests, > kip-848-client-support > Fix For: 3.8.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16248) Kafka consumer should cache leader offset ranges
Lucas Brutschy created KAFKA-16248: -- Summary: Kafka consumer should cache leader offset ranges Key: KAFKA-16248 URL: https://issues.apache.org/jira/browse/KAFKA-16248 Project: Kafka Issue Type: Bug Reporter: Lucas Brutschy We noticed a streams application received an OFFSET_OUT_OF_RANGE error following a network partition and streams task rebalance and subsequently reset its offsets to the beginning. Inspecting the logs, we saw multiple consumer log messages like: {code:java} Setting offset for partition tp to the committed offset FetchPosition{offset=1234, offsetEpoch=Optional.empty...) {code} Inspecting the streams code, it looks like kafka streams calls `commitSync` passing through an explicit OffsetAndMetadata object but does not populate the offset leader epoch. The offset leader epoch is required in the offset commit to ensure that all consumers in the consumer group have coherent metadata before fetching. Otherwise after a consumer group rebalance, a consumer may fetch with a stale leader epoch with respect to the committed offset and get an offset out of range error from a zombie partition leader. An example of where this can cause issues: 1. We have a consumer group with consumer 1 and consumer 2. Partition P is assigned to consumer 1 which has up-to-date metadata for P. Consumer 2 has stale metadata for P. 2. Consumer 1 fetches partition P with offset 50, epoch 8. commits the offset 50 without an epoch. 3. The consumer group rebalances and P is now assigned to consumer 2. Consumer 2 has a stale leader epoch for P (let's say leader epoch 7). Consumer 2 will now try to fetch with leader epoch 7, offset 50. If we have a zombie leader due to a network partition, the zombie leader may accept consumer 2's fetch leader epoch and return an OFFSET_OUT_OF_RANGE to consumer 2. If in step 1, consumer 1 committed the leader epoch for the message, then when consumer 2 receives assignment P it would force a metadata refresh to discover a sufficiently new leader epoch for the committed offset. Kafka Streams cannot fully determine the leader epoch of the offsets it wants to commit - in EOS mode, streams commits the offset after the last control records (to avoid always having a lag of >0), but the leader epoch of the control record is not known to streams (since only non-control records are returned from Consumer.poll). A fix discussed with [~hachikuji] is to have the consumer cache leader epoch ranges, similar to how the broker maintains a leader epoch cache. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16220) KTableKTableForeignKeyInnerJoinCustomPartitionerIntegrationTest#shouldThrowIllegalArgumentExceptionWhenCustomPartionerReturnsMultiplePartitions is flaky
[ https://issues.apache.org/jira/browse/KAFKA-16220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-16220. Resolution: Fixed > KTableKTableForeignKeyInnerJoinCustomPartitionerIntegrationTest#shouldThrowIllegalArgumentExceptionWhenCustomPartionerReturnsMultiplePartitions > is flaky > > > Key: KAFKA-16220 > URL: https://issues.apache.org/jira/browse/KAFKA-16220 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests >Reporter: Lucas Brutschy >Assignee: Lucas Brutschy >Priority: Major > Labels: flaky, flaky-test > > This test has seen significant flakyness > > https://ge.apache.org/s/fac7lploprvuu/tests/task/:streams:test/details/org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinCustomPartitionerIntegrationTest/shouldThrowIllegalArgumentExceptionWhenCustomPartitionerReturnsMultiplePartitions()?top-execution=1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14384) Flaky Test SelfJoinUpgradeIntegrationTest.shouldUpgradeWithTopologyOptimizationOff
[ https://issues.apache.org/jira/browse/KAFKA-14384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-14384. Resolution: Fixed > Flaky Test > SelfJoinUpgradeIntegrationTest.shouldUpgradeWithTopologyOptimizationOff > -- > > Key: KAFKA-14384 > URL: https://issues.apache.org/jira/browse/KAFKA-14384 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: A. Sophie Blee-Goldman >Priority: Critical > Labels: flaky-test > > h3. Stacktrace > java.lang.AssertionError: Did not receive all 5 records from topic > selfjoin-outputSelfJoinUpgradeIntegrationTestshouldUpgradeWithTopologyOptimizationOff > within 6 ms Expected: is a value equal to or greater than <5> but: <0> > was less than <5> at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at > org.apache.kafka.streams.integration.utils.IntegrationTestUtils.lambda$waitUntilMinKeyValueWithTimestampRecordsReceived$2(IntegrationTestUtils.java:763) > at > org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:382) > at > org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:350) > at > org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueWithTimestampRecordsReceived(IntegrationTestUtils.java:759) > at > org.apache.kafka.streams.integration.SelfJoinUpgradeIntegrationTest.processKeyValueAndVerifyCount(SelfJoinUpgradeIntegrationTest.java:244) > at > org.apache.kafka.streams.integration.SelfJoinUpgradeIntegrationTest.shouldUpgradeWithTopologyOptimizationOff(SelfJoinUpgradeIntegrationTest.java:155) > > https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-12835/4/testReport/org.apache.kafka.streams.integration/SelfJoinUpgradeIntegrationTest/Build___JDK_11_and_Scala_2_13___shouldUpgradeWithTopologyOptimizationOff/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14385) Flaky Test QueryableStateIntegrationTest.shouldNotMakeStoreAvailableUntilAllStoresAvailable
[ https://issues.apache.org/jira/browse/KAFKA-14385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-14385. Resolution: Fixed > Flaky Test > QueryableStateIntegrationTest.shouldNotMakeStoreAvailableUntilAllStoresAvailable > --- > > Key: KAFKA-14385 > URL: https://issues.apache.org/jira/browse/KAFKA-14385 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: A. Sophie Blee-Goldman >Priority: Critical > Labels: flaky-test > > Failed twice on the same build (Java 8 & 11) > h3. Stacktrace > java.lang.AssertionError: KafkaStreams did not transit to RUNNING state > within 15000 milli seconds. Expected: but: was at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at > org.apache.kafka.test.StreamsTestUtils.startKafkaStreamsAndWaitForRunningState(StreamsTestUtils.java:134) > at > org.apache.kafka.test.StreamsTestUtils.startKafkaStreamsAndWaitForRunningState(StreamsTestUtils.java:121) > at > org.apache.kafka.streams.integration.QueryableStateIntegrationTest.shouldNotMakeStoreAvailableUntilAllStoresAvailable(QueryableStateIntegrationTest.java:1038) > > https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-12836/3/testReport/org.apache.kafka.streams.integration/QueryableStateIntegrationTest/Build___JDK_11_and_Scala_2_13___shouldNotMakeStoreAvailableUntilAllStoresAvailable/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-8691) Flakey test ProcessorContextTest#shouldNotAllowToScheduleZeroMillisecondPunctuation
[ https://issues.apache.org/jira/browse/KAFKA-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-8691. --- Resolution: Fixed > Flakey test > ProcessorContextTest#shouldNotAllowToScheduleZeroMillisecondPunctuation > > > Key: KAFKA-8691 > URL: https://issues.apache.org/jira/browse/KAFKA-8691 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Boyang Chen >Priority: Critical > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6384/consoleFull] > org.apache.kafka.streams.processor.internals.ProcessorContextTest > > shouldNotAllowToScheduleZeroMillisecondPunctuation PASSED*23:37:09* ERROR: > Failed to write output for test null.Gradle Test Executor 5*23:37:09* > java.lang.NullPointerException: Cannot invoke method write() on null > object*23:37:09*at > org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:91)*23:37:09* > at > org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:47)*23:37:09* > at > org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)*23:37:09* > at > org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:34)*23:37:09* >at > org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)*23:37:09* > at java_io_FileOutputStream$write.call(Unknown Source)*23:37:09* > at > build_5nv3fyjgqff9aim9wbxfnad9z$_run_closure5$_closure75$_closure108.doCall(/home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/build.gradle:244) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-9897) Flaky Test StoreQueryIntegrationTest#shouldQuerySpecificActivePartitionStores
[ https://issues.apache.org/jira/browse/KAFKA-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-9897. --- Resolution: Fixed > Flaky Test StoreQueryIntegrationTest#shouldQuerySpecificActivePartitionStores > - > > Key: KAFKA-9897 > URL: https://issues.apache.org/jira/browse/KAFKA-9897 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests >Affects Versions: 2.6.0 >Reporter: Matthias J. Sax >Priority: Critical > Labels: flaky-test > > [https://builds.apache.org/job/kafka-pr-jdk14-scala2.13/22/testReport/junit/org.apache.kafka.streams.integration/StoreQueryIntegrationTest/shouldQuerySpecificActivePartitionStores/] > {quote}org.apache.kafka.streams.errors.InvalidStateStoreException: Cannot get > state store source-table because the stream thread is PARTITIONS_ASSIGNED, > not RUNNING at > org.apache.kafka.streams.state.internals.StreamThreadStateStoreProvider.stores(StreamThreadStateStoreProvider.java:85) > at > org.apache.kafka.streams.state.internals.QueryableStoreProvider.getStore(QueryableStoreProvider.java:61) > at org.apache.kafka.streams.KafkaStreams.store(KafkaStreams.java:1183) at > org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQuerySpecificActivePartitionStores(StoreQueryIntegrationTest.java:178){quote} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16220) KTableKTableForeignKeyInnerJoinCustomPartitionerIntegrationTest#shouldThrowIllegalArgumentExceptionWhenCustomPartionerReturnsMultiplePartitions flaky
Lucas Brutschy created KAFKA-16220: -- Summary: KTableKTableForeignKeyInnerJoinCustomPartitionerIntegrationTest#shouldThrowIllegalArgumentExceptionWhenCustomPartionerReturnsMultiplePartitions flaky Key: KAFKA-16220 URL: https://issues.apache.org/jira/browse/KAFKA-16220 Project: Kafka Issue Type: Bug Components: streams, unit tests Reporter: Lucas Brutschy Assignee: Lucas Brutschy This test has seen significant flakyness https://ge.apache.org/s/fac7lploprvuu/tests/task/:streams:test/details/org.apache.kafka.streams.integration.KTableKTableForeignKeyInnerJoinCustomPartitionerIntegrationTest/shouldThrowIllegalArgumentExceptionWhenCustomPartitionerReturnsMultiplePartitions()?top-execution=1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16198) Reconciliation may lose partitions when topic metadata is delayed
Lucas Brutschy created KAFKA-16198: -- Summary: Reconciliation may lose partitions when topic metadata is delayed Key: KAFKA-16198 URL: https://issues.apache.org/jira/browse/KAFKA-16198 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Lucas Brutschy Assignee: Lucas Brutschy Fix For: 3.8.0 The current reconciliation code in `AsyncKafkaConsumer`s `MembershipManager` may lose part of the server-provided assignment when metadata is delayed. The reason is incorrect handling of partially resolved topic names, as in this example: * We get assigned {{T1-1}} and {{T2-1}} * We reconcile {{{}T1-1{}}}, {{T2-1}} remains in {{assignmentUnresolved}} since the topic id {{T2}} is not known yet * We get new cluster metadata, which includes {{{}T2{}}}, so {{T2-1}} is moved to {{assignmentReadyToReconcile}} * We call {{reconcile}} -- {{T2-1}} is now treated as the full assignment, so {{T1-1}} is being revoked * We end up with assignment {{T2-1, which is inconsistent with the broker-side target assignment.}} Generally, this seems to be a problem around semantics of the internal collections `assignmentUnresolved` and `assignmentReadyToReconcile`. Absence of a topic in `assignmentReadyToReconcile` may mean either revocation of the topic partition(s), or unavailability of a topic name for the topic. Internal state with simpler and correct invariant could be achieved by using a single collection `currentTargetAssignment` which is based on topic IDs and always corresponds to the latest assignment received from the broker. During every attempted reconciliation, all topic IDs will be resolved from the local cache, which should not introduce a lot of overhead. `assignmentUnresolved` and `assignmentReadyToReconcile` are removed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16169) FencedException in commitAsync not propagated without callback
Lucas Brutschy created KAFKA-16169: -- Summary: FencedException in commitAsync not propagated without callback Key: KAFKA-16169 URL: https://issues.apache.org/jira/browse/KAFKA-16169 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Lucas Brutschy The javadocs for {{commitAsync()}} (w/o callback) say: @throws org.apache.kafka.common.errors.FencedInstanceIdException if this consumer instance gets fenced by broker. If no callback is passed into {{{}commitAsync(){}}}, no offset commit callback invocation is submitted. However, we only check for a {{FencedInstanceIdException}} when we execute a callback. It seems to me that with {{commitAsync()}} we would not throw at all when the consumer gets fenced. In any case, we need a unit test that verifies that the {{FencedInstanceIdException}} is thrown for each version of {{{}commitAsync(){}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16155) Investigate testAutoCommitIntercept
Lucas Brutschy created KAFKA-16155: -- Summary: Investigate testAutoCommitIntercept Key: KAFKA-16155 URL: https://issues.apache.org/jira/browse/KAFKA-16155 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Lucas Brutschy Even with KAFKA-15942, the test PlaintextConsumerTest.testAutoCommitIntercept flakes on the the initial setup (before using interceptors, so interceptors are unrelated here, except for being used later in the test). The problem is that we are seeking two topic partitions to offset 10 and 20, respectively, but when we commit, we seem to have lost one of the offsets, likely due to a race condition. When I output `subscriptionState.allConsumed` repeatedly, I get this output: {code:java} allConsumed: {topic-0=OffsetAndMetadata{offset=100, leaderEpoch=0, metadata=''}, topic-1=OffsetAndMetadata{offset=0, leaderEpoch=null, metadata=''}} seeking topic-0 to FetchPosition{offset=10, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:58298 (id: 0 rack: null)], epoch=0}} seeking topic-1 to FetchPosition{offset=20, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:58301 (id: 1 rack: null)], epoch=0}} allConsumed: {topic-0=OffsetAndMetadata{offset=10, leaderEpoch=null, metadata=''}, topic-1=OffsetAndMetadata{offset=20, leaderEpoch=null, metadata=''}} allConsumed: {topic-0=OffsetAndMetadata{offset=10, leaderEpoch=null, metadata=''}} allConsumed: {topic-0=OffsetAndMetadata{offset=10, leaderEpoch=null, metadata=''}, topic-1=OffsetAndMetadata{offset=0, leaderEpoch=null, metadata=''}} autocommit start {topic-0=OffsetAndMetadata{offset=10, leaderEpoch=null, metadata=''}, topic-1=OffsetAndMetadata{offset=0, leaderEpoch=null, metadata=''}} {code} So we after we seek to 10 / 20, we lose one of the offsets, maybe because we haven't reconciled the assignment yet. Later, we get the second topic partition assigned, but the offset is initialized to 0. We should investigate whether this can be made more like the behavior in the original consumer. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16133) Commits during reconciliation always time out
Lucas Brutschy created KAFKA-16133: -- Summary: Commits during reconciliation always time out Key: KAFKA-16133 URL: https://issues.apache.org/jira/browse/KAFKA-16133 Project: Kafka Issue Type: Bug Components: clients, consumer Affects Versions: 3.7.0 Reporter: Lucas Brutschy This only affects the AsyncKafkaConsumer, which is in Preview in 3.7. In MembershipManagerImpl there is a confusion between timeouts and deadlines. [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/MembershipManagerImpl.java#L836C38-L836C38] This causes all autocommits during reconciliation to immediately time out. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16097) State updater removes task without pending action in EOSv2
[ https://issues.apache.org/jira/browse/KAFKA-16097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-16097. Resolution: Fixed > State updater removes task without pending action in EOSv2 > -- > > Key: KAFKA-16097 > URL: https://issues.apache.org/jira/browse/KAFKA-16097 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 3.8.0 >Reporter: Lucas Brutschy >Priority: Major > > A long-running soak encountered the following exception: > > {code:java} > [2024-01-08 03:06:00,586] ERROR [i-081c089d2ed054443-StreamThread-3] Thread > encountered an error processing soak test > (org.apache.kafka.streams.StreamsSoakTest) > java.lang.IllegalStateException: Got a removed task 1_0 from the state > updater that is not for recycle, closing, or updating input partitions; this > should not happen > at > org.apache.kafka.streams.processor.internals.TaskManager.handleRemovedTasksFromStateUpdater(TaskManager.java:939) > at > org.apache.kafka.streams.processor.internals.TaskManager.checkStateUpdater(TaskManager.java:788) > at > org.apache.kafka.streams.processor.internals.StreamThread.checkStateUpdater(StreamThread.java:1141) > at > org.apache.kafka.streams.processor.internals.StreamThread.runOnceWithoutProcessingThreads(StreamThread.java:949) > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:686) > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:645) > [2024-01-08 03:06:00,587] ERROR [i-081c089d2ed054443-StreamThread-3] > stream-client [i-081c089d2ed054443] Encountered the following exception > during processing and sent shutdown request for the entire application. > (org.apache.kafka.streams.KafkaStreams) > org.apache.kafka.streams.errors.StreamsException: > java.lang.IllegalStateException: Got a removed task 1_0 from the state > updater that is not for recycle, closing, or updating input partitions; this > should not happen > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:729) > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:645) > Caused by: java.lang.IllegalStateException: Got a removed task 1_0 from the > state updater that is not for recycle, closing, or updating input partitions; > this should not happen > at > org.apache.kafka.streams.processor.internals.TaskManager.handleRemovedTasksFromStateUpdater(TaskManager.java:939) > at > org.apache.kafka.streams.processor.internals.TaskManager.checkStateUpdater(TaskManager.java:788) > at > org.apache.kafka.streams.processor.internals.StreamThread.checkStateUpdater(StreamThread.java:1141) > at > org.apache.kafka.streams.processor.internals.StreamThread.runOnceWithoutProcessingThreads(StreamThread.java:949) > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:686) > ... 1 more{code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15941) Flaky test: shouldRestoreNullRecord() – org.apache.kafka.streams.integration.RestoreIntegrationTest
[ https://issues.apache.org/jira/browse/KAFKA-15941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-15941. Resolution: Cannot Reproduce > Flaky test: shouldRestoreNullRecord() – > org.apache.kafka.streams.integration.RestoreIntegrationTest > --- > > Key: KAFKA-15941 > URL: https://issues.apache.org/jira/browse/KAFKA-15941 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests >Reporter: Apoorv Mittal >Priority: Major > Labels: flaky-test > > https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14699/24/tests/ > {code:java} > org.opentest4j.AssertionFailedError: Condition not met within timeout 6. > Did not receive all [KeyValue(2, \x00\x00\x00)] records from topic output > (got []) ==> expected: but was: > Stacktraceorg.opentest4j.AssertionFailedError: Condition not met > within timeout 6. Did not receive all [KeyValue(2, \x00\x00\x00)] records > from topic output (got []) ==> expected: but was: at > org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) > at > org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) > at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) > at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) at > org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210) at > org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:331) >at > org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:379) > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:328) > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:312) at > org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:302) at > org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilFinalKeyValueRecordsReceived(IntegrationTestUtils.java:878) > at > org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilFinalKeyValueRecordsReceived(IntegrationTestUtils.java:827) > at > org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilFinalKeyValueRecordsReceived(IntegrationTestUtils.java:790) > at > org.apache.kafka.streams.integration.RestoreIntegrationTest.shouldRestoreNullRecord(RestoreIntegrationTest.java:244) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16098) State updater may attempt to resume a task that is not assigned anymore
Lucas Brutschy created KAFKA-16098: -- Summary: State updater may attempt to resume a task that is not assigned anymore Key: KAFKA-16098 URL: https://issues.apache.org/jira/browse/KAFKA-16098 Project: Kafka Issue Type: Bug Components: streams Reporter: Lucas Brutschy Attachments: streams.log.gz A long-running soak test brought to light this `IllegalStateException`: {code:java} [2024-01-07 08:54:13,688] ERROR [i-0637ca8609f50425f-StreamThread-1] Thread encountered an error processing soak test (org.apache.kafka.streams.StreamsSoakTest) java.lang.IllegalStateException: No current assignment for partition network-id-repartition-1 at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:367) at org.apache.kafka.clients.consumer.internals.SubscriptionState.resume(SubscriptionState.java:753) at org.apache.kafka.clients.consumer.internals.LegacyKafkaConsumer.resume(LegacyKafkaConsumer.java:963) at org.apache.kafka.clients.consumer.KafkaConsumer.resume(KafkaConsumer.java:1524) at org.apache.kafka.streams.processor.internals.TaskManager.transitRestoredTaskToRunning(TaskManager.java:857) at org.apache.kafka.streams.processor.internals.TaskManager.handleRestoredTasksFromStateUpdater(TaskManager.java:979) at org.apache.kafka.streams.processor.internals.TaskManager.checkStateUpdater(TaskManager.java:791) at org.apache.kafka.streams.processor.internals.StreamThread.checkStateUpdater(StreamThread.java:1141) at org.apache.kafka.streams.processor.internals.StreamThread.runOnceWithoutProcessingThreads(StreamThread.java:949) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:686) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:645) [2024-01-07 08:54:13,688] ERROR [i-0637ca8609f50425f-StreamThread-1] stream-client [i-0637ca8609f50425f] Encountered the following exception during processing and sent shutdown request for the entire application. (org.apache.kafka.streams.KafkaStreams) org.apache.kafka.streams.errors.StreamsException: java.lang.IllegalStateException: No current assignment for partition network-id-repartition-1 at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:729) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:645) Caused by: java.lang.IllegalStateException: No current assignment for partition network-id-repartition-1 at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:367) at org.apache.kafka.clients.consumer.internals.SubscriptionState.resume(SubscriptionState.java:753) at org.apache.kafka.clients.consumer.internals.LegacyKafkaConsumer.resume(LegacyKafkaConsumer.java:963) at org.apache.kafka.clients.consumer.KafkaConsumer.resume(KafkaConsumer.java:1524) at org.apache.kafka.streams.processor.internals.TaskManager.transitRestoredTaskToRunning(TaskManager.java:857) at org.apache.kafka.streams.processor.internals.TaskManager.handleRestoredTasksFromStateUpdater(TaskManager.java:979) at org.apache.kafka.streams.processor.internals.TaskManager.checkStateUpdater(TaskManager.java:791) at org.apache.kafka.streams.processor.internals.StreamThread.checkStateUpdater(StreamThread.java:1141) at org.apache.kafka.streams.processor.internals.StreamThread.runOnceWithoutProcessingThreads(StreamThread.java:949) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:686) ... 1 more {code} Log (with some common messages filtered) attached. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16097) State updater removes task without pending action in EOSv2
Lucas Brutschy created KAFKA-16097: -- Summary: State updater removes task without pending action in EOSv2 Key: KAFKA-16097 URL: https://issues.apache.org/jira/browse/KAFKA-16097 Project: Kafka Issue Type: Bug Components: streams Reporter: Lucas Brutschy A long-running soak encountered the following exception: {code:java} [2024-01-08 03:06:00,586] ERROR [i-081c089d2ed054443-StreamThread-3] Thread encountered an error processing soak test (org.apache.kafka.streams.StreamsSoakTest) java.lang.IllegalStateException: Got a removed task 1_0 from the state updater that is not for recycle, closing, or updating input partitions; this should not happen at org.apache.kafka.streams.processor.internals.TaskManager.handleRemovedTasksFromStateUpdater(TaskManager.java:939) at org.apache.kafka.streams.processor.internals.TaskManager.checkStateUpdater(TaskManager.java:788) at org.apache.kafka.streams.processor.internals.StreamThread.checkStateUpdater(StreamThread.java:1141) at org.apache.kafka.streams.processor.internals.StreamThread.runOnceWithoutProcessingThreads(StreamThread.java:949) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:686) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:645) [2024-01-08 03:06:00,587] ERROR [i-081c089d2ed054443-StreamThread-3] stream-client [i-081c089d2ed054443] Encountered the following exception during processing and sent shutdown request for the entire application. (org.apache.kafka.streams.KafkaStreams) org.apache.kafka.streams.errors.StreamsException: java.lang.IllegalStateException: Got a removed task 1_0 from the state updater that is not for recycle, closing, or updating input partitions; this should not happen at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:729) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:645) Caused by: java.lang.IllegalStateException: Got a removed task 1_0 from the state updater that is not for recycle, closing, or updating input partitions; this should not happen at org.apache.kafka.streams.processor.internals.TaskManager.handleRemovedTasksFromStateUpdater(TaskManager.java:939) at org.apache.kafka.streams.processor.internals.TaskManager.checkStateUpdater(TaskManager.java:788) at org.apache.kafka.streams.processor.internals.StreamThread.checkStateUpdater(StreamThread.java:1141) at org.apache.kafka.streams.processor.internals.StreamThread.runOnceWithoutProcessingThreads(StreamThread.java:949) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:686) ... 1 more{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16089) Kafka Streams still leaking memory in 3.7
Lucas Brutschy created KAFKA-16089: -- Summary: Kafka Streams still leaking memory in 3.7 Key: KAFKA-16089 URL: https://issues.apache.org/jira/browse/KAFKA-16089 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 3.7.0 Reporter: Lucas Brutschy Fix For: 3.7.0 Attachments: graphviz (1).svg In [https://github.com/apache/kafka/commit/58d6d2e5922df60cdc5c5c1bcd1c2d82dd2b71e2] a leak was fixed in the release candidate for 3.7. However, Kafka Streams still seems to be leaking memory (just slower) after the fix. Attached is the `jeprof` output right before a crash after ~11 hours. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16086) Kafka Streams has RocksDB native memory leak
Lucas Brutschy created KAFKA-16086: -- Summary: Kafka Streams has RocksDB native memory leak Key: KAFKA-16086 URL: https://issues.apache.org/jira/browse/KAFKA-16086 Project: Kafka Issue Type: Bug Components: streams Reporter: Lucas Brutschy Attachments: image.png The current 3.7 and trunk versions are leaking native memory while running Kafka streams over several hours. This will likely kill any real workload over time, so this should be treated as a blocker bug for 3.7. This is discovered in a long-running soak test. Attached is the memory consumption, which steadily approaches 100% and then the JVM is killed. Rerunning the same test with jemalloc native memory profiling, we see these allocated objects after a few hours: {noformat} (jeprof) top Total: 13283138973 B 10296829713 77.5% 77.5% 10296829713 77.5% rocksdb::port::cacheline_aligned_alloc 2487325671 18.7% 96.2% 2487325671 18.7% rocksdb::BlockFetcher::ReadBlockContents 150937547 1.1% 97.4% 150937547 1.1% rocksdb::lru_cache::LRUHandleTable::LRUHandleTable 119591613 0.9% 98.3% 119591613 0.9% prof_backtrace_impl 47331433 0.4% 98.6% 105040933 0.8% rocksdb::BlockBasedTable::PutDataBlockToCache 32516797 0.2% 98.9% 32516797 0.2% rocksdb::Arena::AllocateNewBlock 29796095 0.2% 99.1% 30451535 0.2% Java_org_rocksdb_Options_newOptions 18172716 0.1% 99.2% 20008397 0.2% rocksdb::InternalStats::InternalStats 16032145 0.1% 99.4% 16032145 0.1% rocksdb::ColumnFamilyDescriptorJni::construct 12454120 0.1% 99.5% 12454120 0.1% std::_Rb_tree::_M_insert_unique{noformat} The first hypothesis is that this is caused by the leaking `Options` object introduced in this line: [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java#L312|https://github.com/apache/kafka/pull/14852] Introduced in this PR: [https://github.com/apache/kafka/pull/14852|https://github.com/apache/kafka/pull/14852] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16077) State updater fails to close task when input partitions are updated
Lucas Brutschy created KAFKA-16077: -- Summary: State updater fails to close task when input partitions are updated Key: KAFKA-16077 URL: https://issues.apache.org/jira/browse/KAFKA-16077 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 3.7.0 Reporter: Lucas Brutschy There is a race condition in the state updater that can cause the following: # We have an active task in the state updater # We get fenced. We recreate the producer, transactions now uninitialized. We ask the state updater to give back the task, add a pending action to close the task clean once it’s handed back # We get a new assignment with updated input partitions. The task is still owned by the state updater, so we ask the state updater again to hand it back and add a pending action to update its input partition # The task is handed back by the state updater. We update its input partitions but forget to close it clean (pending action was overwritten) # Now the task is in an initialized state, but the underlying producer does not have transactions initialized This can lead to an exception like this: streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log-org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=1_0, processor=KSTREAM-SOURCE-05, topic=node-name-repartition, partition=0, offset=618798, stacktrace=java.lang.IllegalStateException: TransactionalId stream-soak-test-d647640a-12e5-4e74-a0af-e105d0d0cb67-2: Invalid transition attempted from state UNINITIALIZED to state IN_TRANSACTION streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:999) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:985) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.clients.producer.internals.TransactionManager.beginTransaction(TransactionManager.java:311) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.clients.producer.KafkaProducer.beginTransaction(KafkaProducer.java:660) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.streams.processor.internals.StreamsProducer.maybeBeginTransaction(StreamsProducer.java:240) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.streams.processor.internals.StreamsProducer.send(StreamsProducer.java:258) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:253) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:175) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:85) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forwardInternal(ProcessorContextImpl.java:291) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:270) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:229) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.streams.kstream.internals.KStreamKTableJoinProcessor.doJoin(KStreamKTableJoinProcessor.java:130) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.streams.kstream.internals.KStreamKTableJoinProcessor.process(KStreamKTableJoinProcessor.java:99) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:152) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forwardInternal(ProcessorContextImpl.java:291) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:270) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:229) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:84) streams-soak-3-7-eos-v2_20240103131321/18.237.32.3/streams.log- at
[jira] [Resolved] (KAFKA-15696) Revoke partitions on Consumer.close()
[ https://issues.apache.org/jira/browse/KAFKA-15696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-15696. Resolution: Fixed > Revoke partitions on Consumer.close() > - > > Key: KAFKA-15696 > URL: https://issues.apache.org/jira/browse/KAFKA-15696 > Project: Kafka > Issue Type: Sub-task > Components: clients, consumer >Reporter: Kirk True >Assignee: Philip Nee >Priority: Major > Labels: consumer-threading-refactor, kip-848-client-support, > kip-848-e2e, kip-848-preview > > Upon closing of the {{Consumer}} we need to revoke assignment. This involves > stop fetching, committing offsets if auto-commit enabled and invoking the > onPartitionsRevoked callback. There is a mechanism introduced in PR > [14406|https://github.com/apache/kafka/pull/14406] that allows for performing > network I/O on shutdown. The new method > {{ConsumerNetworkThread.runAtClose()}} will be executed when > {{Consumer.close()}} is invoked. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15548) Send GroupConsumerHeartbeatRequest on Consumer.close()
[ https://issues.apache.org/jira/browse/KAFKA-15548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-15548. Resolution: Fixed > Send GroupConsumerHeartbeatRequest on Consumer.close() > -- > > Key: KAFKA-15548 > URL: https://issues.apache.org/jira/browse/KAFKA-15548 > Project: Kafka > Issue Type: Sub-task > Components: clients, consumer >Reporter: Philip Nee >Assignee: Philip Nee >Priority: Major > Labels: consumer-threading-refactor, kip-848-client-support, > kip-848-e2e, kip-848-preview > > Upon closing of the {{Consumer}} we need to send the last > GroupConsumerHeartbeatRequest with epoch = -1 to leave the group (or -2 if > static member). There is a mechanism introduced in PR > [14406|https://github.com/apache/kafka/pull/14406] that allows for performing > network I/O on shutdown. The new method > {{ConsumerNetworkThread.runAtClose()}} will be executed when > {{Consumer.close()}} is invoked. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16001) Migrate ConsumerNetworkThreadTestBuilder away from ConsumerTestBuilder
Lucas Brutschy created KAFKA-16001: -- Summary: Migrate ConsumerNetworkThreadTestBuilder away from ConsumerTestBuilder Key: KAFKA-16001 URL: https://issues.apache.org/jira/browse/KAFKA-16001 Project: Kafka Issue Type: Sub-task Reporter: Lucas Brutschy -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16000) Migrate MembershipManagerImpl away from ConsumerTestBuilder
Lucas Brutschy created KAFKA-16000: -- Summary: Migrate MembershipManagerImpl away from ConsumerTestBuilder Key: KAFKA-16000 URL: https://issues.apache.org/jira/browse/KAFKA-16000 Project: Kafka Issue Type: Sub-task Reporter: Lucas Brutschy -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15999) Migrate HeartbeatRequestManagerTest away from ConsumerTestBuilder
Lucas Brutschy created KAFKA-15999: -- Summary: Migrate HeartbeatRequestManagerTest away from ConsumerTestBuilder Key: KAFKA-15999 URL: https://issues.apache.org/jira/browse/KAFKA-15999 Project: Kafka Issue Type: Sub-task Reporter: Lucas Brutschy -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15977) DelegationTokenEndToEndAuthorizationWithOwnerTest leaks threads
Lucas Brutschy created KAFKA-15977: -- Summary: DelegationTokenEndToEndAuthorizationWithOwnerTest leaks threads Key: KAFKA-15977 URL: https://issues.apache.org/jira/browse/KAFKA-15977 Project: Kafka Issue Type: Bug Reporter: Lucas Brutschy [https://ci-builds.apache.org/blue/rest/organizations/jenkins/pipelines/Kafka/pipelines/kafka-pr/branches/PR-14878/runs/8/nodes/11/steps/90/log/?start=0] I had an unrelated PR fail with the following thread leak: ``` Gradle Test Run :core:test > Gradle Test Executor 95 > DelegationTokenEndToEndAuthorizationWithOwnerTest > executionError STARTED kafka.api.DelegationTokenEndToEndAuthorizationWithOwnerTest.executionError failed, log available in /home/jenkins/workspace/Kafka_kafka-pr_PR-14878/core/build/reports/testOutput/kafka.api.DelegationTokenEndToEndAuthorizationWithOwnerTest.executionError.test.stdout Gradle Test Run :core:test > Gradle Test Executor 95 > DelegationTokenEndToEndAuthorizationWithOwnerTest > executionError FAILED org.opentest4j.AssertionFailedError: Found 1 unexpected threads during @AfterAll: `kafka-admin-client-thread | adminclient-483` ==> expected: but was: ``` All the following tests on that error fail with initialization errors, because the admin client thread is never closed. This is preceded by the following test failure: ``` Gradle Test Run :core:test > Gradle Test Executor 95 > DelegationTokenEndToEndAuthorizationWithOwnerTest > testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl(String, boolean) > [1] quorum=kraft, isIdempotenceEnabled=true FAILED org.opentest4j.AssertionFailedError: expected acls: (principal=User:scram-user2, host=*, operation=CREATE_TOKENS, permissionType=ALLOW) (principal=User:scram-user2, host=*, operation=DESCRIBE_TOKENS, permissionType=ALLOW) but got: at app//org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38) at app//org.junit.jupiter.api.Assertions.fail(Assertions.java:134) at app//kafka.utils.TestUtils$.waitAndVerifyAcls(TestUtils.scala:1142) at app//kafka.api.DelegationTokenEndToEndAuthorizationWithOwnerTest.$anonfun$configureSecurityAfterServersStart$1(DelegationTokenEndToEndAuthorizationWithOwnerTest.scala:71) at app//kafka.api.DelegationTokenEndToEndAuthorizationWithOwnerTest.$anonfun$configureSecurityAfterServersStart$1$adapted(DelegationTokenEndToEndAuthorizationWithOwnerTest.scala:70) at app//scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576) at app//scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574) at app//scala.collection.AbstractIterable.foreach(Iterable.scala:933) at app//kafka.api.DelegationTokenEndToEndAuthorizationWithOwnerTest.configureSecurityAfterServersStart(DelegationTokenEndToEndAuthorizationWithOwnerTest.scala:70) ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15967) Fix revocation in reconcilation logic
Lucas Brutschy created KAFKA-15967: -- Summary: Fix revocation in reconcilation logic Key: KAFKA-15967 URL: https://issues.apache.org/jira/browse/KAFKA-15967 Project: Kafka Issue Type: Sub-task Reporter: Lucas Brutschy Looks like there is a problem in the reconciliation logic. We are getting 6 partitions from an HB, we add them to {{{}assignmentReadyToReconcile{}}}. Next HB we get only 4 partitions (2 are revoked), we also add them to {{{}assignmentReadyToReconcile{}}}, but the 2 partitions that were supposed to be removed from the assignment are never removed because they are still in {{{}assignmentReadyToReconcile{}}}. This was discovered during integration testing of [https://github.com/apache/kafka/pull/14878] - part of the test testRemoteAssignorRange was disabled and should be re-enabled once this is fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15690) EosIntegrationTest is flaky.
[ https://issues.apache.org/jira/browse/KAFKA-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-15690. Resolution: Fixed > EosIntegrationTest is flaky. > > > Key: KAFKA-15690 > URL: https://issues.apache.org/jira/browse/KAFKA-15690 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests >Reporter: Calvin Liu >Assignee: Lucas Brutschy >Priority: Major > Labels: flaky-test > > EosIntegrationTest > shouldNotViolateEosIfOneTaskGetsFencedUsingIsolatedAppInstances[exactly_once_v2, > processing threads = false] > {code:java} > org.junit.runners.model.TestTimedOutException: test timed out after 600 > seconds at > org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.handleServerDisconnect(NetworkClient.java:) > at > org.apache.kafka.clients.NetworkClient.processDisconnection(NetworkClient.java:821) > at > org.apache.kafka.clients.NetworkClient.processTimeoutDisconnection(NetworkClient.java:779) >at > org.apache.kafka.clients.NetworkClient.handleTimedOutRequests(NetworkClient.java:837) > org.apache.kafka.streams.errors.StreamsException: Exception caught in > process. taskId=0_1, processor=KSTREAM-SOURCE-00, > topic=multiPartitionInputTopic, partition=1, offset=15, > stacktrace=java.lang.RuntimeException: Detected we've been interrupted. > at > org.apache.kafka.streams.integration.EosIntegrationTest$1$1.transform(EosIntegrationTest.java:892) >at > org.apache.kafka.streams.integration.EosIntegrationTest$1$1.transform(EosIntegrationTest.java:867) >at > org.apache.kafka.streams.kstream.internals.TransformerSupplierAdapter$1.transform(TransformerSupplierAdapter.java:49) > at > org.apache.kafka.streams.kstream.internals.TransformerSupplierAdapter$1.transform(TransformerSupplierAdapter.java:38) > at > org.apache.kafka.streams.kstream.internals.KStreamFlatTransform$KStreamFlatTransformProcessor.process(KStreamFlatTransform.java:66) > {code} > shouldWriteLatestOffsetsToCheckpointOnShutdown[exactly_once_v2, processing > threads = false] > {code:java} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.UnknownServerException: The server experienced > an unexpected error when processing the request. at > org.apache.kafka.streams.integration.utils.KafkaEmbedded.deleteTopic(KafkaEmbedded.java:204) > at > org.apache.kafka.streams.integration.utils.EmbeddedKafkaCluster.deleteTopicsAndWait(EmbeddedKafkaCluster.java:286) >at > org.apache.kafka.streams.integration.utils.EmbeddedKafkaCluster.deleteTopicsAndWait(EmbeddedKafkaCluster.java:274) >at > org.apache.kafka.streams.integration.EosIntegrationTest.createTopics(EosIntegrationTest.java:174) > at > java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > org.apache.kafka.streams.errors.StreamsException: Exception caught in > process. taskId=0_1, processor=KSTREAM-SOURCE-00, > topic=multiPartitionInputTopic, partition=1, offset=15, > stacktrace=java.lang.RuntimeException: Detected we've been interrupted. > at > org.apache.kafka.streams.integration.EosIntegrationTest$1$1.transform(EosIntegrationTest.java:892) >at > org.apache.kafka.streams.integration.EosIntegrationTest$1$1.transform(EosIntegrationTest.java:867) >at > org.apache.kafka.streams.kstream.internals.TransformerSupplierAdapter$1.transform(TransformerSupplierAdapter.java:49) > at > org.apache.kafka.streams.kstream.internals.TransformerSupplierAdapter$1.transform(TransformerSupplierAdapter.java:38) > at > org.apache.kafka.streams.kstream.internals.KStreamFlatTransform$KStreamFlatTransformProcessor.process(KStreamFlatTransform.java:66) > {code} > shouldWriteLatestOffsetsToCheckpointOnShutdown[exactly_once, processing > threads = false] > {code:java} > org.opentest4j.AssertionFailedError: Condition not met within timeout 6. > StreamsTasks did not request commit. ==> expected: but was: >at > app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) >at > app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) >at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) > at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) > at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210) > java.lang.IllegalStateException: Replica > [Topic=__transaction_state,Partition=2,Replica=1] should be in the > OfflineReplica,ReplicaDeletionStarted states before moving to > ReplicaDeletionIneligible state. Instead it is in OnlineReplica state > at >
[jira] [Resolved] (KAFKA-15887) Autocommit during close consistently fails with exception in background thread
[ https://issues.apache.org/jira/browse/KAFKA-15887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-15887. Resolution: Fixed > Autocommit during close consistently fails with exception in background thread > -- > > Key: KAFKA-15887 > URL: https://issues.apache.org/jira/browse/KAFKA-15887 > Project: Kafka > Issue Type: Sub-task >Reporter: Lucas Brutschy >Assignee: Philip Nee >Priority: Blocker > > when I run {{AsyncKafkaConsumerTest}} I get this every time I call close: > {code:java} > java.lang.IndexOutOfBoundsException: Index: 0 > at java.base/java.util.Collections$EmptyList.get(Collections.java:4483) > at > java.base/java.util.Collections$UnmodifiableList.get(Collections.java:1310) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.findCoordinatorSync(ConsumerNetworkThread.java:302) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.ensureCoordinatorReady(ConsumerNetworkThread.java:288) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.maybeAutoCommitAndLeaveGroup(ConsumerNetworkThread.java:276) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.cleanup(ConsumerNetworkThread.java:257) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.run(ConsumerNetworkThread.java:101) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15930) Flaky test - testWithGroupId - TransactionsBounceTest
Lucas Brutschy created KAFKA-15930: -- Summary: Flaky test - testWithGroupId - TransactionsBounceTest Key: KAFKA-15930 URL: https://issues.apache.org/jira/browse/KAFKA-15930 Project: Kafka Issue Type: Bug Reporter: Lucas Brutschy Test is intermittently failing. [https://ge.apache.org/scans/tests?search.names=Git%20branch=P28D=kafka=Europe%2FVienna=trunk=kafka.api.TransactionsBounceTest=testWithGroupId()] {code:java} org.apache.kafka.common.KafkaException: Unexpected error in TxnOffsetCommitResponse: The server disconnected before a response was received. at app//org.apache.kafka.clients.producer.internals.TransactionManager$TxnOffsetCommitHandler.handleResponse(TransactionManager.java:1702) at app//org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1236) at app//org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:154) at app//org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:594) at app//org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:586) at app//org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:460) at app//org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:337) at app//org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:252) at java.base@17.0.7/java.lang.Thread.run(Thread.java:833) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15929) Flaky test - testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress - TopicCommandIntegrationTest
Lucas Brutschy created KAFKA-15929: -- Summary: Flaky test - testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress - TopicCommandIntegrationTest Key: KAFKA-15929 URL: https://issues.apache.org/jira/browse/KAFKA-15929 Project: Kafka Issue Type: Bug Reporter: Lucas Brutschy Test is intermittently failing. https://ge.apache.org/scans/tests?search.names=Git%20branch=P28D=kafka=Europe%2FVienna=trunk=org.apache.kafka.tools.TopicCommandIntegrationTest=testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress(String)%5B1%5D {code:java} java.lang.NullPointerException: Cannot invoke "org.apache.kafka.clients.admin.PartitionReassignment.addingReplicas()" because "reassignments" is null at org.apache.kafka.tools.TopicCommandIntegrationTest.testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress(TopicCommandIntegrationTest.java:800) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15928) Flaky test - testBalancePartitionLeaders - QuorumControllerTest
Lucas Brutschy created KAFKA-15928: -- Summary: Flaky test - testBalancePartitionLeaders - QuorumControllerTest Key: KAFKA-15928 URL: https://issues.apache.org/jira/browse/KAFKA-15928 Project: Kafka Issue Type: Bug Reporter: Lucas Brutschy Test is intermittently failing. [https://ge.apache.org/scans/tests?search.names=Git%20branch=P28D=kafka=Europe%2FVienna=trunk=org.apache.kafka.controller.QuorumControllerTest=testBalancePartitionLeaders()] ``` org.opentest4j.AssertionFailedError: Condition not met within timeout 1. Leaders were not balanced after unfencing all of the brokers ==> expected: but was: at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210) at org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:331) at org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:379) at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:328) at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:312) at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:302) at org.apache.kafka.controller.QuorumControllerTest.testBalancePartitionLeaders(QuorumControllerTest.java:490) ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15927) Flaky test - testReplicateSourceDefault - MirrorConnectorsIntegrationExactlyOnceTest
Lucas Brutschy created KAFKA-15927: -- Summary: Flaky test - testReplicateSourceDefault - MirrorConnectorsIntegrationExactlyOnceTest Key: KAFKA-15927 URL: https://issues.apache.org/jira/browse/KAFKA-15927 Project: Kafka Issue Type: Bug Reporter: Lucas Brutschy Test is intermittently failing [https://ge.apache.org/scans/tests?search.names=Git%20branch=P28D=kafka=Europe%2FVienna=trunk=*MirrorConnectorsIntegrationExactlyOnceTest] {code:java} org.opentest4j.AssertionFailedError: `delete.retention.ms` should be different, because it's in exclude filter! ==> expected: not equal but was: <8640> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:152) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertNotEquals.failEqual(AssertNotEquals.java:277) at app//org.junit.jupiter.api.AssertNotEquals.assertNotEquals(AssertNotEquals.java:263) at app//org.junit.jupiter.api.Assertions.assertNotEquals(Assertions.java:2828) at app//org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest.lambda$testReplicateSourceDefault$9(MirrorConnectorsIntegrationBaseTest.java:826) at app//org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:331) at app//org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:379) at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:328) at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:312) at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:302) at app//org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest.testReplicateSourceDefault(MirrorConnectorsIntegrationBaseTest.java:813) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15926) Flaky test - testReplicateSourceDefault - MirrorConnectorsIntegrationSSLTest
Lucas Brutschy created KAFKA-15926: -- Summary: Flaky test - testReplicateSourceDefault - MirrorConnectorsIntegrationSSLTest Key: KAFKA-15926 URL: https://issues.apache.org/jira/browse/KAFKA-15926 Project: Kafka Issue Type: Bug Reporter: Lucas Brutschy Test is intermittently failing. See [https://ge.apache.org/scans/tests?search.names=Git%20branch=P28D=kafka=Europe%2FVienna=trunk=*MirrorConnectorsIntegrationSSLTest] for failed runs {code:java} org.opentest4j.AssertionFailedError: `delete.retention.ms` should be different, because it's in exclude filter! ==> expected: not equal but was: <8640> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:152) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertNotEquals.failEqual(AssertNotEquals.java:277) at app//org.junit.jupiter.api.AssertNotEquals.assertNotEquals(AssertNotEquals.java:263) at app//org.junit.jupiter.api.Assertions.assertNotEquals(Assertions.java:2828) at app//org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest.lambda$testReplicateSourceDefault$9(MirrorConnectorsIntegrationBaseTest.java:826) at app//org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:331) at app//org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:379) at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:328) at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:312) at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:302) at app//org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest.testReplicateSourceDefault(MirrorConnectorsIntegrationBaseTest.java:813) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15925) Flaky test testReplicateSourceDefault - MirrorConnectorsIntegrationTransactionsTest
Lucas Brutschy created KAFKA-15925: -- Summary: Flaky test testReplicateSourceDefault - MirrorConnectorsIntegrationTransactionsTest Key: KAFKA-15925 URL: https://issues.apache.org/jira/browse/KAFKA-15925 Project: Kafka Issue Type: Bug Reporter: Lucas Brutschy Test is intermittently failing. [https://ge.apache.org/scans/tests?search.names=Git%20branch=P28D=kafka=Europe%2FVienna=trunk=*MirrorConnectorsIntegrationTransactionsTest] [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14842/10/tests/] {code:java} org.opentest4j.AssertionFailedError: `delete.retention.ms` should be different, because it's in exclude filter! ==> expected: not equal but was: <8640> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:152) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertNotEquals.failEqual(AssertNotEquals.java:277) at app//org.junit.jupiter.api.AssertNotEquals.assertNotEquals(AssertNotEquals.java:263) at app//org.junit.jupiter.api.Assertions.assertNotEquals(Assertions.java:2828) at app//org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest.lambda$testReplicateSourceDefault$9(MirrorConnectorsIntegrationBaseTest.java:826) at app//org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:331) at app//org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:379) at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:328) at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:312) at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:302) at app//org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest.testReplicateSourceDefault(MirrorConnectorsIntegrationBaseTest.java:813){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15544) Enable existing client integration tests for new protocol
[ https://issues.apache.org/jira/browse/KAFKA-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-15544. Resolution: Fixed > Enable existing client integration tests for new protocol > -- > > Key: KAFKA-15544 > URL: https://issues.apache.org/jira/browse/KAFKA-15544 > Project: Kafka > Issue Type: Sub-task > Components: clients, consumer >Reporter: Lianet Magrans >Assignee: Andrew Schofield >Priority: Blocker > Labels: kip-848, kip-848-client-support, kip-848-e2e, > kip-848-preview > > Enable & validate integration tests defined in `PlaintextConsumerTest` that > relate to the consumer group functionality work with the new consumer group > protocol. The majority of these tests should work with both consumer group > protocols, so parameterization of the tests seems like the most appropriate > way forward. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15916) Flaky test - testClientSideTimeoutAfterFailureToReceiveResponse - KafkaAdminClientTest
Lucas Brutschy created KAFKA-15916: -- Summary: Flaky test - testClientSideTimeoutAfterFailureToReceiveResponse - KafkaAdminClientTest Key: KAFKA-15916 URL: https://issues.apache.org/jira/browse/KAFKA-15916 Project: Kafka Issue Type: Bug Reporter: Lucas Brutschy Test intermittently gives the following result: {code} java.lang.IllegalStateException: No requests pending for inbound response MetadataResponseData(throttleTimeMs=0, brokers=[MetadataResponseBroker(nodeId=1, host='localhost', port=8122, rack=null), MetadataResponseBroker(nodeId=0, host='localhost', port=8121, rack=null), MetadataResponseBroker(nodeId=2, host='localhost', port=8123, rack=null)], clusterId='mockClusterId', controllerId=0, topics=[], clusterAuthorizedOperations=-2147483648) at org.apache.kafka.clients.MockClient.respond(MockClient.java:393) at org.apache.kafka.clients.MockClient.respond(MockClient.java:367) at org.apache.kafka.clients.admin.KafkaAdminClientTest.testClientSideTimeoutAfterFailureToReceiveResponse(KafkaAdminClientTest.java:7017) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15915) Flaky test - testUnrecoverableError - ProducerIdManagerTest
Lucas Brutschy created KAFKA-15915: -- Summary: Flaky test - testUnrecoverableError - ProducerIdManagerTest Key: KAFKA-15915 URL: https://issues.apache.org/jira/browse/KAFKA-15915 Project: Kafka Issue Type: Bug Reporter: Lucas Brutschy Test intermittently gives the following result: {code} java.lang.UnsupportedOperationException: Success.failed at scala.util.Success.failed(Try.scala:277) at kafka.coordinator.transaction.ProducerIdManagerTest.verifyFailure(ProducerIdManagerTest.scala:234) at kafka.coordinator.transaction.ProducerIdManagerTest.testUnrecoverableErrors(ProducerIdManagerTest.scala:199) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15914) Flaky test - testAlterSinkConnectorOffsetsOverriddenConsumerGroupId - OffsetsApiIntegrationTest
Lucas Brutschy created KAFKA-15914: -- Summary: Flaky test - testAlterSinkConnectorOffsetsOverriddenConsumerGroupId - OffsetsApiIntegrationTest Key: KAFKA-15914 URL: https://issues.apache.org/jira/browse/KAFKA-15914 Project: Kafka Issue Type: Bug Reporter: Lucas Brutschy ``` org.opentest4j.AssertionFailedError: Condition not met within timeout 3. Sink connector consumer group offsets should catch up to the topic end offsets ==> expected: but was: at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210) at org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:331) at org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:379) at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:328) at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:312) at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:302) at org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.verifyExpectedSinkConnectorOffsets(OffsetsApiIntegrationTest.java:917) at org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.alterAndVerifySinkConnectorOffsets(OffsetsApiIntegrationTest.java:396) at org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsOverriddenConsumerGroupId(OffsetsApiIntegrationTest.java:297) ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14624) State restoration is broken with standby tasks and cache-enabled stores in processor API
[ https://issues.apache.org/jira/browse/KAFKA-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-14624. Resolution: Fixed > State restoration is broken with standby tasks and cache-enabled stores in > processor API > > > Key: KAFKA-14624 > URL: https://issues.apache.org/jira/browse/KAFKA-14624 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 3.3.1 >Reporter: Balaji Rao >Assignee: Lucas Brutschy >Priority: Major > > I found that cache-enabled state stores in PAPI with standby tasks sometimes > returns stale data when a partition moves from one app instance to another > and back. [Here's|https://github.com/balajirrao/kafka-streams-multi-runner] a > small project that I used to reproduce the issue. > I dug around a bit and it seems like it's a bug in standby task state > restoration when caching is enabled. If a partition moves from instance 1 to > 2 and then back to instance 1, since the `CachingKeyValueStore` doesn't > register a restore callback, it can return potentially stale data for > non-dirty keys. > I could fix the issue by modifying the `CachingKeyValueStore` to register a > restore callback in which the cache restored keys are added to the cache. Is > this fix in the right direction? > {code:java} > // register the store > context.register( > root, > (RecordBatchingStateRestoreCallback) records -> { > for (final ConsumerRecord record : > records) { > put(Bytes.wrap(record.key()), record.value()); > } > } > ); > {code} > > I would like to contribute a fix, if I can get some help! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15798) Flaky Test NamedTopologyIntegrationTest.shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology()
[ https://issues.apache.org/jira/browse/KAFKA-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-15798. Resolution: Fixed Test disabled in https://github.com/apache/kafka/pull/14830 since feature will be removed. > Flaky Test > NamedTopologyIntegrationTest.shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology() > - > > Key: KAFKA-15798 > URL: https://issues.apache.org/jira/browse/KAFKA-15798 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests >Reporter: Justine Olshan >Assignee: Lucas Brutschy >Priority: Major > Labels: flaky-test > > I saw a few examples recently. 2 have the same error, but the third is > different > [https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14629/22/testReport/junit/org.apache.kafka.streams.integration/NamedTopologyIntegrationTest/Build___JDK_8_and_Scala_2_12___shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology___2/] > [https://ci-builds.apache.org/job/Kafka/job/kafka/job/trunk/2365/testReport/junit/org.apache.kafka.streams.integration/NamedTopologyIntegrationTest/Build___JDK_21_and_Scala_2_13___shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology__/] > > The failure is like > {code:java} > java.lang.AssertionError: Did not receive all 5 records from topic > output-stream-1 within 6 ms, currently accumulated data is [] Expected: > is a value equal to or greater than <5> but: <0> was less than <5>{code} > The other failure was > [https://ci-builds.apache.org/job/Kafka/job/kafka/job/trunk/2365/testReport/junit/org.apache.kafka.streams.integration/NamedTopologyIntegrationTest/Build___JDK_8_and_Scala_2_12___shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology__/] > {code:java} > java.lang.AssertionError: Expected: <[0, 1]> but: was <[0]>{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14014) Flaky test NamedTopologyIntegrationTest.shouldAllowRemovingAndAddingNamedTopologyToRunningApplicationWithMultipleNodesAndResetsOffsets()
[ https://issues.apache.org/jira/browse/KAFKA-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-14014. Resolution: Fixed Test disabled since feature will be removed. > Flaky test > NamedTopologyIntegrationTest.shouldAllowRemovingAndAddingNamedTopologyToRunningApplicationWithMultipleNodesAndResetsOffsets() > > > Key: KAFKA-14014 > URL: https://issues.apache.org/jira/browse/KAFKA-14014 > Project: Kafka > Issue Type: Test > Components: streams >Reporter: Bruno Cadonna >Assignee: Matthew de Detrich >Priority: Critical > Labels: flaky-test > > {code:java} > java.lang.AssertionError: > Expected: <[KeyValue(B, 1), KeyValue(A, 2), KeyValue(C, 2)]> > but: was <[KeyValue(B, 1), KeyValue(A, 2), KeyValue(C, 1)]> > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6) > at > org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldAllowRemovingAndAddingNamedTopologyToRunningApplicationWithMultipleNodesAndResetsOffsets(NamedTopologyIntegrationTest.java:540) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base/java.lang.Thread.run(Thread.java:833) > {code} > https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-12310/2/testReport/junit/org.apache.kafka.streams.integration/NamedTopologyIntegrationTest/Build___JDK_11_and_Scala_2_13___shouldAllowRemovingAndAddingNamedTopologyToRunningApplicationWithMultipleNodesAndResetsOffsets/ > https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-12310/2/testReport/junit/org.apache.kafka.streams.integration/NamedTopologyIntegrationTest/Build___JDK_17_and_Scala_2_13___shouldAllowRemovingAndAddingNamedTopologyToRunningApplicationWithMultipleNodesAndResetsOffsets/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-13531) Flaky test NamedTopologyIntegrationTest
[ https://issues.apache.org/jira/browse/KAFKA-13531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-13531. Resolution: Fixed Test disabled since feature will be removed. > Flaky test NamedTopologyIntegrationTest > --- > > Key: KAFKA-13531 > URL: https://issues.apache.org/jira/browse/KAFKA-13531 > Project: Kafka > Issue Type: Test > Components: streams, unit tests >Reporter: Matthias J. Sax >Assignee: Matthew de Detrich >Priority: Critical > Labels: flaky-test > Attachments: > org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldRemoveOneNamedTopologyWhileAnotherContinuesProcessing().test.stdout > > > org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldRemoveNamedTopologyToRunningApplicationWithMultipleNodesAndResetsOffsets > {quote}java.lang.AssertionError: Did not receive all 3 records from topic > output-stream-2 within 6 ms, currently accumulated data is [] Expected: > is a value equal to or greater than <3> but: <0> was less than <3> at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at > org.apache.kafka.streams.integration.utils.IntegrationTestUtils.lambda$waitUntilMinKeyValueRecordsReceived$1(IntegrationTestUtils.java:648) > at > org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:368) > at > org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:336) > at > org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:644) > at > org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:617) > at > org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldRemoveNamedTopologyToRunningApplicationWithMultipleNodesAndResetsOffsets(NamedTopologyIntegrationTest.java:439){quote} > STDERR > {quote}java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.GroupSubscribedToTopicException: Deleting > offsets of a topic is forbidden while the consumer group is actively > subscribed to it. at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165) > at > org.apache.kafka.streams.processor.internals.namedtopology.KafkaStreamsNamedTopologyWrapper.lambda$removeNamedTopology$3(KafkaStreamsNamedTopologyWrapper.java:213) > at > org.apache.kafka.common.internals.KafkaFutureImpl.lambda$whenComplete$2(KafkaFutureImpl.java:107) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.kafka.common.internals.KafkaCompletableFuture.kafkaComplete(KafkaCompletableFuture.java:39) > at > org.apache.kafka.common.internals.KafkaFutureImpl.complete(KafkaFutureImpl.java:122) > at > org.apache.kafka.streams.processor.internals.TopologyMetadata.maybeNotifyTopologyVersionWaiters(TopologyMetadata.java:154) > at > org.apache.kafka.streams.processor.internals.StreamThread.checkForTopologyUpdates(StreamThread.java:916) > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:598) > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:575) > Caused by: org.apache.kafka.common.errors.GroupSubscribedToTopicException: > Deleting offsets of a topic is forbidden while the consumer group is actively > subscribed to it. java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.GroupSubscribedToTopicException: Deleting > offsets of a topic is forbidden while the consumer group is actively > subscribed to it. at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165) > at > org.apache.kafka.streams.processor.internals.namedtopology.KafkaStreamsNamedTopologyWrapper.lambda$removeNamedTopology$3(KafkaStreamsNamedTopologyWrapper.java:213) > at > org.apache.kafka.common.internals.KafkaFutureImpl.lambda$whenComplete$2(KafkaFutureImpl.java:107) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at >
[jira] [Created] (KAFKA-15887) Autocommit during close consistently fails with exception in background thread
Lucas Brutschy created KAFKA-15887: -- Summary: Autocommit during close consistently fails with exception in background thread Key: KAFKA-15887 URL: https://issues.apache.org/jira/browse/KAFKA-15887 Project: Kafka Issue Type: Sub-task Reporter: Lucas Brutschy Assignee: Philip Nee when I run {{AsyncKafkaConsumerTest}} I get this every time I call close: {code:java} java.lang.IndexOutOfBoundsException: Index: 0 at java.base/java.util.Collections$EmptyList.get(Collections.java:4483) at java.base/java.util.Collections$UnmodifiableList.get(Collections.java:1310) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.findCoordinatorSync(ConsumerNetworkThread.java:302) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.ensureCoordinatorReady(ConsumerNetworkThread.java:288) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.maybeAutoCommitAndLeaveGroup(ConsumerNetworkThread.java:276) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.cleanup(ConsumerNetworkThread.java:257) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.run(ConsumerNetworkThread.java:101) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15833) Restrict Consumer API to be used from one thread
Lucas Brutschy created KAFKA-15833: -- Summary: Restrict Consumer API to be used from one thread Key: KAFKA-15833 URL: https://issues.apache.org/jira/browse/KAFKA-15833 Project: Kafka Issue Type: Sub-task Reporter: Lucas Brutschy The legacy consumer restricts the API to be used from one thread only. This is not enforced in the new consumer. To avoid inconsistencies in the behavior, we should enforce the same restriction in the new consumer. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15326) Decouple Processing Thread from Polling Thread
[ https://issues.apache.org/jira/browse/KAFKA-15326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-15326. Resolution: Implemented > Decouple Processing Thread from Polling Thread > -- > > Key: KAFKA-15326 > URL: https://issues.apache.org/jira/browse/KAFKA-15326 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Lucas Brutschy >Assignee: Lucas Brutschy >Priority: Critical > > As part of an ongoing effort to implement a better threading architecture in > Kafka streams, we decouple N stream threads into N polling threads and N > processing threads. The effort to consolidate N polling thread into a single > thread is follow-up after this ticket. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15463) StreamsException: Accessing from an unknown node
[ https://issues.apache.org/jira/browse/KAFKA-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-15463. Resolution: Not A Problem > StreamsException: Accessing from an unknown node > - > > Key: KAFKA-15463 > URL: https://issues.apache.org/jira/browse/KAFKA-15463 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 3.2.1 >Reporter: Yevgeny >Priority: Major > > After some time application was working fine, starting to get: > > This is springboot application runs in kubernetes as stateful pod. > > > > {code:java} > Exception in thread > "-ddf9819f-d6c7-46ce-930e-cd923e1b3c2c-StreamThread-1" > org.apache.kafka.streams.errors.StreamsException: Accessing from an unknown > node at > org.apache.kafka.streams.processor.internals.ProcessorContextImpl.getStateStore(ProcessorContextImpl.java:162) > at myclass1.java:28) at myclass2.java:48) at > java.base/java.util.stream.MatchOps$1MatchSink.accept(MatchOps.java:90) at > java.base/java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1602) > at > java.base/java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:129) > at > java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:527) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:513) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230) > at > java.base/java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.allMatch(ReferencePipeline.java:637) > at myclass3.java:48) at > org.apache.kafka.streams.kstream.internals.TransformerSupplierAdapter$1.transform(TransformerSupplierAdapter.java:49) > at > org.apache.kafka.streams.kstream.internals.TransformerSupplierAdapter$1.transform(TransformerSupplierAdapter.java:38) > at > org.apache.kafka.streams.kstream.internals.KStreamFlatTransform$KStreamFlatTransformProcessor.process(KStreamFlatTransform.java:66) > at > org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:146) > at > org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forwardInternal(ProcessorContextImpl.java:275) > at > org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:254) > at > org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:213) > at > org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:84) > at > org.apache.kafka.streams.processor.internals.StreamTask.lambda$doProcess$1(StreamTask.java:780) > at > org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:809) > at > org.apache.kafka.streams.processor.internals.StreamTask.doProcess(StreamTask.java:780) > at > org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:711) > at > org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:100) > at > org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:81) > at > org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1177) > at > org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:769) > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:589) > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:551) > {code} > > stream-thread > [-ddf9819f-d6c7-46ce-930e-cd923e1b3c2c-StreamThread-1] State > transition from PENDING_SHUTDOWN to DEAD > > > Transformer is Prototype bean, the supplier supplys new instance of the > Transformer: > > > {code:java} > @Override public Transformer> get() > { return ctx.getBean(MyTransformer.class); }{code} > > > The only way to recover is to delete all topics used by kafkastreams, even if > application restarted same exception is thrown. > *If messages in internal topics of 'store-changelog' are deleted/offset > manipulated, can it cause the issue? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15378) Rolling upgrade system tests are failing
Lucas Brutschy created KAFKA-15378: -- Summary: Rolling upgrade system tests are failing Key: KAFKA-15378 URL: https://issues.apache.org/jira/browse/KAFKA-15378 Project: Kafka Issue Type: Task Components: streams Affects Versions: 3.5.1 Reporter: Lucas Brutschy The system tests are having failures for these tests: ``` kafkatest.tests.streams.streams_upgrade_test.StreamsUpgradeTest.test_rolling_upgrade_with_2_bounces.from_version=3.1.2.to_version=3.6.0-SNAPSHOT kafkatest.tests.streams.streams_upgrade_test.StreamsUpgradeTest.test_rolling_upgrade_with_2_bounces.from_version=3.2.3.to_version=3.6.0-SNAPSHOT kafkatest.tests.streams.streams_upgrade_test.StreamsUpgradeTest.test_rolling_upgrade_with_2_bounces.from_version=3.3.1.to_version=3.6.0-SNAPSHOT ``` See [https://jenkins.confluent.io/job/system-test-kafka-branch-builder/5801/console] for logs and other test data. Note that system tests currently only run with [this fix](https://github.com/apache/kafka/commit/24d1780061a645bb2fbeefd8b8f50123c28ca94e), I think some CVE python library update broke the system tests... -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15326) Decouple Processing Thread from Polling Thread
Lucas Brutschy created KAFKA-15326: -- Summary: Decouple Processing Thread from Polling Thread Key: KAFKA-15326 URL: https://issues.apache.org/jira/browse/KAFKA-15326 Project: Kafka Issue Type: Task Components: streams Reporter: Lucas Brutschy Assignee: Lucas Brutschy As part of an ongoing effort to implement a better threading architecture in Kafka streams, we decouple N stream threads into N polling threads and N processing threads. The effort to consolidate N polling thread into a single thread is follow-up after this ticket. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14278) Convert INVALID_PRODUCER_EPOCH into PRODUCER_FENCED TxnOffsetCommit
[ https://issues.apache.org/jira/browse/KAFKA-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-14278. Resolution: Fixed > Convert INVALID_PRODUCER_EPOCH into PRODUCER_FENCED TxnOffsetCommit > --- > > Key: KAFKA-14278 > URL: https://issues.apache.org/jira/browse/KAFKA-14278 > Project: Kafka > Issue Type: Sub-task > Components: producer , streams >Reporter: Lucas Brutschy >Assignee: Lucas Brutschy >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14172) bug: State stores lose state when tasks are reassigned under EOS wit…
[ https://issues.apache.org/jira/browse/KAFKA-14172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-14172. Resolution: Fixed > bug: State stores lose state when tasks are reassigned under EOS wit… > - > > Key: KAFKA-14172 > URL: https://issues.apache.org/jira/browse/KAFKA-14172 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 3.1.1 >Reporter: Martin Hørslev >Assignee: Guozhang Wang >Priority: Critical > Fix For: 3.5.0 > > > h1. State stores lose state when tasks are reassigned under EOS with standby > replicas and default acceptable lag. > I have observed that state stores used in a transform step under a Exactly > Once semantics ends up losing state after a rebalancing event that includes > reassignment of tasks to previous standby task within the acceptable standby > lag. > > The problem is reproduceable and an integration test have been created to > showcase the [issue|https://github.com/apache/kafka/pull/12540]. > A detailed description of the observed issue is provided > [here|https://github.com/apache/kafka/pull/12540/files?short_path=3ca480e#diff-3ca480ef093a1faa18912e1ebc679be492b341147b96d7a85bda59911228ef45] > Similar issues have been observed and reported to StackOverflow for example > [here|https://stackoverflow.com/questions/69038181/kafka-streams-aggregation-data-loss-between-instance-restarts-and-rebalances]. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14343) Write upgrade/downgrade tests for enabling the state updater
[ https://issues.apache.org/jira/browse/KAFKA-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-14343. Resolution: Fixed > Write upgrade/downgrade tests for enabling the state updater > - > > Key: KAFKA-14343 > URL: https://issues.apache.org/jira/browse/KAFKA-14343 > Project: Kafka > Issue Type: Task > Components: streams >Reporter: Lucas Brutschy >Assignee: Lucas Brutschy >Priority: Major > > Write a test that verifies the upgrade from a version of Streams with state > updater disabled to a version with state updater enabled and vice versa, so > that we can offer a save upgrade path. > * upgrade test from a version of Streams with state updater disabled to a > version with state updater enabled (probably a system test since the old code > path will be removed from the code base) > * downgrade test from a version of Streams with state updater enabled to a > version with state updater disabled (probably a system test since the old > code path will be removed from the code base) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14532) IllegalStateException when fetch failure happens after assignment changed
Lucas Brutschy created KAFKA-14532: -- Summary: IllegalStateException when fetch failure happens after assignment changed Key: KAFKA-14532 URL: https://issues.apache.org/jira/browse/KAFKA-14532 Project: Kafka Issue Type: Bug Components: clients Reporter: Lucas Brutschy Assignee: Lucas Brutschy On master, all our long-running test jobs are running into this exception: ``` java.lang.IllegalStateException: No current assignment for partition stream-soak-test-KSTREAM-OUTERTHIS-86-store-changelog-1 2 at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:370) 3 at org.apache.kafka.clients.consumer.internals.SubscriptionState.clearPreferredReadReplica(SubscriptionState.java:623) 4 at java.util.LinkedHashMap$LinkedKeySet.forEach(LinkedHashMap.java:559) 5 at org.apache.kafka.clients.consumer.internals.Fetcher$1.onFailure(Fetcher.java:349) 6 at org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:179) 7 at org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:149) 8 at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:613) 9 at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:427) 10 at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262) 11 at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:251) 12 at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1307) 13 at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1243) 14 at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) 15 at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:450) 16 at org.apache.kafka.streams.processor.internals.StreamThread.initializeAndRestorePhase(StreamThread.java:910) 17 at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:773) 18 at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:613) 19 at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:575) 20[2022-12-13 04:01:59,024] ERROR [i-016cf5d2c1889c316-StreamThread-1] stream-client [i-016cf5d2c1889c316] Encountered the following exception during processing and sent shutdown request for the entire application. (org.apache.kafka.streams.KafkaStreams) 21org.apache.kafka.streams.errors.StreamsException: java.lang.IllegalStateException: No current assignment for partition stream-soak-test-KSTREAM-OUTERTHIS-86-store-changelog-1 22 at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:653) 23 at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:575) 24Caused by: java.lang.IllegalStateException: No current assignment for partition stream-soak-test-KSTREAM-OUTERTHIS-86-store-changelog-1 25 at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:370) 26 at org.apache.kafka.clients.consumer.internals.SubscriptionState.clearPreferredReadReplica(SubscriptionState.java:623) 27 at java.util.LinkedHashMap$LinkedKeySet.forEach(LinkedHashMap.java:559) 28 at org.apache.kafka.clients.consumer.internals.Fetcher$1.onFailure(Fetcher.java:349) 29 at org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:179) 30 at org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:149) 31 at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:613) 32 at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:427) 33 at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262) 34 at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:251) 35 at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1307) 36 at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1243) 37 at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) 38 at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:450) 39 at org.apache.kafka.streams.processor.internals.StreamThread.initializeAndRestorePhase(StreamThread.java:910) 40 at
[jira] [Created] (KAFKA-14530) Check state updater more than once in process loops
Lucas Brutschy created KAFKA-14530: -- Summary: Check state updater more than once in process loops Key: KAFKA-14530 URL: https://issues.apache.org/jira/browse/KAFKA-14530 Project: Kafka Issue Type: Task Components: streams Reporter: Lucas Brutschy In the new state restoration code, the state updater needs to be checked regularly by the main thread to transfer ownership of tasks back to the main thread once the state of the task is restored. The more often we check this, the faster we can start processing the tasks. Currently, we only check the state updater once in every loop iteration of the state updater. And while we couldn't observe this to be strictly not often enough, we can increase the number of checks easily by moving the check inside the inner processing loop. This would mean that once we have iterated over `numIterations` records, we can already start processing tasks that have finished restoration in the meantime. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14432) RocksDBStore relies on finalizers to not leak memory
Lucas Brutschy created KAFKA-14432: -- Summary: RocksDBStore relies on finalizers to not leak memory Key: KAFKA-14432 URL: https://issues.apache.org/jira/browse/KAFKA-14432 Project: Kafka Issue Type: Bug Reporter: Lucas Brutschy Assignee: Lucas Brutschy Relying on finalizers in RocksDB has been deprecated for a long time, and starting with rocksdb 7, finalizers are removed completely (see [https://github.com/facebook/rocksdb/pull/9523]). Kafka Streams currently relies on finalizers in parts to not leak memory. This needs to be resolved before we can upgrade to RocksDB 7. See [https://github.com/apache/kafka/pull/12809] . This is a native heap profile after running Kafka Streams without finalizers for a few hours: Total: 13547.5 MB 12936.3 95.5% 95.5% 12936.3 95.5% rocksdb::port::cacheline_aligned_alloc 438.5 3.2% 98.7%438.5 3.2% rocksdb::BlockFetcher::ReadBlockContents 84.0 0.6% 99.3% 84.2 0.6% rocksdb::Arena::AllocateNewBlock 45.9 0.3% 99.7% 45.9 0.3% prof_backtrace_impl 8.1 0.1% 99.7% 14.6 0.1% rocksdb::BlockBasedTable::PutDataBlockToCache 6.4 0.0% 99.8% 12941.4 95.5% Java_org_rocksdb_Statistics_newStatistics___3BJ 6.1 0.0% 99.8% 6.9 0.1% rocksdb::LRUCacheShard::Insert@2d8b20 5.1 0.0% 99.9% 6.5 0.0% rocksdb::VersionSet::ProcessManifestWrites 3.9 0.0% 99.9% 3.9 0.0% rocksdb::WritableFileWriter::WritableFileWriter 3.2 0.0% 99.9% 3.2 0.0% std::string::_Rep::_S_create -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14299) Benchmark and stabilize state updater
[ https://issues.apache.org/jira/browse/KAFKA-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-14299. Assignee: Lucas Brutschy Resolution: Fixed > Benchmark and stabilize state updater > - > > Key: KAFKA-14299 > URL: https://issues.apache.org/jira/browse/KAFKA-14299 > Project: Kafka > Issue Type: Task >Reporter: Lucas Brutschy >Assignee: Lucas Brutschy >Priority: Major > > We need to benchmark and stabilize the separate state restoration code path. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14415) `ThreadCache` is getting slower with every additional state store
Lucas Brutschy created KAFKA-14415: -- Summary: `ThreadCache` is getting slower with every additional state store Key: KAFKA-14415 URL: https://issues.apache.org/jira/browse/KAFKA-14415 Project: Kafka Issue Type: Bug Reporter: Lucas Brutschy There are a few lines in `ThreadCache` that I think should be optimized. `sizeBytes` is called at least once, and potentially many times in every `put` and is linear in the number of caches (= number of state stores, so typically proportional to number of tasks). That means, with every additional task, every put gets a little slower. The throughput is 30% higher if replace it by constant time update… Compare the throughput of TIME_ROCKS on trunk (green graph): [http://kstreams-benchmark-results.s3-website-us-west-2.amazonaws.com/experiments/stateheavy-3-5-3-4-0-51b7eb7937-jenkins-20221113214104-streamsbench/] This is the throughput of TIME_ROCKS when a constant time `sizeBytes` implementation is used: [http://kstreams-benchmark-results.s3-website-us-west-2.amazonaws.com/experiments/stateheavy-3-5-LUCASCOMPARE-lucas-20221122140846-streamsbench/] So the throughput is ~20% higher. The same seems to apply for the MEM backend (initial throughput >8000 instead of 6000), however, I cannot run the same benchmark here because the memory is filled too quickly. [http://kstreams-benchmark-results.s3-website-us-west-2.amazonaws.com/experiments/stateheavy-3-5-LUCASSTATE-lucas-20221121231632-streamsbench/] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14309) Kafka Streams upgrade tests do not cover for FK-joins
[ https://issues.apache.org/jira/browse/KAFKA-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy resolved KAFKA-14309. Resolution: Fixed > Kafka Streams upgrade tests do not cover for FK-joins > - > > Key: KAFKA-14309 > URL: https://issues.apache.org/jira/browse/KAFKA-14309 > Project: Kafka > Issue Type: Bug >Reporter: Lucas Brutschy >Assignee: Lucas Brutschy >Priority: Major > > The current streams upgrade system test for FK joins inserts the production > of foreign key data and an actual foreign key join in every version of > SmokeTestDriver except for the latest. The effect was that FK join upgrades > are not tested at all, since no FK join code is executed after the bounce in > the system test. > We should enable the FK-join code in the system test. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14343) Write upgrade/downgrade tests for enabling the state updater
Lucas Brutschy created KAFKA-14343: -- Summary: Write upgrade/downgrade tests for enabling the state updater Key: KAFKA-14343 URL: https://issues.apache.org/jira/browse/KAFKA-14343 Project: Kafka Issue Type: Task Components: streams Reporter: Lucas Brutschy Assignee: Lucas Brutschy Write a test that verifies the upgrade from a version of Streams with state updater disabled to a version with state updater enabled and vice versa, so that we can offer a save upgrade path. * upgrade test from a version of Streams with state updater disabled to a version with state updater enabled (probably a system test since the old code path will be removed from the code base) * downgrade test from a version of Streams with state updater enabled to a version with state updater disabled (probably a system test since the old code path will be removed from the code base) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14309) Kafka Streams upgrade tests do not cover for FK-joins
Lucas Brutschy created KAFKA-14309: -- Summary: Kafka Streams upgrade tests do not cover for FK-joins Key: KAFKA-14309 URL: https://issues.apache.org/jira/browse/KAFKA-14309 Project: Kafka Issue Type: Bug Reporter: Lucas Brutschy The current streams upgrade system test for FK joins inserts the production of foreign key data and an actual foreign key join in every version of SmokeTestDriver except for the latest. The effect was that FK join upgrades are not tested at all, since no FK join code is executed after the bounce in the system test. We should enable the FK-join code in the system test. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14299) Benchmark and stabilize state updater
Lucas Brutschy created KAFKA-14299: -- Summary: Benchmark and stabilize state updater Key: KAFKA-14299 URL: https://issues.apache.org/jira/browse/KAFKA-14299 Project: Kafka Issue Type: Task Reporter: Lucas Brutschy We need to benchmark and stabilize the separate state restoration code path. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14278) Convert INVALID_PRODUCER_EPOCH into PRODUCER_FENCED TxnOffsetCommit
Lucas Brutschy created KAFKA-14278: -- Summary: Convert INVALID_PRODUCER_EPOCH into PRODUCER_FENCED TxnOffsetCommit Key: KAFKA-14278 URL: https://issues.apache.org/jira/browse/KAFKA-14278 Project: Kafka Issue Type: Sub-task Reporter: Lucas Brutschy -- This message was sent by Atlassian Jira (v8.20.10#820010)