[jira] [Commented] (KAFKA-9888) REST extensions can mutate connector configs in worker config state snapshot
[ https://issues.apache.org/jira/browse/KAFKA-9888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086310#comment-17086310 ] ASF GitHub Bot commented on KAFKA-9888: --- C0urante commented on pull request #8511: KAFKA-9888: Copy connector configs before passing to REST extensions URL: https://github.com/apache/kafka/pull/8511 [Jira](https://issues.apache.org/jira/browse/KAFKA-9888) The changes made in [KIP-454](https://cwiki.apache.org/confluence/display/KAFKA/KIP-454%3A+Expansion+of+the+ConnectClusterState+interface) involved adding a `connectorConfig` method to the [ConnectClusterState](https://github.com/apache/kafka/blob/ecde596180975f8546c0e8e10f77f7eee5f1c4d8/connect/api/src/main/java/org/apache/kafka/connect/health/ConnectClusterState.java) interface that REST extensions could use to query the worker for the configuration of a given connector. The implementation for this method returns the Java `Map` that's stored in the worker's view of the config topic (when running in distributed mode). No copying is performed, which causes mutations of that `Map` object to persist across invocations of `connectorConfig` and, even worse, propagate to the worker when, e.g., starting a connector. The changes here just cause the framework to copy that map before sending it to REST extensions, and alter a comment in `KafkaConfigBackingStore` that addresses the mutability of the snapshots that it provides to warn against changes that may lead to bugs like this one. An existing unit test is modified to ensure that REST extensions receive a copy of the connector config, not the original. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > REST extensions can mutate connector configs in worker config state snapshot > > > Key: KAFKA-9888 > URL: https://issues.apache.org/jira/browse/KAFKA-9888 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.3.0, 2.4.0, 2.3.1, 2.5.0, 2.4.1 >Reporter: Chris Egerton >Assignee: Chris Egerton >Priority: Major > > The changes made in > [KIP-454|https://cwiki.apache.org/confluence/display/KAFKA/KIP-454%3A+Expansion+of+the+ConnectClusterState+interface] > involved adding a {{connectorConfig}} method to the > [ConnectClusterState|https://github.com/apache/kafka/blob/ecde596180975f8546c0e8e10f77f7eee5f1c4d8/connect/api/src/main/java/org/apache/kafka/connect/health/ConnectClusterState.java] > interface that REST extensions could use to query the worker for the > configuration of a given connector. The [implementation for this > method|https://github.com/apache/kafka/blob/ecde596180975f8546c0e8e10f77f7eee5f1c4d8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/health/ConnectClusterStateImpl.java#L86-L89] > returns the Java {{Map}} that's stored in the worker's view of the config > topic (when running in distributed mode). No copying is performed, which > causes mutations of that {{Map}} object to persist across invocations of > {{connectorConfig}} and, even worse, propagate to the worker when, e.g., > starting a connector. > We should not give REST extensions that original map, but instead a copy of > it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9888) REST extensions can mutate connector configs in worker config state snapshot
Chris Egerton created KAFKA-9888: Summary: REST extensions can mutate connector configs in worker config state snapshot Key: KAFKA-9888 URL: https://issues.apache.org/jira/browse/KAFKA-9888 Project: Kafka Issue Type: Bug Components: KafkaConnect Affects Versions: 2.4.1, 2.5.0, 2.3.1, 2.4.0, 2.3.0 Reporter: Chris Egerton Assignee: Chris Egerton The changes made in [KIP-454|https://cwiki.apache.org/confluence/display/KAFKA/KIP-454%3A+Expansion+of+the+ConnectClusterState+interface] involved adding a {{connectorConfig}} method to the [ConnectClusterState|https://github.com/apache/kafka/blob/ecde596180975f8546c0e8e10f77f7eee5f1c4d8/connect/api/src/main/java/org/apache/kafka/connect/health/ConnectClusterState.java] interface that REST extensions could use to query the worker for the configuration of a given connector. The [implementation for this method|https://github.com/apache/kafka/blob/ecde596180975f8546c0e8e10f77f7eee5f1c4d8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/health/ConnectClusterStateImpl.java#L86-L89] returns the Java {{Map}} that's stored in the worker's view of the config topic (when running in distributed mode). No copying is performed, which causes mutations of that {{Map}} object to persist across invocations of {{connectorConfig}} and, even worse, propagate to the worker when, e.g., starting a connector. We should not give REST extensions that original map, but instead a copy of it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9839) IllegalStateException on metadata update when broker learns about its new epoch after the controller
[ https://issues.apache.org/jira/browse/KAFKA-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086227#comment-17086227 ] ASF GitHub Bot commented on KAFKA-9839: --- apovzner commented on pull request #8509: KAFKA-9839: Broker should accept control requests with newer broker epoch URL: https://github.com/apache/kafka/pull/8509 A broker throws IllegalStateException if the broker epoch in the LeaderAndIsr/UpdateMetadataRequest/StopReplicaRequest is larger than its current broker epoch. However, there is no guarantee that the broker would receive the latest broker epoch before the controller: When the broker registers with ZK, there are few more instructions to process before this broker "knows" about its epoch, while the controller may already get notified and send UPDATE_METADATA request (as an example) with the new epoch. This will result in clients getting stale metadata from this broker. With this PR, a broker accepts LeaderAndIsr/UpdateMetadataRequest/StopReplicaRequest if the broker epoch is newer than the current epoch. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IllegalStateException on metadata update when broker learns about its new > epoch after the controller > > > Key: KAFKA-9839 > URL: https://issues.apache.org/jira/browse/KAFKA-9839 > Project: Kafka > Issue Type: Bug > Components: controller, core >Affects Versions: 2.3.1 >Reporter: Anna Povzner >Assignee: Anna Povzner >Priority: Critical > > Broker throws "java.lang.IllegalStateException: Epoch XXX larger than current > broker epoch YYY" on UPDATE_METADATA when the controller learns about the > broker epoch and sends UPDATE_METADATA before KafkaZkCLient.registerBroker > completes (the broker learns about its new epoch). > Here is the scenario we observed in more detail: > 1. ZK session expires on broker 1 > 2. Broker 1 establishes new session to ZK and creates znode > 3. Controller learns about broker 1 and assigns epoch > 4. Broker 1 receives UPDATE_METADATA from controller, but it does not know > about its new epoch yet, so we get an exception: > ERROR [KafkaApi-3] Error when handling request: clientId=1, correlationId=0, > api=UPDATE_METADATA, body={ > . > java.lang.IllegalStateException: Epoch XXX larger than current broker epoch > YYY at kafka.server.KafkaApis.isBrokerEpochStale(KafkaApis.scala:2725) at > kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:320) at > kafka.server.KafkaApis.handle(KafkaApis.scala:139) at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at > java.lang.Thread.run(Thread.java:748) > 5. KafkaZkCLient.registerBroker completes on broker 1: "INFO Stat of the > created znode at /brokers/ids/1" > The result is the broker has a stale metadata for some time. > Possible solutions: > 1. Broker returns a more specific error and controller retries UPDATE_MEDATA > 2. Broker accepts UPDATE_METADATA with larger broker epoch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9839) IllegalStateException on metadata update when broker learns about its new epoch after the controller
[ https://issues.apache.org/jira/browse/KAFKA-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086185#comment-17086185 ] Anna Povzner commented on KAFKA-9839: - [~junrao] Thanks, yes I agree that accepting controller requests (UpdateMetadataRequest, LeaderAndIsrRequest, StopReplicaRequest) with newer broker epoch is the best approach – I don't see any drawbacks of accepting newer metadata. I am opening a PR with this solution shortly. > IllegalStateException on metadata update when broker learns about its new > epoch after the controller > > > Key: KAFKA-9839 > URL: https://issues.apache.org/jira/browse/KAFKA-9839 > Project: Kafka > Issue Type: Bug > Components: controller, core >Affects Versions: 2.3.1 >Reporter: Anna Povzner >Assignee: Anna Povzner >Priority: Critical > > Broker throws "java.lang.IllegalStateException: Epoch XXX larger than current > broker epoch YYY" on UPDATE_METADATA when the controller learns about the > broker epoch and sends UPDATE_METADATA before KafkaZkCLient.registerBroker > completes (the broker learns about its new epoch). > Here is the scenario we observed in more detail: > 1. ZK session expires on broker 1 > 2. Broker 1 establishes new session to ZK and creates znode > 3. Controller learns about broker 1 and assigns epoch > 4. Broker 1 receives UPDATE_METADATA from controller, but it does not know > about its new epoch yet, so we get an exception: > ERROR [KafkaApi-3] Error when handling request: clientId=1, correlationId=0, > api=UPDATE_METADATA, body={ > . > java.lang.IllegalStateException: Epoch XXX larger than current broker epoch > YYY at kafka.server.KafkaApis.isBrokerEpochStale(KafkaApis.scala:2725) at > kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:320) at > kafka.server.KafkaApis.handle(KafkaApis.scala:139) at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at > java.lang.Thread.run(Thread.java:748) > 5. KafkaZkCLient.registerBroker completes on broker 1: "INFO Stat of the > created znode at /brokers/ids/1" > The result is the broker has a stale metadata for some time. > Possible solutions: > 1. Broker returns a more specific error and controller retries UPDATE_MEDATA > 2. Broker accepts UPDATE_METADATA with larger broker epoch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-7689) Add Commit/List Offsets Operations to AdminClient
[ https://issues.apache.org/jira/browse/KAFKA-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086181#comment-17086181 ] zhangzhisheng commented on KAFKA-7689: -- good > Add Commit/List Offsets Operations to AdminClient > - > > Key: KAFKA-7689 > URL: https://issues.apache.org/jira/browse/KAFKA-7689 > Project: Kafka > Issue Type: Improvement > Components: admin >Reporter: Mickael Maison >Assignee: Mickael Maison >Priority: Major > Fix For: 2.5.0 > > > Jira for KIP-396: > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97551484 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9224) State store should not see uncommitted transaction result
[ https://issues.apache.org/jira/browse/KAFKA-9224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086127#comment-17086127 ] Boyang Chen commented on KAFKA-9224: This basically goes back to what consistency guarantee for IQ queries we are providing. Consider a billing pipeline, which doesn't turn on EOS and only rely on IQ, then our commitment as at_least_once doesn't guarantee the ordering for all times, so it's possible to see non-monotonic increasing cost value over time. These guarantees are protected by EOS, which aims for not just one time processing but also strong ordering guarantee. Of course we could disagree here, but as long as unclean leader election is still possible, I don't think a fail-over task could provide any linearizable guarantee towards to previous owner. It's good for us to clarify on the SLA first if we don't think this is expected for non-EOS. > State store should not see uncommitted transaction result > - > > Key: KAFKA-9224 > URL: https://issues.apache.org/jira/browse/KAFKA-9224 > Project: Kafka > Issue Type: Improvement >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > > Currently under EOS, the uncommitted write could be reflected in the state > store before the ongoing transaction is finished. This means interactive > query could see uncommitted data within state store which is not ideal for > users relying on state stores for strong consistency. Ideally, we should have > an option to include state store commit as part of ongoing transaction, > however an immediate step towards a better reasoned system is to `write after > transaction commit`, which means we always buffer data within stream cache > for EOS until the ongoing transaction is committed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9818) Flaky Test RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler
[ https://issues.apache.org/jira/browse/KAFKA-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax resolved KAFKA-9818. Fix Version/s: 2.6.0 Resolution: Fixed > Flaky Test > RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler > - > > Key: KAFKA-9818 > URL: https://issues.apache.org/jira/browse/KAFKA-9818 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Assignee: Matthias J. Sax >Priority: Major > Labels: flaky-test > Fix For: 2.6.0 > > > h3. Error Message > java.lang.AssertionError > h3. Stacktrace > java.lang.AssertionError at org.junit.Assert.fail(Assert.java:87) at > org.junit.Assert.assertTrue(Assert.java:42) at > org.junit.Assert.assertTrue(Assert.java:53) at > org.apache.kafka.streams.processor.internals.RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler(RecordCollectorTest.java:521) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at jdk.internal.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) > at com.sun.proxy.$Proxy5.processTestClass(Unknown Source) at > org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118) > at jdk.internal.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at >
[jira] [Commented] (KAFKA-9818) Flaky Test RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler
[ https://issues.apache.org/jira/browse/KAFKA-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086090#comment-17086090 ] ASF GitHub Bot commented on KAFKA-9818: --- mjsax commented on pull request #8507: KAFKA-9818: Fix flaky test in RecordCollectorTest URL: https://github.com/apache/kafka/pull/8507 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Flaky Test > RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler > - > > Key: KAFKA-9818 > URL: https://issues.apache.org/jira/browse/KAFKA-9818 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Assignee: Matthias J. Sax >Priority: Major > Labels: flaky-test > > h3. Error Message > java.lang.AssertionError > h3. Stacktrace > java.lang.AssertionError at org.junit.Assert.fail(Assert.java:87) at > org.junit.Assert.assertTrue(Assert.java:42) at > org.junit.Assert.assertTrue(Assert.java:53) at > org.apache.kafka.streams.processor.internals.RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler(RecordCollectorTest.java:521) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at jdk.internal.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) > at com.sun.proxy.$Proxy5.processTestClass(Unknown Source) at > org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118) > at
[jira] [Commented] (KAFKA-9224) State store should not see uncommitted transaction result
[ https://issues.apache.org/jira/browse/KAFKA-9224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086088#comment-17086088 ] John Roesler commented on KAFKA-9224: - Thanks [~guozhang] , just for clarity, yes, that's the same situation I was talking about. So far, it seems like the problem as stated is actually orthogonal to EOS. My question was, why solve this with an approach that only applies to active queries under EOS, and not instead seek a general solution to the same problem in all circumstances? > State store should not see uncommitted transaction result > - > > Key: KAFKA-9224 > URL: https://issues.apache.org/jira/browse/KAFKA-9224 > Project: Kafka > Issue Type: Improvement >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > > Currently under EOS, the uncommitted write could be reflected in the state > store before the ongoing transaction is finished. This means interactive > query could see uncommitted data within state store which is not ideal for > users relying on state stores for strong consistency. Ideally, we should have > an option to include state store commit as part of ongoing transaction, > however an immediate step towards a better reasoned system is to `write after > transaction commit`, which means we always buffer data within stream cache > for EOS until the ongoing transaction is committed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-6145) Warm up new KS instances before migrating tasks - potentially a two phase rebalance
[ https://issues.apache.org/jira/browse/KAFKA-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086067#comment-17086067 ] ASF GitHub Bot commented on KAFKA-6145: --- vvcephei commented on pull request #8475: KAFKA-6145: KIP-441: Add test scenarios to ensure rebalance convergence URL: https://github.com/apache/kafka/pull/8475 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Warm up new KS instances before migrating tasks - potentially a two phase > rebalance > --- > > Key: KAFKA-6145 > URL: https://issues.apache.org/jira/browse/KAFKA-6145 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Antony Stubbs >Assignee: Sophie Blee-Goldman >Priority: Major > Labels: needs-kip > > Currently when expanding the KS cluster, the new node's partitions will be > unavailable during the rebalance, which for large states can take a very long > time, or for small state stores even more than a few ms can be a deal breaker > for micro service use cases. > One workaround would be two execute the rebalance in two phases: > 1) start running state store building on the new node > 2) once the state store is fully populated on the new node, only then > rebalance the tasks - there will still be a rebalance pause, but would be > greatly reduced > Relates to: KAFKA-6144 - Allow state stores to serve stale reads during > rebalance -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-4090) JVM runs into OOM if (Java) client uses a SSL port without setting the security protocol
[ https://issues.apache.org/jira/browse/KAFKA-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086060#comment-17086060 ] David Mollitor commented on KAFKA-4090: --- Hey updates or thoughts on moving this forward? > JVM runs into OOM if (Java) client uses a SSL port without setting the > security protocol > > > Key: KAFKA-4090 > URL: https://issues.apache.org/jira/browse/KAFKA-4090 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.1, 0.10.0.1, 2.1.0 >Reporter: Jaikiran Pai >Assignee: Alexandre Dupriez >Priority: Major > > Quoting from the mail thread that was sent to Kafka mailing list: > {quote} > We have been using Kafka 0.9.0.1 (server and Java client libraries). So far > we had been using it with plaintext transport but recently have been > considering upgrading to using SSL. It mostly works except that a > mis-configured producer (and even consumer) causes a hard to relate > OutOfMemory exception and thus causing the JVM in which the client is > running, to go into a bad state. We can consistently reproduce that OOM very > easily. We decided to check if this is something that is fixed in 0.10.0.1 so > upgraded one of our test systems to that version (both server and client > libraries) but still see the same issue. Here's how it can be easily > reproduced > 1. Enable SSL listener on the broker via server.properties, as per the Kafka > documentation > {code} > listeners=PLAINTEXT://:9092,SSL://:9093 > ssl.keystore.location= > ssl.keystore.password=pass > ssl.key.password=pass > ssl.truststore.location= > ssl.truststore.password=pass > {code} > 2. Start zookeeper and kafka server > 3. Create a "oom-test" topic (which will be used for these tests): > {code} > kafka-topics.sh --zookeeper localhost:2181 --create --topic oom-test > --partitions 1 --replication-factor 1 > {code} > 4. Create a simple producer which sends a single message to the topic via > Java (new producer) APIs: > {code} > public class OOMTest { > public static void main(final String[] args) throws Exception { > final Properties kafkaProducerConfigs = new Properties(); > // NOTE: Intentionally use a SSL port without specifying > security.protocol as SSL > > kafkaProducerConfigs.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, > "localhost:9093"); > > kafkaProducerConfigs.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, > StringSerializer.class.getName()); > > kafkaProducerConfigs.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, > StringSerializer.class.getName()); > try (KafkaProducer producer = new > KafkaProducer<>(kafkaProducerConfigs)) { > System.out.println("Created Kafka producer"); > final String topicName = "oom-test"; > final String message = "Hello OOM!"; > // send a message to the topic > final Future recordMetadataFuture = > producer.send(new ProducerRecord<>(topicName, message)); > final RecordMetadata sentRecordMetadata = > recordMetadataFuture.get(); > System.out.println("Sent message '" + message + "' to topic '" + > topicName + "'"); > } > System.out.println("Tests complete"); > } > } > {code} > Notice that the server URL is using a SSL endpoint localhost:9093 but isn't > specifying any of the other necessary SSL configs like security.protocol. > 5. For the sake of easily reproducing this issue run this class with a max > heap size of 256MB (-Xmx256M). Running this code throws up the following > OutOfMemoryError in one of the Sender threads: > {code} > 18:33:25,770 ERROR [KafkaThread] - Uncaught exception in > kafka-producer-network-thread | producer-1: > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) > at > org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) > at org.apache.kafka.common.network.Selector.poll(Selector.java:286) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > at java.lang.Thread.run(Thread.java:745) > {code} > Note that I set it to 256MB as heap size to easily reproduce it but this > isn't
[jira] [Comment Edited] (KAFKA-9543) Consumer offset reset after new segment rolling
[ https://issues.apache.org/jira/browse/KAFKA-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085938#comment-17085938 ] Jason Gustafson edited comment on KAFKA-9543 at 4/17/20, 6:19 PM: -- [~junrao] Good suggestion. I opened https://issues.apache.org/jira/browse/KAFKA-9886 to fix this separately. Note that I'll leave the issue here open until we can confirm whether it is fixed by KAFKA-9838. was (Author: hachikuji): [~junrao] Good suggestion. I opened https://issues.apache.org/jira/browse/KAFKA-9886. I'll leave this issue open until we can confirm whether it is fixed by KAFKA-9838. > Consumer offset reset after new segment rolling > --- > > Key: KAFKA-9543 > URL: https://issues.apache.org/jira/browse/KAFKA-9543 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Rafał Boniecki >Priority: Major > Attachments: Untitled.png, image-2020-04-06-17-10-32-636.png > > > After upgrade from kafka 2.1.1 to 2.4.0, I'm experiencing unexpected consumer > offset resets. > Consumer: > {code:java} > 2020-02-12T11:12:58.402+01:00 hostname 4a2a39a35a02 > [2020-02-12T11:12:58,402][INFO > ][org.apache.kafka.clients.consumer.internals.Fetcher] [Consumer > clientId=logstash-1, groupId=logstash] Fetch offset 1632750575 is out of > range for partition stats-5, resetting offset > {code} > Broker: > {code:java} > 2020-02-12 11:12:58:400 CET INFO > [data-plane-kafka-request-handler-1][kafka.log.Log] [Log partition=stats-5, > dir=/kafka4/data] Rolled new log segment at offset 1632750565 in 2 ms.{code} > All resets are perfectly correlated to rolling new segments at the broker - > segment is rolled first, then, couple of ms later, reset on the consumer > occurs. Attached is grafana graph with consumer lag per partition. All sudden > spikes in lag are offset resets due to this bug. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9887) failed-task-count JMX metric not updated if task fails during startup
Chris Egerton created KAFKA-9887: Summary: failed-task-count JMX metric not updated if task fails during startup Key: KAFKA-9887 URL: https://issues.apache.org/jira/browse/KAFKA-9887 Project: Kafka Issue Type: Bug Components: KafkaConnect Affects Versions: 2.4.1, 2.5.0, 2.4.0 Reporter: Chris Egerton If a task fails on startup (specifically, during [this code section|https://github.com/apache/kafka/blob/00a59b392d92b0d6d3a321ef9a53dae4b3a9d030/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L427-L468]), the {{failed-task-count}} JMX metric is not updated to reflect the task failure, even though the status endpoints in the REST API do report the task as failed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9881) Verify whether RocksDBMetrics Get Measurements from RocksDB in a Unit Test instead of an Intergration Test
[ https://issues.apache.org/jira/browse/KAFKA-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085956#comment-17085956 ] ASF GitHub Bot commented on KAFKA-9881: --- guozhangwang commented on pull request #8501: KAFKA-9881: Convert integration test to verify measurements from RocksDB to unit test URL: https://github.com/apache/kafka/pull/8501 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Verify whether RocksDBMetrics Get Measurements from RocksDB in a Unit Test > instead of an Intergration Test > -- > > Key: KAFKA-9881 > URL: https://issues.apache.org/jira/browse/KAFKA-9881 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 2.5.0 >Reporter: Bruno Cadonna >Priority: Major > > The integration test {{RocksDBMetricsIntegrationTest}} takes pretty long to > complete. The main part of the runtime is spent in the two tests that verify > whether the rocksDB metrics get actual measurements from RocksDB. Those tests > need to wait for the thread that collects the measurements of the RocksDB > metrics to trigger the first recordings of the metrics. These tests do not > need to run as integration tests and thus they shall be converted into unit > tests to save runtime. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions
[ https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085948#comment-17085948 ] Evan Williams commented on KAFKA-4084: -- [~sql_consulting] Many thanks for that! I've just started trying to implement the patch, and just wanted to confirm a few things: # Do I *replace* the entire contents of /usr/share/java/kafka (where my Kafka JAR's are), with the 96 JAR's from the tar file ? Or do I just *copy them in addition* to the existing JAR's ? I tried replacing them all, however kafka was just crashing on start. # In regards to the patched JAR's, I see that they have "2.4.1" in the filenames, ie: kafka-clients-2.4.1-SNAPSHOT.jar Whereas my existing JARS's have 5.4.1. - will that effect things, when I do the rolling restarts ? ie, mistmatching versions. > automated leader rebalance causes replication downtime for clusters with too > many partitions > > > Key: KAFKA-4084 > URL: https://issues.apache.org/jira/browse/KAFKA-4084 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1 >Reporter: Tom Crayford >Priority: Major > Labels: reliability > Fix For: 1.1.0 > > > If you enable {{auto.leader.rebalance.enable}} (which is on by default), and > you have a cluster with many partitions, there is a severe amount of > replication downtime following a restart. This causes > `UnderReplicatedPartitions` to fire, and replication is paused. > This is because the current automated leader rebalance mechanism changes > leaders for *all* imbalanced partitions at once, instead of doing it > gradually. This effectively stops all replica fetchers in the cluster > (assuming there are enough imbalanced partitions), and restarts them. This > can take minutes on busy clusters, during which no replication is happening > and user data is at risk. Clients with {{acks=-1}} also see issues at this > time, because replication is effectively stalled. > To quote Todd Palino from the mailing list: > bq. There is an admin CLI command to trigger the preferred replica election > manually. There is also a broker configuration “auto.leader.rebalance.enable” > which you can set to have the broker automatically perform the PLE when > needed. DO NOT USE THIS OPTION. There are serious performance issues when > doing so, especially on larger clusters. It needs some development work that > has not been fully identified yet. > This setting is extremely useful for smaller clusters, but with high > partition counts causes the huge issues stated above. > One potential fix could be adding a new configuration for the number of > partitions to do automated leader rebalancing for at once, and *stop* once > that number of leader rebalances are in flight, until they're done. There may > be better mechanisms, and I'd love to hear if anybody has any ideas. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9543) Consumer offset reset after new segment rolling
[ https://issues.apache.org/jira/browse/KAFKA-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085938#comment-17085938 ] Jason Gustafson commented on KAFKA-9543: [~junrao] Good suggestion. I opened https://issues.apache.org/jira/browse/KAFKA-9886. I'll leave this issue open until we can confirm whether it is fixed by KAFKA-9838. > Consumer offset reset after new segment rolling > --- > > Key: KAFKA-9543 > URL: https://issues.apache.org/jira/browse/KAFKA-9543 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Rafał Boniecki >Priority: Major > Attachments: Untitled.png, image-2020-04-06-17-10-32-636.png > > > After upgrade from kafka 2.1.1 to 2.4.0, I'm experiencing unexpected consumer > offset resets. > Consumer: > {code:java} > 2020-02-12T11:12:58.402+01:00 hostname 4a2a39a35a02 > [2020-02-12T11:12:58,402][INFO > ][org.apache.kafka.clients.consumer.internals.Fetcher] [Consumer > clientId=logstash-1, groupId=logstash] Fetch offset 1632750575 is out of > range for partition stats-5, resetting offset > {code} > Broker: > {code:java} > 2020-02-12 11:12:58:400 CET INFO > [data-plane-kafka-request-handler-1][kafka.log.Log] [Log partition=stats-5, > dir=/kafka4/data] Rolled new log segment at offset 1632750565 in 2 ms.{code} > All resets are perfectly correlated to rolling new segments at the broker - > segment is rolled first, then, couple of ms later, reset on the consumer > occurs. Attached is grafana graph with consumer lag per partition. All sudden > spikes in lag are offset resets due to this bug. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9886) Validate segment range before reading in `Log.read`
Jason Gustafson created KAFKA-9886: -- Summary: Validate segment range before reading in `Log.read` Key: KAFKA-9886 URL: https://issues.apache.org/jira/browse/KAFKA-9886 Project: Kafka Issue Type: Improvement Reporter: Jason Gustafson Assignee: Jason Gustafson Log.read uses the following logic to set the upper limit on a segment read. {code} val maxPosition = { // Use the max offset position if it is on this segment; otherwise, the segment size is the limit. if (maxOffsetMetadata.segmentBaseOffset == segment.baseOffset) { maxOffsetMetadata.relativePositionInSegment } else { segment.size } } {code} In the else branch, the expectation is that `maxOffsetMetadata.segmentBaseOffset > segment.baseOffset`. In KAFKA-9838, we found a bug where this assumption failed which led to reads above the high watermark. We should validate the expectation explicitly so that we don't leave the door open for similar bugs in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9543) Consumer offset reset after new segment rolling
[ https://issues.apache.org/jira/browse/KAFKA-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085916#comment-17085916 ] Jun Rao commented on KAFKA-9543: [~hachikuji]: Thanks for the analysis. It does seem this issue could be caused by KAFKA-9838. Also, in Log.read(), we have code like the following. If we get to the _else_ part, the assumption is that maxOffsetMetadata.segmentBaseOffset > segment.baseOffset. Perhaps it's useful to assert that. That may help uncover issues that we may not know yet. {code:java} val maxPosition = { // Use the max offset position if it is on this segment; otherwise, the segment size is the limit. if (maxOffsetMetadata.segmentBaseOffset == segment.baseOffset) { maxOffsetMetadata.relativePositionInSegment } else { segment.size } }{code} > Consumer offset reset after new segment rolling > --- > > Key: KAFKA-9543 > URL: https://issues.apache.org/jira/browse/KAFKA-9543 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Rafał Boniecki >Priority: Major > Attachments: Untitled.png, image-2020-04-06-17-10-32-636.png > > > After upgrade from kafka 2.1.1 to 2.4.0, I'm experiencing unexpected consumer > offset resets. > Consumer: > {code:java} > 2020-02-12T11:12:58.402+01:00 hostname 4a2a39a35a02 > [2020-02-12T11:12:58,402][INFO > ][org.apache.kafka.clients.consumer.internals.Fetcher] [Consumer > clientId=logstash-1, groupId=logstash] Fetch offset 1632750575 is out of > range for partition stats-5, resetting offset > {code} > Broker: > {code:java} > 2020-02-12 11:12:58:400 CET INFO > [data-plane-kafka-request-handler-1][kafka.log.Log] [Log partition=stats-5, > dir=/kafka4/data] Rolled new log segment at offset 1632750565 in 2 ms.{code} > All resets are perfectly correlated to rolling new segments at the broker - > segment is rolled first, then, couple of ms later, reset on the consumer > occurs. Attached is grafana graph with consumer lag per partition. All sudden > spikes in lag are offset resets due to this bug. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9818) Flaky Test RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler
[ https://issues.apache.org/jira/browse/KAFKA-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085908#comment-17085908 ] ASF GitHub Bot commented on KAFKA-9818: --- mjsax commented on pull request #8507: KAFKA-9818: Fix flaky test in RecordCollectorTest URL: https://github.com/apache/kafka/pull/8507 Call for review @vvcephei. Follow up of #8488 (own ticket number) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Flaky Test > RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler > - > > Key: KAFKA-9818 > URL: https://issues.apache.org/jira/browse/KAFKA-9818 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Assignee: Matthias J. Sax >Priority: Major > Labels: flaky-test > > h3. Error Message > java.lang.AssertionError > h3. Stacktrace > java.lang.AssertionError at org.junit.Assert.fail(Assert.java:87) at > org.junit.Assert.assertTrue(Assert.java:42) at > org.junit.Assert.assertTrue(Assert.java:53) at > org.apache.kafka.streams.processor.internals.RecordCollectorTest.shouldNotThrowStreamsExceptionOnSubsequentCallIfASendFailsWithContinueExceptionHandler(RecordCollectorTest.java:521) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at jdk.internal.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) > at com.sun.proxy.$Proxy5.processTestClass(Unknown Source) at >
[jira] [Resolved] (KAFKA-9819) Flaky Test StoreChangelogReaderTest#shouldNotThrowOnUnknownRevokedPartition[0]
[ https://issues.apache.org/jira/browse/KAFKA-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax resolved KAFKA-9819. Fix Version/s: 2.6.0 Resolution: Fixed > Flaky Test StoreChangelogReaderTest#shouldNotThrowOnUnknownRevokedPartition[0] > -- > > Key: KAFKA-9819 > URL: https://issues.apache.org/jira/browse/KAFKA-9819 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Assignee: Matthias J. Sax >Priority: Major > Labels: flaky-test > Fix For: 2.6.0 > > > h3. Error Message > java.lang.AssertionError: expected:<[test-reader Changelog partition > unknown-0 could not be found, it could be already cleaned up during the > handlingof task corruption and never restore again]> but was:<[[AdminClient > clientId=adminclient-91] Connection to node -1 (localhost/127.0.0.1:8080) > could not be established. Broker may not be available., test-reader Changelog > partition unknown-0 could not be found, it could be already cleaned up during > the handlingof task corruption and never restore again]> > h3. Stacktrace > java.lang.AssertionError: expected:<[test-reader Changelog partition > unknown-0 could not be found, it could be already cleaned up during the > handlingof task corruption and never restore again]> but was:<[[AdminClient > clientId=adminclient-91] Connection to node -1 (localhost/127.0.0.1:8080) > could not be established. Broker may not be available., test-reader Changelog > partition unknown-0 could not be found, it could be already cleaned up during > the handlingof task corruption and never restore again]> -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9819) Flaky Test StoreChangelogReaderTest#shouldNotThrowOnUnknownRevokedPartition[0]
[ https://issues.apache.org/jira/browse/KAFKA-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085832#comment-17085832 ] ASF GitHub Bot commented on KAFKA-9819: --- mjsax commented on pull request #8488: KAFKA-9819: Fix flaky test in StoreChangelogReaderTest URL: https://github.com/apache/kafka/pull/8488 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Flaky Test StoreChangelogReaderTest#shouldNotThrowOnUnknownRevokedPartition[0] > -- > > Key: KAFKA-9819 > URL: https://issues.apache.org/jira/browse/KAFKA-9819 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Assignee: Matthias J. Sax >Priority: Major > Labels: flaky-test > > h3. Error Message > java.lang.AssertionError: expected:<[test-reader Changelog partition > unknown-0 could not be found, it could be already cleaned up during the > handlingof task corruption and never restore again]> but was:<[[AdminClient > clientId=adminclient-91] Connection to node -1 (localhost/127.0.0.1:8080) > could not be established. Broker may not be available., test-reader Changelog > partition unknown-0 could not be found, it could be already cleaned up during > the handlingof task corruption and never restore again]> > h3. Stacktrace > java.lang.AssertionError: expected:<[test-reader Changelog partition > unknown-0 could not be found, it could be already cleaned up during the > handlingof task corruption and never restore again]> but was:<[[AdminClient > clientId=adminclient-91] Connection to node -1 (localhost/127.0.0.1:8080) > could not be established. Broker may not be available., test-reader Changelog > partition unknown-0 could not be found, it could be already cleaned up during > the handlingof task corruption and never restore again]> -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-7965) Flaky Test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
[ https://issues.apache.org/jira/browse/KAFKA-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085744#comment-17085744 ] Bill Bejeck commented on KAFKA-7965: [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/5826/testReport/junit/kafka.api/ConsumerBounceTest/testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup/] > Flaky Test > ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup > > > Key: KAFKA-7965 > URL: https://issues.apache.org/jira/browse/KAFKA-7965 > Project: Kafka > Issue Type: Bug > Components: clients, consumer, unit tests >Affects Versions: 1.1.1, 2.2.0, 2.3.0 >Reporter: Matthias J. Sax >Assignee: David Jacot >Priority: Critical > Labels: flaky-test > Fix For: 2.3.0 > > > To get stable nightly builds for `2.2` release, I create tickets for all > observed test failures. > [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/21/] > {quote}java.lang.AssertionError: Received 0, expected at least 68 at > org.junit.Assert.fail(Assert.java:88) at > org.junit.Assert.assertTrue(Assert.java:41) at > kafka.api.ConsumerBounceTest.receiveAndCommit(ConsumerBounceTest.scala:557) > at > kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1(ConsumerBounceTest.scala:320) > at > kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1$adapted(ConsumerBounceTest.scala:319) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at > kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(ConsumerBounceTest.scala:319){quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-9839) IllegalStateException on metadata update when broker learns about its new epoch after the controller
[ https://issues.apache.org/jira/browse/KAFKA-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085702#comment-17085702 ] Sigurd Sandve edited comment on KAFKA-9839 at 4/17/20, 12:37 PM: - We experienced the same issue, adding some logs if it could be helpful. After this happened the server got stuck in a loop for an hour where ISR was not in sync, and it wasn't fixed until a hard restart was done on the server. {code:java} Apr 16 10:01:11 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:11,527] WARN Client session timed out, have not heard from server in 6981ms for sessionid 0x5484a370021 (org.apache.zookeeper.ClientCnxn) Apr 16 10:01:11 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:11,892] INFO Client session timed out, have not heard from server in 6981ms for sessionid 0x5484a370021, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) Apr 16 10:01:12 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,930] INFO Opening socket connection to server Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,930] INFO Socket connection established to Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,932] WARN Unable to reconnect to ZooKeeper service, session 0x5484a370021 has expired (org.apache.zookeeper.ClientCnxn) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,932] INFO Unable to reconnect to ZooKeeper service, session 0x5484a370021 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,932] INFO EventThread shut down for session: 0x5484a370021 (org.apache.zookeeper.ClientCnxn) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,932] INFO [ZooKeeperClient Kafka server] Session expired. (kafka.zookeeper.ZooKeeperClient) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,934] INFO [ZooKeeperClient Kafka server] Initializing a new session to <1><2><3><4><5>. (kafka.zookeepe Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,934] INFO Initiating client connection, connectString=<1><2><3><4><5> sessionTimeout=6000 watcher=kafka Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,934] INFO Creating /brokers/ids/0 (is it secure? false) (kafka.zk.KafkaZkClient) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:13,042] INFO Opening socket connection to server <1> Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:13,042] INFO Socket connection established to <1>, initiating session (org.apache.zookeeper.ClientCnxn) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:13,832] INFO Session establishment complete on server <1>, sessionid = 0x1008d93001b, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:13,874] ERROR [KafkaApi-0] Error when handling request: clientId=22, correlationId=0, api=UPDATE_METADATA, body={controller_id=22,controller_epoch=83,broker_epoch=25770069286,topic_states=[{topic Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: =47,replicas=[0],offline_replicas=[]},{partition=54,controller_epoch=83,leader=-1,leader_epoch=47,isr=[0],zk_version=47,replicas=[0],offline_replicas=[]},{partition=55,controller_epoch=83,leader=-1,leader_epoch=47 Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: java.lang.IllegalStateException: Epoch 25770069286 larger than current broker epoch 25770067254 Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at kafka.server.KafkaApis.isBrokerEpochStale(KafkaApis.scala:2612) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:242) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at kafka.server.KafkaApis.handle(KafkaApis.scala:119) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at java.lang.Thread.run(Thread.java:748) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:13,885] INFO Stat of the created znode at /brokers/ids/0 is: 25770069286,25770069286,1587024073851,1587024073851,1,0,0,72057596413149211,200,0,25770069286 Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: (kafka.zk.KafkaZkClient) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:13,885] INFO Registered
[jira] [Commented] (KAFKA-9839) IllegalStateException on metadata update when broker learns about its new epoch after the controller
[ https://issues.apache.org/jira/browse/KAFKA-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085702#comment-17085702 ] Sigurd Sandve commented on KAFKA-9839: -- We experienced the same issue, adding some logs if it could be helpful. After this happened this server got stuck in a loop for an hour where ISR was not in sync, and it wasn't fixed until a hard restart was done on the server. {code:java} Apr 16 10:01:11 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:11,527] WARN Client session timed out, have not heard from server in 6981ms for sessionid 0x5484a370021 (org.apache.zookeeper.ClientCnxn) Apr 16 10:01:11 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:11,892] INFO Client session timed out, have not heard from server in 6981ms for sessionid 0x5484a370021, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) Apr 16 10:01:12 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,930] INFO Opening socket connection to server Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,930] INFO Socket connection established to Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,932] WARN Unable to reconnect to ZooKeeper service, session 0x5484a370021 has expired (org.apache.zookeeper.ClientCnxn) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,932] INFO Unable to reconnect to ZooKeeper service, session 0x5484a370021 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,932] INFO EventThread shut down for session: 0x5484a370021 (org.apache.zookeeper.ClientCnxn) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,932] INFO [ZooKeeperClient Kafka server] Session expired. (kafka.zookeeper.ZooKeeperClient) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,934] INFO [ZooKeeperClient Kafka server] Initializing a new session to <1><2><3><4><5>. (kafka.zookeepe Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,934] INFO Initiating client connection, connectString=<1><2><3><4><5> sessionTimeout=6000 watcher=kafka Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:12,934] INFO Creating /brokers/ids/0 (is it secure? false) (kafka.zk.KafkaZkClient) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:13,042] INFO Opening socket connection to server <1> Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:13,042] INFO Socket connection established to <1>, initiating session (org.apache.zookeeper.ClientCnxn) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:13,832] INFO Session establishment complete on server <1>, sessionid = 0x1008d93001b, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:13,874] ERROR [KafkaApi-0] Error when handling request: clientId=22, correlationId=0, api=UPDATE_METADATA, body={controller_id=22,controller_epoch=83,broker_epoch=25770069286,topic_states=[{topic Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: =47,replicas=[0],offline_replicas=[]},{partition=54,controller_epoch=83,leader=-1,leader_epoch=47,isr=[0],zk_version=47,replicas=[0],offline_replicas=[]},{partition=55,controller_epoch=83,leader=-1,leader_epoch=47 Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: java.lang.IllegalStateException: Epoch 25770069286 larger than current broker epoch 25770067254 Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at kafka.server.KafkaApis.isBrokerEpochStale(KafkaApis.scala:2612) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:242) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at kafka.server.KafkaApis.handle(KafkaApis.scala:119) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: at java.lang.Thread.run(Thread.java:748) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:13,885] INFO Stat of the created znode at /brokers/ids/0 is: 25770069286,25770069286,1587024073851,1587024073851,1,0,0,72057596413149211,200,0,25770069286 Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: (kafka.zk.KafkaZkClient) Apr 16 10:01:13 VM-IWM-APP18 kafka-server-start.sh[9200]: [2020-04-16 10:01:13,885] INFO Registered broker 0 at path /brokers/ids/0 with addresses:
[jira] [Created] (KAFKA-9885) Evict last members of a group when the maximum allowed is reached
David Jacot created KAFKA-9885: -- Summary: Evict last members of a group when the maximum allowed is reached Key: KAFKA-9885 URL: https://issues.apache.org/jira/browse/KAFKA-9885 Project: Kafka Issue Type: Bug Reporter: David Jacot Assignee: David Jacot While analysing https://issues.apache.org/jira/browse/KAFKA-7965, we found that multiple members of a group can be evicted from a group if the leader of the consumer offset partition changes before the group is persisted. This happens because the current evection logic always evict the first member which rejoins the group. We would like to change the evection logic so that the last members to rejoin the group are kicked out instead. Here is an example of what happens when the leader changes: {noformat} // Group is loaded in GroupCoordinator 0 // A rebalance is triggered because the group is over capacity [2020-04-02 11:14:33,393] INFO [GroupMetadataManager brokerId=0] Scheduling loading of offsets and group metadata from __consumer_offsets-0 (kafka.coordinator.group.GroupMetadataManager:66) [2020-04-02 11:14:33,406] INFO [Consumer clientId=ConsumerTestConsumer, groupId=group-max-size-test] Discovered group coordinator localhost:40071 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:794) [2020-04-02 11:14:33,409] INFO Static member MemberMetadata(memberId=ConsumerTestConsumer-ceaab707-69af-4a65-8275-cb7db7fb66b3, groupInstanceId=Some(null), clientId=ConsumerTestConsumer, clientHost=/127.0.0.1, sessionTimeoutMs=1, rebalanceTimeoutMs=6, supportedProtocols=List(range), ).groupInstanceId of group group-max-size-test loaded with member id ConsumerTestConsumer-ceaab707-69af-4a65-8275-cb7db7fb66b3 at generation 1. (kafka.coordinator.group.GroupMetadata$:126) [2020-04-02 11:14:33,410] INFO Static member MemberMetadata(memberId=ConsumerTestConsumer-07077ca2-30e9-45cd-b363-30672281bacb, groupInstanceId=Some(null), clientId=ConsumerTestConsumer, clientHost=/127.0.0.1, sessionTimeoutMs=1, rebalanceTimeoutMs=6, supportedProtocols=List(range), ).groupInstanceId of group group-max-size-test loaded with member id ConsumerTestConsumer-07077ca2-30e9-45cd-b363-30672281bacb at generation 1. (kafka.coordinator.group.GroupMetadata$:126) [2020-04-02 11:14:33,412] INFO Static member MemberMetadata(memberId=ConsumerTestConsumer-5d359e65-1f11-43ce-874e-fddf55c0b49d, groupInstanceId=Some(null), clientId=ConsumerTestConsumer, clientHost=/127.0.0.1, sessionTimeoutMs=1, rebalanceTimeoutMs=6, supportedProtocols=List(range), ).groupInstanceId of group group-max-size-test loaded with member id ConsumerTestConsumer-5d359e65-1f11-43ce-874e-fddf55c0b49d at generation 1. (kafka.coordinator.group.GroupMetadata$:126) [2020-04-02 11:14:33,413] INFO [GroupCoordinator 0]: Loading group metadata for group-max-size-test with generation 1 (kafka.coordinator.group.GroupCoordinator:66) [2020-04-02 11:14:33,413] INFO [GroupCoordinator 0]: Preparing to rebalance group group-max-size-test in state PreparingRebalance with old generation 1 (__consumer_offsets-0) (reason: Freshly-loaded group is over capacity (GroupConfig(10,180,2,0).groupMaxSize). Rebalacing in order to give a chance for consumers to commit offsets) (kafka.coordinator.group.GroupCoordinator:66) [2020-04-02 11:14:33,431] INFO [GroupMetadataManager brokerId=0] Finished loading offsets and group metadata from __consumer_offsets-0 in 28 milliseconds, of which 0 milliseconds was spent in the scheduler. (kafka.coordinator.group.GroupMetadataManager:66) // A first consumer is kicked out of the group while trying to re-join [2020-04-02 11:14:33,449] ERROR [Consumer clientId=ConsumerTestConsumer, groupId=group-max-size-test] Attempt to join group failed due to fatal error: The consumer group has reached its max size. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:627) [2020-04-02 11:14:33,451] ERROR [daemon-consumer-assignment-2]: Error due to (kafka.api.AbstractConsumerTest$ConsumerAssignmentPoller:76) org.apache.kafka.common.errors.GroupMaxSizeReachedException: Consumer group group-max-size-test already has the configured maximum number of members. [2020-04-02 11:14:33,451] INFO [daemon-consumer-assignment-2]: Stopped (kafka.api.AbstractConsumerTest$ConsumerAssignmentPoller:66) // Before the rebalance is completed, a preferred replica leader election kicks in and move the leader from 0 to 1 [2020-04-02 11:14:34,155] INFO [Controller id=0] Processing automatic preferred replica leader election (kafka.controller.KafkaController:66) [2020-04-02 11:14:34,169] INFO [Controller id=0] Starting replica leader election (PREFERRED) for partitions group-max-size-test-0,group-max-size-test-3,__consumer_offsets-0 triggered by AutoTriggered (kafka.controller.KafkaController:66) // The group
[jira] [Commented] (KAFKA-6266) Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of __consumer_offsets-xx to log start offset 203569 since the checkpointed offset 120955 is inval
[ https://issues.apache.org/jira/browse/KAFKA-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085625#comment-17085625 ] William Reynolds commented on KAFKA-6266: - If anyone runs into this and needs a workaround before getting 2.4.1/trunk. Might need some tweaking based on where/how you log and obviously different if you aren't using systemd, will need user/dirs tweaked to be a general workaround {code:java} journalctl -u kafka --since 'yesterday' | grep 'Resetting first dirty offset of' | awk '{print $14 ,$19}' | sort -u > /home/user/checkpoints cp /kafka-topic-data/cleaner-offset-checkpoint /home/user cat /home/user/checkpoints | sed -r 's/(.*)-/\1 /g' > /home/user/checkpoints-processed cat /home/user/checkpoints-processed cleaner-offset-checkpoint | sort --key=1,2 -u > /home/user/cleaner-offset-checkpoint.clean sudo systemctl stop kafka; sudo mv /kafka-topic-data/cleaner-offset-checkpoint /home/user/; sudo mv /home/user/cleaner-offset-checkpoint.clean /kafka-topic-data/cleaner-offset-checkpoint; sudo chown -R kafka:kafka /kafka-topic-data/cleaner-offset-checkpoint; sleep 5; sudo systemctl start kafka {code} > Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of > __consumer_offsets-xx to log start offset 203569 since the checkpointed > offset 120955 is invalid. (kafka.log.LogCleanerManager$) > -- > > Key: KAFKA-6266 > URL: https://issues.apache.org/jira/browse/KAFKA-6266 > Project: Kafka > Issue Type: Bug > Components: offset manager >Affects Versions: 1.0.0, 1.0.1 > Environment: CentOS 7, Apache kafka_2.12-1.0.0 >Reporter: VinayKumar >Assignee: David Mao >Priority: Major > Fix For: 2.5.0, 2.4.1 > > > I upgraded Kafka from 0.10.2.1 to 1.0.0 version. From then, I see the below > warnings in the log. > I'm seeing these continuously in the log, and want these to be fixed- so > that they wont repeat. Can someone please help me in fixing the below > warnings. > {code} > WARN Resetting first dirty offset of __consumer_offsets-17 to log start > offset 3346 since the checkpointed offset 3332 is invalid. > (kafka.log.LogCleanerManager$) > WARN Resetting first dirty offset of __consumer_offsets-23 to log start > offset 4 since the checkpointed offset 1 is invalid. > (kafka.log.LogCleanerManager$) > WARN Resetting first dirty offset of __consumer_offsets-19 to log start > offset 203569 since the checkpointed offset 120955 is invalid. > (kafka.log.LogCleanerManager$) > WARN Resetting first dirty offset of __consumer_offsets-35 to log start > offset 16957 since the checkpointed offset 7 is invalid. > (kafka.log.LogCleanerManager$) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9874) broker can not work when use dns fault
[ https://issues.apache.org/jira/browse/KAFKA-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085608#comment-17085608 ] Sönke Liebau commented on KAFKA-9874: - Hi [~zchzx] I am afraid I don't fully understand your ticket, could you please elaborate a little bit more? > broker can not work when use dns fault > -- > > Key: KAFKA-9874 > URL: https://issues.apache.org/jira/browse/KAFKA-9874 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 2.3.1, 2.4.1 >Reporter: para >Priority: Critical > Labels: acl, dns > Attachments: kast.log > > > in 2.3.1 we authenticate using sasl blocked when the dns service is > fault,caused by java native func getHostByAddr. > but the hostname was never used, can use the default name instead of it > > h3. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-1614) Partition log directory name and segments information exposed via JMX
[ https://issues.apache.org/jira/browse/KAFKA-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085575#comment-17085575 ] Raffaele Saggino commented on KAFKA-1614: - After 5 years, do we need a KIP to get this merged? > Partition log directory name and segments information exposed via JMX > - > > Key: KAFKA-1614 > URL: https://issues.apache.org/jira/browse/KAFKA-1614 > Project: Kafka > Issue Type: Improvement > Components: log >Affects Versions: 0.8.1.1 >Reporter: Alexander Demidko >Assignee: Alexander Demidko >Priority: Major > Attachments: log_segments_dir_jmx_info.patch > > > Makes partition log directory and single segments information exposed via > JMX. This is useful to: > - monitor disks usage in a cluster and on single broker > - calculate disk space taken by different topics > - estimate space to be freed when segments are expired > Patch attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-9704) z/OS won't let us resize file when mmap
[ https://issues.apache.org/jira/browse/KAFKA-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mickael Maison reassigned KAFKA-9704: - Assignee: Shuo Zhang > z/OS won't let us resize file when mmap > --- > > Key: KAFKA-9704 > URL: https://issues.apache.org/jira/browse/KAFKA-9704 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.4.0 >Reporter: Shuo Zhang >Assignee: Shuo Zhang >Priority: Major > Fix For: 2.6.0, 2.4.2, 2.5.1 > > > z/OS won't let us resize file when mmap, so we need to force unman like > Windows. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-9603) Number of open files keeps increasing in Streams application
[ https://issues.apache.org/jira/browse/KAFKA-9603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085556#comment-17085556 ] Abhishek Kumar edited comment on KAFKA-9603 at 4/17/20, 8:44 AM: - [~lpandzic] Potentially related issue with using Confluent ksql. ksql is built using kafka streams. - [https://github.com/confluentinc/ksql/issues/5057] was (Author: a6kme): [~lpandzic] Something related while using Confluent ksql. ksql is built using kafka streams. - [https://github.com/confluentinc/ksql/issues/5057] > Number of open files keeps increasing in Streams application > > > Key: KAFKA-9603 > URL: https://issues.apache.org/jira/browse/KAFKA-9603 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.4.0, 2.3.1 > Environment: Spring Boot 2.2.4, OpenJDK 13, Centos image >Reporter: Bruno Iljazovic >Priority: Major > > Problem appeared when upgrading from *2.0.1* to *2.3.1*. > Relevant Kafka Streams code: > {code:java} > KStream events1 = > builder.stream(FIRST_TOPIC_NAME, Consumed.with(stringSerde, event1Serde, > event1TimestampExtractor(), null)) >.mapValues(...); > KStream events2 = > builder.stream(SECOND_TOPIC_NAME, Consumed.with(stringSerde, event2Serde, > event2TimestampExtractor(), null)) >.mapValues(...); > var joinWindows = JoinWindows.of(Duration.of(1, MINUTES).toMillis()) > .until(Duration.of(1, HOURS).toMillis()); > events2.join(events1, this::join, joinWindows, Joined.with(stringSerde, > event2Serde, event1Serde)) >.foreach(...); > {code} > Number of open *.sst files keeps increasing until eventually it hits the os > limit (65536) and causes this exception: > {code:java} > Caused by: org.rocksdb.RocksDBException: While open a file for appending: > /.../0_8/KSTREAM-JOINOTHER-10-store/KSTREAM-JOINOTHER-10-store.157943520/001354.sst: > Too many open files > at org.rocksdb.RocksDB.flush(Native Method) > at org.rocksdb.RocksDB.flush(RocksDB.java:2394) > {code} > Here are example files that are opened and never closed: > {code:java} > /.../0_27/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158245920/000114.sst > /.../0_27/KSTREAM-JOINOTHER-10-store/KSTREAM-JOINOTHER-10-store.158245920/65.sst > /.../0_29/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158215680/000115.sst > /.../0_29/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158245920/000112.sst > /.../0_31/KSTREAM-JOINTHIS-09-store/KSTREAM-JOINTHIS-09-store.158185440/51.sst > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9543) Consumer offset reset after new segment rolling
[ https://issues.apache.org/jira/browse/KAFKA-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085539#comment-17085539 ] Brian Jones commented on KAFKA-9543: Thanks Jason - that sounds promising. I'll keep an eye for the 2.4.2 / 2.5.1 / 2.6.0 release. We only ever hit this in production, which makes testing tricking, but the fact that you have such a solid explanation of why it might've happened and why it only affected 2.4 is very reassuring. > Consumer offset reset after new segment rolling > --- > > Key: KAFKA-9543 > URL: https://issues.apache.org/jira/browse/KAFKA-9543 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Rafał Boniecki >Priority: Major > Attachments: Untitled.png, image-2020-04-06-17-10-32-636.png > > > After upgrade from kafka 2.1.1 to 2.4.0, I'm experiencing unexpected consumer > offset resets. > Consumer: > {code:java} > 2020-02-12T11:12:58.402+01:00 hostname 4a2a39a35a02 > [2020-02-12T11:12:58,402][INFO > ][org.apache.kafka.clients.consumer.internals.Fetcher] [Consumer > clientId=logstash-1, groupId=logstash] Fetch offset 1632750575 is out of > range for partition stats-5, resetting offset > {code} > Broker: > {code:java} > 2020-02-12 11:12:58:400 CET INFO > [data-plane-kafka-request-handler-1][kafka.log.Log] [Log partition=stats-5, > dir=/kafka4/data] Rolled new log segment at offset 1632750565 in 2 ms.{code} > All resets are perfectly correlated to rolling new segments at the broker - > segment is rolled first, then, couple of ms later, reset on the consumer > occurs. Attached is grafana graph with consumer lag per partition. All sudden > spikes in lag are offset resets due to this bug. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9884) Unable to override some client properties in Mirror maker 2.0
Mithun Kumar created KAFKA-9884: --- Summary: Unable to override some client properties in Mirror maker 2.0 Key: KAFKA-9884 URL: https://issues.apache.org/jira/browse/KAFKA-9884 Project: Kafka Issue Type: Bug Components: mirrormaker Affects Versions: 2.4.1, 2.5.0, 2.4.0 Reporter: Mithun Kumar Attachments: mm2.log I have a two 3 node kafka clusters. MirrorMaker 2.0 is being run as a cluster with bin/connect-mirror-maker.sh mm2.properties I am trying to disable message duplication on replication by enabling idempotence. I understand that EOS is marked as a future work in [KIP-382|https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0] however it should be possible by setting enable.idempotence = true and retries > 0. The .enable.idempotence = true takes effect, however overriding the retries fails. I tried all 3 versions that provide MM2 2.4.0 , 2.4.1 and 2.5.0. My mm2.properties config : {noformat} name = pri_to_bkp connector.class = org.apache.kafka.connect.mirror.MirrorSourceConnector topics = test-mm-topic-3 groups = .* clusters = pri, bkp source.cluster.alias = pri target.cluster.alias = bkp sasl.mechanism = GSSAPI sasl.kerberos.service.name = kafka security.protocol = SASL_PLAINTEXT sasl.jaas.config = com.sun.security.auth.module.Krb5LoginModule required \ useKeyTab=true \ keyTab="/etc/security/keytabs/user.keytab" \ principal="u...@xx.xx.com"; pri.enable.idempotence = true bkp.enable.idempotence = true pri.retries = 2147483647 bkp.retries = 2147483647 pri.bootstrap.servers = SASL_PLAINTEXT://kafka1:9092, SASL_PLAINTEXT://kafka2:9092, SASL_PLAINTEXT://kafka3:9092 bkp.bootstrap.servers = SASL_PLAINTEXT://bkp-kafka1:9092, SASL_PLAINTEXT://bkp-kafka2:9092, SASL_PLAINTEXT://bkp-kafka3:9092 pri->bkp.enabled = true pri->bkp.topics = "test-mm-topic-3" {noformat} The error leading to failure is: {noformat} [2020-04-17 15:46:26,525] ERROR [Worker clientId=connect-1, groupId=pri-mm2] Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:297) org.apache.kafka.common.config.ConfigException: Must set retries to non-zero when using the idempotent producer. at org.apache.kafka.clients.producer.ProducerConfig.maybeOverrideAcksAndRetries(ProducerConfig.java:432) at org.apache.kafka.clients.producer.ProducerConfig.postProcessParsedConfig(ProducerConfig.java:400) at org.apache.kafka.common.config.AbstractConfig.(AbstractConfig.java:110) at org.apache.kafka.common.config.AbstractConfig.(AbstractConfig.java:129) at org.apache.kafka.clients.producer.ProducerConfig.(ProducerConfig.java:481) at org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:326) at org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:270) at org.apache.kafka.connect.util.KafkaBasedLog.createProducer(KafkaBasedLog.java:248) at org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:129) at org.apache.kafka.connect.storage.KafkaStatusBackingStore.start(KafkaStatusBackingStore.java:199) at org.apache.kafka.connect.runtime.AbstractHerder.startServices(AbstractHerder.java:124) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:284) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [2020-04-17 15:46:29,089] INFO [Worker clientId=connect-1, groupId=pri-mm2] Herder stopped (org.apache.kafka.connect.runtime.distributed.DistributedHerder:636) [2020-04-17 15:46:29,089] INFO [Worker clientId=connect-2, groupId=bkp-mm2] Herder stopping (org.apache.kafka.connect.runtime.distributed.DistributedHerder:616) [2020-04-17 15:46:34,090] INFO [Worker clientId=connect-2, groupId=bkp-mm2] Herder stopped (org.apache.kafka.connect.runtime.distributed.DistributedHerder:636) [2020-04-17 15:46:34,090] INFO Kafka MirrorMaker stopped. (org.apache.kafka.connect.mirror.MirrorMaker:191) {noformat} The complete log file is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8812) Rebalance Producers
[ https://issues.apache.org/jira/browse/KAFKA-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085506#comment-17085506 ] Werner Daehn commented on KAFKA-8812: - I am still of the opinion this feature would help all and Kafka Connect in particular. Can we get a decision on that? > Rebalance Producers > --- > > Key: KAFKA-8812 > URL: https://issues.apache.org/jira/browse/KAFKA-8812 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.3.0 >Reporter: Werner Daehn >Assignee: Werner Daehn >Priority: Major > Labels: kip > > [KIP-509: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-509%3A+Rebalance+and+restart+Producers|https://cwiki.apache.org/confluence/display/KAFKA/KIP-509%3A+Rebalance+and+restart+Producers] > Please bare with me. Initially this thought sounds stupid but it has its > merits. > > How do you build a distributed producer at the moment? You use Kafka Connect > which in turn requires a cluster that tells which instance is producing what > partitions. > On the consumer side it is different. There Kafka itself does the data > distribution. If you have 10 Kafka partitions and 10 consumers, each will get > data for one partition. With 5 consumers, each will get data from two > partitions. And if there is only a single consumer active, it gets all data. > All is managed by Kafka, all you have to do is start as many consumers as you > want. > > I'd like to suggest something similar for the producers. A producer would > tell Kafka that its source has 10 partitions. The Kafka server then responds > with a list of partitions this instance shall be responsible for. If it is > the only producer, the response would be all 10 partitions. If it is the > second instance starting up, the first instance would get the information it > should produce data for partition 1-5 and the new one for partition 6-10. If > the producer fails to respond with an alive packet, a rebalance does happen, > informing the active producer to take more load and the dead producer will > get an error when sending data again. > For restart, the producer rebalance has to send the starting point where to > start producing the data onwards from as well, of course. Would be best if > this is a user generated pointer and not the topic offset. Then it can be > e.g. the database system change number, a database transaction id or > something similar. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9863) update the deprecated --zookeeper option in the documentation into --bootstrap-server
[ https://issues.apache.org/jira/browse/KAFKA-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085487#comment-17085487 ] Luke Chen commented on KAFKA-9863: -- PR: https://github.com/apache/kafka/pull/8482 > update the deprecated --zookeeper option in the documentation into > --bootstrap-server > - > > Key: KAFKA-9863 > URL: https://issues.apache.org/jira/browse/KAFKA-9863 > Project: Kafka > Issue Type: Bug > Components: docs, documentation >Affects Versions: 2.4.1 >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > Since V2.2.0, the -zookeeper option turned into deprecated because Kafka can > directly connect to brokers with --bootstrap-server (KIP-377). But in the > official documentation, there are many example commands use --zookeeper > instead of --bootstrap-server. Follow the command in the documentation, > you'll get this warning, which is not good. > {code:java} > Warning: --zookeeper is deprecated and will be removed in a future version of > Kafka. > Use --bootstrap-server instead to specify a broker to connect to.{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)