[jira] [Reopened] (KAFKA-9208) Flaky Test SslAdminClientIntegrationTest.testCreatePartitions
[ https://issues.apache.org/jira/browse/KAFKA-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sophie Blee-Goldman reopened KAFKA-9208: Note that this test is distinct from the similar flaky test AdminClientIntegrationTest.testCreatePartitions, and does not duplicate KAFKA-9069 > Flaky Test SslAdminClientIntegrationTest.testCreatePartitions > - > > Key: KAFKA-9208 > URL: https://issues.apache.org/jira/browse/KAFKA-9208 > Project: Kafka > Issue Type: Bug > Components: admin >Affects Versions: 2.4.0 >Reporter: Sophie Blee-Goldman >Priority: Major > Labels: flaky-test > > Java 8 build failed on 2.4-targeted PR > h3. Stacktrace > java.lang.AssertionError: validateOnly expected:<3> but was:<1> at > org.junit.Assert.fail(Assert.java:89) at > org.junit.Assert.failNotEquals(Assert.java:835) at > org.junit.Assert.assertEquals(Assert.java:647) at > kafka.api.AdminClientIntegrationTest$$anonfun$testCreatePartitions$6.apply(AdminClientIntegrationTest.scala:625) > at > kafka.api.AdminClientIntegrationTest$$anonfun$testCreatePartitions$6.apply(AdminClientIntegrationTest.scala:599) > at scala.collection.immutable.List.foreach(List.scala:392) at > kafka.api.AdminClientIntegrationTest.testCreatePartitions(AdminClientIntegrationTest.scala:599) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9208) Flaky Test SslAdminClientIntegrationTest.testCreatePartitions
[ https://issues.apache.org/jira/browse/KAFKA-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxihx resolved KAFKA-9208. --- Resolution: Duplicate > Flaky Test SslAdminClientIntegrationTest.testCreatePartitions > - > > Key: KAFKA-9208 > URL: https://issues.apache.org/jira/browse/KAFKA-9208 > Project: Kafka > Issue Type: Bug > Components: admin >Affects Versions: 2.4.0 >Reporter: Sophie Blee-Goldman >Priority: Major > Labels: flaky-test > > Java 8 build failed on 2.4-targeted PR > h3. Stacktrace > java.lang.AssertionError: validateOnly expected:<3> but was:<1> at > org.junit.Assert.fail(Assert.java:89) at > org.junit.Assert.failNotEquals(Assert.java:835) at > org.junit.Assert.assertEquals(Assert.java:647) at > kafka.api.AdminClientIntegrationTest$$anonfun$testCreatePartitions$6.apply(AdminClientIntegrationTest.scala:625) > at > kafka.api.AdminClientIntegrationTest$$anonfun$testCreatePartitions$6.apply(AdminClientIntegrationTest.scala:599) > at scala.collection.immutable.List.foreach(List.scala:392) at > kafka.api.AdminClientIntegrationTest.testCreatePartitions(AdminClientIntegrationTest.scala:599) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9051) Source task source offset reads can block graceful shutdown
[ https://issues.apache.org/jira/browse/KAFKA-9051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Randall Hauch updated KAFKA-9051: - Fix Version/s: 2.3.2 2.5.0 2.2.3 2.1.2 2.0.2 Merged to the following branches: * `trunk` for inclusion in 2.5.0 * `2.3` for inclusion in 2.3.2 * `2.2` for inclusion in 2.2.3 * `2.1` for inclusion in 2.1.2 * `2.0` for inclusion in 2.0.2 These versions are the next to be released from each of these branches. We are currently in code freeze on the `2.4` branch for the upcoming AK 2.4.0 release. *As this issue is not a blocker for the AK 2.4.0 release, this commit (da4337271) will be merged to the `2.4` branch once that code freeze has been lifted* and AK 2.4.0 has been released, and will be included in the subsequent 2.4.1 release. > Source task source offset reads can block graceful shutdown > --- > > Key: KAFKA-9051 > URL: https://issues.apache.org/jira/browse/KAFKA-9051 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 1.0.2, 1.1.1, 2.0.1, 2.1.1, 2.3.0, 2.2.1, 2.4.0, 2.5.0 >Reporter: Chris Egerton >Assignee: Chris Egerton >Priority: Major > Fix For: 2.0.2, 2.1.2, 2.2.3, 2.5.0, 2.3.2 > > > When source tasks request source offsets from the framework, this results in > a call to > [Future.get()|https://github.com/apache/kafka/blob/8966d066bd2f80c6d8f270423e7e9982097f97b9/connect/runtime/src/main/java/org/apache/kafka/connect/storage/OffsetStorageReaderImpl.java#L79] > with no timeout. In distributed workers, the future is blocked on a > successful [read to the > end|https://github.com/apache/kafka/blob/8966d066bd2f80c6d8f270423e7e9982097f97b9/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaOffsetBackingStore.java#L136] > of the source offsets topic, which in turn will [poll that topic > indefinitely|https://github.com/apache/kafka/blob/8966d066bd2f80c6d8f270423e7e9982097f97b9/connect/runtime/src/main/java/org/apache/kafka/connect/util/KafkaBasedLog.java#L287] > until the latest messages for every partition of that topic have been > consumed. > This normally completes in a reasonable amount of time. However, if the > connectivity between the Connect worker and the Kafka cluster is degraded or > dropped in the middle of one of these reads, it will block until connectivity > is restored and the request completes successfully. > If a task is stopped (due to a manual restart via the REST API, a rebalance, > worker shutdown, etc.) while blocked on a read of source offsets during its > {{start}} method, not only will it fail to gracefully stop, but the framework > [will not even invoke its stop > method|https://github.com/apache/kafka/blob/8966d066bd2f80c6d8f270423e7e9982097f97b9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L183] > until its {{start}} method (and, as a result, the source offset read > request) [has > completed|https://github.com/apache/kafka/blob/8966d066bd2f80c6d8f270423e7e9982097f97b9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L202-L206]. > This prevents the task from being able to clean up any resources it has > allocated and can lead to OOM errors, excessive thread creation, and other > problems. > > I've confirmed that this affects every release of Connect back through 1.0 at > least; I've tagged the most recent bug fix release of every major/minor > version from then on in the {{Affects Version/s}} field to avoid just putting > every version in that field. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9051) Source task source offset reads can block graceful shutdown
[ https://issues.apache.org/jira/browse/KAFKA-9051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978074#comment-16978074 ] ASF GitHub Bot commented on KAFKA-9051: --- rhauch commented on pull request #7532: KAFKA-9051: Prematurely complete source offset read requests for stopped tasks URL: https://github.com/apache/kafka/pull/7532 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Source task source offset reads can block graceful shutdown > --- > > Key: KAFKA-9051 > URL: https://issues.apache.org/jira/browse/KAFKA-9051 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 1.0.2, 1.1.1, 2.0.1, 2.1.1, 2.3.0, 2.2.1, 2.4.0, 2.5.0 >Reporter: Chris Egerton >Assignee: Chris Egerton >Priority: Major > > When source tasks request source offsets from the framework, this results in > a call to > [Future.get()|https://github.com/apache/kafka/blob/8966d066bd2f80c6d8f270423e7e9982097f97b9/connect/runtime/src/main/java/org/apache/kafka/connect/storage/OffsetStorageReaderImpl.java#L79] > with no timeout. In distributed workers, the future is blocked on a > successful [read to the > end|https://github.com/apache/kafka/blob/8966d066bd2f80c6d8f270423e7e9982097f97b9/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaOffsetBackingStore.java#L136] > of the source offsets topic, which in turn will [poll that topic > indefinitely|https://github.com/apache/kafka/blob/8966d066bd2f80c6d8f270423e7e9982097f97b9/connect/runtime/src/main/java/org/apache/kafka/connect/util/KafkaBasedLog.java#L287] > until the latest messages for every partition of that topic have been > consumed. > This normally completes in a reasonable amount of time. However, if the > connectivity between the Connect worker and the Kafka cluster is degraded or > dropped in the middle of one of these reads, it will block until connectivity > is restored and the request completes successfully. > If a task is stopped (due to a manual restart via the REST API, a rebalance, > worker shutdown, etc.) while blocked on a read of source offsets during its > {{start}} method, not only will it fail to gracefully stop, but the framework > [will not even invoke its stop > method|https://github.com/apache/kafka/blob/8966d066bd2f80c6d8f270423e7e9982097f97b9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L183] > until its {{start}} method (and, as a result, the source offset read > request) [has > completed|https://github.com/apache/kafka/blob/8966d066bd2f80c6d8f270423e7e9982097f97b9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L202-L206]. > This prevents the task from being able to clean up any resources it has > allocated and can lead to OOM errors, excessive thread creation, and other > problems. > > I've confirmed that this affects every release of Connect back through 1.0 at > least; I've tagged the most recent bug fix release of every major/minor > version from then on in the {{Affects Version/s}} field to avoid just putting > every version in that field. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9214) test suite generates different errors each time
AK97 created KAFKA-9214: --- Summary: test suite generates different errors each time Key: KAFKA-9214 URL: https://issues.apache.org/jira/browse/KAFKA-9214 Project: Kafka Issue Type: Bug Affects Versions: 2.3.0 Environment: os: rhel:7.6 architecture: ppc64le Reporter: AK97 I have been running the apache/kafka test suite approx. 6/7 times and at each execution it throws up a different set of errors. Some of the errors thrown are as follows. However, note that they aren't the same set seen each time . Would like some help on understanding the cause for the same . I am running it on a High end VM with good connectivity. Errors: 1) org.apache.kafka.common.security.authenticator.SaslAuthenticatorTest > testMultipleServerMechanisms FAILED java.lang.AssertionError: Metric not updated successful-reauthentication-total expected:<0.0> but was:<1.0> expected:<0.0> but was:<1.0> 2) kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsExistingTopic FAILED java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.CoordinatorNotAvailableException: The coordinator is not available -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1
[ https://issues.apache.org/jira/browse/KAFKA-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-9203: --- Priority: Critical (was: Major) > kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1 > --- > > Key: KAFKA-9203 > URL: https://issues.apache.org/jira/browse/KAFKA-9203 > Project: Kafka > Issue Type: Bug > Components: compression, consumer >Affects Versions: 2.3.0, 2.3.1 >Reporter: David Watzke >Priority: Critical > > I run kafka cluster 2.1.1 > when I upgraded a client app to use kafka-client 2.3.0 (or 2.3.1) instead of > 2.2.0, I immediately started getting the following exceptions in a loop when > consuming a topic with LZ4-compressed messages: > {noformat} > 2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] > com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred > while polling and processing messages: org.apache.kafka.common.KafkaExce > ption: Received exception when fetching the next record from > FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue > consumption. > org.apache.kafka.common.KafkaException: Received exception when fetching the > next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the > record to continue consumption. > at > org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473) > > at > org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332) > > at > org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645) > > at > org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606) > > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263) > > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) > at > com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180) > > at > com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19) > > at > resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88) > > at > scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) > at scala.util.control.Exception$Catch.apply(Exception.scala:228) > at scala.util.control.Exception$Catch.either(Exception.scala:252) > at > resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) > at > resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) > at > resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) > at > resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) > at > resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25) > > at > resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25) > > at > resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50) > > at > resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53) > > at > resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53) > > at > resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$6(RequestSaver.scala:19) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$6$adapted(RequestSaver.scala:18) > > at > resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88) > > at > scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) > at scala.util.control.Exception$Catch.apply(Exception.scala:228) > at scala.util.control.Exception$Catch.either(Exception.scala:252) > at > resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) > at > resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) > at > resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) > at > resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) > at >
[jira] [Updated] (KAFKA-9212) Keep receiving FENCED_LEADER_EPOCH while sending ListOffsetRequest
[ https://issues.apache.org/jira/browse/KAFKA-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-9212: --- Priority: Critical (was: Major) > Keep receiving FENCED_LEADER_EPOCH while sending ListOffsetRequest > -- > > Key: KAFKA-9212 > URL: https://issues.apache.org/jira/browse/KAFKA-9212 > Project: Kafka > Issue Type: Bug > Components: consumer, offset manager >Affects Versions: 2.3.0 > Environment: Linux >Reporter: Yannick >Priority: Critical > > When running Kafka connect s3 sink connector ( confluent 5.3.0), after one > broker got restarted (leaderEpoch updated at this point), the connect worker > crashed with the following error : > [2019-11-19 16:20:30,097] ERROR [Worker clientId=connect-1, > groupId=connect-ls] Uncaught exception in herder work thread, exiting: > (org.apache.kafka.connect.runtime.distributed.DistributedHerder:253) > org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by > times in 30003ms > > After investigation, it seems it's because it got fenced when sending > ListOffsetRequest in loop and then got timed out , as follows : > [2019-11-19 16:20:30,020] DEBUG [Consumer clientId=consumer-3, > groupId=connect-ls] Sending ListOffsetRequest (type=ListOffsetRequest, > replicaId=-1, partitionTimestamps={connect_ls_config-0={timestamp: -1, > maxNumOffsets: 1, currentLeaderEpoch: Optional[1]}}, > isolationLevel=READ_UNCOMMITTED) to broker kafka6.fra2.internal:9092 (id: 4 > rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:905) > [2019-11-19 16:20:30,044] DEBUG [Consumer clientId=consumer-3, > groupId=connect-ls] Attempt to fetch offsets for partition > connect_ls_config-0 failed due to FENCED_LEADER_EPOCH, retrying. > (org.apache.kafka.clients.consumer.internals.Fetcher:985) > > The above happens multiple times until timeout. > > According to the debugs, the consumer always get a leaderEpoch of 1 for this > topic when starting up : > > [2019-11-19 13:27:30,802] DEBUG [Consumer clientId=consumer-3, > groupId=connect-ls] Updating last seen epoch from null to 1 for partition > connect_ls_config-0 (org.apache.kafka.clients.Metadata:178) > > > But according to our brokers log, the leaderEpoch should be 2, as follows : > > [2019-11-18 14:19:28,988] INFO [Partition connect_ls_config-0 broker=4] > connect_ls_config-0 starts at Leader Epoch 2 from offset 22. Previous Leader > Epoch was: 1 (kafka.cluster.Partition) > > > This make impossible to restart the worker as it will always get fenced and > then finally timeout. > > It is also impossible to consume with a 2.3 kafka-console-consumer as > follows : > > kafka-console-consumer --bootstrap-server BOOTSTRAPSERVER:9092 --topic > connect_ls_config --from-beginning > > the above will just hang forever ( which is not expected cause there is > data) and we can see those debug messages : > [2019-11-19 22:17:59,124] DEBUG [Consumer clientId=consumer-1, > groupId=console-consumer-3844] Attempt to fetch offsets for partition > connect_ls_config-0 failed due to FENCED_LEADER_EPOCH, retrying. > (org.apache.kafka.clients.consumer.internals.Fetcher) > > > Interesting fact, if we do subscribe the same way with kafkacat (1.5.0) we > can consume without problem ( must be the way kafkacat is consuming which is > different somehow): > > kafkacat -b BOOTSTRAPSERVER:9092 -t connect_ls_config -o beginning > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-6962) DescribeConfigsRequest Schema documentation is wrong/missing detail
[ https://issues.apache.org/jira/browse/KAFKA-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977948#comment-16977948 ] ASF GitHub Bot commented on KAFKA-6962: --- kdrakon commented on pull request #5091: KAFKA-6962 added better docs to DESCRIBE_CONFIGS_REQUEST_RESOURCE_V0 URL: https://github.com/apache/kafka/pull/5091 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DescribeConfigsRequest Schema documentation is wrong/missing detail > --- > > Key: KAFKA-6962 > URL: https://issues.apache.org/jira/browse/KAFKA-6962 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 1.1.0 >Reporter: Sean Policarpio >Priority: Minor > > The Resource fields for DescribeConfigsRequest for the following fields are > all {{null}}: > * resource_type > * resource_name > * config_names > -Additionally, after using the API, I've also noted that {{resource_name}} > should probably be listed as a nullable String since it's optional.- > The PR attached would output something like the following: > *Requests:* > {{DescribeConfigs Request (Version: 0) => [resources]}} > {{ resources => resource_type resource_name [config_names]}} > {{ resource_type => INT8}} > {{ resource_name => STRING}} > {{ config_names => STRING}} > > ||Field||Description|| > |resources|An array of config resources to be returned.| > |resource_type|The resource type, which is one of 0 (UNKNOWN), 1 (ANY), 2 > (TOPIC), 3 (GROUP), 4 (BROKER)| > |resource_name|The resource name to query.| > |config_names|An array of config names to retrieve. If set to null, then all > configs are returned for resource_name.| > > {{DescribeConfigs Request (Version: 1) => [resources] include_synonyms }} > {{ resources => resource_type resource_name [config_names] }} > {{ resource_type => INT8}} > {{ resource_name => STRING}} > {{ config_names => STRING}} > {{ include_synonyms => BOOLEAN}} > ||Field||Description|| > |resources|An array of config resources to be returned.| > |resource_type|The resource type, which is one of 0 (UNKNOWN), 1 (ANY), 2 > (TOPIC), 3 (GROUP), 4 (BROKER)| > |resource_name|The resource name to query.| > |config_names|An array of config names to retrieve. If set to null, then all > configs are returned for resource_name.| > |include_synonyms|null| -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9213) BufferOverflowException on rolling new segment after upgrading Kafka from 1.1.0 to 2.3.1
Daniyar created KAFKA-9213: -- Summary: BufferOverflowException on rolling new segment after upgrading Kafka from 1.1.0 to 2.3.1 Key: KAFKA-9213 URL: https://issues.apache.org/jira/browse/KAFKA-9213 Project: Kafka Issue Type: Bug Components: log Affects Versions: 2.3.1 Environment: Ubuntu 16.04, AWS instance d2.8xlarge. JAVA Options: -Xms16G -Xmx16G -XX:G1HeapRegionSize=16M -XX:MetaspaceSize=96m -XX:MinMetaspaceFreeRatio=50 Reporter: Daniyar We updated our Kafka cluster from 1.1.0 version to 2.3.1. We followed up to step 2 of the [update instruction|[https://kafka.apache.org/documentation/#upgrade]]. Message format and inter-broker protocol versions were left the same: inter.broker.protocol.version=1.1 log.message.format.version=1.1 After upgrading, we started to get some occasional exceptions: {code:java} 2019/11/19 05:30:53 INFO [ProducerStateManager partition=matchmaker_retry_clicks_15m-2] Writing producer snapshot at offset 788532 (kafka.log.ProducerStateManager) 2019/11/19 05:30:53 INFO [Log partition=matchmaker_retry_clicks_15m-2, dir=/mnt/kafka] Rolled new log segment at offset 788532 in 1 ms. (kafka.log.Log) 2019/11/19 05:31:01 ERROR [ReplicaManager broker=0] Error processing append operation on partition matchmaker_retry_clicks_15m-2 (kafka.server.ReplicaManager) 2019/11/19 05:31:01 java.nio.BufferOverflowException 2019/11/19 05:31:01 at java.nio.Buffer.nextPutIndex(Buffer.java:527) 2019/11/19 05:31:01 at java.nio.DirectByteBuffer.putLong(DirectByteBuffer.java:797) 2019/11/19 05:31:01 at kafka.log.TimeIndex.$anonfun$maybeAppend$1(TimeIndex.scala:134) 2019/11/19 05:31:01 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 2019/11/19 05:31:01 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253) 2019/11/19 05:31:01 at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:114) 2019/11/19 05:31:01 at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:520) 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$8(Log.scala:1690) 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1690) 2019/11/19 05:31:01 at scala.Option.foreach(Option.scala:407) 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$2(Log.scala:1690) 2019/11/19 05:31:01 at kafka.log.Log.maybeHandleIOException(Log.scala:2085) 2019/11/19 05:31:01 at kafka.log.Log.roll(Log.scala:1654) 2019/11/19 05:31:01 at kafka.log.Log.maybeRoll(Log.scala:1639) 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$append$2(Log.scala:966) 2019/11/19 05:31:01 at kafka.log.Log.maybeHandleIOException(Log.scala:2085) 2019/11/19 05:31:01 at kafka.log.Log.append(Log.scala:850) 2019/11/19 05:31:01 at kafka.log.Log.appendAsLeader(Log.scala:819) 2019/11/19 05:31:01 at kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772) 2019/11/19 05:31:01 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253) 2019/11/19 05:31:01 at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259) 2019/11/19 05:31:01 at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759) 2019/11/19 05:31:01 at kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763) 2019/11/19 05:31:01 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) 2019/11/19 05:31:01 at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) 2019/11/19 05:31:01 at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) 2019/11/19 05:31:01 at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) 2019/11/19 05:31:01 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) 2019/11/19 05:31:01 at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) 2019/11/19 05:31:01 at scala.collection.TraversableLike.map(TraversableLike.scala:238) 2019/11/19 05:31:01 at scala.collection.TraversableLike.map$(TraversableLike.scala:231) 2019/11/19 05:31:01 at scala.collection.AbstractTraversable.map(Traversable.scala:108) 2019/11/19 05:31:01 at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:751) 2019/11/19 05:31:01 at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:492) 2019/11/19 05:31:01 at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:544) 2019/11/19 05:31:01 at kafka.server.KafkaApis.handle(KafkaApis.scala:113) 2019/11/19 05:31:01 at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) 2019/11/19 05:31:01 at java.lang.Thread.run(Thread.java:748) {code} The error persists until broker gets restarted (or leadership gets moved to another broker). Brokers config: {code:java} advertised.host.name={{ hostname }} port=9092 # Default number of partitions if a value isn't set when the topic is created.
[jira] [Commented] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment
[ https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977902#comment-16977902 ] sats commented on KAFKA-9205: - Cool let me dig into it, thanks. [~vahid] > Add an option to enforce rack-aware partition reassignment > -- > > Key: KAFKA-9205 > URL: https://issues.apache.org/jira/browse/KAFKA-9205 > Project: Kafka > Issue Type: Improvement > Components: admin, tools >Reporter: Vahid Hashemian >Priority: Minor > > One regularly used healing operation on Kafka clusters is replica > reassignments for topic partitions. For example, when there is a skew in > inbound/outbound traffic of a broker replica reassignment can be used to move > some leaders/followers from the broker; or if there is a skew in disk usage > of brokers, replica reassignment can more some partitions to other brokers > that have more disk space available. > In Kafka clusters that span across multiple data centers (or availability > zones), high availability is a priority; in the sense that when a data center > goes offline the cluster should be able to resume normal operation by > guaranteeing partition replicas in all data centers. > This guarantee is currently the responsibility of the on-call engineer that > performs the reassignment or the tool that automatically generates the > reassignment plan for improving the cluster health (e.g. by considering the > rack configuration value of each broker in the cluster). the former, is quite > error-prone, and the latter, would lead to duplicate code in all such admin > tools (which are not error free either). Not all use cases can make use the > default assignment strategy that is used by --generate option; and current > rack aware enforcement applies to this option only. > It would be great for the built-in replica assignment API and tool provided > by Kafka to support a rack aware verification option for --execute scenario > that would simply return an error when [some] brokers in any replica set > share a common rack. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9212) Keep receiving FENCED_LEADER_EPOCH while sending ListOffsetRequest
[ https://issues.apache.org/jira/browse/KAFKA-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yannick updated KAFKA-9212: --- Description: When running Kafka connect s3 sink connector ( confluent 5.3.0), after one broker got restarted (leaderEpoch updated at this point), the connect worker crashed with the following error : [2019-11-19 16:20:30,097] ERROR [Worker clientId=connect-1, groupId=connect-ls] Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:253) org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 30003ms After investigation, it seems it's because it got fenced when sending ListOffsetRequest in loop and then got timed out , as follows : [2019-11-19 16:20:30,020] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Sending ListOffsetRequest (type=ListOffsetRequest, replicaId=-1, partitionTimestamps={connect_ls_config-0={timestamp: -1, maxNumOffsets: 1, currentLeaderEpoch: Optional[1]}}, isolationLevel=READ_UNCOMMITTED) to broker kafka6.fra2.internal:9092 (id: 4 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:905) [2019-11-19 16:20:30,044] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Attempt to fetch offsets for partition connect_ls_config-0 failed due to FENCED_LEADER_EPOCH, retrying. (org.apache.kafka.clients.consumer.internals.Fetcher:985) The above happens multiple times until timeout. According to the debugs, the consumer always get a leaderEpoch of 1 for this topic when starting up : [2019-11-19 13:27:30,802] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Updating last seen epoch from null to 1 for partition connect_ls_config-0 (org.apache.kafka.clients.Metadata:178) But according to our brokers log, the leaderEpoch should be 2, as follows : [2019-11-18 14:19:28,988] INFO [Partition connect_ls_config-0 broker=4] connect_ls_config-0 starts at Leader Epoch 2 from offset 22. Previous Leader Epoch was: 1 (kafka.cluster.Partition) This make impossible to restart the worker as it will always get fenced and then finally timeout. It is also impossible to consume with a 2.3 kafka-console-consumer as follows : kafka-console-consumer --bootstrap-server BOOTSTRAPSERVER:9092 --topic connect_ls_config --from-beginning the above will just hang forever ( which is not expected cause there is data) and we can see those debug messages : [2019-11-19 22:17:59,124] DEBUG [Consumer clientId=consumer-1, groupId=console-consumer-3844] Attempt to fetch offsets for partition connect_ls_config-0 failed due to FENCED_LEADER_EPOCH, retrying. (org.apache.kafka.clients.consumer.internals.Fetcher) Interesting fact, if we do subscribe the same way with kafkacat (1.5.0) we can consume without problem ( must be the way kafkacat is consuming which is different somehow): kafkacat -b BOOTSTRAPSERVER:9092 -t connect_ls_config -o beginning was: When running Kafka connect s3 sink connector ( confluent 5.3.0), after one broker got restarted (leaderEpoch updated at this point), the connect worker crashed with the following error : [2019-11-19 16:20:30,097] ERROR [Worker clientId=connect-1, groupId=connect-ls] Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:253) org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 30003ms After investigation, it seems it's because it got fenced when sending ListOffsetRequest in loop and then got timed out , as follows : [2019-11-19 16:20:30,020] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Sending ListOffsetRequest (type=ListOffsetRequest, replicaId=-1, partitionTimestamps={connect_ls_config-0={timestamp: -1, maxNumOffsets: 1, currentLeaderEpoch: Optional[1]}}, isolationLevel=READ_UNCOMMITTED) to broker kafka6.fra2.internal:9092 (id: 4 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:905) [2019-11-19 16:20:30,044] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Attempt to fetch offsets for partition connect_ls_config-0 failed due to FENCED_LEADER_EPOCH, retrying. (org.apache.kafka.clients.consumer.internals.Fetcher:985) The above happens multiple times until timeout. According to the debugs, the consumer always get a leaderEpoch of 1 for this topic when starting up : [2019-11-19 13:27:30,802] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Updating last seen epoch from null to 1 for partition connect_ls_config-0 (org.apache.kafka.clients.Metadata:178) But according to our brokers log, the leaderEpoch should be 2, as follows : [2019-11-18 14:19:28,988] INFO [Partition connect_ls_config-0 broker=4] connect_ls_config-0 starts at Leader Epoch 2 from offset 22. Previous Leader Epoch was: 1 (kafka.cluster.Partition)
[jira] [Updated] (KAFKA-9212) Keep receiving FENCED_LEADER_EPOCH while sending ListOffsetRequest
[ https://issues.apache.org/jira/browse/KAFKA-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yannick updated KAFKA-9212: --- Description: When running Kafka connect s3 sink connector ( confluent 5.3.0), after one broker got restarted (leaderEpoch updated at this point), the connect worker crashed with the following error : [2019-11-19 16:20:30,097] ERROR [Worker clientId=connect-1, groupId=connect-ls] Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:253) org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 30003ms After investigation, it seems it's because it got fenced when sending ListOffsetRequest in loop and then got timed out , as follows : [2019-11-19 16:20:30,020] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Sending ListOffsetRequest (type=ListOffsetRequest, replicaId=-1, partitionTimestamps={connect_ls_config-0={timestamp: -1, maxNumOffsets: 1, currentLeaderEpoch: Optional[1]}}, isolationLevel=READ_UNCOMMITTED) to broker kafka6.fra2.internal:9092 (id: 4 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:905) [2019-11-19 16:20:30,044] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Attempt to fetch offsets for partition connect_ls_config-0 failed due to FENCED_LEADER_EPOCH, retrying. (org.apache.kafka.clients.consumer.internals.Fetcher:985) This multiple times until timeout. According to the debugs, the consumer always get a leaderEpoch of 1 for this topic when starting up : [2019-11-19 13:27:30,802] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Updating last seen epoch from null to 1 for partition connect_ls_config-0 (org.apache.kafka.clients.Metadata:178) But according to our brokers log, the leaderEpoch should be 2, as follows : [2019-11-18 14:19:28,988] INFO [Partition connect_ls_config-0 broker=4] connect_ls_config-0 starts at Leader Epoch 2 from offset 22. Previous Leader Epoch was: 1 (kafka.cluster.Partition) This make impossible to restart the worker as it will always get fenced and then finally timeout. It is also impossible to consumer with a 2.3 kafka-console-consumer as follows : kafka-console-consumer --bootstrap-server BOOTSTRAPSERVER:9092 --topic connect_ls_config --from-beginning the above will just hang forever ( which is not expected cause there is data) Interesting fact, if we do subscribe the same way with kafkacat (1.5.0) we can consume without problem ( must be the way kafkacat is consuming which is different somehow): kafkacat -b BOOTSTRAPSERVER:9092 -t connect_ls_config -o beginning was: When running Kafka connect s3 sink connector ( confluent 5.3.0), after one broker got restarted, the connect worker crashed with the following error : [2019-11-19 16:20:30,097] ERROR [Worker clientId=connect-1, groupId=connect-ls] Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:253) org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 30003ms After investigation, it seems it's because it got fenced when sending ListOffsetRequest in loop and then got timed out , as follows : [2019-11-19 16:20:30,020] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Sending ListOffsetRequest (type=ListOffsetRequest, replicaId=-1, partitionTimestamps=\{connect_ls_config-0={timestamp: -1, maxNumOffsets: 1, currentLeaderEpoch: Optional[1]}}, isolationLevel=READ_UNCOMMITTED) to broker kafka6.fra2.internal:9092 (id: 4 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:905) [2019-11-19 16:20:30,044] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Attempt to fetch offsets for partition connect_ls_config-0 failed due to FENCED_LEADER_EPOCH, retrying. (org.apache.kafka.clients.consumer.internals.Fetcher:985) This multiple times until timeout. According to the debugs, the consumer always get a leaderEpoch of 1 for this topic when starting up : [2019-11-19 13:27:30,802] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Updating last seen epoch from null to 1 for partition connect_ls_config-0 (org.apache.kafka.clients.Metadata:178) But according to our brokers log, the leaderEpoch should be 2, as follows : [2019-11-18 14:19:28,988] INFO [Partition connect_ls_config-0 broker=4] connect_ls_config-0 starts at Leader Epoch 2 from offset 22. Previous Leader Epoch was: 1 (kafka.cluster.Partition) This make impossible to restart the worker as it will always get fenced and then finally timeout. It is also impossible to consumer with a 2.3 kafka-console-consumer as follows : kafka-console-consumer --bootstrap-server BOOTSTRAPSERVER:9092 --topic connect_ls_config --from-beginning the above will just hang forever ( which is not expected cause there
[jira] [Updated] (KAFKA-9212) Keep receiving FENCED_LEADER_EPOCH while sending ListOffsetRequest
[ https://issues.apache.org/jira/browse/KAFKA-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yannick updated KAFKA-9212: --- Description: When running Kafka connect s3 sink connector ( confluent 5.3.0), after one broker got restarted (leaderEpoch updated at this point), the connect worker crashed with the following error : [2019-11-19 16:20:30,097] ERROR [Worker clientId=connect-1, groupId=connect-ls] Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:253) org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 30003ms After investigation, it seems it's because it got fenced when sending ListOffsetRequest in loop and then got timed out , as follows : [2019-11-19 16:20:30,020] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Sending ListOffsetRequest (type=ListOffsetRequest, replicaId=-1, partitionTimestamps={connect_ls_config-0={timestamp: -1, maxNumOffsets: 1, currentLeaderEpoch: Optional[1]}}, isolationLevel=READ_UNCOMMITTED) to broker kafka6.fra2.internal:9092 (id: 4 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:905) [2019-11-19 16:20:30,044] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Attempt to fetch offsets for partition connect_ls_config-0 failed due to FENCED_LEADER_EPOCH, retrying. (org.apache.kafka.clients.consumer.internals.Fetcher:985) The above happens multiple times until timeout. According to the debugs, the consumer always get a leaderEpoch of 1 for this topic when starting up : [2019-11-19 13:27:30,802] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Updating last seen epoch from null to 1 for partition connect_ls_config-0 (org.apache.kafka.clients.Metadata:178) But according to our brokers log, the leaderEpoch should be 2, as follows : [2019-11-18 14:19:28,988] INFO [Partition connect_ls_config-0 broker=4] connect_ls_config-0 starts at Leader Epoch 2 from offset 22. Previous Leader Epoch was: 1 (kafka.cluster.Partition) This make impossible to restart the worker as it will always get fenced and then finally timeout. It is also impossible to consumer with a 2.3 kafka-console-consumer as follows : kafka-console-consumer --bootstrap-server BOOTSTRAPSERVER:9092 --topic connect_ls_config --from-beginning the above will just hang forever ( which is not expected cause there is data) Interesting fact, if we do subscribe the same way with kafkacat (1.5.0) we can consume without problem ( must be the way kafkacat is consuming which is different somehow): kafkacat -b BOOTSTRAPSERVER:9092 -t connect_ls_config -o beginning was: When running Kafka connect s3 sink connector ( confluent 5.3.0), after one broker got restarted (leaderEpoch updated at this point), the connect worker crashed with the following error : [2019-11-19 16:20:30,097] ERROR [Worker clientId=connect-1, groupId=connect-ls] Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:253) org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 30003ms After investigation, it seems it's because it got fenced when sending ListOffsetRequest in loop and then got timed out , as follows : [2019-11-19 16:20:30,020] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Sending ListOffsetRequest (type=ListOffsetRequest, replicaId=-1, partitionTimestamps={connect_ls_config-0={timestamp: -1, maxNumOffsets: 1, currentLeaderEpoch: Optional[1]}}, isolationLevel=READ_UNCOMMITTED) to broker kafka6.fra2.internal:9092 (id: 4 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:905) [2019-11-19 16:20:30,044] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Attempt to fetch offsets for partition connect_ls_config-0 failed due to FENCED_LEADER_EPOCH, retrying. (org.apache.kafka.clients.consumer.internals.Fetcher:985) This multiple times until timeout. According to the debugs, the consumer always get a leaderEpoch of 1 for this topic when starting up : [2019-11-19 13:27:30,802] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Updating last seen epoch from null to 1 for partition connect_ls_config-0 (org.apache.kafka.clients.Metadata:178) But according to our brokers log, the leaderEpoch should be 2, as follows : [2019-11-18 14:19:28,988] INFO [Partition connect_ls_config-0 broker=4] connect_ls_config-0 starts at Leader Epoch 2 from offset 22. Previous Leader Epoch was: 1 (kafka.cluster.Partition) This make impossible to restart the worker as it will always get fenced and then finally timeout. It is also impossible to consumer with a 2.3 kafka-console-consumer as follows : kafka-console-consumer --bootstrap-server BOOTSTRAPSERVER:9092 --topic connect_ls_config --from-beginning
[jira] [Created] (KAFKA-9212) Keep receiving FENCED_LEADER_EPOCH while sending ListOffsetRequest
Yannick created KAFKA-9212: -- Summary: Keep receiving FENCED_LEADER_EPOCH while sending ListOffsetRequest Key: KAFKA-9212 URL: https://issues.apache.org/jira/browse/KAFKA-9212 Project: Kafka Issue Type: Bug Components: consumer, offset manager Affects Versions: 2.3.0 Environment: Linux Reporter: Yannick When running Kafka connect s3 sink connector ( confluent 5.3.0), after one broker got restarted, the connect worker crashed with the following error : [2019-11-19 16:20:30,097] ERROR [Worker clientId=connect-1, groupId=connect-ls] Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:253) org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 30003ms After investigation, it seems it's because it got fenced when sending ListOffsetRequest in loop and then got timed out , as follows : [2019-11-19 16:20:30,020] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Sending ListOffsetRequest (type=ListOffsetRequest, replicaId=-1, partitionTimestamps=\{connect_ls_config-0={timestamp: -1, maxNumOffsets: 1, currentLeaderEpoch: Optional[1]}}, isolationLevel=READ_UNCOMMITTED) to broker kafka6.fra2.internal:9092 (id: 4 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:905) [2019-11-19 16:20:30,044] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Attempt to fetch offsets for partition connect_ls_config-0 failed due to FENCED_LEADER_EPOCH, retrying. (org.apache.kafka.clients.consumer.internals.Fetcher:985) This multiple times until timeout. According to the debugs, the consumer always get a leaderEpoch of 1 for this topic when starting up : [2019-11-19 13:27:30,802] DEBUG [Consumer clientId=consumer-3, groupId=connect-ls] Updating last seen epoch from null to 1 for partition connect_ls_config-0 (org.apache.kafka.clients.Metadata:178) But according to our brokers log, the leaderEpoch should be 2, as follows : [2019-11-18 14:19:28,988] INFO [Partition connect_ls_config-0 broker=4] connect_ls_config-0 starts at Leader Epoch 2 from offset 22. Previous Leader Epoch was: 1 (kafka.cluster.Partition) This make impossible to restart the worker as it will always get fenced and then finally timeout. It is also impossible to consumer with a 2.3 kafka-console-consumer as follows : kafka-console-consumer --bootstrap-server BOOTSTRAPSERVER:9092 --topic connect_ls_config --from-beginning the above will just hang forever ( which is not expected cause there is data) Interesting fact, if we do subscribe the same way with kafkacat (1.5.0) we can consume without problem ( must be the way kafkacat is consuming which is different somehow): kafkacat -b BOOTSTRAPSERVER:9092 -t connect_ls_config -o beginning -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9178) restoredPartitions is not cleared until the last restoring task completes
[ https://issues.apache.org/jira/browse/KAFKA-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977854#comment-16977854 ] ASF GitHub Bot commented on KAFKA-9178: --- ableegoldman commented on pull request #7686: KAFKA-9178: restoredPartitions is not cleared until the last restoring task completes URL: https://github.com/apache/kafka/pull/7686 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > restoredPartitions is not cleared until the last restoring task completes > - > > Key: KAFKA-9178 > URL: https://issues.apache.org/jira/browse/KAFKA-9178 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Blocker > Labels: streams > Fix For: 2.4.0 > > > We check the `active` set is empty during closeLostTasks(). However we don't > currently properly clear the {{restoredPartitions}} set in some edge cases: > We only remove partitions from {{restoredPartitions}} when a) all tasks are > done restoring, at which point we clear it entirely(in > {{AssignedStreamTasks#updateRestored}}), or b) one task at a time, when that > task is restoring and is closed. > Say some partitions were still restoring while others had completed and > transitioned to running when a rebalance occurs. The still-restoring tasks > are all revoked, and closed immediately, and their partitions removed from > {{restoredPartitions}}. We also suspend & revoke some running tasks that have > finished restoring, and remove them from {{running}}/{{runningByPartition}}. > Now we have only running tasks left, so in > {{TaskManager#updateNewAndRestoringTasks}} we don’t ever even call > {{AssignedStreamTasks#updateRestored }}and therefore we never get to clear > {{restoredPartitions}}. We then close each of the currently running tasks and > remove their partitions from everything, BUT we never got to remove or clear > the partitions of the running tasks that we revoked previously. > It turns out we can't just rely on removing from {{restoredPartitions }}upon > completion since the partitions will just be added back to it during the next > loop (blocked by KAFKA-9177). For now, we should just remove partitions from > {{restoredPartitions}} when closing or suspending running tasks as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)
[ https://issues.apache.org/jira/browse/KAFKA-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1690#comment-1690 ] Jan Arve Sundt commented on KAFKA-7500: --- I'm testing to replicate all topics from one Kafka cluster to replica Kafka cluster(active/pasive), with the same topic name, include topic data, consumers' offset and configuration settings for topics. I can see topic data and consumers' offset, but I am not able to see the configuration settings for topic. I also need to have the same name for topic in replica. Can anyone explain what I am doing wrong? mm2.properties settings: connector.class = org.apache.kafka.connect.mirror.MirrorSourceConnector clusters = A1,B2 A1.bootstrap.servers = host:port B2.bootstrap.servers = host:port A1->B2.enabled = true A1->B2.topics = test-topic rename.topics = true sync.topic.configs = true sync.topic.acls = false > MirrorMaker 2.0 (KIP-382) > - > > Key: KAFKA-7500 > URL: https://issues.apache.org/jira/browse/KAFKA-7500 > Project: Kafka > Issue Type: New Feature > Components: KafkaConnect, mirrormaker >Affects Versions: 2.4.0 >Reporter: Ryanne Dolan >Assignee: Ryanne Dolan >Priority: Major > Labels: pull-request-available, ready-to-commit > Fix For: 2.4.0 > > Attachments: Active-Active XDCR setup.png > > > Implement a drop-in replacement for MirrorMaker leveraging the Connect > framework. > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0] > [https://github.com/apache/kafka/pull/6295] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9086) Refactor Processor Node Streams Metrics
[ https://issues.apache.org/jira/browse/KAFKA-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Bejeck resolved KAFKA-9086. Resolution: Fixed > Refactor Processor Node Streams Metrics > --- > > Key: KAFKA-9086 > URL: https://issues.apache.org/jira/browse/KAFKA-9086 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bruno Cadonna >Assignee: Bruno Cadonna >Priority: Major > Fix For: 2.5.0 > > > Refactor processor node metrics as described in KIP-444. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9086) Refactor Processor Node Streams Metrics
[ https://issues.apache.org/jira/browse/KAFKA-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977668#comment-16977668 ] ASF GitHub Bot commented on KAFKA-9086: --- bbejeck commented on pull request #7615: KAFKA-9086: Refactor processor-node-level metrics URL: https://github.com/apache/kafka/pull/7615 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Processor Node Streams Metrics > --- > > Key: KAFKA-9086 > URL: https://issues.apache.org/jira/browse/KAFKA-9086 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bruno Cadonna >Assignee: Bruno Cadonna >Priority: Major > Fix For: 2.5.0 > > > Refactor processor node metrics as described in KIP-444. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment
[ https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977592#comment-16977592 ] Vahid Hashemian commented on KAFKA-9205: [~sbellapu] KIP process is not that difficult. If you have access to the wiki you can easily create one and start discussion on it in the mailing list (and after enough discussion/time you do a vote). The KIP page has all the necessary info: [https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals]. You can also take some of the recent KIPs as an example. Since there is an existing option for disabling rack aware mode, this change should be designed in a way that either makes use of that option, or works well alongside it (without causing confusion); and at the same time preserves backward compatibility (i.e. existing default behavior should ideally not change). > Add an option to enforce rack-aware partition reassignment > -- > > Key: KAFKA-9205 > URL: https://issues.apache.org/jira/browse/KAFKA-9205 > Project: Kafka > Issue Type: Improvement > Components: admin, tools >Reporter: Vahid Hashemian >Priority: Minor > > One regularly used healing operation on Kafka clusters is replica > reassignments for topic partitions. For example, when there is a skew in > inbound/outbound traffic of a broker replica reassignment can be used to move > some leaders/followers from the broker; or if there is a skew in disk usage > of brokers, replica reassignment can more some partitions to other brokers > that have more disk space available. > In Kafka clusters that span across multiple data centers (or availability > zones), high availability is a priority; in the sense that when a data center > goes offline the cluster should be able to resume normal operation by > guaranteeing partition replicas in all data centers. > This guarantee is currently the responsibility of the on-call engineer that > performs the reassignment or the tool that automatically generates the > reassignment plan for improving the cluster health (e.g. by considering the > rack configuration value of each broker in the cluster). the former, is quite > error-prone, and the latter, would lead to duplicate code in all such admin > tools (which are not error free either). Not all use cases can make use the > default assignment strategy that is used by --generate option; and current > rack aware enforcement applies to this option only. > It would be great for the built-in replica assignment API and tool provided > by Kafka to support a rack aware verification option for --execute scenario > that would simply return an error when [some] brokers in any replica set > share a common rack. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId
[ https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977577#comment-16977577 ] Raman Gupta commented on KAFKA-8803: We had this problem again today, and checked on each node restart whether the problem was fixed. It went away after restarting the third of four nodes. > Stream will not start due to TimeoutException: Timeout expired after > 6milliseconds while awaiting InitProducerId > > > Key: KAFKA-8803 > URL: https://issues.apache.org/jira/browse/KAFKA-8803 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Raman Gupta >Priority: Major > Attachments: logs.txt.gz, screenshot-1.png > > > One streams app is consistently failing at startup with the following > exception: > {code} > 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] > org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout > exception caught when initializing transactions for task 0_36. This might > happen if the broker is slow to respond, if the network connection to the > broker was interrupted, or if similar circumstances arise. You can increase > producer parameter `max.block.ms` to increase this timeout. > org.apache.kafka.common.errors.TimeoutException: Timeout expired after > 6milliseconds while awaiting InitProducerId > {code} > These same brokers are used by many other streams without any issue, > including some in the very same processes for the stream which consistently > throws this exception. > *UPDATE 08/16:* > The very first instance of this error is August 13th 2019, 17:03:36.754 and > it happened for 4 different streams. For 3 of these streams, the error only > happened once, and then the stream recovered. For the 4th stream, the error > has continued to happen, and continues to happen now. > I looked up the broker logs for this time, and see that at August 13th 2019, > 16:47:43, two of four brokers started reporting messages like this, for > multiple partitions: > [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, > fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader > reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread) > The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, > here is a view of the count of these messages over time: > !screenshot-1.png! > However, as noted, the stream task timeout error continues to happen. > I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 > broker. The broker has a patch for KAFKA-8773. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9211) kafka upgrade 2.3.0 cause produce speed decrease
[ https://issues.apache.org/jira/browse/KAFKA-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977539#comment-16977539 ] Ismael Juma commented on KAFKA-9211: cc [~junrao] > kafka upgrade 2.3.0 cause produce speed decrease > > > Key: KAFKA-9211 > URL: https://issues.apache.org/jira/browse/KAFKA-9211 > Project: Kafka > Issue Type: Bug > Components: controller, producer >Affects Versions: 2.3.0 >Reporter: li xiangyuan >Priority: Critical > Attachments: broker-jstack.txt, producer-jstack.txt > > > Recently we try upgrade kafka from 0.10.0.1 to 2.3.0. > we have 15 clusters in production env, each one has 3~6 brokers. > we know kafka upgrade should: > 1.replcae code to 2.3.0.jar and restart all brokers one by one > 2.unset inter.broker.protocol.version=0.10.0.1 and restart all brokers > one by one > 3.unset log.message.format.version=0.10.0.1 and restart all brokers one > by one > > for now we have already done step 1 & 2 in 12 clusters.but when we try to > upgrade left clusters (already done step 1) in step 2, we found some topics > drop produce speed badly. > we have research this issue for long time, since we couldn't test it in > production environment and we couldn't reproduce in test environment, we > couldn't find the root cause. > now we only could describe the situation in detail as i know, hope anyone > could help us. > > 1.because bug KAFKA-8653, i add code below in KafkaApis.scala > handleJoinGroupRequest function: > {code:java} > if (rebalanceTimeoutMs <= 0) { > rebalanceTimeoutMs = joinGroupRequest.data.sessionTimeoutMs > }{code} > 2.one cluster upgrade failed has 6 8C16G brokers, about 200 topics with 2 > replicas,every broker keep 3000+ partitions and 1500+ leader partition, but > most of them has very low produce message speed,about less than > 50messages/sec, only one topic with 300 partitions has more than 2500 > message/sec with more than 20 consumer groups consume message from it. > so this whole cluster produce 4K messages/sec , 11m Bytes in /sec,240m Bytes > out /sec.and more than 90% traffic made by that topic has 2500messages/sec. > when we unset 5 or 6 servers' inter.broker.protocol.version=0.10.0.1 and > restart, this topic produce message drop to about 200messages/sec, i don't > know whether the way we use could tirgger any problem. > 3.we use kafka wrapped by spring-kafka and set kafkatemplate's > autoFlush=true, so each producer.send execution will execute producer.flush > immediately too.i know flush method will decrease produce performance > dramaticlly, but at least it seems nothing wrong before upgrade step 2. but > i doubt whether it's a problem now after upgrade. > 4.I noticed when produce speed decrease, some consumer group has large > message lag still consume message without any consume speed change or > decrease, so I guess only producerequest speed will drop down,but > fetchrequest not. > 5.we haven't set any throttle configuration, and all producers' acks=1(so > it's not broker replica fetch slow), and when this problem triggered, both > sever & producers cpu usage down, and servers' ioutil keep less than 30% ,so > it shuldn't be a hardware problem. > 6.this event triggered often(almost 100%) most brokers has done upgrade step > 2,then after a auto leader replica election executed, then we can observe > produce speed drop down,and we have to downgrade brokers(set > inter.broker.protocol.version=0.10.0.1)and restart brokers one by one,then it > could be normal. some cluster have to downgrade all brokers,but some cluster > could left 1 or 2 brokers without downgrade, i notice that the broker not > need downgrade is the controller. > 7.I have print jstack for producer & servers. although I do this not the same > cluster, but we can notice that their thread seems really in idle stat. > 8.both 0.10.0.1 & 2.3.0 kafka-client will trigger this problem too. > 8.unless the largest one topic will drop produce speed certainly, other topic > will drop produce speed randomly. maybe topicA will drop speed in first > upgrade attempt but next not, and topicB not drop speed in first attemp but > dropped when do another attempt. > 9.in fact, the largest cluster, has the same topic & group usage scenario > mentioned above, but the largest topic has 1w2 messages/sec,will upgrade fail > in step 1(just use 2.3.0.jar) > any help would be grateful, thx, i'm very sad now... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9211) kafka upgrade 2.3.0 cause produce speed decrease
li xiangyuan created KAFKA-9211: --- Summary: kafka upgrade 2.3.0 cause produce speed decrease Key: KAFKA-9211 URL: https://issues.apache.org/jira/browse/KAFKA-9211 Project: Kafka Issue Type: Bug Components: controller, producer Affects Versions: 2.3.0 Reporter: li xiangyuan Attachments: broker-jstack.txt, producer-jstack.txt Recently we try upgrade kafka from 0.10.0.1 to 2.3.0. we have 15 clusters in production env, each one has 3~6 brokers. we know kafka upgrade should: 1.replcae code to 2.3.0.jar and restart all brokers one by one 2.unset inter.broker.protocol.version=0.10.0.1 and restart all brokers one by one 3.unset log.message.format.version=0.10.0.1 and restart all brokers one by one for now we have already done step 1 & 2 in 12 clusters.but when we try to upgrade left clusters (already done step 1) in step 2, we found some topics drop produce speed badly. we have research this issue for long time, since we couldn't test it in production environment and we couldn't reproduce in test environment, we couldn't find the root cause. now we only could describe the situation in detail as i know, hope anyone could help us. 1.because bug KAFKA-8653, i add code below in KafkaApis.scala handleJoinGroupRequest function: {code:java} if (rebalanceTimeoutMs <= 0) { rebalanceTimeoutMs = joinGroupRequest.data.sessionTimeoutMs }{code} 2.one cluster upgrade failed has 6 8C16G brokers, about 200 topics with 2 replicas,every broker keep 3000+ partitions and 1500+ leader partition, but most of them has very low produce message speed,about less than 50messages/sec, only one topic with 300 partitions has more than 2500 message/sec with more than 20 consumer groups consume message from it. so this whole cluster produce 4K messages/sec , 11m Bytes in /sec,240m Bytes out /sec.and more than 90% traffic made by that topic has 2500messages/sec. when we unset 5 or 6 servers' inter.broker.protocol.version=0.10.0.1 and restart, this topic produce message drop to about 200messages/sec, i don't know whether the way we use could tirgger any problem. 3.we use kafka wrapped by spring-kafka and set kafkatemplate's autoFlush=true, so each producer.send execution will execute producer.flush immediately too.i know flush method will decrease produce performance dramaticlly, but at least it seems nothing wrong before upgrade step 2. but i doubt whether it's a problem now after upgrade. 4.I noticed when produce speed decrease, some consumer group has large message lag still consume message without any consume speed change or decrease, so I guess only producerequest speed will drop down,but fetchrequest not. 5.we haven't set any throttle configuration, and all producers' acks=1(so it's not broker replica fetch slow), and when this problem triggered, both sever & producers cpu usage down, and servers' ioutil keep less than 30% ,so it shuldn't be a hardware problem. 6.this event triggered often(almost 100%) most brokers has done upgrade step 2,then after a auto leader replica election executed, then we can observe produce speed drop down,and we have to downgrade brokers(set inter.broker.protocol.version=0.10.0.1)and restart brokers one by one,then it could be normal. some cluster have to downgrade all brokers,but some cluster could left 1 or 2 brokers without downgrade, i notice that the broker not need downgrade is the controller. 7.I have print jstack for producer & servers. although I do this not the same cluster, but we can notice that their thread seems really in idle stat. 8.both 0.10.0.1 & 2.3.0 kafka-client will trigger this problem too. 8.unless the largest one topic will drop produce speed certainly, other topic will drop produce speed randomly. maybe topicA will drop speed in first upgrade attempt but next not, and topicB not drop speed in first attemp but dropped when do another attempt. 9.in fact, the largest cluster, has the same topic & group usage scenario mentioned above, but the largest topic has 1w2 messages/sec,will upgrade fail in step 1(just use 2.3.0.jar) any help would be grateful, thx, i'm very sad now... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1
[ https://issues.apache.org/jira/browse/KAFKA-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977528#comment-16977528 ] Ismael Juma commented on KAFKA-9203: Are you able to swap the lz4 jar when when running your app with kafka 2.3 to verify this? > kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1 > --- > > Key: KAFKA-9203 > URL: https://issues.apache.org/jira/browse/KAFKA-9203 > Project: Kafka > Issue Type: Bug > Components: compression, consumer >Affects Versions: 2.3.0, 2.3.1 >Reporter: David Watzke >Priority: Major > > I run kafka cluster 2.1.1 > when I upgraded a client app to use kafka-client 2.3.0 (or 2.3.1) instead of > 2.2.0, I immediately started getting the following exceptions in a loop when > consuming a topic with LZ4-compressed messages: > {noformat} > 2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] > com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred > while polling and processing messages: org.apache.kafka.common.KafkaExce > ption: Received exception when fetching the next record from > FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue > consumption. > org.apache.kafka.common.KafkaException: Received exception when fetching the > next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the > record to continue consumption. > at > org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473) > > at > org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332) > > at > org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645) > > at > org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606) > > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263) > > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) > at > com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180) > > at > com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19) > > at > resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88) > > at > scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) > at scala.util.control.Exception$Catch.apply(Exception.scala:228) > at scala.util.control.Exception$Catch.either(Exception.scala:252) > at > resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) > at > resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) > at > resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) > at > resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) > at > resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25) > > at > resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25) > > at > resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50) > > at > resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53) > > at > resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53) > > at > resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$6(RequestSaver.scala:19) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$6$adapted(RequestSaver.scala:18) > > at > resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88) > > at > scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) > at scala.util.control.Exception$Catch.apply(Exception.scala:228) > at scala.util.control.Exception$Catch.either(Exception.scala:252) > at > resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) > at > resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) > at > resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) > at >
[jira] [Comment Edited] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1
[ https://issues.apache.org/jira/browse/KAFKA-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977373#comment-16977373 ] David Watzke edited comment on KAFKA-9203 at 11/19/19 10:55 AM: producer is using kafka client 0.10.2.1 updated lz4-java lib does indeed sound like a very probable cause was (Author: dwatzke): producer is using kafka client 0.10.2.1 > kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1 > --- > > Key: KAFKA-9203 > URL: https://issues.apache.org/jira/browse/KAFKA-9203 > Project: Kafka > Issue Type: Bug > Components: compression, consumer >Affects Versions: 2.3.0, 2.3.1 >Reporter: David Watzke >Priority: Major > > I run kafka cluster 2.1.1 > when I upgraded a client app to use kafka-client 2.3.0 (or 2.3.1) instead of > 2.2.0, I immediately started getting the following exceptions in a loop when > consuming a topic with LZ4-compressed messages: > {noformat} > 2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] > com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred > while polling and processing messages: org.apache.kafka.common.KafkaExce > ption: Received exception when fetching the next record from > FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue > consumption. > org.apache.kafka.common.KafkaException: Received exception when fetching the > next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the > record to continue consumption. > at > org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473) > > at > org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332) > > at > org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645) > > at > org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606) > > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263) > > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) > at > com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180) > > at > com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19) > > at > resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88) > > at > scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) > at scala.util.control.Exception$Catch.apply(Exception.scala:228) > at scala.util.control.Exception$Catch.either(Exception.scala:252) > at > resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) > at > resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) > at > resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) > at > resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) > at > resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25) > > at > resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25) > > at > resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50) > > at > resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53) > > at > resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53) > > at > resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$6(RequestSaver.scala:19) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$6$adapted(RequestSaver.scala:18) > > at > resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88) > > at > scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) > at scala.util.control.Exception$Catch.apply(Exception.scala:228) > at scala.util.control.Exception$Catch.either(Exception.scala:252) > at > resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) > at > resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) > at >
[jira] [Commented] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1
[ https://issues.apache.org/jira/browse/KAFKA-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977373#comment-16977373 ] David Watzke commented on KAFKA-9203: - producer is using kafka client 0.10.2.1 > kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1 > --- > > Key: KAFKA-9203 > URL: https://issues.apache.org/jira/browse/KAFKA-9203 > Project: Kafka > Issue Type: Bug > Components: compression, consumer >Affects Versions: 2.3.0, 2.3.1 >Reporter: David Watzke >Priority: Major > > I run kafka cluster 2.1.1 > when I upgraded a client app to use kafka-client 2.3.0 (or 2.3.1) instead of > 2.2.0, I immediately started getting the following exceptions in a loop when > consuming a topic with LZ4-compressed messages: > {noformat} > 2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] > com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred > while polling and processing messages: org.apache.kafka.common.KafkaExce > ption: Received exception when fetching the next record from > FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue > consumption. > org.apache.kafka.common.KafkaException: Received exception when fetching the > next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the > record to continue consumption. > at > org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473) > > at > org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332) > > at > org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645) > > at > org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606) > > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263) > > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) > at > com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180) > > at > com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19) > > at > resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88) > > at > scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) > at scala.util.control.Exception$Catch.apply(Exception.scala:228) > at scala.util.control.Exception$Catch.either(Exception.scala:252) > at > resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) > at > resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) > at > resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) > at > resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) > at > resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25) > > at > resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25) > > at > resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50) > > at > resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53) > > at > resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53) > > at > resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$6(RequestSaver.scala:19) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$6$adapted(RequestSaver.scala:18) > > at > resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88) > > at > scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) > at scala.util.control.Exception$Catch.apply(Exception.scala:228) > at scala.util.control.Exception$Catch.either(Exception.scala:252) > at > resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) > at > resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) > at > resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) > at > resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) > at >
[jira] [Commented] (KAFKA-9157) logcleaner could generate empty segment files after cleaning
[ https://issues.apache.org/jira/browse/KAFKA-9157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977306#comment-16977306 ] ASF GitHub Bot commented on KAFKA-9157: --- huxihx commented on pull request #7711: KAFKA-9157: Avoid generating empty segments if all records are deleted after cleaning URL: https://github.com/apache/kafka/pull/7711 https://issues.apache.org/jira/browse/KAFKA-9157 If all records are deleted after cleaning, we should avoid generating empty log segments. *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > logcleaner could generate empty segment files after cleaning > > > Key: KAFKA-9157 > URL: https://issues.apache.org/jira/browse/KAFKA-9157 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.3.0 >Reporter: Jun Rao >Assignee: huxihx >Priority: Major > > Currently, the log cleaner could only combine segments within a 2-billion > offset range. If all records in that range are deleted, an empty segment > could be generated. It would be useful to avoid generating such empty > segments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9210) kafka stream loss data
[ https://issues.apache.org/jira/browse/KAFKA-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-9210: --- Component/s: streams > kafka stream loss data > -- > > Key: KAFKA-9210 > URL: https://issues.apache.org/jira/browse/KAFKA-9210 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.0.1 >Reporter: panpan.liu >Priority: Major > Attachments: app.log, screenshot-1.png > > > kafka broker: 2.0.1 > kafka stream client: 2.1.0 > # two applications run at the same time > # after some days,I stop one application(in k8s) > # The flollowing log occured and I check the data and find that value is > less than what I expected. > {quote}Partitions > [flash-app-xmc-worker-share-store-minute-repartition-1]Partitions > [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog > flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 > 05:50:49.816|WARN > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread > [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting > StreamTasks stores to recreate from > scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets > out of range with no configured reset policy for partitions: > \{flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at > org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 > 05:50:49.817|INFO > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread > [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 > ProcessorTopology: KSTREAM-SOURCE-70: topics: > [flash-app-xmc-worker-share-store-minute-repartition] children: > [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: > [worker-share-store-minute] children: [KTABLE-TOSTREAM-71] > KTABLE-TOSTREAM-71: children: [KSTREAM-SINK-72] > KSTREAM-SINK-72: topic: > StaticTopicNameExtractor(xmc-worker-share-minute)Partitions > [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog > flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 > 05:50:49.842|WARN > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread > [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting > StreamTasks stores to recreate from > scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets > out of range with no configured reset policy for partitions: > \{flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at > org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 > 05:50:49.842|INFO > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread > [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 > ProcessorTopology: KSTREAM-SOURCE-70: topics: > [flash-app-xmc-worker-share-store-minute-repartition] children: > [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: > [worker-share-store-minute] children: [KTABLE-TOSTREAM-71] > KTABLE-TOSTREAM-71: children: [KSTREAM-SINK-72] > KSTREAM-SINK-72: topic: > StaticTopicNameExtractor(xmc-worker-share-minute)Partitions > [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog > flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 > 05:50:49.905|WARN > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread > [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting > StreamTasks stores to recreate from > scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets > out of range with no configured reset policy for partitions: > \{flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at > org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 > 05:50:49.906|INFO > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread > [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 > ProcessorTopology: KSTREAM-SOURCE-70: topics: > [flash-app-xmc-worker-share-store-minute-repartition] children: > [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: > [worker-share-store-minute] > {quote} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9210) kafka stream loss data
[ https://issues.apache.org/jira/browse/KAFKA-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-9210: --- Description: kafka broker: 2.0.1 kafka stream client: 2.1.0 # two applications run at the same time # after some days,I stop one application(in k8s) # The flollowing log occured and I check the data and find that value is less than what I expected. {quote}Partitions [flash-app-xmc-worker-share-store-minute-repartition-1]Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.816|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: \{flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 05:50:49.817|INFO |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 ProcessorTopology: KSTREAM-SOURCE-70: topics: [flash-app-xmc-worker-share-store-minute-repartition] children: [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: [worker-share-store-minute] children: [KTABLE-TOSTREAM-71] KTABLE-TOSTREAM-71: children: [KSTREAM-SINK-72] KSTREAM-SINK-72: topic: StaticTopicNameExtractor(xmc-worker-share-minute)Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.842|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: \{flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 05:50:49.842|INFO |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 ProcessorTopology: KSTREAM-SOURCE-70: topics: [flash-app-xmc-worker-share-store-minute-repartition] children: [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: [worker-share-store-minute] children: [KTABLE-TOSTREAM-71] KTABLE-TOSTREAM-71: children: [KSTREAM-SINK-72] KSTREAM-SINK-72: topic: StaticTopicNameExtractor(xmc-worker-share-minute)Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.905|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: \{flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 05:50:49.906|INFO |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 ProcessorTopology: KSTREAM-SOURCE-70: topics: [flash-app-xmc-worker-share-store-minute-repartition] children: [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: [worker-share-store-minute] {quote} was: kafka broker: 2.0.1 kafka stream client: 2.1.0 # two applications run at the same time # after some days,I stop one application(in k8s) # The flollowing log occured and I check the data and find that value is less than what I expected. {noformat} Partitions [flash-app-xmc-worker-share-store-minute-repartition-1]Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.816|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions:
[jira] [Updated] (KAFKA-9210) kafka stream loss data
[ https://issues.apache.org/jira/browse/KAFKA-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-9210: --- Description: kafka broker: 2.0.1 kafka stream client: 2.1.0 # two applications run at the same time # after some days,I stop one application(in k8s) # The flollowing log occured and I check the data and find that value is less than what I expected. {noformat} Partitions [flash-app-xmc-worker-share-store-minute-repartition-1]Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.816|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 05:50:49.817|INFO |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 ProcessorTopology: KSTREAM-SOURCE-70: topics: [flash-app-xmc-worker-share-store-minute-repartition] children: [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: [worker-share-store-minute] children: [KTABLE-TOSTREAM-71] KTABLE-TOSTREAM-71: children: [KSTREAM-SINK-72] KSTREAM-SINK-72: topic: StaticTopicNameExtractor(xmc-worker-share-minute)Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.842|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 05:50:49.842|INFO |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 ProcessorTopology: KSTREAM-SOURCE-70: topics: [flash-app-xmc-worker-share-store-minute-repartition] children: [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: [worker-share-store-minute] children: [KTABLE-TOSTREAM-71] KTABLE-TOSTREAM-71: children: [KSTREAM-SINK-72] KSTREAM-SINK-72: topic: StaticTopicNameExtractor(xmc-worker-share-minute)Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.905|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 05:50:49.906|INFO |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 ProcessorTopology: KSTREAM-SOURCE-70: topics: [flash-app-xmc-worker-share-store-minute-repartition] children: [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: [worker-share-store-minute] {noformat} was: kafka broker: 2.0.1 kafka stream client: 2.1.0 # two applications run at the same time # after some days,I stop one application(in k8s) # The flollowing log occured and I check the data and find that value is less than what I expected. ``` Partitions [flash-app-xmc-worker-share-store-minute-repartition-1]Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.816|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: