[jira] [Commented] (KAFKA-9543) Consumer offset reset after new segment rolling
[ https://issues.apache.org/jira/browse/KAFKA-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178099#comment-17178099 ] Andrey Klochkov commented on KAFKA-9543: This issue is marked as fixed in 2.4.2 and 2.5.1, with the last comment saying that it's fixed by KAFKA-9838, while KAFKA-9838 has a fix version of 2.6.0. Is this defect fixed in 2.5.1? > Consumer offset reset after new segment rolling > --- > > Key: KAFKA-9543 > URL: https://issues.apache.org/jira/browse/KAFKA-9543 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.4.0, 2.5.0, 2.4.1 >Reporter: Rafał Boniecki >Priority: Major > Fix For: 2.4.2, 2.5.1 > > Attachments: Untitled.png, image-2020-04-06-17-10-32-636.png > > > After upgrade from kafka 2.1.1 to 2.4.0, I'm experiencing unexpected consumer > offset resets. > Consumer: > {code:java} > 2020-02-12T11:12:58.402+01:00 hostname 4a2a39a35a02 > [2020-02-12T11:12:58,402][INFO > ][org.apache.kafka.clients.consumer.internals.Fetcher] [Consumer > clientId=logstash-1, groupId=logstash] Fetch offset 1632750575 is out of > range for partition stats-5, resetting offset > {code} > Broker: > {code:java} > 2020-02-12 11:12:58:400 CET INFO > [data-plane-kafka-request-handler-1][kafka.log.Log] [Log partition=stats-5, > dir=/kafka4/data] Rolled new log segment at offset 1632750565 in 2 ms.{code} > All resets are perfectly correlated to rolling new segments at the broker - > segment is rolled first, then, couple of ms later, reset on the consumer > occurs. Attached is grafana graph with consumer lag per partition. All sudden > spikes in lag are offset resets due to this bug. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [kafka] ijuma commented on pull request #9180: MINOR: corrected unit tests
ijuma commented on pull request #9180: URL: https://github.com/apache/kafka/pull/9180#issuecomment-674304221 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (KAFKA-9263) Reocurrence: Transient failure in kafka.api.PlaintextAdminIntegrationTest.testLogStartOffsetCheckpoint and kafka.api.PlaintextAdminIntegrationTest.testAlterReplicaLogDi
[ https://issues.apache.org/jira/browse/KAFKA-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178021#comment-17178021 ] Bill Bejeck commented on KAFKA-9263: Test failure [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/7843/testReport/junit/kafka.api/PlaintextAdminIntegrationTest/testAlterReplicaLogDirs/] {noformat} 2020-08-14 17:34:06,420] ERROR [ReplicaManager broker=0] Error while changing replica dir for partition topic-0 (kafka.server.ReplicaManager:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: Error while fetching partition state for topic-0 [2020-08-14 17:34:06,420] ERROR [ReplicaManager broker=1] Error while changing replica dir for partition topic-0 (kafka.server.ReplicaManager:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: Error while fetching partition state for topic-0 [2020-08-14 17:34:06,420] ERROR [ReplicaManager broker=2] Error while changing replica dir for partition topic-0 (kafka.server.ReplicaManager:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: Error while fetching partition state for topic-0 [2020-08-14 17:36:24,822] ERROR [Consumer instanceId=test_instance_id_1, clientId=test_client_id, groupId=test_group_id] Offset commit failed on partition test_topic-1 at offset 0: The coordinator is not aware of this member. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:1191) [2020-08-14 17:36:24,823] ERROR Thread Thread[Thread-4,5,FailOnTimeoutGroup] died (org.apache.zookeeper.server.NIOServerCnxnFactory:92) org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records. at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:1257) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:1164) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1132) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1107) at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:206) at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:169) at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:129) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:602) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:412) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:297) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:1006) at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1394) at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1348) at kafka.api.PlaintextAdminIntegrationTest$$anon$1.run(PlaintextAdminIntegrationTest.scala:1071) [2020-08-14 17:36:24,848] ERROR [Consumer instanceId=test_instance_id_2, clientId=test_client_id, groupId=test_group_id] Offset commit failed on partition test_topic1-0 at offset 0: Specified group generation id is not valid. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:1191) [2020-08-14 17:36:24,852] ERROR [Consumer clientId=test_client_id, groupId=test_group_id] Offset commit failed on partition test_topic2-0 at offset 0: Specified group generation id is not valid. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:1191) [2020-08-14 18:29:44,855] ERROR [ReplicaManager broker=1] Error while changing replica dir for partition topic-0 (kafka.server.ReplicaManager:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: Error while fetching partition state for topic-0 [2020-08-14 18:29:44,855] ERROR [ReplicaManager broker=0]
[GitHub] [kafka] bbejeck commented on pull request #9108: KAFKA-9273: Extract testShouldAutoShutdownOnIncompleteMetadata from S…
bbejeck commented on pull request #9108: URL: https://github.com/apache/kafka/pull/9108#issuecomment-674229113 Java 8 passed Java 14 failed with `org.apache.kafka.streams.integration.PurgeRepartitionTopicIntegrationTest.shouldRestoreState` Java 11 failed with `kafka.api.PlaintextAdminIntegrationTest.testAlterReplicaLogDirs` retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (KAFKA-10405) Flaky Test org.apache.kafka.streams.integration.PurgeRepartitionTopicIntegrationTest.shouldRestoreState
Bill Bejeck created KAFKA-10405: --- Summary: Flaky Test org.apache.kafka.streams.integration.PurgeRepartitionTopicIntegrationTest.shouldRestoreState Key: KAFKA-10405 URL: https://issues.apache.org/jira/browse/KAFKA-10405 Project: Kafka Issue Type: Test Components: streams Reporter: Bill Bejeck >From build [https://builds.apache.org/job/kafka-pr-jdk14-scala2.13/1979/] {noformat} org.apache.kafka.streams.integration.PurgeRepartitionTopicIntegrationTest > shouldRestoreState FAILED 14:25:19 java.lang.AssertionError: Condition not met within timeout 6. Repartition topic restore-test-KSTREAM-AGGREGATE-STATE-STORE-02-repartition not purged data after 6 ms. 14:25:19 at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:26) 14:25:19 at org.apache.kafka.test.TestUtils.lambda$waitForCondition$6(TestUtils.java:401) 14:25:19 at org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:449) 14:25:19 at org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:417) 14:25:19 at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:398) 14:25:19 at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:388) 14:25:19 at org.apache.kafka.streams.integration.PurgeRepartitionTopicIntegrationTest.shouldRestoreState(PurgeRepartitionTopicIntegrationTest.java:206){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
[ https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177995#comment-17177995 ] Rajini Sivaram commented on KAFKA-10158: [~bbyrne] We can close this now, right? > Fix flaky > kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress > --- > > Key: KAFKA-10158 > URL: https://issues.apache.org/jira/browse/KAFKA-10158 > Project: Kafka > Issue Type: Bug > Components: unit tests >Reporter: Chia-Ping Tsai >Assignee: Brian Byrne >Priority: Minor > Fix For: 2.7.0 > > > Altering the assignments is a async request so it is possible that the > reassignment is still in progress when we start to verify the > "under-replicated-partitions". In order to make it stable, it needs a wait > for the reassignment completion before verifying the topic command with > "under-replicated-partitions". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached
[ https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177994#comment-17177994 ] Vagesh Mathapati commented on KAFKA-10396: -- In my microservice i am using globalktable for 5 diff topic. So the configuration of rocksDB which i am setting is per ktable/kstream right? > Overall memory of container keep on growing due to kafka stream / rocksdb and > OOM killed once limit reached > --- > > Key: KAFKA-10396 > URL: https://issues.apache.org/jira/browse/KAFKA-10396 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.3.1, 2.5.0 >Reporter: Vagesh Mathapati >Priority: Critical > Attachments: CustomRocksDBConfig.java, MyStreamProcessor.java, > kafkaStreamConfig.java > > > We are observing that overall memory of our container keep on growing and > never came down. > After analysis find out that rocksdbjni.so is keep on allocating 64M chunks > of memory off-heap and never releases back. This causes OOM kill after memory > reaches configured limit. > We use Kafka stream and globalktable for our many kafka topics. > Below is our environment > * Kubernetes cluster > * openjdk 11.0.7 2020-04-14 LTS > * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS) > * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode) > * Springboot 2.3 > * spring-kafka-2.5.0 > * kafka-streams-2.5.0 > * kafka-streams-avro-serde-5.4.0 > * rocksdbjni-5.18.3 > Observed same result with kafka 2.3 version. > Below is the snippet of our analysis > from pmap output we took addresses from these 64M allocations (RSS) > Address Kbytes RSS Dirty Mode Mapping > 7f3ce800 65536 65532 65532 rw--- [ anon ] > 7f3cf400 65536 65536 65536 rw--- [ anon ] > 7f3d6400 65536 65536 65536 rw--- [ anon ] > We tried to match with memory allocation logs enabled with the help of Azul > systems team. > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff7ca0 > @ /tmp/librocksdbjni6564497922441568920.so: > _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4] > - 0x7f3ce8ff9780 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ff9750 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff97c0 > @ > /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > We also identified that content on this 64M is just 0s and no any data > present in it. > I tried to tune the rocksDB configuratino as mentioned but it did not helped. > [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config] > > Please let me know if you need any more details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached
[ https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177982#comment-17177982 ] Vagesh Mathapati commented on KAFKA-10396: -- [~desai.p.rohan] Thanks a lot . Looks like If i used WriteBufferManager config memory then memory does not grow after some limit. Attaching [^CustomRocksDBConfig.java] for reference . I can see some documentation and example related to it at [https://kafka.apache.org/23/documentation/streams/developer-guide/memory-mgmt.html] confluent documentation i think needs to be updated. Still need to test more to confirm but initial results looks promising. I am not able to come up with optimized number. Generally what numbers we should keep ? > Overall memory of container keep on growing due to kafka stream / rocksdb and > OOM killed once limit reached > --- > > Key: KAFKA-10396 > URL: https://issues.apache.org/jira/browse/KAFKA-10396 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.3.1, 2.5.0 >Reporter: Vagesh Mathapati >Priority: Critical > Attachments: CustomRocksDBConfig.java, MyStreamProcessor.java, > kafkaStreamConfig.java > > > We are observing that overall memory of our container keep on growing and > never came down. > After analysis find out that rocksdbjni.so is keep on allocating 64M chunks > of memory off-heap and never releases back. This causes OOM kill after memory > reaches configured limit. > We use Kafka stream and globalktable for our many kafka topics. > Below is our environment > * Kubernetes cluster > * openjdk 11.0.7 2020-04-14 LTS > * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS) > * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode) > * Springboot 2.3 > * spring-kafka-2.5.0 > * kafka-streams-2.5.0 > * kafka-streams-avro-serde-5.4.0 > * rocksdbjni-5.18.3 > Observed same result with kafka 2.3 version. > Below is the snippet of our analysis > from pmap output we took addresses from these 64M allocations (RSS) > Address Kbytes RSS Dirty Mode Mapping > 7f3ce800 65536 65532 65532 rw--- [ anon ] > 7f3cf400 65536 65536 65536 rw--- [ anon ] > 7f3d6400 65536 65536 65536 rw--- [ anon ] > We tried to match with memory allocation logs enabled with the help of Azul > systems team. > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff7ca0 > @ /tmp/librocksdbjni6564497922441568920.so: > _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4] > - 0x7f3ce8ff9780 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ff9750 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff97c0 > @ > /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > We also identified that content on this 64M is just 0s and no any data > present in it. > I tried to tune the rocksDB configuratino as mentioned but it did not helped. > [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config] > > Please let me know if you need any more details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [kafka] rajinisivaram opened a new pull request #9184: KAFKA-8033; Wait for NoOffsetForPartitionException in testFetchInvalidOffset
rajinisivaram opened a new pull request #9184: URL: https://github.com/apache/kafka/pull/9184 We wait only 50ms in consumer.poll() and expect NoOffsetForPartitionException, but the exception is thrown only when initializing partition offsets after coordinator is known. Increased poll timeout to make test reliable. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached
[ https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vagesh Mathapati updated KAFKA-10396: - Attachment: CustomRocksDBConfig.java > Overall memory of container keep on growing due to kafka stream / rocksdb and > OOM killed once limit reached > --- > > Key: KAFKA-10396 > URL: https://issues.apache.org/jira/browse/KAFKA-10396 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.3.1, 2.5.0 >Reporter: Vagesh Mathapati >Priority: Critical > Attachments: CustomRocksDBConfig.java, MyStreamProcessor.java, > kafkaStreamConfig.java > > > We are observing that overall memory of our container keep on growing and > never came down. > After analysis find out that rocksdbjni.so is keep on allocating 64M chunks > of memory off-heap and never releases back. This causes OOM kill after memory > reaches configured limit. > We use Kafka stream and globalktable for our many kafka topics. > Below is our environment > * Kubernetes cluster > * openjdk 11.0.7 2020-04-14 LTS > * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS) > * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode) > * Springboot 2.3 > * spring-kafka-2.5.0 > * kafka-streams-2.5.0 > * kafka-streams-avro-serde-5.4.0 > * rocksdbjni-5.18.3 > Observed same result with kafka 2.3 version. > Below is the snippet of our analysis > from pmap output we took addresses from these 64M allocations (RSS) > Address Kbytes RSS Dirty Mode Mapping > 7f3ce800 65536 65532 65532 rw--- [ anon ] > 7f3cf400 65536 65536 65536 rw--- [ anon ] > 7f3d6400 65536 65536 65536 rw--- [ anon ] > We tried to match with memory allocation logs enabled with the help of Azul > systems team. > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff7ca0 > @ /tmp/librocksdbjni6564497922441568920.so: > _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4] > - 0x7f3ce8ff9780 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ff9750 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff97c0 > @ > /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > We also identified that content on this 64M is just 0s and no any data > present in it. > I tried to tune the rocksDB configuratino as mentioned but it did not helped. > [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config] > > Please let me know if you need any more details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-5896) Kafka Connect task threads never interrupted
[ https://issues.apache.org/jira/browse/KAFKA-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177973#comment-17177973 ] Ewen Cheslack-Postava commented on KAFKA-5896: -- [~yazgoo] It looks like this was marked "Abandoned". Reiterating my previous comment: > Really my complaint is that Java's interrupt semantics are terrible and try >to give the feeling of preemption when it can't and therefore leaves a ton of >bugs and overpromised underdelivered guarantees in its wake. It's fine to reopen if you think it needs more discussion, I just don't see a way to actually fix the issue – Thread.interrupt doesn't do what we'd want and afaik the jvm doesn't provide anything that does. So I think given those constraints, it's probably better to identify the connector that is behaving badly and work with upstream to address it. > Kafka Connect task threads never interrupted > > > Key: KAFKA-5896 > URL: https://issues.apache.org/jira/browse/KAFKA-5896 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Nick Pillitteri >Assignee: Nick Pillitteri >Priority: Minor > > h2. Problem > Kafka Connect tasks associated with connectors are run in their own threads. > When tasks are stopped or restarted, a flag is set - {{stopping}} - to > indicate the task should stop processing records. However, if the thread the > task is running in is blocked (waiting for a lock or performing I/O) it's > possible the task will never stop. > I've created a connector specifically to demonstrate this issue (along with > some more detailed instructions for reproducing the issue): > https://github.com/smarter-travel-media/hang-connector > I believe this is an issue because it means that a single badly behaved > connector (any connector that does I/O without timeouts) can cause the Kafka > Connect worker to get into a state where the only solution is to restart the > JVM. > I think, but couldn't reproduce, that this is the cause of this problem on > Stack Overflow: > https://stackoverflow.com/questions/43802156/inconsistent-connector-state-connectexception-task-already-exists-in-this-work > h2. Expected Result > I would expect the Worker to eventually interrupt the thread that the task is > running in. In the past across various other libraries, this is what I've > seen done when a thread needs to be forcibly stopped. > h2. Actual Result > In actuality, the Worker sets a {{stopping}} flag and lets the thread run > indefinitely. It uses a timeout while waiting for the task to stop but after > this timeout has expired it simply sets a {{cancelled}} flag. This means that > every time a task is restarted, a new thread running the task will be > created. Thus a task may end up with multiple instances all running in their > own threads when there's only supposed to be a single thread. > h2. Steps to Reproduce > The problem can be replicated by using the connector available here: > https://github.com/smarter-travel-media/hang-connector > Apologies for how involved the steps are. > I've created a patch that forcibly interrupts threads after they fail to > gracefully shutdown here: > https://github.com/smarter-travel-media/kafka/commit/295c747a9fd82ee8b30556c89c31e0bfcce5a2c5 > I've confirmed that this fixes the issue. I can add some unit tests and > submit a PR if people agree that this is a bug and interrupting threads is > the right fix. > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-8033) Flaky Test PlaintextConsumerTest#testFetchInvalidOffset
[ https://issues.apache.org/jira/browse/KAFKA-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajini Sivaram reassigned KAFKA-8033: - Assignee: Rajini Sivaram > Flaky Test PlaintextConsumerTest#testFetchInvalidOffset > --- > > Key: KAFKA-8033 > URL: https://issues.apache.org/jira/browse/KAFKA-8033 > Project: Kafka > Issue Type: Bug > Components: core, unit tests >Affects Versions: 2.3.0 >Reporter: Matthias J. Sax >Assignee: Rajini Sivaram >Priority: Critical > Labels: flaky-test > Fix For: 2.7.0, 2.6.1 > > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/2829/testReport/junit/kafka.api/PlaintextConsumerTest/testFetchInvalidOffset/] > {quote}org.scalatest.junit.JUnitTestFailedError: Expected exception > org.apache.kafka.clients.consumer.NoOffsetForPartitionException to be thrown, > but no exception was thrown{quote} > STDOUT prints this over and over again: > {quote}[2019-03-02 04:01:25,576] ERROR [ReplicaFetcher replicaId=0, > leaderId=1, fetcherId=0] Error for partition __consumer_offsets-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76){quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [kafka] ijuma commented on pull request #9067: MINOR: Streams integration tests should not call exit
ijuma commented on pull request #9067: URL: https://github.com/apache/kafka/pull/9067#issuecomment-674193106 @mjsax did you run checkstyle and spotBugs before pushing the 2.6 change? It seems like it broke the build: https://ci-builds.apache.org/job/Kafka/job/kafka-2.6-jdk8/3/console This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] rajinisivaram opened a new pull request #9183: KAFKA-10404; Use higher poll timeout to avoid rebalance in testCoordinatorFailover
rajinisivaram opened a new pull request #9183: URL: https://github.com/apache/kafka/pull/9183 Tests use 6s poll timeout, which isn't sufficient to ensure that clients don't leave the group due to poll timeout. The PR increases poll timeout to 15s. Session timeout of 5s is also low, but leaving it as-is for now with lower heartbeat timeout to make sure we catch unexpected rebalances during the failover. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (KAFKA-10404) Flaky Test kafka.api.SaslSslConsumerTest.testCoordinatorFailover
[ https://issues.apache.org/jira/browse/KAFKA-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajini Sivaram reassigned KAFKA-10404: -- Assignee: Rajini Sivaram > Flaky Test kafka.api.SaslSslConsumerTest.testCoordinatorFailover > > > Key: KAFKA-10404 > URL: https://issues.apache.org/jira/browse/KAFKA-10404 > Project: Kafka > Issue Type: Test > Components: core, unit tests >Reporter: Bill Bejeck >Assignee: Rajini Sivaram >Priority: Major > > From build [https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3829/] > > {noformat} > kafka.api.SaslSslConsumerTest > testCoordinatorFailover FAILED > 11:27:15 java.lang.AssertionError: expected: but > was: commit cannot be completed since the consumer is not part of an active group > for auto partition assignment; it is likely that the consumer was kicked out > of the group.)> > 11:27:15 at org.junit.Assert.fail(Assert.java:89) > 11:27:15 at org.junit.Assert.failNotEquals(Assert.java:835) > 11:27:15 at org.junit.Assert.assertEquals(Assert.java:120) > 11:27:15 at org.junit.Assert.assertEquals(Assert.java:146) > 11:27:15 at > kafka.api.AbstractConsumerTest.sendAndAwaitAsyncCommit(AbstractConsumerTest.scala:195) > 11:27:15 at > kafka.api.AbstractConsumerTest.ensureNoRebalance(AbstractConsumerTest.scala:302) > 11:27:15 at > kafka.api.BaseConsumerTest.testCoordinatorFailover(BaseConsumerTest.scala:76) > 11:27:15 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > 11:27:15 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 11:27:15 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 11:27:15 at java.lang.reflect.Method.invoke(Method.java:498) > 11:27:15 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > 11:27:15 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 11:27:15 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > 11:27:15 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 11:27:15 at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > 11:27:15 at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > 11:27:15 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > 11:27:15 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > 11:27:15 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > 11:27:15 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > 11:27:15 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > 11:27:15 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > 11:27:15 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > 11:27:15 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > 11:27:15 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > 11:27:15 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > 11:27:15 at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > 11:27:15 at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > 11:27:15 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > 11:27:15 at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > 11:27:15 at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) > 11:27:15 at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > 11:27:15 at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > 11:27:15 at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) > 11:27:15 at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > 11:27:15 at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown > Source) > 11:27:15 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 11:27:15 at java.lang.reflect.Method.invoke(Method.java:498) > 11:27:15 at >
[GitHub] [kafka] bbejeck commented on pull request #9108: KAFKA-9273: Extract testShouldAutoShutdownOnIncompleteMetadata from S…
bbejeck commented on pull request #9108: URL: https://github.com/apache/kafka/pull/9108#issuecomment-674162578 Java 8 failed on an unrelated test `kafka.api.SaslSslConsumerTest.testCoordinatorFailover` created a Jira ticket. retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (KAFKA-10404) Flaky Test kafka.api.SaslSslConsumerTest.testCoordinatorFailover
Bill Bejeck created KAFKA-10404: --- Summary: Flaky Test kafka.api.SaslSslConsumerTest.testCoordinatorFailover Key: KAFKA-10404 URL: https://issues.apache.org/jira/browse/KAFKA-10404 Project: Kafka Issue Type: Test Components: core, unit tests Reporter: Bill Bejeck >From build [https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3829/] {noformat} kafka.api.SaslSslConsumerTest > testCoordinatorFailover FAILED 11:27:15 java.lang.AssertionError: expected: but was: 11:27:15 at org.junit.Assert.fail(Assert.java:89) 11:27:15 at org.junit.Assert.failNotEquals(Assert.java:835) 11:27:15 at org.junit.Assert.assertEquals(Assert.java:120) 11:27:15 at org.junit.Assert.assertEquals(Assert.java:146) 11:27:15 at kafka.api.AbstractConsumerTest.sendAndAwaitAsyncCommit(AbstractConsumerTest.scala:195) 11:27:15 at kafka.api.AbstractConsumerTest.ensureNoRebalance(AbstractConsumerTest.scala:302) 11:27:15 at kafka.api.BaseConsumerTest.testCoordinatorFailover(BaseConsumerTest.scala:76) 11:27:15 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 11:27:15 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 11:27:15 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 11:27:15 at java.lang.reflect.Method.invoke(Method.java:498) 11:27:15 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) 11:27:15 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) 11:27:15 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) 11:27:15 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) 11:27:15 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 11:27:15 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 11:27:15 at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 11:27:15 at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) 11:27:15 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 11:27:15 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) 11:27:15 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) 11:27:15 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 11:27:15 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 11:27:15 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 11:27:15 at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 11:27:15 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 11:27:15 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 11:27:15 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 11:27:15 at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 11:27:15 at org.junit.runners.ParentRunner.run(ParentRunner.java:413) 11:27:15 at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) 11:27:15 at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) 11:27:15 at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) 11:27:15 at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) 11:27:15 at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) 11:27:15 at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) 11:27:15 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 11:27:15 at java.lang.reflect.Method.invoke(Method.java:498) 11:27:15 at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) 11:27:15 at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) 11:27:15 at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) 11:27:15 at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) 11:27:15 at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) 11:27:15 at
[GitHub] [kafka] ijuma commented on pull request #9182: KAFKA-10403 Replace scala collection by java collection in generating…
ijuma commented on pull request #9182: URL: https://github.com/apache/kafka/pull/9182#issuecomment-674160788 Can you please include more detail on how it fails currently? I assume it's some kind of deserialization error? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] chia7712 opened a new pull request #9182: KAFKA-10403 Replace scala collection by java collection in generating…
chia7712 opened a new pull request #9182: URL: https://github.com/apache/kafka/pull/9182 issue: https://issues.apache.org/jira/browse/KAFKA-10403 It seems to me the metrics is a "kind" of public interface so users should be able to access metrics of kafka server without scala library. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (KAFKA-10403) Replace scala collection by java collection in generating MBeans attributes
Chia-Ping Tsai created KAFKA-10403: -- Summary: Replace scala collection by java collection in generating MBeans attributes Key: KAFKA-10403 URL: https://issues.apache.org/jira/browse/KAFKA-10403 Project: Kafka Issue Type: Improvement Reporter: Chia-Ping Tsai Assignee: Chia-Ping Tsai It seems to me the metrics is a "kind" of public interface so users should be able to access metrics of kafka server without scala library. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-5896) Kafka Connect task threads never interrupted
[ https://issues.apache.org/jira/browse/KAFKA-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177821#comment-17177821 ] yazgoo edited comment on KAFKA-5896 at 8/14/20, 3:11 PM: - I still encountered this exception in CP 5.5, shouldn't this be re-opened ? was (Author: yazgoo): I still encountered this exception in CP 2.5, shouldn't this be re-opened ? > Kafka Connect task threads never interrupted > > > Key: KAFKA-5896 > URL: https://issues.apache.org/jira/browse/KAFKA-5896 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Nick Pillitteri >Assignee: Nick Pillitteri >Priority: Minor > > h2. Problem > Kafka Connect tasks associated with connectors are run in their own threads. > When tasks are stopped or restarted, a flag is set - {{stopping}} - to > indicate the task should stop processing records. However, if the thread the > task is running in is blocked (waiting for a lock or performing I/O) it's > possible the task will never stop. > I've created a connector specifically to demonstrate this issue (along with > some more detailed instructions for reproducing the issue): > https://github.com/smarter-travel-media/hang-connector > I believe this is an issue because it means that a single badly behaved > connector (any connector that does I/O without timeouts) can cause the Kafka > Connect worker to get into a state where the only solution is to restart the > JVM. > I think, but couldn't reproduce, that this is the cause of this problem on > Stack Overflow: > https://stackoverflow.com/questions/43802156/inconsistent-connector-state-connectexception-task-already-exists-in-this-work > h2. Expected Result > I would expect the Worker to eventually interrupt the thread that the task is > running in. In the past across various other libraries, this is what I've > seen done when a thread needs to be forcibly stopped. > h2. Actual Result > In actuality, the Worker sets a {{stopping}} flag and lets the thread run > indefinitely. It uses a timeout while waiting for the task to stop but after > this timeout has expired it simply sets a {{cancelled}} flag. This means that > every time a task is restarted, a new thread running the task will be > created. Thus a task may end up with multiple instances all running in their > own threads when there's only supposed to be a single thread. > h2. Steps to Reproduce > The problem can be replicated by using the connector available here: > https://github.com/smarter-travel-media/hang-connector > Apologies for how involved the steps are. > I've created a patch that forcibly interrupts threads after they fail to > gracefully shutdown here: > https://github.com/smarter-travel-media/kafka/commit/295c747a9fd82ee8b30556c89c31e0bfcce5a2c5 > I've confirmed that this fixes the issue. I can add some unit tests and > submit a PR if people agree that this is a bug and interrupting threads is > the right fix. > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-5896) Kafka Connect task threads never interrupted
[ https://issues.apache.org/jira/browse/KAFKA-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177821#comment-17177821 ] yazgoo commented on KAFKA-5896: --- I still encountered this exception in CP 2.5, shouldn't this be re-opened ? > Kafka Connect task threads never interrupted > > > Key: KAFKA-5896 > URL: https://issues.apache.org/jira/browse/KAFKA-5896 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Nick Pillitteri >Assignee: Nick Pillitteri >Priority: Minor > > h2. Problem > Kafka Connect tasks associated with connectors are run in their own threads. > When tasks are stopped or restarted, a flag is set - {{stopping}} - to > indicate the task should stop processing records. However, if the thread the > task is running in is blocked (waiting for a lock or performing I/O) it's > possible the task will never stop. > I've created a connector specifically to demonstrate this issue (along with > some more detailed instructions for reproducing the issue): > https://github.com/smarter-travel-media/hang-connector > I believe this is an issue because it means that a single badly behaved > connector (any connector that does I/O without timeouts) can cause the Kafka > Connect worker to get into a state where the only solution is to restart the > JVM. > I think, but couldn't reproduce, that this is the cause of this problem on > Stack Overflow: > https://stackoverflow.com/questions/43802156/inconsistent-connector-state-connectexception-task-already-exists-in-this-work > h2. Expected Result > I would expect the Worker to eventually interrupt the thread that the task is > running in. In the past across various other libraries, this is what I've > seen done when a thread needs to be forcibly stopped. > h2. Actual Result > In actuality, the Worker sets a {{stopping}} flag and lets the thread run > indefinitely. It uses a timeout while waiting for the task to stop but after > this timeout has expired it simply sets a {{cancelled}} flag. This means that > every time a task is restarted, a new thread running the task will be > created. Thus a task may end up with multiple instances all running in their > own threads when there's only supposed to be a single thread. > h2. Steps to Reproduce > The problem can be replicated by using the connector available here: > https://github.com/smarter-travel-media/hang-connector > Apologies for how involved the steps are. > I've created a patch that forcibly interrupts threads after they fail to > gracefully shutdown here: > https://github.com/smarter-travel-media/kafka/commit/295c747a9fd82ee8b30556c89c31e0bfcce5a2c5 > I've confirmed that this fixes the issue. I can add some unit tests and > submit a PR if people agree that this is a bug and interrupting threads is > the right fix. > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [kafka] rajinisivaram opened a new pull request #9181: KAFKA-9516; Increase timeout in testNonBlockingProducer to make it more reliable
rajinisivaram opened a new pull request #9181: URL: https://github.com/apache/kafka/pull/9181 The test has been timing out occasionally and it is on the first send on a producer, so increasing timeout to 30s similar to some of the other timeouts in BaseProducerSendTest. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (KAFKA-9516) Flaky Test PlaintextProducerSendTest#testNonBlockingProducer
[ https://issues.apache.org/jira/browse/KAFKA-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajini Sivaram reassigned KAFKA-9516: - Assignee: Rajini Sivaram > Flaky Test PlaintextProducerSendTest#testNonBlockingProducer > > > Key: KAFKA-9516 > URL: https://issues.apache.org/jira/browse/KAFKA-9516 > Project: Kafka > Issue Type: Bug > Components: core, producer , tools, unit tests >Reporter: Matthias J. Sax >Assignee: Rajini Sivaram >Priority: Critical > Labels: flaky-test > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/4521/testReport/junit/kafka.api/PlaintextProducerSendTest/testNonBlockingProducer/] > {quote}java.util.concurrent.TimeoutException: Timeout after waiting for 1 > ms. at > org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:78) > at > org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30) > at > kafka.api.PlaintextProducerSendTest.verifySendSuccess$1(PlaintextProducerSendTest.scala:148) > at > kafka.api.PlaintextProducerSendTest.testNonBlockingProducer(PlaintextProducerSendTest.scala:172){quote} > {quote} > h3. Standard Output > [2020-02-06 03:35:27,912] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=0] Error for partition topic-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2020-02-06 03:35:50,812] ERROR > [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition > topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2020-02-06 03:35:51,015] ERROR > [ReplicaManager broker=0] Error processing append operation on partition > topic-0 (kafka.server.ReplicaManager:76) > org.apache.kafka.common.errors.InvalidTimestampException: One or more records > have been rejected due to invalid timestamp [2020-02-06 03:35:51,027] ERROR > [ReplicaManager broker=0] Error processing append operation on partition > topic-0 (kafka.server.ReplicaManager:76) > org.apache.kafka.common.errors.InvalidTimestampException: One or more records > have been rejected due to invalid timestamp [2020-02-06 03:35:53,127] ERROR > [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition > topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2020-02-06 03:35:58,617] ERROR > [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition > topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2020-02-06 03:36:01,843] ERROR > [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition > topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2020-02-06 03:36:05,111] ERROR > [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition > topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2020-02-06 03:36:08,383] ERROR > [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition > topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2020-02-06 03:36:08,383] ERROR > [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition > topic-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2020-02-06 03:36:12,582] ERROR > [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition > topic-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2020-02-06 03:36:12,582] ERROR > [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition > topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2020-02-06 03:36:15,902] ERROR > [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition >
[GitHub] [kafka] mumrah commented on a change in pull request #9100: Add AlterISR RPC and use it for ISR modifications
mumrah commented on a change in pull request #9100: URL: https://github.com/apache/kafka/pull/9100#discussion_r470678123 ## File path: core/src/main/scala/kafka/cluster/Partition.scala ## @@ -255,6 +255,10 @@ class Partition(val topicPartition: TopicPartition, def isAddingReplica(replicaId: Int): Boolean = assignmentState.isAddingReplica(replicaId) + // For advancing the HW we assume the largest ISR even if the controller hasn't made the change yet + // This set includes the latest ISR (as we learned from LeaderAndIsr) and any replicas from a pending ISR expansion + def effectiveInSyncReplicaIds: Set[Int] = inSyncReplicaIds | pendingInSyncReplicaIds Review comment: Since we are now only allowing one in-flight AlterIsr, I changed the semantics of pendingInSyncReplicaIds to be the maximal "effective" ISR. This way we don't need to compute it each time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] mumrah commented on a change in pull request #9100: Add AlterISR RPC and use it for ISR modifications
mumrah commented on a change in pull request #9100: URL: https://github.com/apache/kafka/pull/9100#discussion_r470675300 ## File path: core/src/main/scala/kafka/server/AlterIsrChannelManager.scala ## @@ -0,0 +1,132 @@ +package kafka.server + +import java.util +import java.util.concurrent.{ScheduledFuture, TimeUnit} +import java.util.concurrent.atomic.AtomicLong + +import kafka.api.LeaderAndIsr +import kafka.metrics.KafkaMetricsGroup +import kafka.utils.{Logging, Scheduler} +import kafka.zk.KafkaZkClient +import org.apache.kafka.clients.ClientResponse +import org.apache.kafka.common.TopicPartition +import org.apache.kafka.common.message.AlterIsrRequestData.{AlterIsrRequestPartitions, AlterIsrRequestTopics} +import org.apache.kafka.common.message.{AlterIsrRequestData, AlterIsrResponseData} +import org.apache.kafka.common.protocol.Errors +import org.apache.kafka.common.requests.{AlterIsrRequest, AlterIsrResponse} + +import scala.collection.mutable +import scala.jdk.CollectionConverters._ + +/** + * Handles the sending of AlterIsr requests to the controller. Updating the ISR is an asynchronous operation, + * so partitions will learn about updates through LeaderAndIsr messages sent from the controller + */ +trait AlterIsrChannelManager { + val IsrChangePropagationBlackOut = 5000L + val IsrChangePropagationInterval = 6L + + def enqueueIsrUpdate(alterIsrItem: AlterIsrItem): Unit + + def clearPending(topicPartition: TopicPartition): Unit + + def startup(): Unit + + def shutdown(): Unit +} + +case class AlterIsrItem(topicPartition: TopicPartition, leaderAndIsr: LeaderAndIsr) + +class AlterIsrChannelManagerImpl(val controllerChannelManager: BrokerToControllerChannelManager, + val zkClient: KafkaZkClient, + val scheduler: Scheduler, + val brokerId: Int, + val brokerEpoch: Long) extends AlterIsrChannelManager with Logging with KafkaMetricsGroup { + + private val pendingIsrUpdates: mutable.Map[TopicPartition, AlterIsrItem] = new mutable.HashMap[TopicPartition, AlterIsrItem]() + private val lastIsrChangeMs = new AtomicLong(0) + private val lastIsrPropagationMs = new AtomicLong(0) + + @volatile private var scheduledRequest: Option[ScheduledFuture[_]] = None + + override def enqueueIsrUpdate(alterIsrItem: AlterIsrItem): Unit = { +pendingIsrUpdates synchronized { + pendingIsrUpdates(alterIsrItem.topicPartition) = alterIsrItem + lastIsrChangeMs.set(System.currentTimeMillis()) + // Rather than sending right away, we'll delay at most 50ms to allow for batching of ISR changes happening + // in fast succession + if (scheduledRequest.isEmpty) { +scheduledRequest = Some(scheduler.schedule("propagate-alter-isr", propagateIsrChanges, 50, -1, TimeUnit.MILLISECONDS)) + } +} + } + + override def clearPending(topicPartition: TopicPartition): Unit = { +pendingIsrUpdates synchronized { + // when we get a new LeaderAndIsr, we clear out any pending requests + pendingIsrUpdates.remove(topicPartition) Review comment: With the latest changes to prevent multiple in-flight requests, I don't think this should happen for a given partition. Even if it did, the retried in-flight request from BrokerToControllerRequestThread would fail on the controller with an old version. I'm wondering if we even need this clearPending behavior. Since I changed the AlterIsr request to fire at most after 50ms, it's a narrow window between enqueueing an ISR update and receiving a LeaderAndIsr. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] mumrah commented on a change in pull request #9100: Add AlterISR RPC and use it for ISR modifications
mumrah commented on a change in pull request #9100: URL: https://github.com/apache/kafka/pull/9100#discussion_r470673412 ## File path: core/src/main/scala/kafka/server/AlterIsrChannelManager.scala ## @@ -0,0 +1,132 @@ +package kafka.server + +import java.util +import java.util.concurrent.{ScheduledFuture, TimeUnit} +import java.util.concurrent.atomic.AtomicLong + +import kafka.api.LeaderAndIsr +import kafka.metrics.KafkaMetricsGroup +import kafka.utils.{Logging, Scheduler} +import kafka.zk.KafkaZkClient +import org.apache.kafka.clients.ClientResponse +import org.apache.kafka.common.TopicPartition +import org.apache.kafka.common.message.AlterIsrRequestData.{AlterIsrRequestPartitions, AlterIsrRequestTopics} +import org.apache.kafka.common.message.{AlterIsrRequestData, AlterIsrResponseData} +import org.apache.kafka.common.protocol.Errors +import org.apache.kafka.common.requests.{AlterIsrRequest, AlterIsrResponse} + +import scala.collection.mutable +import scala.jdk.CollectionConverters._ + +/** + * Handles the sending of AlterIsr requests to the controller. Updating the ISR is an asynchronous operation, + * so partitions will learn about updates through LeaderAndIsr messages sent from the controller + */ +trait AlterIsrChannelManager { + val IsrChangePropagationBlackOut = 5000L + val IsrChangePropagationInterval = 6L + + def enqueueIsrUpdate(alterIsrItem: AlterIsrItem): Unit + + def clearPending(topicPartition: TopicPartition): Unit + + def startup(): Unit + + def shutdown(): Unit +} + +case class AlterIsrItem(topicPartition: TopicPartition, leaderAndIsr: LeaderAndIsr) + +class AlterIsrChannelManagerImpl(val controllerChannelManager: BrokerToControllerChannelManager, + val zkClient: KafkaZkClient, + val scheduler: Scheduler, + val brokerId: Int, + val brokerEpoch: Long) extends AlterIsrChannelManager with Logging with KafkaMetricsGroup { Review comment: Fixed this by adding getBrokerEpoch to KafkaZkClient This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (KAFKA-10402) Upgrade python version in system tests
Nikolay Izhikov created KAFKA-10402: --- Summary: Upgrade python version in system tests Key: KAFKA-10402 URL: https://issues.apache.org/jira/browse/KAFKA-10402 Project: Kafka Issue Type: Improvement Reporter: Nikolay Izhikov Assignee: Nikolay Izhikov Currently, system tests using python 2 which is outdated and not supported. Since all dependency of system tests including ducktape supporting python 3 we can migrate system tests to python3. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [kafka] bbejeck commented on pull request #9108: KAFKA-9273: Extract testShouldAutoShutdownOnIncompleteMetadata from S…
bbejeck commented on pull request #9108: URL: https://github.com/apache/kafka/pull/9108#issuecomment-674091814 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] sanketfajage opened a new pull request #9180: MINOR: corrected unit tests
sanketfajage opened a new pull request #9180: URL: https://github.com/apache/kafka/pull/9180 Corrected unit tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] jeqo commented on a change in pull request #9137: KAFKA-9929: Support reverse iterator on KeyValueStore
jeqo commented on a change in pull request #9137: URL: https://github.com/apache/kafka/pull/9137#discussion_r470536034 ## File path: streams/src/test/java/org/apache/kafka/streams/state/internals/AbstractKeyValueStoreTest.java ## @@ -188,7 +188,55 @@ public void testPutGetRange() { } @Test -public void testPutGetRangeWithDefaultSerdes() { +public void testPutGetReverseRange() { Review comment: Just realized that, I also thought that path was tested. Good catch! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached
[ https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177610#comment-17177610 ] Rohan Desai commented on KAFKA-10396: - In addition to configuring the write buffer manager, make sure you close the iterator you open in {{getDefaultDataFromStore}} > Overall memory of container keep on growing due to kafka stream / rocksdb and > OOM killed once limit reached > --- > > Key: KAFKA-10396 > URL: https://issues.apache.org/jira/browse/KAFKA-10396 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.3.1, 2.5.0 >Reporter: Vagesh Mathapati >Priority: Critical > Attachments: MyStreamProcessor.java, kafkaStreamConfig.java > > > We are observing that overall memory of our container keep on growing and > never came down. > After analysis find out that rocksdbjni.so is keep on allocating 64M chunks > of memory off-heap and never releases back. This causes OOM kill after memory > reaches configured limit. > We use Kafka stream and globalktable for our many kafka topics. > Below is our environment > * Kubernetes cluster > * openjdk 11.0.7 2020-04-14 LTS > * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS) > * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode) > * Springboot 2.3 > * spring-kafka-2.5.0 > * kafka-streams-2.5.0 > * kafka-streams-avro-serde-5.4.0 > * rocksdbjni-5.18.3 > Observed same result with kafka 2.3 version. > Below is the snippet of our analysis > from pmap output we took addresses from these 64M allocations (RSS) > Address Kbytes RSS Dirty Mode Mapping > 7f3ce800 65536 65532 65532 rw--- [ anon ] > 7f3cf400 65536 65536 65536 rw--- [ anon ] > 7f3d6400 65536 65536 65536 rw--- [ anon ] > We tried to match with memory allocation logs enabled with the help of Azul > systems team. > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff7ca0 > @ /tmp/librocksdbjni6564497922441568920.so: > _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4] > - 0x7f3ce8ff9780 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ff9750 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff97c0 > @ > /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > We also identified that content on this 64M is just 0s and no any data > present in it. > I tried to tune the rocksDB configuratino as mentioned but it did not helped. > [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config] > > Please let me know if you need any more details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached
[ https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177579#comment-17177579 ] Vagesh Mathapati commented on KAFKA-10396: -- I can see below output for pmap -X 7f806e181000 r-xp 08:90 13 540 204 110 204 0 0 0 libjemalloc.so 7f806e208000 ---p 00087000 08:90 13 2044 0 0 0 0 0 0 libjemalloc.so 7f806e407000 r--p 00086000 08:90 13 24 24 20 24 16 0 0 libjemalloc.so 7f806e40d000 rw-p 0008c000 08:90 13 4 4 4 4 4 0 0 libjemalloc.so > Overall memory of container keep on growing due to kafka stream / rocksdb and > OOM killed once limit reached > --- > > Key: KAFKA-10396 > URL: https://issues.apache.org/jira/browse/KAFKA-10396 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.3.1, 2.5.0 >Reporter: Vagesh Mathapati >Priority: Critical > Attachments: MyStreamProcessor.java, kafkaStreamConfig.java > > > We are observing that overall memory of our container keep on growing and > never came down. > After analysis find out that rocksdbjni.so is keep on allocating 64M chunks > of memory off-heap and never releases back. This causes OOM kill after memory > reaches configured limit. > We use Kafka stream and globalktable for our many kafka topics. > Below is our environment > * Kubernetes cluster > * openjdk 11.0.7 2020-04-14 LTS > * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS) > * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode) > * Springboot 2.3 > * spring-kafka-2.5.0 > * kafka-streams-2.5.0 > * kafka-streams-avro-serde-5.4.0 > * rocksdbjni-5.18.3 > Observed same result with kafka 2.3 version. > Below is the snippet of our analysis > from pmap output we took addresses from these 64M allocations (RSS) > Address Kbytes RSS Dirty Mode Mapping > 7f3ce800 65536 65532 65532 rw--- [ anon ] > 7f3cf400 65536 65536 65536 rw--- [ anon ] > 7f3d6400 65536 65536 65536 rw--- [ anon ] > We tried to match with memory allocation logs enabled with the help of Azul > systems team. > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff7ca0 > @ /tmp/librocksdbjni6564497922441568920.so: > _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4] > - 0x7f3ce8ff9780 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ff9750 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff97c0 > @ > /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > We also identified that content on this 64M is just 0s and no any data > present in it. > I tried to tune the rocksDB configuratino as mentioned but it did not helped. > [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config] > > Please let me know if you need any more details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [kafka] chia7712 commented on pull request #9162: MINOR: refactor Log to get rid of "return" in nested anonymous function
chia7712 commented on pull request #9162: URL: https://github.com/apache/kafka/pull/9162#issuecomment-673942147 only ```shouldUpgradeFromEosAlphaToEosBeta``` fails. retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached
[ https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177577#comment-17177577 ] Rohan Desai commented on KAFKA-10396: - It may also be a different problem. Something else to try would be to configure the write buffer manager from your rocksdb config setter (I don't think the example in the streams doc does this). You could take a look at the ksqlDB config-setter as an example: [https://github.com/confluentinc/ksql/blob/master/ksqldb-rocksdb-config-setter/src/main/java/io/confluent/ksql/rocksdb/KsqlBoundedMemoryRocksDBConfigSetter.java] > Overall memory of container keep on growing due to kafka stream / rocksdb and > OOM killed once limit reached > --- > > Key: KAFKA-10396 > URL: https://issues.apache.org/jira/browse/KAFKA-10396 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.3.1, 2.5.0 >Reporter: Vagesh Mathapati >Priority: Critical > Attachments: MyStreamProcessor.java, kafkaStreamConfig.java > > > We are observing that overall memory of our container keep on growing and > never came down. > After analysis find out that rocksdbjni.so is keep on allocating 64M chunks > of memory off-heap and never releases back. This causes OOM kill after memory > reaches configured limit. > We use Kafka stream and globalktable for our many kafka topics. > Below is our environment > * Kubernetes cluster > * openjdk 11.0.7 2020-04-14 LTS > * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS) > * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode) > * Springboot 2.3 > * spring-kafka-2.5.0 > * kafka-streams-2.5.0 > * kafka-streams-avro-serde-5.4.0 > * rocksdbjni-5.18.3 > Observed same result with kafka 2.3 version. > Below is the snippet of our analysis > from pmap output we took addresses from these 64M allocations (RSS) > Address Kbytes RSS Dirty Mode Mapping > 7f3ce800 65536 65532 65532 rw--- [ anon ] > 7f3cf400 65536 65536 65536 rw--- [ anon ] > 7f3d6400 65536 65536 65536 rw--- [ anon ] > We tried to match with memory allocation logs enabled with the help of Azul > systems team. > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff7ca0 > @ /tmp/librocksdbjni6564497922441568920.so: > _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4] > - 0x7f3ce8ff9780 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ff9750 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff97c0 > @ > /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > We also identified that content on this 64M is just 0s and no any data > present in it. > I tried to tune the rocksDB configuratino as mentioned but it did not helped. > [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config] > > Please let me know if you need any more details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached
[ https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177574#comment-17177574 ] Rohan Desai commented on KAFKA-10396: - [~vmathapati] it should show up on your pmap. For example: {code:java} sudo pmap -X 13799 13799: ./a.out Address Perm Offset Device Inode Size Rss Pss Referenced Anonymous ShmemPmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss Locked ProtectionKey Mapping ... 7fa5c36da000 r--p 103:02 274900 28 28 28 28 0 0 0 00 0 0 0 libjemalloc.so.2 7fa5c36e1000 r-xp 7000 103:02 274900 608 396 396 396 0 0 0 00 0 0 0 libjemalloc.so.2 7fa5c3779000 r--p 0009f000 103:02 274900 64 64 64 64 0 0 0 00 0 0 0 libjemalloc.so.2 7fa5c3789000 ---p 000af000 103:02 2749004 0 0 0 0 0 0 00 0 0 0 libjemalloc.so.2 7fa5c378a000 r--p 000af000 103:02 274900 20 20 20 2016 0 0 00 0 0 0 libjemalloc.so.2 7fa5c378f000 rw-p 000b4000 103:02 2749004 4 4 4 4 0 0 00 0 0 0 libjemalloc.so.2 ...{code} > Overall memory of container keep on growing due to kafka stream / rocksdb and > OOM killed once limit reached > --- > > Key: KAFKA-10396 > URL: https://issues.apache.org/jira/browse/KAFKA-10396 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.3.1, 2.5.0 >Reporter: Vagesh Mathapati >Priority: Critical > Attachments: MyStreamProcessor.java, kafkaStreamConfig.java > > > We are observing that overall memory of our container keep on growing and > never came down. > After analysis find out that rocksdbjni.so is keep on allocating 64M chunks > of memory off-heap and never releases back. This causes OOM kill after memory > reaches configured limit. > We use Kafka stream and globalktable for our many kafka topics. > Below is our environment > * Kubernetes cluster > * openjdk 11.0.7 2020-04-14 LTS > * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS) > * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode) > * Springboot 2.3 > * spring-kafka-2.5.0 > * kafka-streams-2.5.0 > * kafka-streams-avro-serde-5.4.0 > * rocksdbjni-5.18.3 > Observed same result with kafka 2.3 version. > Below is the snippet of our analysis > from pmap output we took addresses from these 64M allocations (RSS) > Address Kbytes RSS Dirty Mode Mapping > 7f3ce800 65536 65532 65532 rw--- [ anon ] > 7f3cf400 65536 65536 65536 rw--- [ anon ] > 7f3d6400 65536 65536 65536 rw--- [ anon ] > We tried to match with memory allocation logs enabled with the help of Azul > systems team. > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff7ca0 > @ /tmp/librocksdbjni6564497922441568920.so: > _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4] > - 0x7f3ce8ff9780 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ff9750 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff97c0 > @ > /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: >
[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached
[ https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177573#comment-17177573 ] Vagesh Mathapati commented on KAFKA-10396: -- Is there anything which i can check to see if really jemalloc coming into picture. > Overall memory of container keep on growing due to kafka stream / rocksdb and > OOM killed once limit reached > --- > > Key: KAFKA-10396 > URL: https://issues.apache.org/jira/browse/KAFKA-10396 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.3.1, 2.5.0 >Reporter: Vagesh Mathapati >Priority: Critical > Attachments: MyStreamProcessor.java, kafkaStreamConfig.java > > > We are observing that overall memory of our container keep on growing and > never came down. > After analysis find out that rocksdbjni.so is keep on allocating 64M chunks > of memory off-heap and never releases back. This causes OOM kill after memory > reaches configured limit. > We use Kafka stream and globalktable for our many kafka topics. > Below is our environment > * Kubernetes cluster > * openjdk 11.0.7 2020-04-14 LTS > * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS) > * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode) > * Springboot 2.3 > * spring-kafka-2.5.0 > * kafka-streams-2.5.0 > * kafka-streams-avro-serde-5.4.0 > * rocksdbjni-5.18.3 > Observed same result with kafka 2.3 version. > Below is the snippet of our analysis > from pmap output we took addresses from these 64M allocations (RSS) > Address Kbytes RSS Dirty Mode Mapping > 7f3ce800 65536 65532 65532 rw--- [ anon ] > 7f3cf400 65536 65536 65536 rw--- [ anon ] > 7f3d6400 65536 65536 65536 rw--- [ anon ] > We tried to match with memory allocation logs enabled with the help of Azul > systems team. > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff7ca0 > @ /tmp/librocksdbjni6564497922441568920.so: > _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4] > - 0x7f3ce8ff9780 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ff9750 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff97c0 > @ > /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > We also identified that content on this 64M is just 0s and no any data > present in it. > I tried to tune the rocksDB configuratino as mentioned but it did not helped. > [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config] > > Please let me know if you need any more details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached
[ https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177566#comment-17177566 ] Vagesh Mathapati commented on KAFKA-10396: -- i tried with [https://centos.pkgs.org/8/epel-x86_64/jemalloc-5.2.1-2.el8.x86_64.rpm.html] and then used rpm command as i do not have apt installer on my kubernetes cluster. I can see "/usr/lib64/libjemalloc.so.2". So, i copied only this file into my pod and renamed it to libjemalloc.so and set the environment as follow. - name: LD_PRELOAD value: /cmservice/data/libjemalloc.so After the test run i still observed the same issue where memory keep on growing. > Overall memory of container keep on growing due to kafka stream / rocksdb and > OOM killed once limit reached > --- > > Key: KAFKA-10396 > URL: https://issues.apache.org/jira/browse/KAFKA-10396 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.3.1, 2.5.0 >Reporter: Vagesh Mathapati >Priority: Critical > Attachments: MyStreamProcessor.java, kafkaStreamConfig.java > > > We are observing that overall memory of our container keep on growing and > never came down. > After analysis find out that rocksdbjni.so is keep on allocating 64M chunks > of memory off-heap and never releases back. This causes OOM kill after memory > reaches configured limit. > We use Kafka stream and globalktable for our many kafka topics. > Below is our environment > * Kubernetes cluster > * openjdk 11.0.7 2020-04-14 LTS > * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS) > * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode) > * Springboot 2.3 > * spring-kafka-2.5.0 > * kafka-streams-2.5.0 > * kafka-streams-avro-serde-5.4.0 > * rocksdbjni-5.18.3 > Observed same result with kafka 2.3 version. > Below is the snippet of our analysis > from pmap output we took addresses from these 64M allocations (RSS) > Address Kbytes RSS Dirty Mode Mapping > 7f3ce800 65536 65532 65532 rw--- [ anon ] > 7f3cf400 65536 65536 65536 rw--- [ anon ] > 7f3d6400 65536 65536 65536 rw--- [ anon ] > We tried to match with memory allocation logs enabled with the help of Azul > systems team. > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff7ca0 > @ /tmp/librocksdbjni6564497922441568920.so: > _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4] > - 0x7f3ce8ff9780 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ff9750 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff97c0 > @ > /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > We also identified that content on this 64M is just 0s and no any data > present in it. > I tried to tune the rocksDB configuratino as mentioned but it did not helped. > [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config] > > Please let me know if you need any more details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-6585) Consolidate duplicated logic on reset tools
[ https://issues.apache.org/jira/browse/KAFKA-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177563#comment-17177563 ] Francesco Nobilia commented on KAFKA-6585: -- [~Samuel Hawker] are you still working on it? Otherwise, I am happy to take over. =) > Consolidate duplicated logic on reset tools > --- > > Key: KAFKA-6585 > URL: https://issues.apache.org/jira/browse/KAFKA-6585 > Project: Kafka > Issue Type: Improvement >Reporter: Guozhang Wang >Assignee: Samuel Hawker >Priority: Minor > Labels: newbie > > The consumer reset tool and streams reset tool today shares lot of common > logics such as resetting to a datetime etc. We can consolidate them into a > common class which directly depend on admin client at simply let these tools > to use the class. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-10401) GroupMetadataManager ignores current_state_timestamp field for GROUP_METADATA_VALUE_SCHEMA_V3
[ https://issues.apache.org/jira/browse/KAFKA-10401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek updated KAFKA-10401: -- Description: While reading group metadata information from ByteBuffer GroupMetadataManager reads current_state_timestamp only for group schema version 2. For all other versions this value is set to "None". Piece of code responsible for the bug: [https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412] Restarting kafka forces GroupMetadataManager manager to reload group metadata from file basically causing [KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals?src=breadcrumbs-parent]] to be only applicable for schema version 2. was: While reading group metadata information from ByteBuffer GroupMetadataManager reads current_state_timestamp only for group schema version 2. For all other versions this value is set to "None". Piece of code responsible for the bug: [https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412] Restarting kafka forces GroupMetadataManager manager to reload group metadata from file basically causing [KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals?src=breadcrumbs-parent]] to be only applicable for schema version 2. > GroupMetadataManager ignores current_state_timestamp field for > GROUP_METADATA_VALUE_SCHEMA_V3 > - > > Key: KAFKA-10401 > URL: https://issues.apache.org/jira/browse/KAFKA-10401 > Project: Kafka > Issue Type: Bug > Components: offset manager >Affects Versions: 2.1.1, 2.2.2, 2.4.1, 2.6.0, 2.5.1 >Reporter: Marek >Priority: Major > > While reading group metadata information from ByteBuffer GroupMetadataManager > reads current_state_timestamp only for group schema version 2. For all other > versions this value is set to "None". > Piece of code responsible for the bug: > [https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412] > Restarting kafka forces GroupMetadataManager manager to reload group metadata > from file basically causing > [KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals?src=breadcrumbs-parent]] > to be only applicable for schema version 2. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-10401) GroupMetadataManager ignores current_state_timestamp field for GROUP_METADATA_VALUE_SCHEMA_V3
[ https://issues.apache.org/jira/browse/KAFKA-10401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek updated KAFKA-10401: -- Description: While reading group metadata information from ByteBuffer GroupMetadataManager reads current_state_timestamp only for group schema version 2. For all other versions this value is set to "None". Piece of code responsible for the bug: [https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412] Restarting kafka forces GroupMetadataManager manager to reload group metadata from file basically causing [KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets]] to be only applicable for schema version 2. was: While reading group metadata information from ByteBuffer GroupMetadataManager reads current_state_timestamp only for group schema version 2. For all other versions this value is set to "None". Piece of code responsible for the bug: [https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412] Restarting kafka forces GroupMetadataManager manager to reload group metadata from file basically causing [KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals?src=breadcrumbs-parent]] to be only applicable for schema version 2. > GroupMetadataManager ignores current_state_timestamp field for > GROUP_METADATA_VALUE_SCHEMA_V3 > - > > Key: KAFKA-10401 > URL: https://issues.apache.org/jira/browse/KAFKA-10401 > Project: Kafka > Issue Type: Bug > Components: offset manager >Affects Versions: 2.1.1, 2.2.2, 2.4.1, 2.6.0, 2.5.1 >Reporter: Marek >Priority: Major > > While reading group metadata information from ByteBuffer GroupMetadataManager > reads current_state_timestamp only for group schema version 2. For all other > versions this value is set to "None". > Piece of code responsible for the bug: > [https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412] > Restarting kafka forces GroupMetadataManager manager to reload group metadata > from file basically causing > [KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets]] > to be only applicable for schema version 2. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-10401) GroupMetadataManager ignores current_state_timestamp field for GROUP_METADATA_VALUE_SCHEMA_V3
[ https://issues.apache.org/jira/browse/KAFKA-10401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek updated KAFKA-10401: -- Description: While reading group metadata information from ByteBuffer GroupMetadataManager reads current_state_timestamp only for group schema version 2. For all other versions this value is set to "None". Piece of code responsible for the bug: [https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412] Restarting kafka forces GroupMetadataManager manager to reload group metadata from file basically causing [KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals?src=breadcrumbs-parent]] to be only applicable for schema version 2. was: While reading group metadata information from ByteBuffer GroupMetadataManager reads current_state_timestamp only for group schema version 2. For all other versions this value is set to "None". Piece of code responsible for bug: [https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412] Restarting kafka forces GroupMetadataManager manager to reload group metadata from file basically causing [KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals?src=breadcrumbs-parent]] to be only applicable for schema version 2. > GroupMetadataManager ignores current_state_timestamp field for > GROUP_METADATA_VALUE_SCHEMA_V3 > - > > Key: KAFKA-10401 > URL: https://issues.apache.org/jira/browse/KAFKA-10401 > Project: Kafka > Issue Type: Bug > Components: offset manager >Affects Versions: 2.1.1, 2.2.2, 2.4.1, 2.6.0, 2.5.1 >Reporter: Marek >Priority: Major > > While reading group metadata information from ByteBuffer GroupMetadataManager > reads current_state_timestamp only for group schema version 2. For all other > versions this value is set to "None". > Piece of code responsible for the bug: > [https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412] > Restarting kafka forces GroupMetadataManager manager to reload group metadata > from file basically causing > [KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals?src=breadcrumbs-parent]] > to be only applicable for schema version 2. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10401) GroupMetadataManager ignores current_state_timestamp field for GROUP_METADATA_VALUE_SCHEMA_V3
Marek created KAFKA-10401: - Summary: GroupMetadataManager ignores current_state_timestamp field for GROUP_METADATA_VALUE_SCHEMA_V3 Key: KAFKA-10401 URL: https://issues.apache.org/jira/browse/KAFKA-10401 Project: Kafka Issue Type: Bug Components: offset manager Affects Versions: 2.5.1, 2.6.0, 2.4.1, 2.2.2, 2.1.1 Reporter: Marek While reading group metadata information from ByteBuffer GroupMetadataManager reads current_state_timestamp only for group schema version 2. For all other versions this value is set to "None". Piece of code responsible for bug: [https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412] Restarting kafka forces GroupMetadataManager manager to reload group metadata from file basically causing [KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals?src=breadcrumbs-parent]] to be only applicable for schema version 2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached
[ https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177530#comment-17177530 ] Rohan Desai commented on KAFKA-10396: - [~vmathapati] you'd have to install it, and then look for it in the lib directories. The exact steps depend on the system. For example, on debian/ubuntu you can run: {{apt install libjemalloc-dev}} Then, run {{sudo find /usr/lib/ -name libjemalloc.so}} to get the so path. > Overall memory of container keep on growing due to kafka stream / rocksdb and > OOM killed once limit reached > --- > > Key: KAFKA-10396 > URL: https://issues.apache.org/jira/browse/KAFKA-10396 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.3.1, 2.5.0 >Reporter: Vagesh Mathapati >Priority: Critical > Attachments: MyStreamProcessor.java, kafkaStreamConfig.java > > > We are observing that overall memory of our container keep on growing and > never came down. > After analysis find out that rocksdbjni.so is keep on allocating 64M chunks > of memory off-heap and never releases back. This causes OOM kill after memory > reaches configured limit. > We use Kafka stream and globalktable for our many kafka topics. > Below is our environment > * Kubernetes cluster > * openjdk 11.0.7 2020-04-14 LTS > * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS) > * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode) > * Springboot 2.3 > * spring-kafka-2.5.0 > * kafka-streams-2.5.0 > * kafka-streams-avro-serde-5.4.0 > * rocksdbjni-5.18.3 > Observed same result with kafka 2.3 version. > Below is the snippet of our analysis > from pmap output we took addresses from these 64M allocations (RSS) > Address Kbytes RSS Dirty Mode Mapping > 7f3ce800 65536 65532 65532 rw--- [ anon ] > 7f3cf400 65536 65536 65536 rw--- [ anon ] > 7f3d6400 65536 65536 65536 rw--- [ anon ] > We tried to match with memory allocation logs enabled with the help of Azul > systems team. > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff7ca0 > @ /tmp/librocksdbjni6564497922441568920.so: > _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4] > - 0x7f3ce8ff9780 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ff9750 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff97c0 > @ > /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > We also identified that content on this 64M is just 0s and no any data > present in it. > I tried to tune the rocksDB configuratino as mentioned but it did not helped. > [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config] > > Please let me know if you need any more details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached
[ https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177527#comment-17177527 ] Vagesh Mathapati commented on KAFKA-10396: -- [~desai.p.rohan] Thanks for responding. For debugging same issue i preloaded few of the so files. I can try out same steps. I am not able to find out jemalloc.so file on the above link. Will it be possible for you to point me to the location of so file. > Overall memory of container keep on growing due to kafka stream / rocksdb and > OOM killed once limit reached > --- > > Key: KAFKA-10396 > URL: https://issues.apache.org/jira/browse/KAFKA-10396 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.3.1, 2.5.0 >Reporter: Vagesh Mathapati >Priority: Critical > Attachments: MyStreamProcessor.java, kafkaStreamConfig.java > > > We are observing that overall memory of our container keep on growing and > never came down. > After analysis find out that rocksdbjni.so is keep on allocating 64M chunks > of memory off-heap and never releases back. This causes OOM kill after memory > reaches configured limit. > We use Kafka stream and globalktable for our many kafka topics. > Below is our environment > * Kubernetes cluster > * openjdk 11.0.7 2020-04-14 LTS > * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS) > * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode) > * Springboot 2.3 > * spring-kafka-2.5.0 > * kafka-streams-2.5.0 > * kafka-streams-avro-serde-5.4.0 > * rocksdbjni-5.18.3 > Observed same result with kafka 2.3 version. > Below is the snippet of our analysis > from pmap output we took addresses from these 64M allocations (RSS) > Address Kbytes RSS Dirty Mode Mapping > 7f3ce800 65536 65532 65532 rw--- [ anon ] > 7f3cf400 65536 65536 65536 rw--- [ anon ] > 7f3d6400 65536 65536 65536 rw--- [ anon ] > We tried to match with memory allocation logs enabled with the help of Azul > systems team. > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff7ca0 > @ /tmp/librocksdbjni6564497922441568920.so: > _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4] > - 0x7f3ce8ff9780 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ff9750 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ff97c0 > @ > /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] > - 0x7f3ce8ffccf0 > @ /tmp/librocksdbjni6564497922441568920.so: > _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] > - 0x7f3ce8ffcd10 > We also identified that content on this 64M is just 0s and no any data > present in it. > I tried to tune the rocksDB configuratino as mentioned but it did not helped. > [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config] > > Please let me know if you need any more details -- This message was sent by Atlassian Jira (v8.3.4#803005)