[jira] [Commented] (KAFKA-9543) Consumer offset reset after new segment rolling

2020-08-14 Thread Andrey Klochkov (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178099#comment-17178099
 ] 

Andrey Klochkov commented on KAFKA-9543:


This issue is marked as fixed in 2.4.2 and 2.5.1, with the last comment saying 
that it's fixed by KAFKA-9838, while KAFKA-9838 has a fix version of 2.6.0. Is 
this defect fixed in 2.5.1?

> Consumer offset reset after new segment rolling
> ---
>
> Key: KAFKA-9543
> URL: https://issues.apache.org/jira/browse/KAFKA-9543
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.0, 2.5.0, 2.4.1
>Reporter: Rafał Boniecki
>Priority: Major
> Fix For: 2.4.2, 2.5.1
>
> Attachments: Untitled.png, image-2020-04-06-17-10-32-636.png
>
>
> After upgrade from kafka 2.1.1 to 2.4.0, I'm experiencing unexpected consumer 
> offset resets.
> Consumer:
> {code:java}
> 2020-02-12T11:12:58.402+01:00 hostname 4a2a39a35a02 
> [2020-02-12T11:12:58,402][INFO 
> ][org.apache.kafka.clients.consumer.internals.Fetcher] [Consumer 
> clientId=logstash-1, groupId=logstash] Fetch offset 1632750575 is out of 
> range for partition stats-5, resetting offset
> {code}
> Broker:
> {code:java}
> 2020-02-12 11:12:58:400 CET INFO  
> [data-plane-kafka-request-handler-1][kafka.log.Log] [Log partition=stats-5, 
> dir=/kafka4/data] Rolled new log segment at offset 1632750565 in 2 ms.{code}
> All resets are perfectly correlated to rolling new segments at the broker - 
> segment is rolled first, then, couple of ms later, reset on the consumer 
> occurs. Attached is grafana graph with consumer lag per partition. All sudden 
> spikes in lag are offset resets due to this bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] ijuma commented on pull request #9180: MINOR: corrected unit tests

2020-08-14 Thread GitBox


ijuma commented on pull request #9180:
URL: https://github.com/apache/kafka/pull/9180#issuecomment-674304221


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KAFKA-9263) Reocurrence: Transient failure in kafka.api.PlaintextAdminIntegrationTest.testLogStartOffsetCheckpoint and kafka.api.PlaintextAdminIntegrationTest.testAlterReplicaLogDi

2020-08-14 Thread Bill Bejeck (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178021#comment-17178021
 ] 

Bill Bejeck commented on KAFKA-9263:


 

Test failure 
[https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/7843/testReport/junit/kafka.api/PlaintextAdminIntegrationTest/testAlterReplicaLogDirs/]

 
{noformat}
2020-08-14 17:34:06,420] ERROR [ReplicaManager broker=0] Error while changing 
replica dir for partition topic-0 (kafka.server.ReplicaManager:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: Error while 
fetching partition state for topic-0
[2020-08-14 17:34:06,420] ERROR [ReplicaManager broker=1] Error while changing 
replica dir for partition topic-0 (kafka.server.ReplicaManager:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: Error while 
fetching partition state for topic-0
[2020-08-14 17:34:06,420] ERROR [ReplicaManager broker=2] Error while changing 
replica dir for partition topic-0 (kafka.server.ReplicaManager:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: Error while 
fetching partition state for topic-0
[2020-08-14 17:36:24,822] ERROR [Consumer instanceId=test_instance_id_1, 
clientId=test_client_id, groupId=test_group_id] Offset commit failed on 
partition test_topic-1 at offset 0: The coordinator is not aware of this 
member. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:1191)
[2020-08-14 17:36:24,823] ERROR Thread Thread[Thread-4,5,FailOnTimeoutGroup] 
died (org.apache.zookeeper.server.NIOServerCnxnFactory:92)
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
completed since the group has already rebalanced and assigned the partitions to 
another member. This means that the time between subsequent calls to poll() was 
longer than the configured max.poll.interval.ms, which typically implies that 
the poll loop is spending too much time message processing. You can address 
this either by increasing max.poll.interval.ms or by reducing the maximum size 
of batches returned in poll() with max.poll.records.
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:1257)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:1164)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1132)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1107)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:206)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:169)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:129)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:602)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:412)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:297)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:1006)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1394)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1348)
at 
kafka.api.PlaintextAdminIntegrationTest$$anon$1.run(PlaintextAdminIntegrationTest.scala:1071)
[2020-08-14 17:36:24,848] ERROR [Consumer instanceId=test_instance_id_2, 
clientId=test_client_id, groupId=test_group_id] Offset commit failed on 
partition test_topic1-0 at offset 0: Specified group generation id is not 
valid. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:1191)
[2020-08-14 17:36:24,852] ERROR [Consumer clientId=test_client_id, 
groupId=test_group_id] Offset commit failed on partition test_topic2-0 at 
offset 0: Specified group generation id is not valid. 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:1191)
[2020-08-14 18:29:44,855] ERROR [ReplicaManager broker=1] Error while changing 
replica dir for partition topic-0 (kafka.server.ReplicaManager:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: Error while 
fetching partition state for topic-0
[2020-08-14 18:29:44,855] ERROR [ReplicaManager broker=0] 

[GitHub] [kafka] bbejeck commented on pull request #9108: KAFKA-9273: Extract testShouldAutoShutdownOnIncompleteMetadata from S…

2020-08-14 Thread GitBox


bbejeck commented on pull request #9108:
URL: https://github.com/apache/kafka/pull/9108#issuecomment-674229113


   Java 8 passed
   Java 14 failed with 
`org.apache.kafka.streams.integration.PurgeRepartitionTopicIntegrationTest.shouldRestoreState`
   Java 11 failed with 
`kafka.api.PlaintextAdminIntegrationTest.testAlterReplicaLogDirs`
   
   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (KAFKA-10405) Flaky Test org.apache.kafka.streams.integration.PurgeRepartitionTopicIntegrationTest.shouldRestoreState

2020-08-14 Thread Bill Bejeck (Jira)
Bill Bejeck created KAFKA-10405:
---

 Summary: Flaky Test 
org.apache.kafka.streams.integration.PurgeRepartitionTopicIntegrationTest.shouldRestoreState
 Key: KAFKA-10405
 URL: https://issues.apache.org/jira/browse/KAFKA-10405
 Project: Kafka
  Issue Type: Test
  Components: streams
Reporter: Bill Bejeck


>From build [https://builds.apache.org/job/kafka-pr-jdk14-scala2.13/1979/]

 
{noformat}
org.apache.kafka.streams.integration.PurgeRepartitionTopicIntegrationTest > 
shouldRestoreState FAILED
14:25:19 java.lang.AssertionError: Condition not met within timeout 6. 
Repartition topic 
restore-test-KSTREAM-AGGREGATE-STATE-STORE-02-repartition not purged 
data after 6 ms.
14:25:19 at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:26)
14:25:19 at 
org.apache.kafka.test.TestUtils.lambda$waitForCondition$6(TestUtils.java:401)
14:25:19 at 
org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:449)
14:25:19 at 
org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:417)
14:25:19 at 
org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:398)
14:25:19 at 
org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:388)
14:25:19 at 
org.apache.kafka.streams.integration.PurgeRepartitionTopicIntegrationTest.shouldRestoreState(PurgeRepartitionTopicIntegrationTest.java:206){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress

2020-08-14 Thread Rajini Sivaram (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177995#comment-17177995
 ] 

Rajini Sivaram commented on KAFKA-10158:


[~bbyrne] We can close this now, right?

> Fix flaky 
> kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
> ---
>
> Key: KAFKA-10158
> URL: https://issues.apache.org/jira/browse/KAFKA-10158
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: Chia-Ping Tsai
>Assignee: Brian Byrne
>Priority: Minor
> Fix For: 2.7.0
>
>
> Altering the assignments is a async request so it is possible that the 
> reassignment is still in progress when we start to verify the 
> "under-replicated-partitions". In order to make it stable, it needs a wait 
> for the reassignment completion before verifying the topic command with 
> "under-replicated-partitions".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached

2020-08-14 Thread Vagesh Mathapati (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177994#comment-17177994
 ] 

Vagesh Mathapati commented on KAFKA-10396:
--

In my microservice i am using globalktable for 5 diff topic. So the 
configuration of rocksDB which i am setting is per ktable/kstream right? 

> Overall memory of container keep on growing due to kafka stream / rocksdb and 
> OOM killed once limit reached
> ---
>
> Key: KAFKA-10396
> URL: https://issues.apache.org/jira/browse/KAFKA-10396
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 2.5.0
>Reporter: Vagesh Mathapati
>Priority: Critical
> Attachments: CustomRocksDBConfig.java, MyStreamProcessor.java, 
> kafkaStreamConfig.java
>
>
> We are observing that overall memory of our container keep on growing and 
> never came down.
> After analysis find out that rocksdbjni.so is keep on allocating 64M chunks 
> of memory off-heap and never releases back. This causes OOM kill after memory 
> reaches configured limit.
> We use Kafka stream and globalktable for our many kafka topics.
> Below is our environment
>  * Kubernetes cluster
>  * openjdk 11.0.7 2020-04-14 LTS
>  * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS)
>  * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode)
>  * Springboot 2.3
>  * spring-kafka-2.5.0
>  * kafka-streams-2.5.0
>  * kafka-streams-avro-serde-5.4.0
>  * rocksdbjni-5.18.3
> Observed same result with kafka 2.3 version.
> Below is the snippet of our analysis
> from pmap output we took addresses from these 64M allocations (RSS)
> Address Kbytes RSS Dirty Mode Mapping
> 7f3ce800 65536 65532 65532 rw--- [ anon ]
> 7f3cf400 65536 65536 65536 rw--- [ anon ]
> 7f3d6400 65536 65536 65536 rw--- [ anon ]
> We tried to match with memory allocation logs enabled with the help of Azul 
> systems team.
> @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff7ca0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4]
>  - 0x7f3ce8ff9780
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ff9750
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff97c0
>  @ 
> /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
> We also identified that content on this 64M is just 0s and no any data 
> present in it.
> I tried to tune the rocksDB configuratino as mentioned but it did not helped. 
> [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config]
>  
> Please let me know if you need any more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached

2020-08-14 Thread Vagesh Mathapati (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177982#comment-17177982
 ] 

Vagesh Mathapati commented on KAFKA-10396:
--

[~desai.p.rohan] Thanks a lot . Looks like If i used WriteBufferManager config 
memory then memory does not grow after some limit. 

Attaching [^CustomRocksDBConfig.java] for reference . 

I can see some documentation and example related to it at 
[https://kafka.apache.org/23/documentation/streams/developer-guide/memory-mgmt.html]
 

confluent documentation i think needs to be updated. 

Still need to test more to confirm but initial results looks promising.

 

I am not able to come up with optimized number. Generally what numbers we 
should keep ?

> Overall memory of container keep on growing due to kafka stream / rocksdb and 
> OOM killed once limit reached
> ---
>
> Key: KAFKA-10396
> URL: https://issues.apache.org/jira/browse/KAFKA-10396
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 2.5.0
>Reporter: Vagesh Mathapati
>Priority: Critical
> Attachments: CustomRocksDBConfig.java, MyStreamProcessor.java, 
> kafkaStreamConfig.java
>
>
> We are observing that overall memory of our container keep on growing and 
> never came down.
> After analysis find out that rocksdbjni.so is keep on allocating 64M chunks 
> of memory off-heap and never releases back. This causes OOM kill after memory 
> reaches configured limit.
> We use Kafka stream and globalktable for our many kafka topics.
> Below is our environment
>  * Kubernetes cluster
>  * openjdk 11.0.7 2020-04-14 LTS
>  * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS)
>  * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode)
>  * Springboot 2.3
>  * spring-kafka-2.5.0
>  * kafka-streams-2.5.0
>  * kafka-streams-avro-serde-5.4.0
>  * rocksdbjni-5.18.3
> Observed same result with kafka 2.3 version.
> Below is the snippet of our analysis
> from pmap output we took addresses from these 64M allocations (RSS)
> Address Kbytes RSS Dirty Mode Mapping
> 7f3ce800 65536 65532 65532 rw--- [ anon ]
> 7f3cf400 65536 65536 65536 rw--- [ anon ]
> 7f3d6400 65536 65536 65536 rw--- [ anon ]
> We tried to match with memory allocation logs enabled with the help of Azul 
> systems team.
> @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff7ca0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4]
>  - 0x7f3ce8ff9780
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ff9750
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff97c0
>  @ 
> /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
> We also identified that content on this 64M is just 0s and no any data 
> present in it.
> I tried to tune the rocksDB configuratino as mentioned but it did not helped. 
> [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config]
>  
> Please let me know if you need any more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] rajinisivaram opened a new pull request #9184: KAFKA-8033; Wait for NoOffsetForPartitionException in testFetchInvalidOffset

2020-08-14 Thread GitBox


rajinisivaram opened a new pull request #9184:
URL: https://github.com/apache/kafka/pull/9184


   We wait only 50ms in consumer.poll() and expect 
NoOffsetForPartitionException, but the exception is thrown only when 
initializing partition offsets after coordinator is known. Increased poll 
timeout to make test reliable.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached

2020-08-14 Thread Vagesh Mathapati (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vagesh Mathapati updated KAFKA-10396:
-
Attachment: CustomRocksDBConfig.java

> Overall memory of container keep on growing due to kafka stream / rocksdb and 
> OOM killed once limit reached
> ---
>
> Key: KAFKA-10396
> URL: https://issues.apache.org/jira/browse/KAFKA-10396
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 2.5.0
>Reporter: Vagesh Mathapati
>Priority: Critical
> Attachments: CustomRocksDBConfig.java, MyStreamProcessor.java, 
> kafkaStreamConfig.java
>
>
> We are observing that overall memory of our container keep on growing and 
> never came down.
> After analysis find out that rocksdbjni.so is keep on allocating 64M chunks 
> of memory off-heap and never releases back. This causes OOM kill after memory 
> reaches configured limit.
> We use Kafka stream and globalktable for our many kafka topics.
> Below is our environment
>  * Kubernetes cluster
>  * openjdk 11.0.7 2020-04-14 LTS
>  * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS)
>  * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode)
>  * Springboot 2.3
>  * spring-kafka-2.5.0
>  * kafka-streams-2.5.0
>  * kafka-streams-avro-serde-5.4.0
>  * rocksdbjni-5.18.3
> Observed same result with kafka 2.3 version.
> Below is the snippet of our analysis
> from pmap output we took addresses from these 64M allocations (RSS)
> Address Kbytes RSS Dirty Mode Mapping
> 7f3ce800 65536 65532 65532 rw--- [ anon ]
> 7f3cf400 65536 65536 65536 rw--- [ anon ]
> 7f3d6400 65536 65536 65536 rw--- [ anon ]
> We tried to match with memory allocation logs enabled with the help of Azul 
> systems team.
> @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff7ca0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4]
>  - 0x7f3ce8ff9780
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ff9750
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff97c0
>  @ 
> /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
> We also identified that content on this 64M is just 0s and no any data 
> present in it.
> I tried to tune the rocksDB configuratino as mentioned but it did not helped. 
> [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config]
>  
> Please let me know if you need any more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-5896) Kafka Connect task threads never interrupted

2020-08-14 Thread Ewen Cheslack-Postava (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177973#comment-17177973
 ] 

Ewen Cheslack-Postava commented on KAFKA-5896:
--

[~yazgoo] It looks like this was marked "Abandoned". Reiterating my previous 
comment:

> Really my complaint is that Java's interrupt semantics are terrible and try 
>to give the feeling of preemption when it can't and therefore leaves a ton of 
>bugs and overpromised underdelivered guarantees in its wake.

It's fine to reopen if you think it needs more discussion, I just don't see a 
way to actually fix the issue – Thread.interrupt doesn't do what we'd want and 
afaik the jvm doesn't provide anything that does. So I think given those 
constraints, it's probably better to identify the connector that is behaving 
badly and work with upstream to address it.

> Kafka Connect task threads never interrupted
> 
>
> Key: KAFKA-5896
> URL: https://issues.apache.org/jira/browse/KAFKA-5896
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Nick Pillitteri
>Assignee: Nick Pillitteri
>Priority: Minor
>
> h2. Problem
> Kafka Connect tasks associated with connectors are run in their own threads. 
> When tasks are stopped or restarted, a flag is set - {{stopping}} - to 
> indicate the task should stop processing records. However, if the thread the 
> task is running in is blocked (waiting for a lock or performing I/O) it's 
> possible the task will never stop.
> I've created a connector specifically to demonstrate this issue (along with 
> some more detailed instructions for reproducing the issue): 
> https://github.com/smarter-travel-media/hang-connector
> I believe this is an issue because it means that a single badly behaved 
> connector (any connector that does I/O without timeouts) can cause the Kafka 
> Connect worker to get into a state where the only solution is to restart the 
> JVM.
> I think, but couldn't reproduce, that this is the cause of this problem on 
> Stack Overflow: 
> https://stackoverflow.com/questions/43802156/inconsistent-connector-state-connectexception-task-already-exists-in-this-work
> h2. Expected Result
> I would expect the Worker to eventually interrupt the thread that the task is 
> running in. In the past across various other libraries, this is what I've 
> seen done when a thread needs to be forcibly stopped.
> h2. Actual Result
> In actuality, the Worker sets a {{stopping}} flag and lets the thread run 
> indefinitely. It uses a timeout while waiting for the task to stop but after 
> this timeout has expired it simply sets a {{cancelled}} flag. This means that 
> every time a task is restarted, a new thread running the task will be 
> created. Thus a task may end up with multiple instances all running in their 
> own threads when there's only supposed to be a single thread.
> h2. Steps to Reproduce
> The problem can be replicated by using the connector available here: 
> https://github.com/smarter-travel-media/hang-connector
> Apologies for how involved the steps are.
> I've created a patch that forcibly interrupts threads after they fail to 
> gracefully shutdown here: 
> https://github.com/smarter-travel-media/kafka/commit/295c747a9fd82ee8b30556c89c31e0bfcce5a2c5
> I've confirmed that this fixes the issue. I can add some unit tests and 
> submit a PR if people agree that this is a bug and interrupting threads is 
> the right fix.
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-8033) Flaky Test PlaintextConsumerTest#testFetchInvalidOffset

2020-08-14 Thread Rajini Sivaram (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram reassigned KAFKA-8033:
-

Assignee: Rajini Sivaram

> Flaky Test PlaintextConsumerTest#testFetchInvalidOffset
> ---
>
> Key: KAFKA-8033
> URL: https://issues.apache.org/jira/browse/KAFKA-8033
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Rajini Sivaram
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.7.0, 2.6.1
>
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/2829/testReport/junit/kafka.api/PlaintextConsumerTest/testFetchInvalidOffset/]
> {quote}org.scalatest.junit.JUnitTestFailedError: Expected exception 
> org.apache.kafka.clients.consumer.NoOffsetForPartitionException to be thrown, 
> but no exception was thrown{quote}
> STDOUT prints this over and over again:
> {quote}[2019-03-02 04:01:25,576] ERROR [ReplicaFetcher replicaId=0, 
> leaderId=1, fetcherId=0] Error for partition __consumer_offsets-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76){quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] ijuma commented on pull request #9067: MINOR: Streams integration tests should not call exit

2020-08-14 Thread GitBox


ijuma commented on pull request #9067:
URL: https://github.com/apache/kafka/pull/9067#issuecomment-674193106


   @mjsax did you run checkstyle and spotBugs before pushing the 2.6 change? It 
seems like it broke the build:
   
   https://ci-builds.apache.org/job/Kafka/job/kafka-2.6-jdk8/3/console



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] rajinisivaram opened a new pull request #9183: KAFKA-10404; Use higher poll timeout to avoid rebalance in testCoordinatorFailover

2020-08-14 Thread GitBox


rajinisivaram opened a new pull request #9183:
URL: https://github.com/apache/kafka/pull/9183


   Tests use 6s poll timeout, which isn't sufficient to ensure that clients 
don't leave the group due to poll timeout. The PR increases poll timeout to 
15s. Session timeout of 5s is also low, but leaving it as-is for now with lower 
heartbeat timeout to make sure we catch unexpected rebalances during the 
failover.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (KAFKA-10404) Flaky Test kafka.api.SaslSslConsumerTest.testCoordinatorFailover

2020-08-14 Thread Rajini Sivaram (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram reassigned KAFKA-10404:
--

Assignee: Rajini Sivaram

> Flaky Test kafka.api.SaslSslConsumerTest.testCoordinatorFailover
> 
>
> Key: KAFKA-10404
> URL: https://issues.apache.org/jira/browse/KAFKA-10404
> Project: Kafka
>  Issue Type: Test
>  Components: core, unit tests
>Reporter: Bill Bejeck
>Assignee: Rajini Sivaram
>Priority: Major
>
> From build [https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3829/]
>  
> {noformat}
> kafka.api.SaslSslConsumerTest > testCoordinatorFailover FAILED
> 11:27:15 java.lang.AssertionError: expected: but 
> was: commit cannot be completed since the consumer is not part of an active group 
> for auto partition assignment; it is likely that the consumer was kicked out 
> of the group.)>
> 11:27:15 at org.junit.Assert.fail(Assert.java:89)
> 11:27:15 at org.junit.Assert.failNotEquals(Assert.java:835)
> 11:27:15 at org.junit.Assert.assertEquals(Assert.java:120)
> 11:27:15 at org.junit.Assert.assertEquals(Assert.java:146)
> 11:27:15 at 
> kafka.api.AbstractConsumerTest.sendAndAwaitAsyncCommit(AbstractConsumerTest.scala:195)
> 11:27:15 at 
> kafka.api.AbstractConsumerTest.ensureNoRebalance(AbstractConsumerTest.scala:302)
> 11:27:15 at 
> kafka.api.BaseConsumerTest.testCoordinatorFailover(BaseConsumerTest.scala:76)
> 11:27:15 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> 11:27:15 at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 11:27:15 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 11:27:15 at java.lang.reflect.Method.invoke(Method.java:498)
> 11:27:15 at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 11:27:15 at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 11:27:15 at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 11:27:15 at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 11:27:15 at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 11:27:15 at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 11:27:15 at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> 11:27:15 at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> 11:27:15 at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> 11:27:15 at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> 11:27:15 at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> 11:27:15 at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> 11:27:15 at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> 11:27:15 at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> 11:27:15 at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> 11:27:15 at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> 11:27:15 at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 11:27:15 at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 11:27:15 at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> 11:27:15 at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> 11:27:15 at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
> 11:27:15 at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
> 11:27:15 at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
> 11:27:15 at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
> 11:27:15 at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
> 11:27:15 at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown 
> Source)
> 11:27:15 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 11:27:15 at java.lang.reflect.Method.invoke(Method.java:498)
> 11:27:15 at 
> 

[GitHub] [kafka] bbejeck commented on pull request #9108: KAFKA-9273: Extract testShouldAutoShutdownOnIncompleteMetadata from S…

2020-08-14 Thread GitBox


bbejeck commented on pull request #9108:
URL: https://github.com/apache/kafka/pull/9108#issuecomment-674162578


   Java 8 failed on an unrelated test 
`kafka.api.SaslSslConsumerTest.testCoordinatorFailover` created a Jira ticket.
   
   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (KAFKA-10404) Flaky Test kafka.api.SaslSslConsumerTest.testCoordinatorFailover

2020-08-14 Thread Bill Bejeck (Jira)
Bill Bejeck created KAFKA-10404:
---

 Summary: Flaky Test 
kafka.api.SaslSslConsumerTest.testCoordinatorFailover
 Key: KAFKA-10404
 URL: https://issues.apache.org/jira/browse/KAFKA-10404
 Project: Kafka
  Issue Type: Test
  Components: core, unit tests
Reporter: Bill Bejeck


>From build [https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3829/]

 
{noformat}
kafka.api.SaslSslConsumerTest > testCoordinatorFailover FAILED
11:27:15 java.lang.AssertionError: expected: but 
was:
11:27:15 at org.junit.Assert.fail(Assert.java:89)
11:27:15 at org.junit.Assert.failNotEquals(Assert.java:835)
11:27:15 at org.junit.Assert.assertEquals(Assert.java:120)
11:27:15 at org.junit.Assert.assertEquals(Assert.java:146)
11:27:15 at 
kafka.api.AbstractConsumerTest.sendAndAwaitAsyncCommit(AbstractConsumerTest.scala:195)
11:27:15 at 
kafka.api.AbstractConsumerTest.ensureNoRebalance(AbstractConsumerTest.scala:302)
11:27:15 at 
kafka.api.BaseConsumerTest.testCoordinatorFailover(BaseConsumerTest.scala:76)
11:27:15 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
11:27:15 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
11:27:15 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
11:27:15 at java.lang.reflect.Method.invoke(Method.java:498)
11:27:15 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
11:27:15 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
11:27:15 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
11:27:15 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
11:27:15 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
11:27:15 at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
11:27:15 at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
11:27:15 at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
11:27:15 at 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
11:27:15 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
11:27:15 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
11:27:15 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
11:27:15 at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
11:27:15 at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
11:27:15 at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
11:27:15 at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
11:27:15 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
11:27:15 at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
11:27:15 at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
11:27:15 at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
11:27:15 at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
11:27:15 at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
11:27:15 at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
11:27:15 at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
11:27:15 at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
11:27:15 at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
11:27:15 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
11:27:15 at java.lang.reflect.Method.invoke(Method.java:498)
11:27:15 at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
11:27:15 at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
11:27:15 at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
11:27:15 at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
11:27:15 at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
11:27:15 at 

[GitHub] [kafka] ijuma commented on pull request #9182: KAFKA-10403 Replace scala collection by java collection in generating…

2020-08-14 Thread GitBox


ijuma commented on pull request #9182:
URL: https://github.com/apache/kafka/pull/9182#issuecomment-674160788


   Can you please include more detail on how it fails currently? I assume it's 
some kind of deserialization error?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] chia7712 opened a new pull request #9182: KAFKA-10403 Replace scala collection by java collection in generating…

2020-08-14 Thread GitBox


chia7712 opened a new pull request #9182:
URL: https://github.com/apache/kafka/pull/9182


   issue: https://issues.apache.org/jira/browse/KAFKA-10403
   
   It seems to me the metrics is a "kind" of public interface so users should 
be able to access metrics of kafka server without scala library.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (KAFKA-10403) Replace scala collection by java collection in generating MBeans attributes

2020-08-14 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-10403:
--

 Summary: Replace scala collection by java collection in generating 
MBeans attributes
 Key: KAFKA-10403
 URL: https://issues.apache.org/jira/browse/KAFKA-10403
 Project: Kafka
  Issue Type: Improvement
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai


It seems to me the metrics is a "kind" of public interface so users should be 
able to access metrics of kafka server without scala library.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-5896) Kafka Connect task threads never interrupted

2020-08-14 Thread yazgoo (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177821#comment-17177821
 ] 

yazgoo edited comment on KAFKA-5896 at 8/14/20, 3:11 PM:
-

I still encountered this exception in CP 5.5, shouldn't this be re-opened ? 


was (Author: yazgoo):
I still encountered this exception in CP 2.5, shouldn't this be re-opened ? 

> Kafka Connect task threads never interrupted
> 
>
> Key: KAFKA-5896
> URL: https://issues.apache.org/jira/browse/KAFKA-5896
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Nick Pillitteri
>Assignee: Nick Pillitteri
>Priority: Minor
>
> h2. Problem
> Kafka Connect tasks associated with connectors are run in their own threads. 
> When tasks are stopped or restarted, a flag is set - {{stopping}} - to 
> indicate the task should stop processing records. However, if the thread the 
> task is running in is blocked (waiting for a lock or performing I/O) it's 
> possible the task will never stop.
> I've created a connector specifically to demonstrate this issue (along with 
> some more detailed instructions for reproducing the issue): 
> https://github.com/smarter-travel-media/hang-connector
> I believe this is an issue because it means that a single badly behaved 
> connector (any connector that does I/O without timeouts) can cause the Kafka 
> Connect worker to get into a state where the only solution is to restart the 
> JVM.
> I think, but couldn't reproduce, that this is the cause of this problem on 
> Stack Overflow: 
> https://stackoverflow.com/questions/43802156/inconsistent-connector-state-connectexception-task-already-exists-in-this-work
> h2. Expected Result
> I would expect the Worker to eventually interrupt the thread that the task is 
> running in. In the past across various other libraries, this is what I've 
> seen done when a thread needs to be forcibly stopped.
> h2. Actual Result
> In actuality, the Worker sets a {{stopping}} flag and lets the thread run 
> indefinitely. It uses a timeout while waiting for the task to stop but after 
> this timeout has expired it simply sets a {{cancelled}} flag. This means that 
> every time a task is restarted, a new thread running the task will be 
> created. Thus a task may end up with multiple instances all running in their 
> own threads when there's only supposed to be a single thread.
> h2. Steps to Reproduce
> The problem can be replicated by using the connector available here: 
> https://github.com/smarter-travel-media/hang-connector
> Apologies for how involved the steps are.
> I've created a patch that forcibly interrupts threads after they fail to 
> gracefully shutdown here: 
> https://github.com/smarter-travel-media/kafka/commit/295c747a9fd82ee8b30556c89c31e0bfcce5a2c5
> I've confirmed that this fixes the issue. I can add some unit tests and 
> submit a PR if people agree that this is a bug and interrupting threads is 
> the right fix.
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-5896) Kafka Connect task threads never interrupted

2020-08-14 Thread yazgoo (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177821#comment-17177821
 ] 

yazgoo commented on KAFKA-5896:
---

I still encountered this exception in CP 2.5, shouldn't this be re-opened ? 

> Kafka Connect task threads never interrupted
> 
>
> Key: KAFKA-5896
> URL: https://issues.apache.org/jira/browse/KAFKA-5896
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Nick Pillitteri
>Assignee: Nick Pillitteri
>Priority: Minor
>
> h2. Problem
> Kafka Connect tasks associated with connectors are run in their own threads. 
> When tasks are stopped or restarted, a flag is set - {{stopping}} - to 
> indicate the task should stop processing records. However, if the thread the 
> task is running in is blocked (waiting for a lock or performing I/O) it's 
> possible the task will never stop.
> I've created a connector specifically to demonstrate this issue (along with 
> some more detailed instructions for reproducing the issue): 
> https://github.com/smarter-travel-media/hang-connector
> I believe this is an issue because it means that a single badly behaved 
> connector (any connector that does I/O without timeouts) can cause the Kafka 
> Connect worker to get into a state where the only solution is to restart the 
> JVM.
> I think, but couldn't reproduce, that this is the cause of this problem on 
> Stack Overflow: 
> https://stackoverflow.com/questions/43802156/inconsistent-connector-state-connectexception-task-already-exists-in-this-work
> h2. Expected Result
> I would expect the Worker to eventually interrupt the thread that the task is 
> running in. In the past across various other libraries, this is what I've 
> seen done when a thread needs to be forcibly stopped.
> h2. Actual Result
> In actuality, the Worker sets a {{stopping}} flag and lets the thread run 
> indefinitely. It uses a timeout while waiting for the task to stop but after 
> this timeout has expired it simply sets a {{cancelled}} flag. This means that 
> every time a task is restarted, a new thread running the task will be 
> created. Thus a task may end up with multiple instances all running in their 
> own threads when there's only supposed to be a single thread.
> h2. Steps to Reproduce
> The problem can be replicated by using the connector available here: 
> https://github.com/smarter-travel-media/hang-connector
> Apologies for how involved the steps are.
> I've created a patch that forcibly interrupts threads after they fail to 
> gracefully shutdown here: 
> https://github.com/smarter-travel-media/kafka/commit/295c747a9fd82ee8b30556c89c31e0bfcce5a2c5
> I've confirmed that this fixes the issue. I can add some unit tests and 
> submit a PR if people agree that this is a bug and interrupting threads is 
> the right fix.
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] rajinisivaram opened a new pull request #9181: KAFKA-9516; Increase timeout in testNonBlockingProducer to make it more reliable

2020-08-14 Thread GitBox


rajinisivaram opened a new pull request #9181:
URL: https://github.com/apache/kafka/pull/9181


   The test has been timing out occasionally and it is on the first send on a 
producer, so increasing timeout to 30s similar to some of the other timeouts in 
BaseProducerSendTest.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (KAFKA-9516) Flaky Test PlaintextProducerSendTest#testNonBlockingProducer

2020-08-14 Thread Rajini Sivaram (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram reassigned KAFKA-9516:
-

Assignee: Rajini Sivaram

> Flaky Test PlaintextProducerSendTest#testNonBlockingProducer
> 
>
> Key: KAFKA-9516
> URL: https://issues.apache.org/jira/browse/KAFKA-9516
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer , tools, unit tests
>Reporter: Matthias J. Sax
>Assignee: Rajini Sivaram
>Priority: Critical
>  Labels: flaky-test
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/4521/testReport/junit/kafka.api/PlaintextProducerSendTest/testNonBlockingProducer/]
> {quote}java.util.concurrent.TimeoutException: Timeout after waiting for 1 
> ms. at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:78)
>  at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30)
>  at 
> kafka.api.PlaintextProducerSendTest.verifySendSuccess$1(PlaintextProducerSendTest.scala:148)
>  at 
> kafka.api.PlaintextProducerSendTest.testNonBlockingProducer(PlaintextProducerSendTest.scala:172){quote}
> {quote}
> h3. Standard Output
> [2020-02-06 03:35:27,912] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition topic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76) 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition. [2020-02-06 03:35:50,812] ERROR 
> [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition 
> topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition. [2020-02-06 03:35:51,015] ERROR 
> [ReplicaManager broker=0] Error processing append operation on partition 
> topic-0 (kafka.server.ReplicaManager:76) 
> org.apache.kafka.common.errors.InvalidTimestampException: One or more records 
> have been rejected due to invalid timestamp [2020-02-06 03:35:51,027] ERROR 
> [ReplicaManager broker=0] Error processing append operation on partition 
> topic-0 (kafka.server.ReplicaManager:76) 
> org.apache.kafka.common.errors.InvalidTimestampException: One or more records 
> have been rejected due to invalid timestamp [2020-02-06 03:35:53,127] ERROR 
> [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition 
> topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition. [2020-02-06 03:35:58,617] ERROR 
> [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition 
> topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition. [2020-02-06 03:36:01,843] ERROR 
> [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition 
> topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition. [2020-02-06 03:36:05,111] ERROR 
> [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition 
> topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition. [2020-02-06 03:36:08,383] ERROR 
> [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition 
> topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition. [2020-02-06 03:36:08,383] ERROR 
> [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition 
> topic-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition. [2020-02-06 03:36:12,582] ERROR 
> [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition 
> topic-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition. [2020-02-06 03:36:12,582] ERROR 
> [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition 
> topic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition. [2020-02-06 03:36:15,902] ERROR 
> [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition 
> 

[GitHub] [kafka] mumrah commented on a change in pull request #9100: Add AlterISR RPC and use it for ISR modifications

2020-08-14 Thread GitBox


mumrah commented on a change in pull request #9100:
URL: https://github.com/apache/kafka/pull/9100#discussion_r470678123



##
File path: core/src/main/scala/kafka/cluster/Partition.scala
##
@@ -255,6 +255,10 @@ class Partition(val topicPartition: TopicPartition,
 
   def isAddingReplica(replicaId: Int): Boolean = 
assignmentState.isAddingReplica(replicaId)
 
+  // For advancing the HW we assume the largest ISR even if the controller 
hasn't made the change yet
+  // This set includes the latest ISR (as we learned from LeaderAndIsr) and 
any replicas from a pending ISR expansion
+  def effectiveInSyncReplicaIds: Set[Int] = inSyncReplicaIds | 
pendingInSyncReplicaIds

Review comment:
   Since we are now only allowing one in-flight AlterIsr, I changed the 
semantics of pendingInSyncReplicaIds to be the maximal "effective" ISR. This 
way we don't need to compute it each time.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] mumrah commented on a change in pull request #9100: Add AlterISR RPC and use it for ISR modifications

2020-08-14 Thread GitBox


mumrah commented on a change in pull request #9100:
URL: https://github.com/apache/kafka/pull/9100#discussion_r470675300



##
File path: core/src/main/scala/kafka/server/AlterIsrChannelManager.scala
##
@@ -0,0 +1,132 @@
+package kafka.server
+
+import java.util
+import java.util.concurrent.{ScheduledFuture, TimeUnit}
+import java.util.concurrent.atomic.AtomicLong
+
+import kafka.api.LeaderAndIsr
+import kafka.metrics.KafkaMetricsGroup
+import kafka.utils.{Logging, Scheduler}
+import kafka.zk.KafkaZkClient
+import org.apache.kafka.clients.ClientResponse
+import org.apache.kafka.common.TopicPartition
+import 
org.apache.kafka.common.message.AlterIsrRequestData.{AlterIsrRequestPartitions, 
AlterIsrRequestTopics}
+import org.apache.kafka.common.message.{AlterIsrRequestData, 
AlterIsrResponseData}
+import org.apache.kafka.common.protocol.Errors
+import org.apache.kafka.common.requests.{AlterIsrRequest, AlterIsrResponse}
+
+import scala.collection.mutable
+import scala.jdk.CollectionConverters._
+
+/**
+ * Handles the sending of AlterIsr requests to the controller. Updating the 
ISR is an asynchronous operation,
+ * so partitions will learn about updates through LeaderAndIsr messages sent 
from the controller
+ */
+trait AlterIsrChannelManager {
+  val IsrChangePropagationBlackOut = 5000L
+  val IsrChangePropagationInterval = 6L
+
+  def enqueueIsrUpdate(alterIsrItem: AlterIsrItem): Unit
+
+  def clearPending(topicPartition: TopicPartition): Unit
+
+  def startup(): Unit
+
+  def shutdown(): Unit
+}
+
+case class AlterIsrItem(topicPartition: TopicPartition, leaderAndIsr: 
LeaderAndIsr)
+
+class AlterIsrChannelManagerImpl(val controllerChannelManager: 
BrokerToControllerChannelManager,
+ val zkClient: KafkaZkClient,
+ val scheduler: Scheduler,
+ val brokerId: Int,
+ val brokerEpoch: Long) extends 
AlterIsrChannelManager with Logging with KafkaMetricsGroup {
+
+  private val pendingIsrUpdates: mutable.Map[TopicPartition, AlterIsrItem] = 
new mutable.HashMap[TopicPartition, AlterIsrItem]()
+  private val lastIsrChangeMs = new AtomicLong(0)
+  private val lastIsrPropagationMs = new AtomicLong(0)
+
+  @volatile private var scheduledRequest: Option[ScheduledFuture[_]] = None
+
+  override def enqueueIsrUpdate(alterIsrItem: AlterIsrItem): Unit = {
+pendingIsrUpdates synchronized {
+  pendingIsrUpdates(alterIsrItem.topicPartition) = alterIsrItem
+  lastIsrChangeMs.set(System.currentTimeMillis())
+  // Rather than sending right away, we'll delay at most 50ms to allow for 
batching of ISR changes happening
+  // in fast succession
+  if (scheduledRequest.isEmpty) {
+scheduledRequest = Some(scheduler.schedule("propagate-alter-isr", 
propagateIsrChanges, 50, -1, TimeUnit.MILLISECONDS))
+  }
+}
+  }
+
+  override def clearPending(topicPartition: TopicPartition): Unit = {
+pendingIsrUpdates synchronized {
+  // when we get a new LeaderAndIsr, we clear out any pending requests
+  pendingIsrUpdates.remove(topicPartition)

Review comment:
   With the latest changes to prevent multiple in-flight requests, I don't 
think this should happen for a given partition. Even if it did, the retried 
in-flight request from BrokerToControllerRequestThread would fail on the 
controller with an old version. 
   
   I'm wondering if we even need this clearPending behavior. Since I changed 
the AlterIsr request to fire at most after 50ms, it's a narrow window between 
enqueueing an ISR update and receiving a LeaderAndIsr. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] mumrah commented on a change in pull request #9100: Add AlterISR RPC and use it for ISR modifications

2020-08-14 Thread GitBox


mumrah commented on a change in pull request #9100:
URL: https://github.com/apache/kafka/pull/9100#discussion_r470673412



##
File path: core/src/main/scala/kafka/server/AlterIsrChannelManager.scala
##
@@ -0,0 +1,132 @@
+package kafka.server
+
+import java.util
+import java.util.concurrent.{ScheduledFuture, TimeUnit}
+import java.util.concurrent.atomic.AtomicLong
+
+import kafka.api.LeaderAndIsr
+import kafka.metrics.KafkaMetricsGroup
+import kafka.utils.{Logging, Scheduler}
+import kafka.zk.KafkaZkClient
+import org.apache.kafka.clients.ClientResponse
+import org.apache.kafka.common.TopicPartition
+import 
org.apache.kafka.common.message.AlterIsrRequestData.{AlterIsrRequestPartitions, 
AlterIsrRequestTopics}
+import org.apache.kafka.common.message.{AlterIsrRequestData, 
AlterIsrResponseData}
+import org.apache.kafka.common.protocol.Errors
+import org.apache.kafka.common.requests.{AlterIsrRequest, AlterIsrResponse}
+
+import scala.collection.mutable
+import scala.jdk.CollectionConverters._
+
+/**
+ * Handles the sending of AlterIsr requests to the controller. Updating the 
ISR is an asynchronous operation,
+ * so partitions will learn about updates through LeaderAndIsr messages sent 
from the controller
+ */
+trait AlterIsrChannelManager {
+  val IsrChangePropagationBlackOut = 5000L
+  val IsrChangePropagationInterval = 6L
+
+  def enqueueIsrUpdate(alterIsrItem: AlterIsrItem): Unit
+
+  def clearPending(topicPartition: TopicPartition): Unit
+
+  def startup(): Unit
+
+  def shutdown(): Unit
+}
+
+case class AlterIsrItem(topicPartition: TopicPartition, leaderAndIsr: 
LeaderAndIsr)
+
+class AlterIsrChannelManagerImpl(val controllerChannelManager: 
BrokerToControllerChannelManager,
+ val zkClient: KafkaZkClient,
+ val scheduler: Scheduler,
+ val brokerId: Int,
+ val brokerEpoch: Long) extends 
AlterIsrChannelManager with Logging with KafkaMetricsGroup {

Review comment:
   Fixed this by adding getBrokerEpoch to KafkaZkClient





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (KAFKA-10402) Upgrade python version in system tests

2020-08-14 Thread Nikolay Izhikov (Jira)
Nikolay Izhikov created KAFKA-10402:
---

 Summary: Upgrade python version in system tests
 Key: KAFKA-10402
 URL: https://issues.apache.org/jira/browse/KAFKA-10402
 Project: Kafka
  Issue Type: Improvement
Reporter: Nikolay Izhikov
Assignee: Nikolay Izhikov


Currently, system tests using python 2 which is outdated and not supported.

Since all dependency of system tests including ducktape supporting python 3 we 
can migrate system tests to python3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] bbejeck commented on pull request #9108: KAFKA-9273: Extract testShouldAutoShutdownOnIncompleteMetadata from S…

2020-08-14 Thread GitBox


bbejeck commented on pull request #9108:
URL: https://github.com/apache/kafka/pull/9108#issuecomment-674091814


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] sanketfajage opened a new pull request #9180: MINOR: corrected unit tests

2020-08-14 Thread GitBox


sanketfajage opened a new pull request #9180:
URL: https://github.com/apache/kafka/pull/9180


   Corrected unit tests. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] jeqo commented on a change in pull request #9137: KAFKA-9929: Support reverse iterator on KeyValueStore

2020-08-14 Thread GitBox


jeqo commented on a change in pull request #9137:
URL: https://github.com/apache/kafka/pull/9137#discussion_r470536034



##
File path: 
streams/src/test/java/org/apache/kafka/streams/state/internals/AbstractKeyValueStoreTest.java
##
@@ -188,7 +188,55 @@ public void testPutGetRange() {
 }
 
 @Test
-public void testPutGetRangeWithDefaultSerdes() {
+public void testPutGetReverseRange() {

Review comment:
   Just realized that, I also thought that path was tested. Good catch!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached

2020-08-14 Thread Rohan Desai (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177610#comment-17177610
 ] 

Rohan Desai commented on KAFKA-10396:
-

In addition to configuring the write buffer manager, make sure you close the 
iterator you open in 

{{getDefaultDataFromStore}}

> Overall memory of container keep on growing due to kafka stream / rocksdb and 
> OOM killed once limit reached
> ---
>
> Key: KAFKA-10396
> URL: https://issues.apache.org/jira/browse/KAFKA-10396
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 2.5.0
>Reporter: Vagesh Mathapati
>Priority: Critical
> Attachments: MyStreamProcessor.java, kafkaStreamConfig.java
>
>
> We are observing that overall memory of our container keep on growing and 
> never came down.
> After analysis find out that rocksdbjni.so is keep on allocating 64M chunks 
> of memory off-heap and never releases back. This causes OOM kill after memory 
> reaches configured limit.
> We use Kafka stream and globalktable for our many kafka topics.
> Below is our environment
>  * Kubernetes cluster
>  * openjdk 11.0.7 2020-04-14 LTS
>  * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS)
>  * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode)
>  * Springboot 2.3
>  * spring-kafka-2.5.0
>  * kafka-streams-2.5.0
>  * kafka-streams-avro-serde-5.4.0
>  * rocksdbjni-5.18.3
> Observed same result with kafka 2.3 version.
> Below is the snippet of our analysis
> from pmap output we took addresses from these 64M allocations (RSS)
> Address Kbytes RSS Dirty Mode Mapping
> 7f3ce800 65536 65532 65532 rw--- [ anon ]
> 7f3cf400 65536 65536 65536 rw--- [ anon ]
> 7f3d6400 65536 65536 65536 rw--- [ anon ]
> We tried to match with memory allocation logs enabled with the help of Azul 
> systems team.
> @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff7ca0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4]
>  - 0x7f3ce8ff9780
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ff9750
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff97c0
>  @ 
> /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
> We also identified that content on this 64M is just 0s and no any data 
> present in it.
> I tried to tune the rocksDB configuratino as mentioned but it did not helped. 
> [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config]
>  
> Please let me know if you need any more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached

2020-08-14 Thread Vagesh Mathapati (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177579#comment-17177579
 ] 

Vagesh Mathapati commented on KAFKA-10396:
--

I can see below output for pmap -X

7f806e181000 r-xp  08:90 13 540 204 110 204 0 0 0 libjemalloc.so
 7f806e208000 ---p 00087000 08:90 13 2044 0 0 0 0 0 0 libjemalloc.so
 7f806e407000 r--p 00086000 08:90 13 24 24 20 24 16 0 0 libjemalloc.so
 7f806e40d000 rw-p 0008c000 08:90 13 4 4 4 4 4 0 0 libjemalloc.so

> Overall memory of container keep on growing due to kafka stream / rocksdb and 
> OOM killed once limit reached
> ---
>
> Key: KAFKA-10396
> URL: https://issues.apache.org/jira/browse/KAFKA-10396
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 2.5.0
>Reporter: Vagesh Mathapati
>Priority: Critical
> Attachments: MyStreamProcessor.java, kafkaStreamConfig.java
>
>
> We are observing that overall memory of our container keep on growing and 
> never came down.
> After analysis find out that rocksdbjni.so is keep on allocating 64M chunks 
> of memory off-heap and never releases back. This causes OOM kill after memory 
> reaches configured limit.
> We use Kafka stream and globalktable for our many kafka topics.
> Below is our environment
>  * Kubernetes cluster
>  * openjdk 11.0.7 2020-04-14 LTS
>  * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS)
>  * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode)
>  * Springboot 2.3
>  * spring-kafka-2.5.0
>  * kafka-streams-2.5.0
>  * kafka-streams-avro-serde-5.4.0
>  * rocksdbjni-5.18.3
> Observed same result with kafka 2.3 version.
> Below is the snippet of our analysis
> from pmap output we took addresses from these 64M allocations (RSS)
> Address Kbytes RSS Dirty Mode Mapping
> 7f3ce800 65536 65532 65532 rw--- [ anon ]
> 7f3cf400 65536 65536 65536 rw--- [ anon ]
> 7f3d6400 65536 65536 65536 rw--- [ anon ]
> We tried to match with memory allocation logs enabled with the help of Azul 
> systems team.
> @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff7ca0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4]
>  - 0x7f3ce8ff9780
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ff9750
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff97c0
>  @ 
> /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
> We also identified that content on this 64M is just 0s and no any data 
> present in it.
> I tried to tune the rocksDB configuratino as mentioned but it did not helped. 
> [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config]
>  
> Please let me know if you need any more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka] chia7712 commented on pull request #9162: MINOR: refactor Log to get rid of "return" in nested anonymous function

2020-08-14 Thread GitBox


chia7712 commented on pull request #9162:
URL: https://github.com/apache/kafka/pull/9162#issuecomment-673942147


   only ```shouldUpgradeFromEosAlphaToEosBeta``` fails. retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached

2020-08-14 Thread Rohan Desai (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177577#comment-17177577
 ] 

Rohan Desai commented on KAFKA-10396:
-

It may also be a different problem. Something else to try would be to configure 
the write buffer manager from your rocksdb config setter (I don't think the 
example in the streams doc does this). You could take a look at the ksqlDB 
config-setter as an example:

[https://github.com/confluentinc/ksql/blob/master/ksqldb-rocksdb-config-setter/src/main/java/io/confluent/ksql/rocksdb/KsqlBoundedMemoryRocksDBConfigSetter.java]

> Overall memory of container keep on growing due to kafka stream / rocksdb and 
> OOM killed once limit reached
> ---
>
> Key: KAFKA-10396
> URL: https://issues.apache.org/jira/browse/KAFKA-10396
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 2.5.0
>Reporter: Vagesh Mathapati
>Priority: Critical
> Attachments: MyStreamProcessor.java, kafkaStreamConfig.java
>
>
> We are observing that overall memory of our container keep on growing and 
> never came down.
> After analysis find out that rocksdbjni.so is keep on allocating 64M chunks 
> of memory off-heap and never releases back. This causes OOM kill after memory 
> reaches configured limit.
> We use Kafka stream and globalktable for our many kafka topics.
> Below is our environment
>  * Kubernetes cluster
>  * openjdk 11.0.7 2020-04-14 LTS
>  * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS)
>  * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode)
>  * Springboot 2.3
>  * spring-kafka-2.5.0
>  * kafka-streams-2.5.0
>  * kafka-streams-avro-serde-5.4.0
>  * rocksdbjni-5.18.3
> Observed same result with kafka 2.3 version.
> Below is the snippet of our analysis
> from pmap output we took addresses from these 64M allocations (RSS)
> Address Kbytes RSS Dirty Mode Mapping
> 7f3ce800 65536 65532 65532 rw--- [ anon ]
> 7f3cf400 65536 65536 65536 rw--- [ anon ]
> 7f3d6400 65536 65536 65536 rw--- [ anon ]
> We tried to match with memory allocation logs enabled with the help of Azul 
> systems team.
> @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff7ca0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4]
>  - 0x7f3ce8ff9780
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ff9750
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff97c0
>  @ 
> /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
> We also identified that content on this 64M is just 0s and no any data 
> present in it.
> I tried to tune the rocksDB configuratino as mentioned but it did not helped. 
> [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config]
>  
> Please let me know if you need any more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached

2020-08-14 Thread Rohan Desai (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177574#comment-17177574
 ] 

Rohan Desai commented on KAFKA-10396:
-

[~vmathapati] it should show up on your pmap. For example:

 

 
{code:java}
sudo pmap -X 13799
13799:   ./a.out
 Address Perm   Offset Device   Inode Size Rss Pss 
Referenced Anonymous ShmemPmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss 
Locked ProtectionKey Mapping
...
7fa5c36da000 r--p  103:02  274900   28  28  28 
28 0  0  0   00   0  0  
   0 libjemalloc.so.2
7fa5c36e1000 r-xp 7000 103:02  274900  608 396 396
396 0  0  0   00   0  0 
0 libjemalloc.so.2
7fa5c3779000 r--p 0009f000 103:02  274900   64  64  64 
64 0  0  0   00   0  0  
   0 libjemalloc.so.2
7fa5c3789000 ---p 000af000 103:02  2749004   0   0  
0 0  0  0   00   0  0   
  0 libjemalloc.so.2
7fa5c378a000 r--p 000af000 103:02  274900   20  20  20 
2016  0  0   00   0  0  
   0 libjemalloc.so.2
7fa5c378f000 rw-p 000b4000 103:02  2749004   4   4  
4 4  0  0   00   0  0   
  0 libjemalloc.so.2

...{code}
 

> Overall memory of container keep on growing due to kafka stream / rocksdb and 
> OOM killed once limit reached
> ---
>
> Key: KAFKA-10396
> URL: https://issues.apache.org/jira/browse/KAFKA-10396
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 2.5.0
>Reporter: Vagesh Mathapati
>Priority: Critical
> Attachments: MyStreamProcessor.java, kafkaStreamConfig.java
>
>
> We are observing that overall memory of our container keep on growing and 
> never came down.
> After analysis find out that rocksdbjni.so is keep on allocating 64M chunks 
> of memory off-heap and never releases back. This causes OOM kill after memory 
> reaches configured limit.
> We use Kafka stream and globalktable for our many kafka topics.
> Below is our environment
>  * Kubernetes cluster
>  * openjdk 11.0.7 2020-04-14 LTS
>  * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS)
>  * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode)
>  * Springboot 2.3
>  * spring-kafka-2.5.0
>  * kafka-streams-2.5.0
>  * kafka-streams-avro-serde-5.4.0
>  * rocksdbjni-5.18.3
> Observed same result with kafka 2.3 version.
> Below is the snippet of our analysis
> from pmap output we took addresses from these 64M allocations (RSS)
> Address Kbytes RSS Dirty Mode Mapping
> 7f3ce800 65536 65532 65532 rw--- [ anon ]
> 7f3cf400 65536 65536 65536 rw--- [ anon ]
> 7f3d6400 65536 65536 65536 rw--- [ anon ]
> We tried to match with memory allocation logs enabled with the help of Azul 
> systems team.
> @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff7ca0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4]
>  - 0x7f3ce8ff9780
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ff9750
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff97c0
>  @ 
> /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> 

[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached

2020-08-14 Thread Vagesh Mathapati (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177573#comment-17177573
 ] 

Vagesh Mathapati commented on KAFKA-10396:
--

Is there anything which i can check to see if really jemalloc coming into 
picture.

> Overall memory of container keep on growing due to kafka stream / rocksdb and 
> OOM killed once limit reached
> ---
>
> Key: KAFKA-10396
> URL: https://issues.apache.org/jira/browse/KAFKA-10396
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 2.5.0
>Reporter: Vagesh Mathapati
>Priority: Critical
> Attachments: MyStreamProcessor.java, kafkaStreamConfig.java
>
>
> We are observing that overall memory of our container keep on growing and 
> never came down.
> After analysis find out that rocksdbjni.so is keep on allocating 64M chunks 
> of memory off-heap and never releases back. This causes OOM kill after memory 
> reaches configured limit.
> We use Kafka stream and globalktable for our many kafka topics.
> Below is our environment
>  * Kubernetes cluster
>  * openjdk 11.0.7 2020-04-14 LTS
>  * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS)
>  * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode)
>  * Springboot 2.3
>  * spring-kafka-2.5.0
>  * kafka-streams-2.5.0
>  * kafka-streams-avro-serde-5.4.0
>  * rocksdbjni-5.18.3
> Observed same result with kafka 2.3 version.
> Below is the snippet of our analysis
> from pmap output we took addresses from these 64M allocations (RSS)
> Address Kbytes RSS Dirty Mode Mapping
> 7f3ce800 65536 65532 65532 rw--- [ anon ]
> 7f3cf400 65536 65536 65536 rw--- [ anon ]
> 7f3d6400 65536 65536 65536 rw--- [ anon ]
> We tried to match with memory allocation logs enabled with the help of Azul 
> systems team.
> @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff7ca0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4]
>  - 0x7f3ce8ff9780
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ff9750
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff97c0
>  @ 
> /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
> We also identified that content on this 64M is just 0s and no any data 
> present in it.
> I tried to tune the rocksDB configuratino as mentioned but it did not helped. 
> [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config]
>  
> Please let me know if you need any more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached

2020-08-14 Thread Vagesh Mathapati (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177566#comment-17177566
 ] 

Vagesh Mathapati commented on KAFKA-10396:
--

i tried with 
[https://centos.pkgs.org/8/epel-x86_64/jemalloc-5.2.1-2.el8.x86_64.rpm.html] 
and then used rpm command as i do not have apt installer on my kubernetes 
cluster. 

I can see "/usr/lib64/libjemalloc.so.2". So, i copied only this file into my 
pod and renamed it to libjemalloc.so and set the environment as follow.

- name: LD_PRELOAD
   value: /cmservice/data/libjemalloc.so 

After the test run i still observed the same issue where memory keep on growing.

 

> Overall memory of container keep on growing due to kafka stream / rocksdb and 
> OOM killed once limit reached
> ---
>
> Key: KAFKA-10396
> URL: https://issues.apache.org/jira/browse/KAFKA-10396
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 2.5.0
>Reporter: Vagesh Mathapati
>Priority: Critical
> Attachments: MyStreamProcessor.java, kafkaStreamConfig.java
>
>
> We are observing that overall memory of our container keep on growing and 
> never came down.
> After analysis find out that rocksdbjni.so is keep on allocating 64M chunks 
> of memory off-heap and never releases back. This causes OOM kill after memory 
> reaches configured limit.
> We use Kafka stream and globalktable for our many kafka topics.
> Below is our environment
>  * Kubernetes cluster
>  * openjdk 11.0.7 2020-04-14 LTS
>  * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS)
>  * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode)
>  * Springboot 2.3
>  * spring-kafka-2.5.0
>  * kafka-streams-2.5.0
>  * kafka-streams-avro-serde-5.4.0
>  * rocksdbjni-5.18.3
> Observed same result with kafka 2.3 version.
> Below is the snippet of our analysis
> from pmap output we took addresses from these 64M allocations (RSS)
> Address Kbytes RSS Dirty Mode Mapping
> 7f3ce800 65536 65532 65532 rw--- [ anon ]
> 7f3cf400 65536 65536 65536 rw--- [ anon ]
> 7f3d6400 65536 65536 65536 rw--- [ anon ]
> We tried to match with memory allocation logs enabled with the help of Azul 
> systems team.
> @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff7ca0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4]
>  - 0x7f3ce8ff9780
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ff9750
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff97c0
>  @ 
> /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
> We also identified that content on this 64M is just 0s and no any data 
> present in it.
> I tried to tune the rocksDB configuratino as mentioned but it did not helped. 
> [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config]
>  
> Please let me know if you need any more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-6585) Consolidate duplicated logic on reset tools

2020-08-14 Thread Francesco Nobilia (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177563#comment-17177563
 ] 

Francesco Nobilia commented on KAFKA-6585:
--

[~Samuel Hawker] are you still working on it? Otherwise, I am happy to take 
over. =) 

> Consolidate duplicated logic on reset tools
> ---
>
> Key: KAFKA-6585
> URL: https://issues.apache.org/jira/browse/KAFKA-6585
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Guozhang Wang
>Assignee: Samuel Hawker
>Priority: Minor
>  Labels: newbie
>
> The consumer reset tool and streams reset tool today shares lot of common 
> logics such as resetting to a datetime etc. We can consolidate them into a 
> common class which directly depend on admin client at simply let these tools 
> to use the class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10401) GroupMetadataManager ignores current_state_timestamp field for GROUP_METADATA_VALUE_SCHEMA_V3

2020-08-14 Thread Marek (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek updated KAFKA-10401:
--
Description: 
While reading group metadata information from ByteBuffer GroupMetadataManager 
reads current_state_timestamp only for group schema version 2. For all other 
versions this value is set to "None". 

Piece of code responsible for the bug:

[https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412]

Restarting kafka forces GroupMetadataManager manager to reload group metadata 
from file basically causing 
[KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals?src=breadcrumbs-parent]]
 to be only applicable for schema version 2. 

 

 

  was:
While reading group metadata information from ByteBuffer GroupMetadataManager 
reads current_state_timestamp only for group schema version 2. For all other 
versions this value is set to "None". 

Piece of code responsible for the bug:

[https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412]

Restarting kafka forces GroupMetadataManager manager to reload group metadata 
from file basically causing 
[KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals?src=breadcrumbs-parent]]
  to be only applicable for schema version 2. 

 

 


> GroupMetadataManager ignores current_state_timestamp field for 
> GROUP_METADATA_VALUE_SCHEMA_V3
> -
>
> Key: KAFKA-10401
> URL: https://issues.apache.org/jira/browse/KAFKA-10401
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 2.1.1, 2.2.2, 2.4.1, 2.6.0, 2.5.1
>Reporter: Marek
>Priority: Major
>
> While reading group metadata information from ByteBuffer GroupMetadataManager 
> reads current_state_timestamp only for group schema version 2. For all other 
> versions this value is set to "None". 
> Piece of code responsible for the bug:
> [https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412]
> Restarting kafka forces GroupMetadataManager manager to reload group metadata 
> from file basically causing 
> [KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals?src=breadcrumbs-parent]]
>  to be only applicable for schema version 2. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10401) GroupMetadataManager ignores current_state_timestamp field for GROUP_METADATA_VALUE_SCHEMA_V3

2020-08-14 Thread Marek (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek updated KAFKA-10401:
--
Description: 
While reading group metadata information from ByteBuffer GroupMetadataManager 
reads current_state_timestamp only for group schema version 2. For all other 
versions this value is set to "None". 

Piece of code responsible for the bug:

[https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412]

Restarting kafka forces GroupMetadataManager manager to reload group metadata 
from file basically causing 
[KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets]]
 to be only applicable for schema version 2. 

 

 

  was:
While reading group metadata information from ByteBuffer GroupMetadataManager 
reads current_state_timestamp only for group schema version 2. For all other 
versions this value is set to "None". 

Piece of code responsible for the bug:

[https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412]

Restarting kafka forces GroupMetadataManager manager to reload group metadata 
from file basically causing 
[KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals?src=breadcrumbs-parent]]
 to be only applicable for schema version 2. 

 

 


> GroupMetadataManager ignores current_state_timestamp field for 
> GROUP_METADATA_VALUE_SCHEMA_V3
> -
>
> Key: KAFKA-10401
> URL: https://issues.apache.org/jira/browse/KAFKA-10401
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 2.1.1, 2.2.2, 2.4.1, 2.6.0, 2.5.1
>Reporter: Marek
>Priority: Major
>
> While reading group metadata information from ByteBuffer GroupMetadataManager 
> reads current_state_timestamp only for group schema version 2. For all other 
> versions this value is set to "None". 
> Piece of code responsible for the bug:
> [https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412]
> Restarting kafka forces GroupMetadataManager manager to reload group metadata 
> from file basically causing 
> [KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets]]
>  to be only applicable for schema version 2. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10401) GroupMetadataManager ignores current_state_timestamp field for GROUP_METADATA_VALUE_SCHEMA_V3

2020-08-14 Thread Marek (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek updated KAFKA-10401:
--
Description: 
While reading group metadata information from ByteBuffer GroupMetadataManager 
reads current_state_timestamp only for group schema version 2. For all other 
versions this value is set to "None". 

Piece of code responsible for the bug:

[https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412]

Restarting kafka forces GroupMetadataManager manager to reload group metadata 
from file basically causing 
[KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals?src=breadcrumbs-parent]]
  to be only applicable for schema version 2. 

 

 

  was:
While reading group metadata information from ByteBuffer GroupMetadataManager 
reads current_state_timestamp only for group schema version 2. For all other 
versions this value is set to "None". 

Piece of code responsible for bug:

[https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412]

Restarting kafka forces GroupMetadataManager manager to reload group metadata 
from file basically causing 
[KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals?src=breadcrumbs-parent]]
  to be only applicable for schema version 2. 

 

 


> GroupMetadataManager ignores current_state_timestamp field for 
> GROUP_METADATA_VALUE_SCHEMA_V3
> -
>
> Key: KAFKA-10401
> URL: https://issues.apache.org/jira/browse/KAFKA-10401
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 2.1.1, 2.2.2, 2.4.1, 2.6.0, 2.5.1
>Reporter: Marek
>Priority: Major
>
> While reading group metadata information from ByteBuffer GroupMetadataManager 
> reads current_state_timestamp only for group schema version 2. For all other 
> versions this value is set to "None". 
> Piece of code responsible for the bug:
> [https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412]
> Restarting kafka forces GroupMetadataManager manager to reload group metadata 
> from file basically causing 
> [KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals?src=breadcrumbs-parent]]
>   to be only applicable for schema version 2. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10401) GroupMetadataManager ignores current_state_timestamp field for GROUP_METADATA_VALUE_SCHEMA_V3

2020-08-14 Thread Marek (Jira)
Marek created KAFKA-10401:
-

 Summary: GroupMetadataManager ignores current_state_timestamp 
field for GROUP_METADATA_VALUE_SCHEMA_V3
 Key: KAFKA-10401
 URL: https://issues.apache.org/jira/browse/KAFKA-10401
 Project: Kafka
  Issue Type: Bug
  Components: offset manager
Affects Versions: 2.5.1, 2.6.0, 2.4.1, 2.2.2, 2.1.1
Reporter: Marek


While reading group metadata information from ByteBuffer GroupMetadataManager 
reads current_state_timestamp only for group schema version 2. For all other 
versions this value is set to "None". 

Piece of code responsible for bug:

[https://github.com/apache/kafka/blob/2.6.0/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L1412]

Restarting kafka forces GroupMetadataManager manager to reload group metadata 
from file basically causing 
[KIP-211|[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals?src=breadcrumbs-parent]]
  to be only applicable for schema version 2. 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached

2020-08-14 Thread Rohan Desai (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177530#comment-17177530
 ] 

Rohan Desai commented on KAFKA-10396:
-

[~vmathapati] you'd have to install it, and then look for it in the lib 
directories. The exact steps depend on the system. For example, on 
debian/ubuntu you can run:

{{apt install libjemalloc-dev}}

Then, run

{{sudo find /usr/lib/ -name libjemalloc.so}}

to get the so path.

 

> Overall memory of container keep on growing due to kafka stream / rocksdb and 
> OOM killed once limit reached
> ---
>
> Key: KAFKA-10396
> URL: https://issues.apache.org/jira/browse/KAFKA-10396
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 2.5.0
>Reporter: Vagesh Mathapati
>Priority: Critical
> Attachments: MyStreamProcessor.java, kafkaStreamConfig.java
>
>
> We are observing that overall memory of our container keep on growing and 
> never came down.
> After analysis find out that rocksdbjni.so is keep on allocating 64M chunks 
> of memory off-heap and never releases back. This causes OOM kill after memory 
> reaches configured limit.
> We use Kafka stream and globalktable for our many kafka topics.
> Below is our environment
>  * Kubernetes cluster
>  * openjdk 11.0.7 2020-04-14 LTS
>  * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS)
>  * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode)
>  * Springboot 2.3
>  * spring-kafka-2.5.0
>  * kafka-streams-2.5.0
>  * kafka-streams-avro-serde-5.4.0
>  * rocksdbjni-5.18.3
> Observed same result with kafka 2.3 version.
> Below is the snippet of our analysis
> from pmap output we took addresses from these 64M allocations (RSS)
> Address Kbytes RSS Dirty Mode Mapping
> 7f3ce800 65536 65532 65532 rw--- [ anon ]
> 7f3cf400 65536 65536 65536 rw--- [ anon ]
> 7f3d6400 65536 65536 65536 rw--- [ anon ]
> We tried to match with memory allocation logs enabled with the help of Azul 
> systems team.
> @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff7ca0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4]
>  - 0x7f3ce8ff9780
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ff9750
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff97c0
>  @ 
> /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
> We also identified that content on this 64M is just 0s and no any data 
> present in it.
> I tried to tune the rocksDB configuratino as mentioned but it did not helped. 
> [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config]
>  
> Please let me know if you need any more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10396) Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached

2020-08-14 Thread Vagesh Mathapati (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177527#comment-17177527
 ] 

Vagesh Mathapati commented on KAFKA-10396:
--

[~desai.p.rohan] Thanks for responding. For debugging same issue i preloaded 
few of the so files. I can try out same steps.

I am not able to find out jemalloc.so file on the above link. Will it be 
possible for you to point me to the location of so file.

> Overall memory of container keep on growing due to kafka stream / rocksdb and 
> OOM killed once limit reached
> ---
>
> Key: KAFKA-10396
> URL: https://issues.apache.org/jira/browse/KAFKA-10396
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 2.5.0
>Reporter: Vagesh Mathapati
>Priority: Critical
> Attachments: MyStreamProcessor.java, kafkaStreamConfig.java
>
>
> We are observing that overall memory of our container keep on growing and 
> never came down.
> After analysis find out that rocksdbjni.so is keep on allocating 64M chunks 
> of memory off-heap and never releases back. This causes OOM kill after memory 
> reaches configured limit.
> We use Kafka stream and globalktable for our many kafka topics.
> Below is our environment
>  * Kubernetes cluster
>  * openjdk 11.0.7 2020-04-14 LTS
>  * OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS)
>  * OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode)
>  * Springboot 2.3
>  * spring-kafka-2.5.0
>  * kafka-streams-2.5.0
>  * kafka-streams-avro-serde-5.4.0
>  * rocksdbjni-5.18.3
> Observed same result with kafka 2.3 version.
> Below is the snippet of our analysis
> from pmap output we took addresses from these 64M allocations (RSS)
> Address Kbytes RSS Dirty Mode Mapping
> 7f3ce800 65536 65532 65532 rw--- [ anon ]
> 7f3cf400 65536 65536 65536 rw--- [ anon ]
> 7f3d6400 65536 65536 65536 rw--- [ anon ]
> We tried to match with memory allocation logs enabled with the help of Azul 
> systems team.
> @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff7ca0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4]
>  - 0x7f3ce8ff9780
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ff9750
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ff97c0
>  @ 
> /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da]
>  - 0x7f3ce8ffccf0
>  @ /tmp/librocksdbjni6564497922441568920.so:
> _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741]
>  - 0x7f3ce8ffcd10
> We also identified that content on this 64M is just 0s and no any data 
> present in it.
> I tried to tune the rocksDB configuratino as mentioned but it did not helped. 
> [https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config]
>  
> Please let me know if you need any more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)