[ 
https://issues.apache.org/jira/browse/KAFKA-8165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Sophie Blee-Goldman resolved KAFKA-8165.
-------------------------------------------
    Resolution: Fixed

> Streams task causes Out Of Memory after connection issues and store 
> restoration
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-8165
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8165
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.1.0
>         Environment: 3 nodes, 22 topics, 16 partitions per topic, 1 window 
> store, 4 KV stores. 
> Kafka Streams application cluster: 3 AWS t2.large instances (8GB mem). 1 
> application instance, 2 threads per instance.
> Kafka 2.1, Kafka Streams 2.1
> Amazon Linux.
> Scala application, on Docker based on openJdk9. 
>            Reporter: Di Campo
>            Priority: Major
>
> Having a Kafka Streams 2.1 application, when Kafka brokers are stable, the 
> (largely stateful) application has been consuming ~160 messages per second at 
> a sustained rate for several hours. 
> However it started having connection issues to the brokers. 
> {code:java}
> Connection to node 3 (/172.31.36.118:9092) could not be established. Broker 
> may not be available. (org.apache.kafka.clients.NetworkClient){code}
> Also it began showing a lot of these errors: 
> {code:java}
> WARN [Consumer 
> clientId=stream-processor-81e1ce17-1765-49f8-9b44-117f983a2d19-StreamThread-2-consumer,
>  groupId=stream-processor] 1 partitions have leader brokers without a 
> matching listener, including [broker-2-health-check-0] 
> (org.apache.kafka.clients.NetworkClient){code}
> In fact, the _health-check_ topic is in the broker but not consumed by this 
> topology or used in any way by the Streams application (it is just broker 
> healthcheck). It does not complain about topics that are actually consumed by 
> the topology. 
> Some time after these errors (that appear at a rate of 24 appearances per 
> second during ~5 minutes), then the following logs appear: 
> {code:java}
> [2019-03-27 15:14:47,709] WARN [Consumer 
> clientId=stream-processor-81e1ce17-1765-49f8-9b44-117f983a2d19-StreamThread-1-restore-consumer,
>  groupId=] Connection to node -3 (/ip3:9092) could not be established. Broker 
> may not be available. (org.apache.kafka.clients.NetworkClient){code}
> In between 6 and then 3 lines of "Connection could not be established" error 
> messages, 3 of these ones slipped in: 
> {code:java}
> [2019-03-27 15:14:47,723] WARN Started Restoration of visitorCustomerStore 
> partition 15 total records to be restored 17 
> (com.divvit.dp.streams.applications.monitors.ConsoleGlobalRestoreListener){code}
>  
>  ... one for each different KV store I have (I still have another KV that 
> does not appear, and a WindowedStore store that also does not appear). 
>  Then I finally see "Restoration Complete" (using a logging 
> ConsoleGlobalRestoreListener as in docs) messages for all of my stores. So it 
> seems it may be fine now to restart the processing.
> Three minutes later, some events get processed, and I see an OOM error:  
> {code:java}
> java.lang.OutOfMemoryError: GC overhead limit exceeded{code}
>  
> ... so given that it usually allows to process during hours under same 
> circumstances, I'm wondering whether there is some memory leak in the 
> connection resources or somewhere in the handling of this scenario.
> Kafka and KafkaStreams 2.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to