Which Java vendor and version are you using in runtime? Also what OS is this? Can you get the lsof output (on Linux) and paste the output of that to some place (like gist) to show us what descriptors are open etc...

-Jaikiran

On Friday 26 August 2016 02:49 AM, Bharath Srinivasan wrote:
Hello:

We are running a data pipeline application stack using Kafka 0.8.2.2 in
production. We have been seeing intermittent CLOSE_WAIT on our kafka
brokers frequently and they fill up the file handles pretty quickly. By the
time the open file count reaches around 40K, the node becomes unresponsive
and we see huge GC pauses. The only way out has been restart of the node.
When the nodes are working fine, the average open files in the nodes stay
around 6K during peak load and 3K at average.

Configurations:
- 5 broker cluster (Single node spec: 24 core processors, 250 GB RAM, 256GB
SSD)
- 20 topics and 1100 partitions across all topics
- Replication factor of 3
- Java based KafkaProducer and high level consumers
(ZookeeperConsumerConnector)
- GC params { -Xmx32G -Xms4G -server -XX:MetaspaceSize=96m -XX:+UseG1GC
-XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35
-XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50
-XX:MaxMetaspaceFreeRatio=80 }

Any pointers here? Appreciate your help.

Thanks,
Bharath


Reply via email to