Hi, I am running a 3 node ZK ensemble on 3 VMs (2 CPU, 32GB RAM) in the test environment. Lately, I have been getting OutOfMemoryError on all three ZK nodes. ZK has been configured with 6GB heap size. The same ZK ensemble is shared between Kafka, HDFS HA and another custom service.
I analyzed the heap dump and 5.8+ GB is being used by DataTree. I don't have a purge policy in place and size of ZK data directory stands at ~14 GB now. There is enough space on the disk holding ZK data (20% used). As soon as I restart a ZK node, it grows to use all 6GB and starts Full GC every 1-2 sec. In 3-5 minutes, it throws OOM: GC Overhead exceeded. I would appreciate any help in diagnosing the issue. Thanks, CP Mishra
