Karol, that's interesting. Can you send the Jira ticket, please? In our case, a rogue program added 300k entries via a service that persists data in ZK and is meant for only a handful of entries. Now, we are dealing with deleting these entries taking up > 3 GB.
Thanks, CP On Fri, Apr 24, 2015 at 1:09 PM, Karol Dudzinski <[email protected]> wrote: > Hi, > > Do you know if any of the services that use your ZK create ACLs that are > potentially unique and one-time-ish? I recently hit a similar problem and > discovered that the DataTree has an ACL cache that never gets anything > removed from it. That was by far and away the largest memory consumer I > found when analysing the heap dump. If this is the case then you should > see lots of ACL objects on the heap. > > I filed a JIRA for this and keep meaning to submit a patch but sadly > haven't got round to it. As an interim solution, I wrote a tool which uses > the DataTree class and the serialisation utils to purge this cache of > unused entries. I my case it shrank the snapshot from 500MB to 12MB! The > time to write the snapshot went from 40 seconds to less than 1 second as a > result. > > Thanks, > Karol > > > > On 24 Apr 2015, at 18:45, CP Mishra <[email protected]> wrote: > > > > Hi, > > > > I am running a 3 node ZK ensemble on 3 VMs (2 CPU, 32GB RAM) in the test > > environment. Lately, I have been getting OutOfMemoryError on all three ZK > > nodes. ZK has been configured with 6GB heap size. The same ZK ensemble is > > shared between Kafka, HDFS HA and another custom service. > > > > I analyzed the heap dump and 5.8+ GB is being used by DataTree. I don't > > have a purge policy in place and size of ZK data directory stands at ~14 > GB > > now. There is enough space on the disk holding ZK data (20% used). > > > > As soon as I restart a ZK node, it grows to use all 6GB and starts Full > GC > > every 1-2 sec. In 3-5 minutes, it throws OOM: GC Overhead exceeded. > > > > I would appreciate any help in diagnosing the issue. > > > > Thanks, > > CP Mishra >
