Although the amount of Java heap that ZK might be using may go down, the JVM process will still hang on to the physical memory allocated for it and if there is no external pressure from other processes Linux will not need to swap it to disk, hence the RSS will remain roughly constant.
That is, the amount of 'real' memory used by a JVM doesn't tell you how much of the JVM's heap is being used. If you believe that the heap usage by ZK is too high, because GC is not finding enough free objects to return to the heap, then that will cause a problem because if you ever do have memory pressure then ZK will start swapping, which is bad. In general, processes on Linux don't usually give memory back - they use as much as they need concurrently, and then the operating system eventually swaps out the unused pages if it needs to. Can you paste the output of jmap -heap <zk-pid> into a reply? That will allow us to see how much of the heap is really being used. Thanks, Henry On 23 May 2012 17:41, Brian Oki <[email protected]> wrote: > Hello, > > We use ZooKeeper 3.3.3. On a 3-node site, we've been using Patrick Hunt's > publicly available latencies test suite to create scenarios that will help > us to understand the memory, CPU and disk requirements for a deployment of > ZooKeeper for our type of workload. We use a fourth node as the ZooKeeper > (ZK) client to conduct the tests. > > We modified zk-latencies.py slightly to just create-set-delete znodes only. > In particular, we create 1000 permanent znodes, each written with 250,000 > bytes of data. We do this create-set-delete in a loop, sleeping for 5 > seconds between iterations. > > We observe at the ZK leader that the Resident Set Size (RSS) memory climbs > rapidly to 2.6 GB on an 8 GB RAM node. The Java heap size of each ZK > server daemon is 3 GB. > > Further, once the test has gone through 15 iterations, all the znodes > created on behalf of the test have been deleted. There is no further write > activity to ZK, and no read activity at all. The system is quiesced. No > other services are competing for the disk, CPU or RAM during the test. > > Our question is this: The RSS of the ZK leader (and the followers) seems to > remain at 2.6 GB after the test has completed. Why? > > We would expect that since all relevant znodes for the test have been > deleted, the leader's RSS should have shrunk considerably, even after 1 > hour has passed. Are we missing something? > > We have used jmap to inspect the heap. To understand the heap contents > requires detailed implementation knowledge that we don't have, so we didn't > pursue this avenue any further. > > Configuration: > 3 node servers running ZK daemons as 3-server ensemble > 1 client machine > each node has 8 GB RAM > each node has 4 cores > each node has a 465 GB disk > ZK release: 3.3.3 > ZK server java heap size: 3 GB > GC: concurrent low-pause garbage collector > NIC: bonded 1 Gb NIC > > Thank you. > > Sincerely, > > Brian > -- Henry Robinson Software Engineer Cloudera 415-994-6679
