Hi Curtis, I suggest you use "org.apache.zookeeper.server.SnapshotFormatter" to have a look at the content of the snapshot, it should give you a hint on what has gone wrong.
-Flavio > On 21 Jun 2016, at 13:30, Cantrell, Curtis <[email protected]> wrote: > > This past week, we had a zookeeper outage. All clients lost contact with > the quorum. I am still trying to understand what happened. One of our > servers ran out of disk space because between 02:23 and 02:50, zookeeper > created almost 16 GB data. > > ls -al /opt/eg/zookeeper/data/version-2 > total 1662608 > drwxr-xr-x 2 egadmin eggrp 61440 Jun 16 09:01 . > drwxr-xr-x 3 egadmin eggrp 4096 Jun 5 07:21 .. > -rw------- 1 egadmin eggrp 3 Jun 16 09:01 acceptedEpoch > -rw------- 1 egadmin eggrp 3 Jun 16 09:01 currentEpoch > -rw------- 1 egadmin eggrp 5986 Jun 16 09:01 snapshot.14d00018763 > -rw------- 1 egadmin eggrp 522637450 Jun 16 02:23 snapshot.1a0060ab78 > -rw------- 1 egadmin eggrp 523110346 Jun 16 02:24 snapshot.1a0060c200 > -rw------- 1 egadmin eggrp 528639820 Jun 16 02:36 snapshot.1a0061c975 > -rw------- 1 egadmin eggrp 128020480 Jun 16 02:50 snapshot.1a0062fd8c > [root@jtcmpslegwap01 ~]# > > The zookeeper tree does not have much data in it. > > There are about 8 leaders, 1 pathcache with single strings, and one data > element at a single zpath. > > What would cause something like this, creating so many large snapshots. > What goes in the snapshots besides the data? > > Thank you, > Curtis Cantrell > > The information contained in this message is proprietary and/or confidential. > If you are not the intended recipient, please: (i) delete the message and all > copies; (ii) do not disclose, distribute or use the message in any manner; > and (iii) notify the sender immediately. In addition, please be aware that > any message addressed to our domain is subject to archiving and review by > persons other than the intended recipient. Thank you.
