Hi Curtis,

I suggest you use "org.apache.zookeeper.server.SnapshotFormatter" to have a 
look at the content of the snapshot, it should give you a hint on what has gone 
wrong.

-Flavio

> On 21 Jun 2016, at 13:30, Cantrell, Curtis <[email protected]> wrote:
> 
> This past week, we had a zookeeper outage.    All clients lost contact with 
> the quorum.     I am still trying to understand what happened.  One of our 
> servers ran out of disk space because between 02:23 and 02:50, zookeeper 
> created almost 16 GB data.
> 
> ls -al /opt/eg/zookeeper/data/version-2
> total 1662608
> drwxr-xr-x 2 egadmin eggrp     61440 Jun 16 09:01 .
> drwxr-xr-x 3 egadmin eggrp      4096 Jun  5 07:21 ..
> -rw------- 1 egadmin eggrp         3 Jun 16 09:01 acceptedEpoch
> -rw------- 1 egadmin eggrp         3 Jun 16 09:01 currentEpoch
> -rw------- 1 egadmin eggrp      5986 Jun 16 09:01 snapshot.14d00018763
> -rw------- 1 egadmin eggrp 522637450 Jun 16 02:23 snapshot.1a0060ab78
> -rw------- 1 egadmin eggrp 523110346 Jun 16 02:24 snapshot.1a0060c200
> -rw------- 1 egadmin eggrp 528639820 Jun 16 02:36 snapshot.1a0061c975
> -rw------- 1 egadmin eggrp 128020480 Jun 16 02:50 snapshot.1a0062fd8c
> [root@jtcmpslegwap01 ~]#
> 
> The zookeeper tree does not have much data in it.
> 
> There are about 8 leaders, 1 pathcache with single strings, and one data 
> element at a single zpath.
> 
> What would cause something like this, creating so many large snapshots.   
> What goes in the snapshots besides the data?
> 
> Thank you,
> Curtis Cantrell
> 
> The information contained in this message is proprietary and/or confidential. 
> If you are not the intended recipient, please: (i) delete the message and all 
> copies; (ii) do not disclose, distribute or use the message in any manner; 
> and (iii) notify the sender immediately. In addition, please be aware that 
> any message addressed to our domain is subject to archiving and review by 
> persons other than the intended recipient. Thank you.

Reply via email to