This past week, we had a zookeeper outage.    All clients lost contact with the 
quorum.     I am still trying to understand what happened.  One of our servers 
ran out of disk space because between 02:23 and 02:50, zookeeper created almost 
16 GB data.

ls -al /opt/eg/zookeeper/data/version-2
total 1662608
drwxr-xr-x 2 egadmin eggrp     61440 Jun 16 09:01 .
drwxr-xr-x 3 egadmin eggrp      4096 Jun  5 07:21 ..
-rw------- 1 egadmin eggrp         3 Jun 16 09:01 acceptedEpoch
-rw------- 1 egadmin eggrp         3 Jun 16 09:01 currentEpoch
-rw------- 1 egadmin eggrp      5986 Jun 16 09:01 snapshot.14d00018763
-rw------- 1 egadmin eggrp 522637450 Jun 16 02:23 snapshot.1a0060ab78
-rw------- 1 egadmin eggrp 523110346 Jun 16 02:24 snapshot.1a0060c200
-rw------- 1 egadmin eggrp 528639820 Jun 16 02:36 snapshot.1a0061c975
-rw------- 1 egadmin eggrp 128020480 Jun 16 02:50 snapshot.1a0062fd8c
[root@jtcmpslegwap01 ~]#

The zookeeper tree does not have much data in it.

There are about 8 leaders, 1 pathcache with single strings, and one data 
element at a single zpath.

What would cause something like this, creating so many large snapshots.   What 
goes in the snapshots besides the data?

Thank you,
Curtis Cantrell

The information contained in this message is proprietary and/or confidential. 
If you are not the intended recipient, please: (i) delete the message and all 
copies; (ii) do not disclose, distribute or use the message in any manner; and 
(iii) notify the sender immediately. In addition, please be aware that any 
message addressed to our domain is subject to archiving and review by persons 
other than the intended recipient. Thank you.

Reply via email to