Lukasz Osipiuk commented on ZOOKEEPER-713:
Thanks for detailed answer.
lukasz thank you for reporting this problem and providing the details. we
analyzed your logs and found the following problems:
1) you are getting an OutOfMemoryError during the snapshot. that is why you are
getting the invalid snapshot file. it turns out that the invalid file isn't
really a problem since we can just use an older snapshot to recover from, but
this may indicate that you are running very close to the limit and spending a
lot of time in the GC. (this may aggravate the next issue.)
It appears we did not adjust java heap size and we are running on 512M of
memory. I will increase heap size tomorrow.
2) the initLimit and ticktime in zoo.cfg may too low. how much data is stored
in zookeeper? look at the snapshot filesize, you need to be able to transmit
the snapshot within the initLimit*ticktime. as an experiment try scping the
snapshots between the different servers and see how long it takes. ticktime
should be increased on wan connections. you might try doubling the ticktime and
initLimit. (if you really are overloading the GC it is going to be slowing
Our snapshot is quite small. At time problem occured i had around 11M. I
checked with scp and copying takes much less than a second.
I guess JVM was swapping (as it was running out of memory) caused delays with
file transfer. What do you think? Is it possible?
One more question - Is it normal that zookeeper consumed 0.5G of memory
handling such small snapshot?
3) we also noticed that it is taking a long time to read and write snapshots
(~17 and ~40 seconds in some cases). do you have other things contending with
the disk? this is going to affect how long it takes for the leader to respond
to a client, and thus the initLimit.
Same as above - maybe memory limit was the problem. Anyway I will move
zookeeper guests to xenhosts which does not have any disk intensive guests - to
limit disc contention. Hope this helps. On xenhosts on which zookeeper is run
some mysqls are run too. They are not very intensively used but it still good
to fix it.
4) we noticed that the jvm version you are using is pretty old. you may try
upgrading to the latest version, especially since you are using 64-bit linux.
Well - debian stable and "up to date" does not work well together. I will
consider building newer java deb package myself but I would rather
treat is a last resort. Do you really think newer java version could help?
> zookeeper fails to start - broken snapshot?
> Key: ZOOKEEPER-713
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-713
> Project: Zookeeper
> Issue Type: Bug
> Affects Versions: 3.2.2
> Environment: debian lenny; ia64; xen virtualization
> Reporter: Lukasz Osipiuk
> Attachments: node1-version-2.tgz-aa, node1-version-2.tgz-ab,
> node1-zookeeper.log.gz, node2-version-2.tgz-aa, node2-version-2.tgz-ab,
> node2-version-2.tgz-ac, node2-zookeeper.log.gz, node3-version-2.tgz-aa,
> node3-version-2.tgz-ab, node3-version-2.tgz-ac, node3-zookeeper.log.gz,
> Hi guys,
> The following is not a bug report but rather a question - but as I am
> attaching large files I am posting it here rather than on mailinglist.
> Today we had major failure in our production environment. Machines in
> zookeeper cluster gone wild and all clients got disconnected.
> We tried to restart whole zookeeper cluster but cluster got stuck in leader
> election phase.
> Calling stat command on any machine in the cluster resulted in
> 'ZooKeeperServer not running' message
> In one of logs I noticed 'Invalid snapshot' message which disturbed me a bit.
> We did not manage to make cluster work again with data. We deleted all
> version-2 directories on all nodes and then cluster started up without
> Is it possible that snapshot/log data got corrupted in a way which made
> cluster unable to start?
> Fortunately we could rebuild data we store in zookeeper as we use it only for
> locks and most of nodes is ephemeral.
> I am attaching contents of version-2 directory from all nodes and server logs.
> Source problem occurred some time before 15. First cluster restart happened
> at 15:03.
> At some point later we experimented with deleting version-2 directory so I
> would not look at following restart because they can be misleading due to our
> I am also attaching zoo.cfg. Maybe something is wrong at this place.
> As I know look into logs i see read timeout during initialization phase after
> 20secs (initLimit=10, tickTime=2000).
> Maybe all I have to do is increase one or other. which one? Are there any
> downsides of increasing tickTime.
> Best regards, Łukasz Osipiuk
> PS. due to attachment size limit I used split. to untar use
> cat nodeX-version-2.tgz-* |tar -xz
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.