You have an 11gig snapshot file. That's very large. Did someone unexpectedly overload the server with znode creations?
When a follower comes up the leader needs to serialize the znodes to the snapshot file, stream it to the follower, who saves it locally then deserializes it. (11g/15min is avg about 12meg/second for this process) Often times this is exacerbated by the max heap and GC interactions. Patrick On Tue, Jul 31, 2012 at 2:23 PM, Jordan Zimmerman <[email protected]> wrote: > BTW - this is 3.3.5 > > On Jul 31, 2012, at 2:22 PM, Jordan Zimmerman <[email protected]> > wrote: > >> We've had a few outages of our ZK cluster recently. When trying to bring the >> cluster back up it's been taking 10-15 minutes for the followers to sync >> with the Leader. Any idea what might cause this? Here's an ls of the data >> dir: >> >> -rw-r--r-- 1 zookeeperserverprod nac 67108880 Jul 31 20:39 log.3900a4bc75 >> -rw-r--r-- 1 zookeeperserverprod nac 67108880 Jul 31 20:40 log.3900a634ee >> -rw-r--r-- 1 zookeeperserverprod nac 67108880 Jul 31 21:21 log.3a00000001 >> -rw-r--r-- 1 zookeeperserverprod nac 67108880 Jul 31 21:22 log.3a000139a2 >> -rw-r--r-- 1 zookeeperserverprod nac 9279729723 Jul 31 20:42 >> snapshot.3900a634ec >> -rw-r--r-- 1 zookeeperserverprod nac 11126306780 Jul 31 21:09 >> snapshot.3900a6b149 >> -rw-r--r-- 1 zookeeperserverprod nac 4153727423 Jul 31 21:22 >> snapshot.3a000139a0 >> >
