On Wed, Apr 9, 2014 at 12:56 PM, Bae, Jae Hyeon <[email protected]> wrote:
> Let me clarify. a) is correct. There were normally running 5 instances with > 3 as quorum. I restarted the leader instance and while re-electing leader, > zookeeper cluster lost quorum for a minute and a few zookeeper clients lost > connection. So, this is the form of losing quorum, correct? > Yes. > Is there any way to avoid losing quorum while rolling restart of zookeeper > cluster, specifically the leader instance? > No. You have to always have 3 ZK nodes live in order to maintain continuous operation. Rolling restart implies that you wait long enough after restarting each node so that it has a chance to rejoin the quorum. If you do that then restarting the leader will result in a tiny moment when writes will not be accepted and may require some ZK clients to transparently reconnect to a different ZK node, but it should be hard to detect any outage. > > Thank you > Best, Jae > > > On Wed, Apr 9, 2014 at 12:06 PM, Ted Dunning <[email protected]> > wrote: > > > Your email is a little ambiguous. > > > > a) "5 instances with 3 as quorum" could mean 5 instances configured and > > running normally. > > > > Or > > > > b) it could mean 5 instances with 2 instances that are down. > > > > In (a) restarting the leader instance *should* cause the cluster to do a > > leader election again and form a new quorum. That is a form of losing > > quorum. If that is what you mean, this is normal. A new quorum should > be > > formed and things should continue fairly soon. > > > > In (b), restarting the leader will result in only 2 instances running > which > > is not enough to maintain quorum and until you have at least 3 nodes > > running again, you can't proceed. > > > > > > > > > > > > > > On Wed, Apr 9, 2014 at 11:03 AM, Bae, Jae Hyeon <[email protected]> > > wrote: > > > > > Hi zookeeper users > > > > > > While rolling restart zookeeper cluster of 5 instances with 3 as > quorum, > > > restarting the leader instance made quorum lost. Is this expected? > > > Otherwise, how can I restart the leader instance without interrupting > > whole > > > cluster? Or is this fixed in 3.4.6? > > > > > > Thank you > > > Best, Jae > > > > > >
