> Promoting the 2 observers to participant will be a manual step (as part of > disaster recovery) to get the cluster up. During this manual step, if needed, > we can shutdown/terminate the old AZ instances. > We also have puppet managing configuration. Puppet module will be updated to > reflect new cluster instances. So when, if the AZ comes up, puppet will see > that these instances are no longer part of the zookeeper cluster and module > will stop zookeeper service. What guarantee do you have that all clients will have switched over to the new cluster? Even if puppet will shutdown the old cluster, it will take time to see that it needs to be shut down, which creates an opportunity for clients to connect and do stuff.
> A side question: Will observers will always in sync with entire cluster? in > other words when observers will be in sync with the quorum participants? By in sync, I take it that you mean that any write that was acknowledged by the initial cluster exists in the failover cluster. No, they may not be in-sync. The observer will always have a prefix of the log of the participants. This prefix may be the entire log, or it may be missing the latest writes. This is true even if you have a participant in the failover AZ. For a write to be acknowledged, it has to hit a majority of the quorum. With 2 AZs, 1 AZ will always have a majority, so if it goes down, writes will be missing from the other AZ. The exception to this is where there's an even number of participant in each AZ. In this case, you one AZ goes down, you can no longer form a majority, but all writes will exist on both AZs. Maybe this could be a path forward, since you accept that you will have manual failover. I'm not sure how well this scenario is supported in the tooling though. -Ivan