No. This should not cause data loss. As soon as ZK cannot replicate changes to a majority of machines, it refuses to take any more changes. This is old ground and is required for correctness in the face of network partition. It is conceivable (barely) that *exactly* the minority that were behind were the survivors, but this is almost equivalent to a complete failure of the cluster choreographed in such a way that a few nodes come back from the dead just afterwards. That could cause the state to not include some "completed" transactions to disappear, but at this level of massive failure, we have the same issues with any cluster.
To be explicit, you can cause any ZK cluster to back-track in time by doing the following: a) take down a minority of machines b) do some updates c) take down the rest of the cluster d) bring back the minority e) reconfigure to tell the minority that they are everything f) add new members of the cluster At this point, you will have lost the transactions from (b), but I really, really am not going to worry about this happening either by plan or by accident. Without steps (e) and (f), the cluster will tell you that it knows something is wrong and that it cannot elect a leader. If you don't have *exact* coincidence of the survivor set and the set of laggards, then you won't have any data loss at all. You have to decide if this is too much risk for you. My feeling is that it is OK level of correctness for conventional weapon fire control, but not for nuclear weapons safeguards. Since my apps are considerably less sensitive than either of those, I am not much worried. On Mon, Jul 6, 2009 at 12:40 PM, Henry Robinson <he...@cloudera.com> wrote: > It seems like there is a > correctness issue: if a majority of servers fail, with the remaining > minority lagging the leader for some reason, won't the ensemble's current > state be forever lost? >