No.  This should not cause data loss.

As soon as ZK cannot replicate changes to a majority of machines, it refuses
to take any more changes.  This is old ground and is required for
correctness in the face of network partition.  It is conceivable (barely)
that *exactly* the minority that were behind were the survivors, but this is
almost equivalent to a complete failure of the cluster choreographed in such
a way that a few nodes come back from the dead just afterwards.  That could
cause the state to not include some "completed" transactions to disappear,
but at this level of massive failure, we have the same issues with any
cluster.

To be explicit, you can cause any ZK cluster to back-track in time by doing
the following:

a) take down a minority of machines

b) do some updates

c) take down the rest of the cluster

d) bring back the minority

e) reconfigure to tell the minority that they are everything

f) add new members of the cluster

At this point, you will have lost the transactions from (b), but I really,
really am not going to worry about this happening either by plan or by
accident.  Without steps (e) and (f), the cluster will tell you that it
knows something is wrong and that it cannot elect a leader.  If you don't
have *exact* coincidence of the survivor set and the set of laggards, then
you won't have any data loss at all.

You have to decide if this is too much risk for you.  My feeling is that it
is OK level of correctness for conventional weapon fire control, but not for
nuclear weapons safeguards.  Since my apps are considerably less sensitive
than either of those, I am not much worried.

On Mon, Jul 6, 2009 at 12:40 PM, Henry Robinson <he...@cloudera.com> wrote:

> It seems like there is a
> correctness issue: if a majority of servers fail, with the remaining
> minority lagging the leader for some reason, won't the ensemble's current
> state be forever lost?
>

Reply via email to