On Thu, Apr 16, 2009 at 12:38:19PM +0200, Dietmar Maurer wrote:
> Lest assume the cluster is partitioned:
> 
> Part1: node1 node2 node3
> Part2: node4 node5
> 
> After recovery, what join/leave messaged do I receive with a CPG:
> 
> A.) JOIN: node4 node5
> 
> or
> 
> B.) JOIN: node1 node2 node3
> 
> or anything else?

In practice I believe you'll see:

nodes 1-3 get a confchg with members=1,2,3,4,5 joined=4,5
nodes 4-5 get a confchg with members=1,2,3,4,5 joined=1,2,3

The issue of partitioning and merging has been a big issue over the years, and
is a very serious problem for any application requiring the properties of
virtual synchrony.

VS guarantees that all cpg members will see the same sequence of messages and
configuration changes, i.e. history of events.  If a cpg is partitioned, that
immediately violates VS.  One part must be killed so that the remaining nodes
will all agree on one version of history, thus maintaining VS.  Partitioning
can't be avoided, so an application must be able to deal with it and kill/stop
one part (assuming the app depends on VS.)

Once a partition exists, a merge back together doesn't change the fact that
the disagreement has already occured (at partition time) and that disagreement
can only be resolved (to maintain VS) by killing nodes that don't agree with
one version of the history.

My applications use quorum to block activity in minority partitions.  They
also exchange messages to detect merges of prior partitions, and then
kill/block nodes that *were* in a minority partition to maintain VS in the
majority.

(Note that a *single* node:process joining the cpg doesn't mean that it wasn't
partitioned by itself and is now merging.)

corosync might make this easier by not merging cpg's (or even whole clusters)
that have been partitioned, but that raises other questions and I've been told
that doing it would be next to impossible.

We have a lot of experience with these situations because of corosync's
tendency to form spurious, transient partitions where a partition is created
and then immediately merged again in fractions of a second.  This doesn't
happen much any more with small clusters, but it does when you get up toward
32 nodes.  This is the most significant item on the list of suggested
improvements I recently sent out.

Dave

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to