On Thu, Apr 16, 2009 at 12:38:19PM +0200, Dietmar Maurer wrote: > Lest assume the cluster is partitioned: > > Part1: node1 node2 node3 > Part2: node4 node5 > > After recovery, what join/leave messaged do I receive with a CPG: > > A.) JOIN: node4 node5 > > or > > B.) JOIN: node1 node2 node3 > > or anything else?
In practice I believe you'll see: nodes 1-3 get a confchg with members=1,2,3,4,5 joined=4,5 nodes 4-5 get a confchg with members=1,2,3,4,5 joined=1,2,3 The issue of partitioning and merging has been a big issue over the years, and is a very serious problem for any application requiring the properties of virtual synchrony. VS guarantees that all cpg members will see the same sequence of messages and configuration changes, i.e. history of events. If a cpg is partitioned, that immediately violates VS. One part must be killed so that the remaining nodes will all agree on one version of history, thus maintaining VS. Partitioning can't be avoided, so an application must be able to deal with it and kill/stop one part (assuming the app depends on VS.) Once a partition exists, a merge back together doesn't change the fact that the disagreement has already occured (at partition time) and that disagreement can only be resolved (to maintain VS) by killing nodes that don't agree with one version of the history. My applications use quorum to block activity in minority partitions. They also exchange messages to detect merges of prior partitions, and then kill/block nodes that *were* in a minority partition to maintain VS in the majority. (Note that a *single* node:process joining the cpg doesn't mean that it wasn't partitioned by itself and is now merging.) corosync might make this easier by not merging cpg's (or even whole clusters) that have been partitioned, but that raises other questions and I've been told that doing it would be next to impossible. We have a lot of experience with these situations because of corosync's tendency to form spurious, transient partitions where a partition is created and then immediately merged again in fractions of a second. This doesn't happen much any more with small clusters, but it does when you get up toward 32 nodes. This is the most significant item on the list of suggested improvements I recently sent out. Dave _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
