On Feb 4, 2007, at 1:36 PM, Jan Wieck wrote:

On 2/4/2007 10:53 AM, Theo Schlossnagle wrote:
As the clock must be incremented clusterwide, the need for it to be insync with the system clock (on any or all of the systems) is obviated. In fact, as you can't guarantee the synchronicity means that it can be confusing -- one expects a time-based clock to be accurate to the time. A counter-based clock has no such expectations.

For the fourth time, the clock is in the mix to allow to continue during a network outage. All your arguments seem to assume 100% network uptime. There will be no clusterwide clock or clusterwide increment when you lose connection. How does your idea cope with that?

That's exactly what a quorum algorithm is for.

Obviously the counters will immediately drift apart based on the transaction load of the nodes as soon as the network goes down. And in order to avoid this "clock" confusion and wrong expectation, you'd rather have a system with such a simple, non-clock based counter and accept that it starts behaving totally wonky when the cluster reconnects after a network outage? I rather confuse a few people than having a last update wins conflict resolution that basically rolls dice to determine "last".

If your cluster partition and you have hours of independent action and upon merge you apply a conflict resolution algorithm that has enormous effect undoing portions of the last several hours of work on the nodes, you wouldn't call that "wonky?"

For sane disconnected (or more generally, partitioned) operation in multi-master environments, a quorum for the dataset must be established. Now, one can consider the "database" to be the dataset. So, on network partitions those in "the" quorum are allowed to progress with data modification and others only read. However, there is no reason why the dataset _must_ be the database and that multiple datasets _must_ share the same quorum algorithm. You could easily classify certain tables or schema or partitions into a specific dataset and apply a suitable quorum algorithm to that and a different quorum algorithm to other disjoint data sets.


// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to