Re: [HACKERS] Proposal: Commit timestamp

Markus Schiltknecht Wed, 07 Feb 2007 18:28:48 -0800

Hi,

Jan Wieck wrote:

Then let me give you a little puzzle just for the fun of it.
A database containing customer contact information (among other things)is a two node multimaster system. One is serving the customer webportal, the other is used by the company staff including the callcenter. At 13:45 the two servers lose connectivity to each other, yetthe internal staff can access the internal server while the web portalis accessible from the outside. At 13:50 customer A updates their creditcard information through the web portal, while customer B does the samethrough the call center. At 13:55 both customers change their mind touse yet another credit card, now customer A phones the call center whilecustomer B does it via the internet.

Phew, a mind twister... one customer would already be enough to triggerthat sort of conflict...

At 14:00 the two servers reconnect and go through the conflictresolution. How do you intend to solve both conflicts without using any"clock", because that seems to be a stopword causing instant rejectionof whatever you propose. Needless to say, both customers will bedissatisfied if you charge the "wrong" credit card during your nextbilling cycle.

Correct. But do these cases satisfy storing timestamps to each and everytransaction you do? That's what I doubt, not the usefulness of timebased conflict resolution for certain cases.

You can always add a time based conflict resolution, by adding atimestamp column and decide upon that one. I'd guess that the overallcosts are lower that way.


But you've withdrawn that proposal already, so...

Which is a good discussion because one of the reasons why I stoppedlooking into Postgres-R is the fact that is based on the idea to pushall the replication information through a system that generates a globalserialized message queue. That by itself isn't the problem, but the factthat implementing a global serialized message queue has seriousthroughput issues that are (among other details) linked to the speed oflight.

Agreed. Nevertheless, there are use cases for such systems, because theyput less limitations to the application. One could even argue, that yourabove example would be one ;-)

I am trying to start with a system, that doesn't rely on such amechanism for everything. I do intend to add an option later, thatallows to declare a UNIQUE NOT NULL constraint to be synchronous. Whatthat means is, that any INSERT, UPDATE, DELETE and SELECT FOR UPDATEwill require the node to currently be a member of the (quorum orpriority defined) majority of the cluster.


Sounds reasonable.

An advisory lock system,based on a total order group communication, will grant the lock to theunique key values on a first come, first serve base. Every node in thecluster will keep those keys as "locked" until the asynchronousreplication stream reports the locking transaction as ended. If anotherremote transaction in the meantime requires updating such key, theincoming stream from that node will be on hold until the lock iscleared. This is to protect agains node B replicating a transaction fromnode A and a later update on node B arrives on C before C got the firstevent from A. A node that got disconnected from the cluster must rebuildthe current advisory lock list upon reconnecting to the cluster.


Yeah, this is a convenient way to replicate sequences via a GCS.

I think that this will be a way to overcome Postgres-R's communicationbottleneck, as well as allowing limited update activity even during acompletely disconnected state of a node. Synchronous or groupcommunication messages are reduced to the cases, where the applicationcannot be implemented in a conflict free way, like allocating a naturalprimary key. There is absolutely no need to synchronize for examplecreating a sales order.

Agreed, such cases can easily be optimized. But you have to be aware ofhe limitations these optimizations cause. Postgres-R is much moretargeted at very general use cases.

An application can use global unique ID's forthe order number. And everything possibly referenced by an order (items,customers, ...) is stored in a way that the references are neverupdated. Deletes to those possibly referenced objects are implemented ina two step process, where they are first marked obsolete, and later onthings that have been marked obsolete for X long are deleted. A REPLICATRIGGER on inserting an order will simply reset the obsolete flag ofreferenced objects. If a node is disconnected longer than X, you have aproblem - hunt down the guy who defined X.

Yeah, that's another very nice optimization. Again, as long as you knowthe limitations, that's all well and fine.

Merging certain ideas to come up with an async/sync hybrid? Seems to mewe have similar enough ideas to need conflict resolution, because we hadthem simultaneously but communicate them asynchronously.


Huh? Sorry, I didn't get what you're trying to say here.

Regards

Markus


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Re: [HACKERS] Proposal: Commit timestamp

Reply via email to