Re: [HACKERS] Proposal: Commit timestamp

Naz Gassiep Fri, 26 Jan 2007 00:03:08 -0800

I would be *very* concerned that system time is not a guaranteedmonotonic entity. Surely a counter or other internally managed mechanismwould be a better solution.

Furthermore, what would be the ramifications of master and slave systemtimes being out of sync?

Finally what if system time is rolled forward a few minutes as part of acorrection and there were transactions completed in that time? There isa change, albeit small, that two transactions will have the sametimestamp. More importantly, this will throw all kinds of issues in whenthe slave sees transactions in the future. Even with regular NTP syncs,drift can cause a clock to be rolled forward a few milliseconds,possibly resulting in duplicate transaction IDs.

In summary, I don't think the use of system time has any place inPostgreSQL's internal consistency mechanisms, it is too unreliable anenvironment property. Why can't a counter be used for this instead?


- Naz.

Jan Wieck wrote:

For a future multimaster replication system, I will need a couple offeatures in the PostgreSQL server itself. I will submit separateproposals per feature so that discussions can be kept focused on onefeature per thread.
For conflict resolution purposes in an asynchronous multimastersystem, the "last update" definition often comes into play. For thisto work, the system must provide a monotonically increasing timestamptaken at the commit of a transaction. During replication, thereplication process must be able to provide the remote nodes timestampso that the replicated data will be "as of the time it was written onthe remote node", and not the current local time of the replica, whichis by definition of "asynchronous" later.
To provide this data, I would like to add another "log" directory,pg_tslog. The files in this directory will be similar to the clog, butcontain arrays of timestamptz values. On commit, the current systemtime will be taken. As long as this time is lower or equal to the lasttaken time in this PostgreSQL instance, the value will be increased byone microsecond. The resulting time will be added to the commit WALrecord and written into the pg_tslog file.
If a per database configurable tslog_priority is given, the timestampwill be truncated to milliseconds and the increment logic is done onmilliseconds. The priority is added to the timestamp. This guaranteesthat no two timestamps for commits will ever be exactly identical,even across different servers.
The COMMIT syntax will get extended to

    COMMIT [TRANSACTION] [WITH TIMESTAMP <timestamptz>];
The extension is limited to superusers and will override the normallygenerated commit timestamp. This will be used to give the replicatingtransaction on the replica the exact same timestamp it got on theoriginating master node.
The pg_tslog segments will be purged like the clog segments, after alltransactions belonging to them have been stamped frozen. A frozen xidby definition has a timestamp of epoch. To ensure a system using thistimestamp feature has enough time to perform its work, a new GUCvariable defining an interval will prevent vacuum from freezing xid'sthat are younger than that.
A function get_commit_timestamp(xid) returning timpstamptz will returnthe commit time of a transaction as recorded by this feature.
Comments, changes, additions?

Jan


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Re: [HACKERS] Proposal: Commit timestamp

Reply via email to