RE: Commit Timestamp and LSN Inversion issue

Zhijie Hou (Fujitsu) Wed, 06 Nov 2024 18:30:47 -0800

On Tuesday, November 5, 2024 9:59 PM Jan Wieck <[email protected]> wrote:
> 
> Hi Hackers,
> 
> On 9/5/24 01:39, Amit Kapila wrote:
> >
> > We can't forget CDR completely as this could only be a potential
> > problem in that context. Right now, we don't have any built-in
> > resolution strategies, so this can't impact but if this is a problem
> > then we need to have a solution for it before considering a solution
> > like "last_write_wins" strategy.
> 
> I agree that we can't forget about CDR. This is precisely the problem we ran 
> into
> here at pgEdge and why we came up with a solution (attached).
> 
> > Now, instead of discussing LSN<->timestamp inversion issue, you
> > started to discuss "last_write_wins" strategy itself which we have
> > discussed to some extent in the thread [2]. BTW, we are planning to
> > start a separate thread as well just to discuss the clock skew problem
> > w.r.t resolution strategies like "last_write_wins" strategy. So, we
> > can discuss clock skew in that thread and keep the focus of this
> > thread LSN<->timestamp inversion problem.
> 
> Fact is that "last_write_wins" together with some implementation of Conflict
> free Replicated Data Types (CRDT) is good enough for many real world
> situations. Anything resembling a TPC-B or TPC-C is quite happy with it.
> 
> The attached solution is minimally invasive because it doesn't move the
> timestamp generation (clock_gettime() call) into the critical section of
> ReserveXLogInsertLocation() that is protected by a spinlock. Instead it keeps
> track of the last commit-ts written to WAL in shared memory and simply bumps
> that by one microsecond if the next one is below or equal.
> There is one extra condition in that code section plus a function call by 
> pointer
> for every WAL record. In the unlikely case of encountering a stalled or
> backwards running commit-ts, the expensive part of recalculating the CRC of
> the altered commit WAL-record is done later, after releasing the spinlock. I 
> have
> not been able to measure any performance impact on a machine with 2x
> Xeon-Silver (32 HT cores).
> 
> This will work fine until we have systems that can sustain a commit rate of 
> one
> million transactions per second or higher for more than a few milliseconds.


Thanks for the patch! I am reading the patch and noticed few minor things.

1.
+       /*
+        * This is a local transaction. Make sure that the xact_time
+        * higher than any timestamp we have seen thus far.
+        *
+        * TODO: This is not postmaster restart safe. If the local
+        * system clock is further behind other nodes than it takes
+        * for the postmaster to restart (time between it stops
+        * accepting new transactions and time when it becomes ready
+        * to accept new transactions), local transactions will not
+        * be bumped into the future correctly.
+        */

The TODO section mentions other nodes, but I believe think patch currently do
not have the handling of timestamps for other nodes. Should we either remove
this part or add a brief explanation to clarify the relationship with other
nodes?

2.
+/*
+ * Hook function to be called while holding the WAL insert spinlock.
+ * to adjust commit timestamps via Lamport clock if needed.
+ */

The second line seems can be improved:
"to adjust commit timestamps .." => "It adjusts commit timestamps ..."


Best Regards,
Hou zj

RE: Commit Timestamp and LSN Inversion issue

Reply via email to