On Tuesday, November 5, 2024 9:59 PM Jan Wieck <j...@wi3ck.info> wrote: > > Hi Hackers, > > On 9/5/24 01:39, Amit Kapila wrote: > > > > We can't forget CDR completely as this could only be a potential > > problem in that context. Right now, we don't have any built-in > > resolution strategies, so this can't impact but if this is a problem > > then we need to have a solution for it before considering a solution > > like "last_write_wins" strategy. > > I agree that we can't forget about CDR. This is precisely the problem we ran > into > here at pgEdge and why we came up with a solution (attached). > > > Now, instead of discussing LSN<->timestamp inversion issue, you > > started to discuss "last_write_wins" strategy itself which we have > > discussed to some extent in the thread [2]. BTW, we are planning to > > start a separate thread as well just to discuss the clock skew problem > > w.r.t resolution strategies like "last_write_wins" strategy. So, we > > can discuss clock skew in that thread and keep the focus of this > > thread LSN<->timestamp inversion problem. > > Fact is that "last_write_wins" together with some implementation of Conflict > free Replicated Data Types (CRDT) is good enough for many real world > situations. Anything resembling a TPC-B or TPC-C is quite happy with it. > > The attached solution is minimally invasive because it doesn't move the > timestamp generation (clock_gettime() call) into the critical section of > ReserveXLogInsertLocation() that is protected by a spinlock. Instead it keeps > track of the last commit-ts written to WAL in shared memory and simply bumps > that by one microsecond if the next one is below or equal. > There is one extra condition in that code section plus a function call by > pointer > for every WAL record. In the unlikely case of encountering a stalled or > backwards running commit-ts, the expensive part of recalculating the CRC of > the altered commit WAL-record is done later, after releasing the spinlock. I > have > not been able to measure any performance impact on a machine with 2x > Xeon-Silver (32 HT cores). > > This will work fine until we have systems that can sustain a commit rate of > one > million transactions per second or higher for more than a few milliseconds.
Thanks for the patch! I am reading the patch and noticed few minor things. 1. + /* + * This is a local transaction. Make sure that the xact_time + * higher than any timestamp we have seen thus far. + * + * TODO: This is not postmaster restart safe. If the local + * system clock is further behind other nodes than it takes + * for the postmaster to restart (time between it stops + * accepting new transactions and time when it becomes ready + * to accept new transactions), local transactions will not + * be bumped into the future correctly. + */ The TODO section mentions other nodes, but I believe think patch currently do not have the handling of timestamps for other nodes. Should we either remove this part or add a brief explanation to clarify the relationship with other nodes? 2. +/* + * Hook function to be called while holding the WAL insert spinlock. + * to adjust commit timestamps via Lamport clock if needed. + */ The second line seems can be improved: "to adjust commit timestamps .." => "It adjusts commit timestamps ..." Best Regards, Hou zj