On Wed, Sep 4, 2024 at 12:23 PM shveta malik <[email protected]> wrote: > > Hello hackers, > (Cc people involved in the earlier discussion) > > I would like to discuss the $Subject. > > While discussing Logical Replication's Conflict Detection and > Resolution (CDR) design in [1] , it came to our notice that the > commit LSN and timestamp may not correlate perfectly i.e. commits may > happen with LSN1 < LSN2 but with Ts1 > Ts2. This issue may arise > because, during the commit process, the timestamp (xactStopTimestamp) > is captured slightly earlier than when space is reserved in the WAL. > > ~~ > > Reproducibility of conflict-resolution problem due to the timestamp inversion > ------------------------------------------------ > It was suggested that timestamp inversion *may* impact the time-based > resolutions such as last_update_wins (targeted to be implemented in > [1]) as we may end up making wrong decisions if timestamps and LSNs > are not correctly ordered. And thus we tried some tests but failed to > find any practical scenario where it could be a problem. > > Basically, the proposed conflict resolution is a row-level resolution, > and to cause the row value to be inconsistent, we need to modify the > same row in concurrent transactions and commit the changes > concurrently. But this doesn't seem possible because concurrent > updates on the same row are disallowed (e.g., the later update will be > blocked due to the row lock). See [2] for the details. > > We tried to give some thoughts on multi table cases as well e.g., > update table A with foreign key and update the table B that table A > refers to. But update on table A will block the update on table B as > well, so we could not reproduce data-divergence due to the > LSN/timestamp mismatch issue there. > > ~~ > > Idea proposed to fix the timestamp inversion issue > ------------------------------------------------ > There was a suggestion in [3] to acquire the timestamp while reserving > the space (because that happens in LSN order). The clock would need to > be monotonic (easy enough with CLOCK_MONOTONIC), but also cheap. The > main problem why it's being done outside the critical section, because > gettimeofday() may be quite expensive. There's a concept of hybrid > clock, combining "time" and logical counter, which might be useful > independently of CDR. > > On further analyzing this idea, we found that CLOCK_MONOTONIC can be > accepted only by clock_gettime() which has more precision than > gettimeofday() and thus is equally or more expensive theoretically (we > plan to test it and post the results). It does not look like a good > idea to call any of these when holding spinlock to reserve the wal > position. As for the suggested solution "hybrid clock", it might not > help here because the logical counter is only used to order the > transactions with the same timestamp. The problem here is how to get > the timestamp along with wal position > reservation(ReserveXLogInsertLocation). >
Here are the tests done to compare clock_gettime() and gettimeofday()
performance.
Machine details :
Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz
CPU(s): 120; 800GB RAM
Three functions were tested across three different call volumes (1
million, 100 million, and 1 billion):
1) clock_gettime() with CLOCK_REALTIME
2) clock_gettime() with CLOCK_MONOTONIC
3) gettimeofday()
--> clock_gettime() with CLOCK_MONOTONIC sometimes shows slightly
better performance, but not consistently. The difference in time taken
by all three functions is minimal, with averages varying by no more
than ~2.5%. Overall, the performance between CLOCK_MONOTONIC and
gettimeofday() is essentially the same.
Below are the test results -
(each test was run twice for consistency)
1) For 1 million calls:
1a) clock_gettime() with CLOCK_REALTIME:
- Run 1: 0.01770 seconds, Run 2: 0.01772 seconds, Average: 0.01771 seconds.
1b) clock_gettime() with CLOCK_MONOTONIC:
- Run 1: 0.01753 seconds, Run 2: 0.01748 seconds, Average: 0.01750 seconds.
1c) gettimeofday():
- Run 1: 0.01742 seconds, Run 2: 0.01777 seconds, Average: 0.01760 seconds.
2) For 100 million calls:
2a) clock_gettime() with CLOCK_REALTIME:
- Run 1: 1.76649 seconds, Run 2: 1.76602 seconds, Average: 1.76625 seconds.
2b) clock_gettime() with CLOCK_MONOTONIC:
- Run 1: 1.72768 seconds, Run 2: 1.72988 seconds, Average: 1.72878 seconds.
2c) gettimeofday():
- Run 1: 1.72436 seconds, Run 2: 1.72174 seconds, Average: 1.72305 seconds.
3) For 1 billion calls:
3a) clock_gettime() with CLOCK_REALTIME:
- Run 1: 17.63859 seconds, Run 2: 17.65529 seconds, Average:
17.64694 seconds.
3b) clock_gettime() with CLOCK_MONOTONIC:
- Run 1: 17.15109 seconds, Run 2: 17.27406 seconds, Average:
17.21257 seconds.
3c) gettimeofday():
- Run 1: 17.21368 seconds, Run 2: 17.22983 seconds, Average:
17.22175 seconds.
~~~~
Attached the scripts used for tests.
--
Thanks,
Nisha
<<attachment: clock_gettime_test.zip>>
