On Thu, Aug 29, 2024 at 2:31 AM John H <johnh...@gmail.com> wrote: > > Hi Amit, > > On Mon, Aug 26, 2024 at 11:00 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > I wanted a simple test where in the first case you use > > synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' and in the second case > > use standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot. You > > can try some variations of it as well. The idea is that even if the > > performance is less for synchronous_standby_names configuration, we > > should be able to document it. This will help users to decide what is > > ... > > What is the difference between "Test failover_slots with > > synchronized_standby_slots = 'rr_1, rr_2, > > > rr_3, rr_4, rr_5'" and "Test failover_slots waiting on sync_rep no new > > > shared cache"? I want to know what configuration did you used for > > > synchronous_standby_names in the latter case. > > Sorry for the confusion due to the bad-naming of the test cases, let > me rephrase. > All three tests had synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' > set with synchronous_commit = 'on', and failover_slots = 'on' > for the 10 logical slots. > > # Test failover_slots with synchronized_standby_slots = 'rr_1, rr_2, > rr_3, rr_4, rr_5' > This is the test you wanted where the logical clients are waiting on > all 5 slots to acknowledge the change since > 'synchronized_standby_slots' takes priority when set. > > # Test failover_slots sync rep no cache > This test has 'synchronized_standby_slots' commented out, and without > relying on the new cache introduced in 0003. > Logical clients will wait on synchronous_standby_names in this case. > > # Test failover slots with additional shared cache > This test also has 'synchronized_standby_slots' commented out, and > logical clients will wait on the LSNs > reported from synchronous_standby_names but it relies on a new cache > to reduce contention on SyncRepLock. > > > The idea is that even if the > > performance is less for synchronous_standby_names configuration, we > > should be able to document it. This will help users to decide what is > > best for them. > > Makes sense. > > > I am also not sure especially as the test results didn't shown much > > improvement and the code also becomes bit complicated. BTW, in the > > 0003 version in the below code: > > That's fair, I've updated to be more in line with 0002. > > > + /* Cache values to reduce contention */ > > + LWLockAcquire(SyncRepLock, LW_SHARED); > > + memcpy((XLogRecPtr *) walsndctl->cached_lsn, (XLogRecPtr *) > > walsndctl->lsn, sizeof(lsn)); > > + LWLockRelease(SyncRepLock); > > > > Which mode lsn is being copied? I am not sure if I understood this > > part of the code. > > All of the mode LSNs are being copied in case SyncRepWaitMode changes in > the next iteration. I've removed that part but kept: > > > + memcpy(lsn, (XLogRecPtr *) walsndctl->lsn, sizeof(lsn)); > > as suggested by Bertrand to avoid the for loop updating values one-by-one. > > Here's what's logged after the memcpy: > > 2024-08-28 19:41:13.798 UTC [1160413] LOG: lsn[0] after memcpy is: > 279/752C7FF0 > 2024-08-28 19:41:13.798 UTC [1160413] LOG: lsn[1] after memcpy is: > 279/752C7F20 > 2024-08-28 19:41:13.798 UTC [1160413] LOG: lsn[2] after memcpy is: > 279/752C7F20 > > > In the 0002 version, in the following code [1], you are referring to > > LSN mode which is enabled for logical walsender irrespective of the > > mode used by the physical walsender. It is possible that they are > > always the same but that is not evident from the code or comments in > > the patch. > > They are almost always the same, I tried to indicate that with the > following comment in the patch, but I could make it more explicit? > > /* Initialize value in case SIGHUP changing to SYNC_REP_NO_WAIT */ > > At the beginning we set > > > int mode = SyncRepWaitMode; > > At this time, the logical walsender mode it's checking against is the > same as what the physical walsenders are using. > It's possible that this mode is no longer the same when we execute the > following check: > > > if (lsn[mode] >= wait_for_lsn) > > because of a SIGHUP to synchronous_commit that changes SyncRepWaitMode > to some other value > > We cache the value instead of > > if (lsn[SyncRepWaitMode] >= wait_for_lsn) > > because SYNC_REP_NO_WAIT is -1. If SyncRepWaitMode is set to this it > leads to out of bounds access. > > I've attached a new patch that removes the shared cache introduced in 0003. >
Thanks for the patch. Few comments and queries: 1) + static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE]; We shall name it as 'lsns' as there are multiple. 2) + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) + { + lsn[i] = InvalidXLogRecPtr; + } Can we do it like below similar to what you have done at another place: memset(lsn, InvalidXLogRecPtr, sizeof(lsn)); 3) + if (!initialized) + { + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) + { + lsn[i] = InvalidXLogRecPtr; + } + } I do not see 'initialized' set to TRUE anywhere. Can you please elaborate the intent here? 4) + int mode = SyncRepWaitMode; + int i; + + if (!initialized) + { + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) + { + lsn[i] = InvalidXLogRecPtr; + } + } + if (mode == SYNC_REP_NO_WAIT) + return true; I do not understand this code well. As Amit also pointed out, 'mode' may change. When we initialize 'mode' lets say SyncRepWaitMode is SYNC_REP_NO_WAIT but by the time we check 'if (mode == SYNC_REP_NO_WAIT)', SyncRepWaitMode has changed to say SYNC_REP_WAIT_FLUSH, if so, then we will wrongly return true from here. Is that a possibility? ProcessConfigFile() is in the caller, and thus we may end up using the wrong mode. thanks Shveta