RE: Synchronizing slots from primary to standby
On Monday, April 29, 2024 5:11 PM shveta malik wrote: > > On Mon, Apr 29, 2024 at 11:38 AM shveta malik > wrote: > > > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu) > > wrote: > > > > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot > wrote: > > > > > > > > Hi, > > > > > > > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote: > > > > > Hi, > > > > > > > > > > Since the standby_slot_names patch has been committed, I am > > > > > attaching the last doc patch for review. > > > > > > > > > > > > > Thanks! > > > > > > > > 1 === > > > > > > > > + continue subscribing to publications now on the new primary > > > > + server > > > > without > > > > + any data loss. > > > > > > > > I think "without any data loss" should be re-worded in this > > > > context. Data loss in the sense "data committed on the primary and > > > > not visible on the subscriber in case of failover" can still occurs (in > > > > case > synchronous replication is not used). > > > > > > > > 2 === > > > > > > > > + If the result (failover_ready) of both above > > > > steps is > > > > + true, existing subscriptions will be able to continue without data > loss. > > > > + > > > > > > > > I don't think that's true if synchronous replication is not used. > > > > Say, > > > > > > > > - synchronous replication is not used > > > > - primary is not able to reach the standby anymore and > > > > standby_slot_names is set > > > > - new data is inserted into the primary > > > > - then not replicated to subscriber (due to standby_slot_names) > > > > > > > > Then I think the both above steps will return true but data would > > > > be lost in case of failover. > > > > > > Thanks for the comments, attach the new version patch which reworded > > > the above places. > > > > Thanks for the patch. > > > > Few comments: > > > > 1) Tested the steps, one of the queries still refers to > > 'conflict_reason'. I think it should refer 'conflicting'. Thanks for catching this. Fixed. > > > > 2) Will it be good to mention that in case of planned promotion, it is > > recommended to run pg_sync_replication_slots() as last sync attempt > > before we run failvoer-ready validation steps? This can be mentioned > > in high-availaibility.sgml of current patch > > I recall now that with the latest fix, we cannot run > pg_sync_replication_slots() unless we disable the slot-sync worker. > Considering that, I think it will be too many steps just to run the SQL > function at > the end without much value added. Thus we can skip this point, we can rely on > slot sync worker completely. Agreed. I didn't change this. Here is the V3 doc patch. Best Regards, Hou zj v3-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch Description: v3-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch
RE: Synchronizing slots from primary to standby
On Friday, March 15, 2024 10:45 PM Bertrand Drouvot wrote: > > Hi, > > On Thu, Mar 14, 2024 at 02:22:44AM +0000, Zhijie Hou (Fujitsu) wrote: > > Hi, > > > > Since the standby_slot_names patch has been committed, I am attaching > > the last doc patch for review. > > > > Thanks! > > 1 === > > + continue subscribing to publications now on the new primary server > without > + any data loss. > > I think "without any data loss" should be re-worded in this context. Data > loss in > the sense "data committed on the primary and not visible on the subscriber in > case of failover" can still occurs (in case synchronous replication is not > used). > > 2 === > > + If the result (failover_ready) of both above steps is > + true, existing subscriptions will be able to continue without data loss. > + > > I don't think that's true if synchronous replication is not used. Say, > > - synchronous replication is not used > - primary is not able to reach the standby anymore and standby_slot_names is > set > - new data is inserted into the primary > - then not replicated to subscriber (due to standby_slot_names) > > Then I think the both above steps will return true but data would be lost in > case > of failover. Thanks for the comments, attach the new version patch which reworded the above places. Best Regards, Hou zj v2-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch Description: v2-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch
RE: Improving the latch handling between logical replication launcher and worker processes.
On Thursday, April 25, 2024 4:59 PM vignesh C wrote: > > Hi, > > Currently the launcher's latch is used for the following: a) worker process > attach > b) worker process exit and c) subscription creation. > Since this same latch is used for multiple cases, the launcher process is not > able > to handle concurrent scenarios like: a) Launcher started a new apply worker > and > waiting for apply worker to attach and b) create subscription sub2 sending > launcher wake up signal. In this scenario, both of them will set latch of the > launcher process, the launcher process is not able to identify that both > operations have occurred 1) worker is attached 2) subscription is created and > apply worker should be started. As a result the apply worker does not get > started for the new subscription created immediately and gets started after > the > timeout of 180 seconds. > > I have started a new thread for this based on suggestions at [1]. > > a) Introduce a new latch to handle worker attach and exit. I found the startup process also uses two latches(see recoveryWakeupLatch) for different purposes, so maybe this is OK. But note that both logical launcher and apply worker will call logicalrep_worker_launch(), if we only add one new latch, it means both workers will wait on the same new Latch, and the launcher may consume the SetLatch that should have been consumed by the apply worker(although it's rare). > b) Add a new GUC launcher_retry_time which gives more flexibility to users as > suggested by Amit at [1]. Before 5a3a953, the wal_retrieve_retry_interval > plays > a similar role as the suggested new GUC launcher_retry_time, e.g. even if a > worker is launched, the launcher only wait wal_retrieve_retry_interval time > before next round. IIUC, the issue does not happen frequently, and may not be noticeable where apply workers wouldn't be restarted often. So, I am slightly not sure if it's worth adding a new GUC. > c) Don't reset the latch at worker attach and allow launcher main to identify > and > handle it. For this there is a patch v6-0002 available at [2]. This seems simple. I found we are doing something similar in RegisterSyncRequest() and WalSummarizerMain(). Best Regards, Hou zj
RE: Race condition in FetchTableStates() breaks synchronization of subscription tables
On Wednesday, April 24, 2024 6:29 PM vignesh C wrote: > > On Wed, 24 Apr 2024 at 11:59, Amit Kapila wrote: > > > > On Wed, Mar 13, 2024 at 9:19 AM vignesh C wrote: > > > > > > On Tue, 12 Mar 2024 at 09:34, Ajin Cherian wrote: > > > > > > > > > > > > > > > > On Tue, Mar 12, 2024 at 2:59 PM vignesh C > wrote: > > > >> > > > >> > > > >> Thanks, I have created the following Commitfest entry for this: > > > >> https://commitfest.postgresql.org/47/4816/ > > > >> > > > >> Regards, > > > >> Vignesh > > > > > > > > > > > > Thanks for the patch, I have verified that the fix works well by > > > > following > the steps mentioned to reproduce the problem. > > > > Reviewing the patch, it seems good and is well documented. Just one > minor comment I had was probably to change the name of the variable > table_states_valid to table_states_validity. The current name made sense when > it was a bool, but now that it is a tri-state enum, it doesn't fit well. > > > > > > Thanks for reviewing the patch, the attached v6 patch has the > > > changes for the same. > > > > > > > v6_0001* looks good to me. This should be backpatched unless you or > > others think otherwise. > > This patch needs to be committed in master,PG16 and PG15. > This is not required from PG14 onwards because they don't have > HasSubscriptionRelations call before updating table_states_valid: > /* > * Does the subscription have tables? > * > * If there were not-READY relations found then we know it does. But > * if table_states_not_ready was empty we still need to check again to > * see if there are 0 tables. > */ > has_subrels = (table_states_not_ready != NIL) || > HasSubscriptionRelations(MySubscription->oid); > > So the invalidation function will not be called here. > > Whereas for PG15 and newer versions this is applicable: > HasSubscriptionRelations calls table_open function which will get the > invalidate callback like in: > HasSubscriptionRelations -> table_open -> relation_open -> LockRelationOid > -> AcceptInvalidationMessages -> ReceiveSharedInvalidMessages -> > LocalExecuteInvalidationMessage -> CallSyscacheCallbacks -> > invalidate_syncing_table_states > > The attached patch > v7-0001-Table-sync-missed-due-to-race-condition-in-subscr.patch > applies for master and PG16 branch, > v7-0001-Table-sync-missed-due-to-race-condition-in-subscr_PG15.patch > applies for PG15 branch. Thanks, I have verified that the patches can be applied cleanly and fix the issue on each branch. The regression test can also pass after applying the patch on my machine. Best Regards, Hou zj
RE: promotion related handling in pg_sync_replication_slots()
On Friday, April 19, 2024 4:22 PM shveta malik wrote: > On Fri, Apr 19, 2024 at 11:37 AM shveta malik wrote: > > > > On Fri, Apr 19, 2024 at 10:53 AM Bertrand Drouvot > > wrote: > > > > > > Hi, > > > > > > On Thu, Apr 18, 2024 at 05:36:05PM +0530, shveta malik wrote: > > > > Please find v8 attached. Changes are: > > > > > > Thanks! > > > > > > A few comments: > > > > Thanks for reviewing. > > > > > 1 === > > > > > > @@ -1440,7 +1461,7 @@ ReplSlotSyncWorkerMain(char *startup_data, > size_t startup_data_len) > > > * slotsync_worker_onexit() but that will need the connection to > > > be > made > > > * global and we want to avoid introducing global for this > > > purpose. > > > */ > > > - before_shmem_exit(slotsync_failure_callback, > PointerGetDatum(wrconn)); > > > + before_shmem_exit(slotsync_worker_disconnect, > > > + PointerGetDatum(wrconn)); > > > > > > The comment above this change still states "Register the failure > > > callback once we have the connection", I think it has to be reworded > > > a bit now that v8 is making use of slotsync_worker_disconnect(). > > > > > > 2 === > > > > > > +* Register slotsync_worker_onexit() before we register > > > +* ReplicationSlotShmemExit() in BaseInit(), to ensure that during > exit of > > > +* slot sync worker, ReplicationSlotShmemExit() is called first, > followed > > > +* by slotsync_worker_onexit(). Startup process during > > > + promotion waits for > > > > > > Worth to mention in shmem_exit() (where it "while > (--before_shmem_exit_index >= 0)" > > > or before the shmem_exit() definition) that ReplSlotSyncWorkerMain() > > > relies on this LIFO behavior? (not sure if there is other "strong" > > > LIFO requirement in other part of the code). > > > > I see other modules as well relying on LIFO behavior. > > Please see applyparallelworker.c where > > 'before_shmem_exit(pa_shutdown)' is needed to be done after > > 'before_shmem_exit(logicalrep_worker_onexit)' (commit id 3d144c6). > > Also in postinit.c, I see such comments atop > > 'before_shmem_exit(ShutdownPostgres, 0)'. > > I feel we can skip adding this specific comment about > > ReplSlotSyncWorkerMain() in ipc.c, as none of the other modules has > > also not added any. I will address the rest of your comments in the > > next version. > > > > > 3 === > > > > > > +* Startup process during promotion waits for slot sync to finish > and it > > > +* does that by checking the 'syncing' flag. > > > > > > worth to mention ShutDownSlotSync()? > > > > > > 4 === > > > > > > I did a few tests manually (launching ShutDownSlotSync() through gdb > > > / with and without sync worker and with / without > > > pg_sync_replication_slots() running > > > concurrently) and it looks like it works as designed. > > > > Thanks for testing it. > > > > > Having said that, the logic that is in place to take care of the > > > corner cases described up-thread seems reasonable to me. > > Please find v9 with the above comments addressed. Thanks, the patch looks good to me. I also tested a few concurrent promotion/function execution cases and didn't find issues. Best Regards, Hou zj
RE: Disallow changing slot's failover option in transaction block
On Friday, April 19, 2024 10:54 AM Kuroda, Hayato/黒田 隼人 wrote: > In your patch, the pg_dump.c was updated. IIUC the main reason was that > pg_restore executes some queries as single transactions so that ALTER > SUBSCRIPTION cannot be done, right? Yes, and please see below for other reasons. > Also, failover was synchronized only when we were in the upgrade mode, but > your patch seems to remove the condition. Can you clarify the reason? We used ALTER SUBSCRIPTION in upgrade mode because it was not allowed to use connect=false and failover=true together when CREATE SUBSCRIPTION. But since we don't have this restriction anymore(we don't alter slot when creating sub anymore), we can directly specify failover in CREATE SUBSCRIPTION and do that in non-upgrade mode as well. Attach the V3 patch which also addressed Shveta[1] and Bertrand[2]'s comments. [1] https://www.postgresql.org/message-id/CAJpy0uD3YOeDg-tTCUi3EZ8vznRDfDqO_k6LepJpXUV1Z_%3DgkA%40mail.gmail.com [2] https://www.postgresql.org/message-id/ZiIxuaiINsuaWuDK%40ip-10-97-1-34.eu-west-3.compute.internal Best Regards, Hou zj v3-0001-Fix-the-handling-of-failover-option-in-subscripti.patch Description: v3-0001-Fix-the-handling-of-failover-option-in-subscripti.patch
RE: Disallow changing slot's failover option in transaction block
On Thursday, April 18, 2024 1:52 PM Amit Kapila wrote: > > On Tue, Apr 16, 2024 at 5:06 PM shveta malik > wrote: > > > > On Tue, Apr 16, 2024 at 1:45 PM Hayato Kuroda (Fujitsu) > > wrote: > > > > > > Dear Hou, > > > > > > > Kuroda-San reported an issue off-list that: > > > > > > > > If user execute ALTER SUBSCRIPTION SET (failover) command inside a > > > > txn block and rollback, only the subscription option change can be > > > > rolled back, while the replication slot's failover change is preserved. > > > > > > > > This is because ALTER SUBSCRIPTION SET (failover) command > > > > internally executes the replication command ALTER_REPLICATION_SLOT > > > > to change the replication slot's failover property, but this > > > > replication command execution cannot be rollback. > > > > > > > > To fix it, I think we can prevent user from executing ALTER > > > > SUBSCRIPTION set > > > > (failover) inside a txn block, which is also consistent to the > > > > ALTER SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach > a > > > > small patch to address this. > > > > > > Thanks for posting the patch, the fix is same as my expectation. > > > Also, should we add the restriction to the doc? I feel [1] can be updated. > > > > +1. > > > > Similar to ALTER SUB, CREATE SUB also needs to be fixed. This is > > because we call alter_replication_slot in CREATE SUB as well, for the > > case when slot_name is provided and create_slot=false. But the tricky > > part is we call alter_replication_slot() when creating subscription > > for both failover=true and false. That means if we want to fix it on > > the similar line of ALTER SUB, we have to disallow user from executing > > the CREATE SUBSCRIPTION (slot_name = xx) in a txn block, which seems > > to break some existing use cases. (previously, user can execute such a > > command inside a txn block). > > > > So, we need to think if there are better ways to fix it. After > > discussion with Hou-San offlist, here are some ideas: > > > > 1. do not alter replication slot's failover option when CREATE > > SUBSCRIPTION WITH failover=false. This means we alter replication > > slot only when failover is set to true. And thus we can disallow > > CREATE SUB WITH (slot_name =xx, failover=true, create_slot=false) > > inside a txn block. > > > > This option allows user to run CREATE-SUB(create_slot=false) with > > failover=false in txn block like earlier. But on the downside, it > > makes the behavior inconsistent for otherwise simpler option like > > failover, i.e. with failover=true, CREATE SUB is not allowed in txn > > block while with failover=false, it is allowed. It makes it slightly > > complex to be understood by user. > > > > 2. let's not disallow CREATE SUB in txn block as earlier, just don't > > perform internal alter-failover during CREATE SUB for existing > > slots(create_slot=false, slot_name=xx) i.e. when create_slot is > > false, we will ignore failover parameter of CREATE SUB and it is > > user's responsibility to set it appropriately using ALTER SUB cmd. For > > create_slot=true, the behaviour of CREATE-SUB remains same as earlier. > > > > This option does not add new restriction for CREATE SUB wrt txn block. > > In context of failover with create_slot=false, we already have a > > similar restriction (documented one) for ALTER SUB, i.e. with 'ALTER > > SUBSCRIPTION SET(slot_name = new)', user needs to alter the new slot's > > failover by himself. CREAT SUB can also be documented in similar way. > > This seems simpler to be understood considering existing ALTER SUB's > > behavior as well. Plus, this will make CREATE-SUB code slightly > > simpler and thus easily manageable. > > > > +1 for option 2 as it sounds logical to me and consistent with ALTER > SUBSCRIPTION. +1. Here is V2 patch which includes the changes for CREATE SUBSCRIPTION as suggested. Since we don't connect pub to alter slot when (create_slot=false) anymore, the restriction that disallows failover=true when connect=false is also removed. Best Regards, Hou zj v2-0001-Fix-the-handling-of-failover-option-in-subscripti.patch Description: v2-0001-Fix-the-handling-of-failover-option-in-subscripti.patch
RE: promotion related handling in pg_sync_replication_slots()
On Tuesday, April 16, 2024 2:52 PM Bertrand Drouvot wrote: Hi, > On Tue, Apr 16, 2024 at 10:00:04AM +0530, shveta malik wrote: > > Please find v5 addressing above comments. > > Thanks! > > @@ -1634,9 +1677,14 @@ SyncReplicationSlots(WalReceiverConn *wrconn) { > PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, > PointerGetDatum(wrconn)); > { > + check_flags_and_set_sync_info(InvalidPid); > + > > Given the fact that if the sync worker is running it won't be possible to > trigger a > manual sync with pg_sync_replication_slots(), what about also checking the > "sync_replication_slots" value at the start of SyncReplicationSlots() and > emmit > an error if sync_replication_slots is set to on? (The message could explicitly > states that it's not possible to use the function if sync_replication_slots > is set to > on). I personally feel adding the additional check for sync_replication_slots may not improve the situation here. Because the GUC sync_replication_slots can change at any point, the GUC could be false when performing this addition check and is set to true immediately after the check, so It could not simplify the logic anyway. Best Regards, Hou zj
RE: Race condition in FetchTableStates() breaks synchronization of subscription tables
On Wednesday, March 13, 2024 11:49 AM vignesh C wrote: > > On Tue, 12 Mar 2024 at 09:34, Ajin Cherian wrote: > > > > > > > > On Tue, Mar 12, 2024 at 2:59 PM vignesh C wrote: > >> > >> > >> Thanks, I have created the following Commitfest entry for this: > >> https://commitfest.postgresql.org/47/4816/ > >> > >> Regards, > >> Vignesh > > > > > > Thanks for the patch, I have verified that the fix works well by following > > the > steps mentioned to reproduce the problem. > > Reviewing the patch, it seems good and is well documented. Just one minor > comment I had was probably to change the name of the variable > table_states_valid to table_states_validity. The current name made sense when > it was a bool, but now that it is a tri-state enum, it doesn't fit well. > > Thanks for reviewing the patch, the attached v6 patch has the changes for the > same. FYI, I noticed that there is one open item on https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items which is related to the fix in this thread. -- Intermittent failures in 040_standby_failover_slots_sync test Possible solution in this thread: Race condition in FetchTableStates -- AFAICS, the bug discussed here is not a new issue on PG17, so I am thinking to move the item to the "Older bugs affecting stable branches" section if no objections. Best Regards, Hou zj
Disallow changing slot's failover option in transaction block
Hi, Kuroda-San reported an issue off-list that: If user execute ALTER SUBSCRIPTION SET (failover) command inside a txn block and rollback, only the subscription option change can be rolled back, while the replication slot's failover change is preserved. This is because ALTER SUBSCRIPTION SET (failover) command internally executes the replication command ALTER_REPLICATION_SLOT to change the replication slot's failover property, but this replication command execution cannot be rollback. To fix it, I think we can prevent user from executing ALTER SUBSCRIPTION set (failover) inside a txn block, which is also consistent to the ALTER SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach a small patch to address this. Best Regards, Hou Zhijie v1-0001-Disallow-alter-subscription-s-failover-option-ins.patch Description: v1-0001-Disallow-alter-subscription-s-failover-option-ins.patch
RE: promotion related handling in pg_sync_replication_slots()
On Monday, April 15, 2024 6:09 PM shveta malik wrote: > > Please find v4 addressing the above comments. Thanks for the patch. Here are few comments: 1. + ereport(ERROR, + errmsg("promotion in progress, can not synchronize replication slots")); + } I think an errcode is needed. The style of the error message seems a bit unnatural to me. I suggest: "cannot synchronize replication slots when standby promotion is ongoing" 2. + if (worker_pid != InvalidPid) + Assert(SlotSyncCtx->pid == InvalidPid); We could merge the checks into one Assert(). Assert(SlotSyncCtx->pid == InvalidPid || worker_pid == InvalidPid); 3. - pqsignal(SIGINT, SignalHandlerForShutdownRequest); I realized that we should register this before setting SlotSyncCtx->pid, otherwise if the standby is promoted after setting pid and before registering signal handle function, the slotsync worker could miss to handle SIGINT sent by startup process(ShutDownSlotSync). This is an existing issue for slotsync worker, but maybe we could fix it together with the patch. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Friday, April 12, 2024 11:31 AM Amit Kapila wrote: > > On Thu, Apr 11, 2024 at 5:04 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Thursday, April 11, 2024 12:11 PM Amit Kapila > wrote: > > > > > > > > 2. > > > - if (remote_slot->restart_lsn < slot->data.restart_lsn) > > > + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush) > > > elog(ERROR, > > > "cannot synchronize local slot \"%s\" LSN(%X/%X)" > > > > > > Can we be more specific in this message? How about splitting it into > > > error_message as "cannot synchronize local slot \"%s\"" and then > > > errdetail as "Local slot's start streaming location LSN(%X/%X) is > > > ahead of remote slot's LSN(%X/%X)"? > > > > Your version looks better. Since the above two messages all have > > errdetail, I used the style of ereport(ERROR, errmsg_internal(), > > errdetail_internal()... in the patch which is equal to the elog(ERROR but > > has an > additional detail message. > > > > makes sense. > > > Here is V5 patch set. > > > > I think we should move the check to not advance slot when one of > remote_slot's restart_lsn or catalog_xmin is lesser than the local slot inside > update_local_synced_slot() as we want to prevent updating slot in those cases > even during slot synchronization. Agreed. Here is the V6 patch which addressed this. I have merged the two patches into one. Best Regards, Hou zj v6-0001-Fix-the-handling-of-LSN-and-xmin-in-slot-sync.patch Description: v6-0001-Fix-the-handling-of-LSN-and-xmin-in-slot-sync.patch
RE: Synchronizing slots from primary to standby
On Thursday, April 11, 2024 12:11 PM Amit Kapila wrote: > > On Wed, Apr 10, 2024 at 5:28 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Thursday, April 4, 2024 5:37 PM Amit Kapila > wrote: > > > > > > BTW, while thinking on this one, I > > > noticed that in the function LogicalConfirmReceivedLocation(), we > > > first update the disk copy, see comment [1] and then in-memory > > > whereas the same is not true in > > > update_local_synced_slot() for the case when snapshot exists. Now, > > > do we have the same risk here in case of standby? Because I think we > > > will use these xmins while sending the feedback message (in > XLogWalRcvSendHSFeedback()). > > > > > > * We have to write the changed xmin to disk *before* we change > > > * the in-memory value, otherwise after a crash we wouldn't know > > > * that some catalog tuples might have been removed already. > > > > Yes, I think we have the risk on the standby, I can reproduce the case > > that if the server crashes after updating the in-memory value and > > before saving them to disk, the synced slot could be invalidated after > > restarting from crash, because the necessary rows have been removed on > > the primary. The steps can be found in [1]. > > > > I think we'd better fix the order in update_local_synced_slot() as > > well. I tried to make the fix in 0002, 0001 is Shveta's patch to fix > > another issue in this thread. Since they are touching the same function, so > attach them together for review. > > > > Few comments: > === > 1. > + > + /* Sanity check */ > + if (slot->data.confirmed_flush != remote_slot->confirmed_lsn) > + ereport(LOG, errmsg("synchronized confirmed_flush for slot \"%s\" > + differs from > remote slot", > +remote_slot->name), > > Is there a reason to use elevel as LOG instead of ERROR? I think it should be > elog(ERROR, .. as this is an unexpected case. Agreed. > > 2. > - if (remote_slot->restart_lsn < slot->data.restart_lsn) > + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush) > elog(ERROR, > "cannot synchronize local slot \"%s\" LSN(%X/%X)" > > Can we be more specific in this message? How about splitting it into > error_message as "cannot synchronize local slot \"%s\"" and then errdetail as > "Local slot's start streaming location LSN(%X/%X) is ahead of remote slot's > LSN(%X/%X)"? Your version looks better. Since the above two messages all have errdetail, I used the style of ereport(ERROR, errmsg_internal(), errdetail_internal()... in the patch which is equal to the elog(ERROR but has an additional detail message. Here is V5 patch set. Best Regards, Hou zj v5-0002-write-the-changed-xmin-to-disk-before-computing-g.patch Description: v5-0002-write-the-changed-xmin-to-disk-before-computing-g.patch v5-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch Description: v5-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch
RE: Synchronizing slots from primary to standby
On Thursday, April 4, 2024 5:37 PM Amit Kapila wrote: > > BTW, while thinking on this one, I > noticed that in the function LogicalConfirmReceivedLocation(), we first update > the disk copy, see comment [1] and then in-memory whereas the same is not > true in > update_local_synced_slot() for the case when snapshot exists. Now, do we have > the same risk here in case of standby? Because I think we will use these xmins > while sending the feedback message (in XLogWalRcvSendHSFeedback()). > > * We have to write the changed xmin to disk *before* we change > * the in-memory value, otherwise after a crash we wouldn't know > * that some catalog tuples might have been removed already. Yes, I think we have the risk on the standby, I can reproduce the case that if the server crashes after updating the in-memory value and before saving them to disk, the synced slot could be invalidated after restarting from crash, because the necessary rows have been removed on the primary. The steps can be found in [1]. I think we'd better fix the order in update_local_synced_slot() as well. I tried to make the fix in 0002, 0001 is Shveta's patch to fix another issue in this thread. Since they are touching the same function, so attach them together for review. [1] -- Primary: SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot', 'test_decoding', false, false, true); -- Standby: SELECT 'init' FROM pg_create_logical_replication_slot('standbylogicalslot', 'test_decoding', false, false, false); SELECT pg_sync_replication_slots(); -- Primary: CREATE TABLE test (a int); INSERT INTO test VALUES(1); DROP TABLE test; SELECT txid_current(); SELECT txid_current(); SELECT txid_current(); SELECT pg_log_standby_snapshot(); SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn()); -- Standby: - wait for standby to replay all the changes on the primary. - this is to serialize snapshots. SELECT pg_replication_slot_advance('standbylogicalslot', pg_last_wal_replay_lsn()); - Use gdb to stop at the place after calling ReplicationSlotsComputexx() functions and before calling ReplicationSlotSave(). SELECT pg_sync_replication_slots(); -- Primary: - First, wait for the primary slot(the physical slot)'s catalog xmin to be updated to the same as the failover slot. VACUUM FULL; - Wait for VACUMM FULL to be replayed on standby. -- Standby: - For the process which is blocked by gdb, let the process crash (elog(PANIC, ...)). After restarting the standby from crash, we can see the synced slot is invalidated. LOG: invalidating obsolete replication slot "logicalslot" DETAIL: The slot conflicted with xid horizon 741. CONTEXT: WAL redo at 0/3059B90 for Heap2/PRUNE_ON_ACCESS: snapshotConflictHorizon: 741, isCatalogRel: T, nplans: 0, nredirected: 0, ndead: 7, nunused: 0, dead: [22, 23, 24, 25, 26, 27, 28]; blkref #0: rel 1663/5/1249, blk 16 Best Regards, Hou zj v4-0002-write-the-changed-xmin-to-disk-before-computing-g.patch Description: v4-0002-write-the-changed-xmin-to-disk-before-computing-g.patch v4-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch Description: v4-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch
RE: Synchronizing slots from primary to standby
On Thursday, April 4, 2024 4:25 PM Masahiko Sawada wrote: Hi, > On Wed, Apr 3, 2024 at 7:06 PM Amit Kapila > wrote: > > > > On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila > wrote: > > > > > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy > > > wrote: > > > > > > > I quickly looked at v8, and have a nit, rest all looks good. > > > > > > > > +if (DecodingContextReady(ctx) && > found_consistent_snapshot) > > > > +*found_consistent_snapshot = true; > > > > > > > > Can the found_consistent_snapshot be checked first to help avoid > > > > the function call DecodingContextReady() for > > > > pg_replication_slot_advance callers? > > > > > > > > > > Okay, changed. Additionally, I have updated the comments and commit > > > message. I'll push this patch after some more testing. > > > > > > > Pushed! > > While testing this change, I realized that it could happen that the server > logs > are flooded with the following logical decoding logs that are written every > 200 > ms: Thanks for reporting! > > 2024-04-04 16:15:19.270 JST [3838739] LOG: starting logical decoding for slot > "test_sub" > 2024-04-04 16:15:19.270 JST [3838739] DETAIL: Streaming transactions > committing after 0/50006F48, reading WAL from 0/50006F10. > 2024-04-04 16:15:19.270 JST [3838739] LOG: logical decoding found > consistent point at 0/50006F10 > 2024-04-04 16:15:19.270 JST [3838739] DETAIL: There are no running > transactions. > 2024-04-04 16:15:19.477 JST [3838739] LOG: starting logical decoding for slot > "test_sub" > 2024-04-04 16:15:19.477 JST [3838739] DETAIL: Streaming transactions > committing after 0/50006F48, reading WAL from 0/50006F10. > 2024-04-04 16:15:19.477 JST [3838739] LOG: logical decoding found > consistent point at 0/50006F10 > 2024-04-04 16:15:19.477 JST [3838739] DETAIL: There are no running > transactions. > > For example, I could reproduce it with the following steps: > > 1. create the primary and start. > 2. run "pgbench -i -s 100" on the primary. > 3. run pg_basebackup to create the standby. > 4. configure slotsync setup on the standby and start. > 5. create a publication for all tables on the primary. > 6. create the subscriber and start. > 7. run "pgbench -i -Idtpf" on the subscriber. > 8. create a subscription on the subscriber (initial data copy will start). > > The logical decoding logs were written every 200 ms during the initial data > synchronization. > > Looking at the new changes for update_local_synced_slot(): ... > We call LogicalSlotAdvanceAndCheckSnapState() if one of confirmed_lsn, > restart_lsn, and catalog_xmin is different between the remote slot and the > local > slot. In my test case, during the initial sync performing, only catalog_xmin > was > different and there was no serialized snapshot at restart_lsn, and the > slotsync > worker called LogicalSlotAdvanceAndCheckSnapState(). However no slot > properties were changed even after the function and it set slot_updated = > true. > So it starts the next slot synchronization after 200ms. I was trying to reproduce this and check why the catalog_xmin is different among synced slot and remote slot, but I was not able to reproduce the case where there are lots of logical decoding logs. The script I used is attached. Would it be possible for you to share the script you used to reproduce this issue? Alternatively, could you please share the log files from both the primary and standby servers after reproducing the problem (it would be greatly helpful if you could set the log level to DEBUG2). Best Regards, Hou zj test.sh Description: test.sh
RE: Synchronizing slots from primary to standby
On Monday, April 8, 2024 6:32 PM Amit Kapila wrote: > > On Mon, Apr 8, 2024 at 12:19 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Saturday, April 6, 2024 12:43 PM Amit Kapila > wrote: > > > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot > > > wrote: > > > > > > Yeah, that could be the first step. We can probably add an injection > > > point to control the bgwrite behavior and then add tests involving > > > walsender performing the decoding. But I think it is important to > > > have sufficient tests in this area as I see they are quite helpful in > > > uncovering > the issues. > > > > Here is the patch to drop the subscription in the beginning so that > > the restart_lsn of the lsub1_slot won't be advanced due to concurrent > > xl_running_xacts from bgwriter. The subscription will be re-created > > after all the slots are sync-ready. I think maybe we can use this to > > stabilize the test as a first step and then think about how to make > > use of injection point to add more tests if it's worth it. > > > > Pushed. Thanks for pushing. I checked the BF status, and noticed one BF failure, which I think is related to a miss in the test code. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-04-08%2012%3A04%3A27 From the following log, I can see the sync failed because the standby is lagging behind of the failover slot. - # No postmaster PID for node "cascading_standby" error running SQL: 'psql::1: ERROR: skipping slot synchronization as the received slot sync LSN 0/4000148 for slot "snap_test_slot" is ahead of the standby position 0/4000114' while running 'psql -XAtq -d port=50074 host=/tmp/t4HQFlrDmI dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT pg_sync_replication_slots();' at /home/bf/bf-build/adder/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm line 2042. # Postmaster PID for node "publisher" is 3715298 - I think it's because we missed to call wait_for_replay_catchup before syncing slots. - $primary->safe_psql('postgres', "SELECT pg_create_logical_replication_slot('snap_test_slot', 'test_decoding', false, false, true);" ); # ? missed to wait here $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();"); - While testing, I noticed another place where we were calling wait_for_replay_catchup before doing pg_replication_slot_advance, which also has a small possibility to cause the failover slot to be ahead of the standby if some logs are written in between these two steps. So, I adjusted them together. Here is a small patch to improve the test. Best Regards, Hou zj 0001-Ensure-the-standby-is-not-lagging-behind-the-failove.patch Description: 0001-Ensure-the-standby-is-not-lagging-behind-the-failove.patch
RE: Synchronizing slots from primary to standby
On Saturday, April 6, 2024 12:43 PM Amit Kapila wrote: > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot > wrote: > > > > On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote: > > > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila > wrote: > > > Thinking more on this, it doesn't seem related to > > > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't > > > change any locking or something like that which impacts write positions. > > > > Agree. > > > > > I think what has happened here is that running_xact record written > > > by the background writer [1] is not written to the kernel or disk > > > (see LogStandbySnapshot()), before pg_current_wal_lsn() checks the > > > current_lsn to be compared with replayed LSN. > > > > Agree, I think it's not visible through pg_current_wal_lsn() yet. > > > > Also I think that the DEBUG message in LogCurrentRunningXacts() > > > > " > > elog(DEBUG2, > > "snapshot of %d+%d running transaction ids (lsn %X/%X oldest > xid %u latest complete %u next xid %u)", > > CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt, > > LSN_FORMAT_ARGS(recptr), > > CurrRunningXacts->oldestRunningXid, > > CurrRunningXacts->latestCompletedXid, > > CurrRunningXacts->nextXid); " > > > > should be located after the XLogSetAsyncXactLSN() call. Indeed, the > > new LSN is visible after the spinlock (XLogCtl->info_lck) in > > XLogSetAsyncXactLSN() is released, > > > > I think the new LSN can be visible only when the corresponding WAL is written > by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can make it > visible. In your experiment below, isn't it possible that in the meantime WAL > writer has written that WAL due to which you are seeing an updated location? > > >see: > > > > \watch on Session 1 provides: > > > > pg_current_wal_lsn > > > > 0/87D110 > > (1 row) > > > > Until: > > > > Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at > xlog.c:2579 > > 2579XLogRecPtr WriteRqstPtr = asyncXactLSN; > > (gdb) n > > 2581boolwakeup = false; > > (gdb) > > 2584SpinLockAcquire(>info_lck); > > (gdb) > > 2585RefreshXLogWriteResult(LogwrtResult); > > (gdb) > > 2586sleeping = XLogCtl->WalWriterSleeping; > > (gdb) > > 2587prevAsyncXactLSN = XLogCtl->asyncXactLSN; > > (gdb) > > 2588if (XLogCtl->asyncXactLSN < asyncXactLSN) > > (gdb) > > 2589XLogCtl->asyncXactLSN = asyncXactLSN; > > (gdb) > > 2590SpinLockRelease(>info_lck); > > (gdb) p p/x (uint32) XLogCtl->asyncXactLSN > > $1 = 0x87d148 > > > > Then session 1 provides: > > > > pg_current_wal_lsn > > > > 0/87D148 > > (1 row) > > > > So, when we see in the log: > > > > 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG: > > snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740 > > latest complete 739 next xid 740) > > 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG: > statement: SELECT '0/360' <= replay_lsn AND state = 'streaming' > > > > It's indeed possible that the new LSN was not visible yet (spinlock > > not released?) before the query began (because we can not rely on the > > time the DEBUG message has been emitted). > > > > > Note that the reason why > > > walsender has picked the running_xact written by background writer > > > is because it has checked after pg_current_wal_lsn() query, see LOGs [2]. > > > I think we can probably try to reproduce manually via debugger. > > > > > > If this theory is correct > > > > It think it is. > > > > > then I think we will need to use injection points to control the > > > behavior of bgwriter or use the slots created via SQL API for > > > syncing in tests. > > > > > > Thoughts? > > > > I think that maybe as a first step we should move the "elog(DEBUG2," > > message as proposed above to help debugging (that could help to confirm > the above theory). > > > > I think I am missing how exactly moving DEBUG2 can confirm the above theory. > > > If the theory is proven then I'm not sure we need the extra complexity > > of injection point here, maybe just relying on the slots created via > > SQL API could be enough. > > > > Yeah, that could be the first step. We can probably add an injection point to > control the bgwrite behavior and then add tests involving walsender > performing the decoding. But I think it is important to have sufficient tests > in > this area as I see they are quite helpful in uncovering the issues. Here is the patch to drop the subscription in the beginning so that the restart_lsn of the lsub1_slot won't be advanced due to concurrent xl_running_xacts from bgwriter. The subscription will be re-created after all the slots are sync-ready. I think maybe we can use this to stabilize the test as a first step and then think about how to make use of injection point
RE: Synchronizing slots from primary to standby
On Tuesday, April 2, 2024 8:49 PM Bharath Rupireddy wrote: > > On Tue, Apr 2, 2024 at 2:11 PM Zhijie Hou (Fujitsu) > wrote: > > > > CFbot[1] complained about one query result's order in the tap-test, so I am > > attaching a V7 patch set which fixed this. There are no changes in 0001. > > > > [1] https://cirrus-ci.com/task/6375962162495488 > > Thanks. Here are some comments: Thanks for the comments. > > 1. Can we just remove pg_logical_replication_slot_advance and use > LogicalSlotAdvanceAndCheckSnapState instead? If worried about the > function naming, LogicalSlotAdvanceAndCheckSnapState can be renamed to > pg_logical_replication_slot_advance? > > + * Advance our logical replication slot forward. See > + * LogicalSlotAdvanceAndCheckSnapState for details. > */ > static XLogRecPtr > pg_logical_replication_slot_advance(XLogRecPtr moveto) > { It was commented[1] that it's not appropriate for the pg_logical_replication_slot_advance to have an out parameter 'ready_for_decoding' which looks bit awkward as the functionality doesn't match the name, and is also not consistent with the style of pg_physical_replication_slot_advance(). So, we added a new function. > > 2. > +if (!ready_for_decoding) > +{ > +elog(DEBUG1, "could not find consistent point for synced > slot; restart_lsn = %X/%X", > + LSN_FORMAT_ARGS(slot->data.restart_lsn)); > > Can we specify the slot name in the message? Added. > > 3. Also, can the "could not find consistent point for synced slot; > restart_lsn = %X/%X" be emitted at LOG level just like other messages > in update_and_persist_local_synced_slot. Although, I see "XXX should > this be changed to elog(DEBUG1) perhaps?", these messages need to be > at LOG level as they help debug issues if at all they are hit. Changed to LOG and reworded the message. > > 4. How about using found_consistent_snapshot instead of > ready_for_decoding? A general understanding is that the synced slots > are not allowed for decoding (although with this fix, we do that for > internal purposes), ready_for_decoding looks a bit misleading. Agreed and renamed. > > 5. As far as the test case for this issue is concerned, I'm fine with > adding one using an INJECTION point because we seem to be having no > consistent way to control postgres writing current snapshot to WAL. Since me and my colleagues can reproduce the issue consistently after applying 0002 and it could be rare for concurrent xl_running_xacts to happen, we discussed[2] to consider adding the INJECTION point after pushing the main fix. > > 6. A nit: can we use "fast_forward mode" instead of "fast-forward > mode" just to be consistent? > + * logical changes unless we are in fast-forward mode where no changes > are > > 7. > +/* > + * We need to access the system tables during decoding to build the > + * logical changes unless we are in fast-forward mode where no changes > are > + * generated. > + */ > +if (slot->data.database != MyDatabaseId && !fast_forward) > > May I know if we need this change for this fix? The slotsync worker needs to advance the slots from different databases in fast_forward. So, we need to skip this check in fast_forward mode. The analysis can be found in [3]. Attach the V8 patch which addressed above comments. [1] https://www.postgresql.org/message-id/CAA4eK1%2BwkaRi2BrLLC_0gKbHN68Awc9dRp811G3An6A6fuqdOg%40mail.gmail.com [2] https://www.postgresql.org/message-id/ZgvI9iAUWCZ17z5V%40ip-10-97-1-34.eu-west-3.compute.internal [3] https://www.postgresql.org/message-id/CAJpy0uCQ2PDCAqcnbdOz6q_ZqmBfMyBpVqKDqL_XZBP%3DeK-1yw%40mail.gmail.com Best Regards, Hou zj v8-0002-test-the-data-loss-case.patch Description: v8-0002-test-the-data-loss-case.patch v8-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch Description: v8-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
RE: Synchronizing slots from primary to standby
On Tuesday, April 2, 2024 3:21 PM Zhijie Hou (Fujitsu) wrote: > On Tuesday, April 2, 2024 8:35 AM Zhijie Hou (Fujitsu) > wrote: > > > > On Monday, April 1, 2024 7:30 PM Amit Kapila > > wrote: > > > > > > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu) > > > > > > wrote: > > > > > > > > On Friday, March 29, 2024 2:50 PM Amit Kapila > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > 2. > > > > > +extern XLogRecPtr > > > > > +pg_logical_replication_slot_advance(XLogRecPtr > > > moveto, > > > > > + bool *found_consistent_point); > > > > > + > > > > > > > > > > This API looks a bit awkward as the functionality doesn't match > > > > > the name. How about having a function with name > > > > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto, > > > > > ready_for_decoding) with the same functionality as your patch > > > > > has for > > > > > pg_logical_replication_slot_advance() and then invoke it both > > > > > from pg_logical_replication_slot_advance and slotsync.c. The > > > > > function name is too big, we can think of a shorter name. Any ideas? > > > > > > > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just > > > > LogicalSlotAdvanceAndCheckDecoding()? > > > > > > > > > > It is about snapbuild state, so how about naming the function as > > > LogicalSlotAdvanceAndCheckSnapState()? > > > > It looks better to me, so changed. > > > > > > > > I have made quite a few cosmetic changes in comments and code. See > > > attached. This is atop your latest patch. Can you please review and > > > include these changes in the next version? > > > > Thanks, I have reviewed and merged them. > > Attach the V5 patch set which addressed above comments and ran pgindent. > > I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which > can > reproduce the data loss issue consistently on my machine. It may not > reproduce in some rare cases if concurrent xl_running_xacts are written by > bgwriter, but I think it's still valuable if it can verify the fix in most > cases. The test > will fail if directly applied on HEAD, and will pass after applying atop of > 0001. CFbot[1] complained about one query result's order in the tap-test, so I am attaching a V7 patch set which fixed this. There are no changes in 0001. [1] https://cirrus-ci.com/task/6375962162495488 Best Regards, Hou zj v7-0002-test-the-data-loss-case.patch Description: v7-0002-test-the-data-loss-case.patch v7-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch Description: v7-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
RE: Synchronizing slots from primary to standby
On Tuesday, April 2, 2024 8:35 AM Zhijie Hou (Fujitsu) wrote: > > On Monday, April 1, 2024 7:30 PM Amit Kapila > wrote: > > > > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu) > > > > wrote: > > > > > > On Friday, March 29, 2024 2:50 PM Amit Kapila > > > > > wrote: > > > > > > > > > > > > > > > > > > > 2. > > > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr > > moveto, > > > > + bool *found_consistent_point); > > > > + > > > > > > > > This API looks a bit awkward as the functionality doesn't match > > > > the name. How about having a function with name > > > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto, > > > > ready_for_decoding) with the same functionality as your patch has > > > > for > > > > pg_logical_replication_slot_advance() and then invoke it both from > > > > pg_logical_replication_slot_advance and slotsync.c. The function > > > > name is too big, we can think of a shorter name. Any ideas? > > > > > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just > > > LogicalSlotAdvanceAndCheckDecoding()? > > > > > > > It is about snapbuild state, so how about naming the function as > > LogicalSlotAdvanceAndCheckSnapState()? > > It looks better to me, so changed. > > > > > I have made quite a few cosmetic changes in comments and code. See > > attached. This is atop your latest patch. Can you please review and > > include these changes in the next version? > > Thanks, I have reviewed and merged them. > Attach the V5 patch set which addressed above comments and ran pgindent. I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which can reproduce the data loss issue consistently on my machine. It may not reproduce in some rare cases if concurrent xl_running_xacts are written by bgwriter, but I think it's still valuable if it can verify the fix in most cases. The test will fail if directly applied on HEAD, and will pass after applying atop of 0001. Best Regards, Hou zj v6-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch Description: v6-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch v6-0002-test-the-data-loss-case.patch Description: v6-0002-test-the-data-loss-case.patch
RE: Synchronizing slots from primary to standby
On Tuesday, April 2, 2024 8:43 AM Bharath Rupireddy wrote: > > On Mon, Apr 1, 2024 at 11:36 AM Zhijie Hou (Fujitsu) > wrote: > > > > Attach the V4 patch which includes the optimization to skip the > > decoding if the snapshot at the syncing restart_lsn is already > > serialized. It can avoid most of the duplicate decoding in my test, and I am > doing some more tests locally. > > Thanks for the patch. I'm thinking if we can reduce the amount of work that we > do for synced slots in each sync worker cycle. With that context in mind, why > do > we need to create decoding context every time? > Can't we create it once, store it in an in-memory structure and use it for > each > sync worker cycle? Is there any problem with it? What do you think? Thanks for the idea. I think the cost of decoding context seems to be relatively minor when compared to the IO cost. After generating the profiles for the tests shared by Nisha[1], it appears that the StartupDecodingContext is not a issue. While the suggested refactoring is an option, I think we can consider this as a future improvement and addressing it only if we encounter scenarios where the creation of decoding context becomes a bottleneck. [1] https://www.postgresql.org/message-id/CALj2ACUeij5tFzJ1-cuoUh%2Bmhj33v%2BYgqD_gHYUpRdXSCSBbhw%40mail.gmail.com Best Regards, Hou zj <>
RE: Synchronizing slots from primary to standby
On Monday, April 1, 2024 7:30 PM Amit Kapila wrote: > > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu) > wrote: > > > > On Friday, March 29, 2024 2:50 PM Amit Kapila > wrote: > > > > > > > > > > > > > > 2. > > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr > moveto, > > > + bool *found_consistent_point); > > > + > > > > > > This API looks a bit awkward as the functionality doesn't match the > > > name. How about having a function with name > > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto, > > > ready_for_decoding) with the same functionality as your patch has > > > for > > > pg_logical_replication_slot_advance() and then invoke it both from > > > pg_logical_replication_slot_advance and slotsync.c. The function > > > name is too big, we can think of a shorter name. Any ideas? > > > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just > > LogicalSlotAdvanceAndCheckDecoding()? > > > > It is about snapbuild state, so how about naming the function as > LogicalSlotAdvanceAndCheckSnapState()? It looks better to me, so changed. > > I have made quite a few cosmetic changes in comments and code. See > attached. This is atop your latest patch. Can you please review and include > these changes in the next version? Thanks, I have reviewed and merged them. Attach the V5 patch set which addressed above comments and ran pgindent. I will think and test the improvement suggested by Bertrand[1] and reply after that. [1] https://www.postgresql.org/message-id/Zgp8n9QD5nYSESnM%40ip-10-97-1-34.eu-west-3.compute.internal Best Regards, Hou zj v5-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch Description: v5-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
RE: Synchronizing slots from primary to standby
On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu) wrote: > > On Friday, March 29, 2024 2:50 PM Amit Kapila > wrote: > > > > On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) > > > > wrote: > > > > > > > > > Attach a new version patch which fixed an un-initialized variable > > > issue and added some comments. > > > > > > > The other approach to fix this issue could be that the slotsync worker > > get the serialized snapshot using pg_read_binary_file() corresponding > > to restart_lsn and writes those at standby. But there are cases when > > we won't have such a file like (a) when we initially create the slot > > and reach the consistent_point, or (b) also by the time the slotsync > > worker starts to read the remote snapshot file, the snapshot file > > could have been removed by the checkpointer on the primary (if the > > restart_lsn of the remote has been advanced in this window). So, in > > such cases, we anyway need to advance the slot. I think these could be > optimizations that we could do in the future. > > > > Few comments: > > Thanks for the comments. > > > = > > 1. > > - if (slot->data.database != MyDatabaseId) > > + if (slot->data.database != MyDatabaseId && !fast_forward) > > ereport(ERROR, > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > errmsg("replication slot \"%s\" was not created in this database", > > @@ -526,7 > > +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn, > > * Do not allow consumption of a "synchronized" slot until the standby > > * gets promoted. > > */ > > - if (RecoveryInProgress() && slot->data.synced) > > + if (RecoveryInProgress() && slot->data.synced && > > + !IsSyncingReplicationSlots()) > > > > > > Add comments at both of the above places. > > Added. > > > > > > > 2. > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr > moveto, > > + bool *found_consistent_point); > > + > > > > This API looks a bit awkward as the functionality doesn't match the > > name. How about having a function with name > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto, > > ready_for_decoding) with the same functionality as your patch has for > > pg_logical_replication_slot_advance() and then invoke it both from > > pg_logical_replication_slot_advance and slotsync.c. The function name > > is too big, we can think of a shorter name. Any ideas? > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just > LogicalSlotAdvanceAndCheckDecoding()? (I used the suggested > LogicalSlotAdvanceAndCheckReadynessForDecoding in this version, It can be > renamed in next version if we agree). > > Attach the V3 patch which addressed above comments and Kuroda-san's > comments[1]. I also adjusted the tap-test to only check the > confirmed_flush_lsn > after syncing, as the restart_lsn could be different from the remote one due > to > the new slot_advance() call. I am also testing some optimization idea locally > and > will share if ready. Attach the V4 patch which includes the optimization to skip the decoding if the snapshot at the syncing restart_lsn is already serialized. It can avoid most of the duplicate decoding in my test, and I am doing some more tests locally. Best Regards, Hou zj v4-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch Description: v4-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
RE: Synchronizing slots from primary to standby
On Friday, March 29, 2024 2:50 PM Amit Kapila wrote: > > On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) > wrote: > > > > > > Attach a new version patch which fixed an un-initialized variable > > issue and added some comments. > > > > The other approach to fix this issue could be that the slotsync worker get the > serialized snapshot using pg_read_binary_file() corresponding to restart_lsn > and writes those at standby. But there are cases when we won't have such a > file > like (a) when we initially create the slot and reach the consistent_point, or > (b) > also by the time the slotsync worker starts to read the remote snapshot file, > the > snapshot file could have been removed by the checkpointer on the primary (if > the restart_lsn of the remote has been advanced in this window). So, in such > cases, we anyway need to advance the slot. I think these could be > optimizations > that we could do in the future. > > Few comments: Thanks for the comments. > = > 1. > - if (slot->data.database != MyDatabaseId) > + if (slot->data.database != MyDatabaseId && !fast_forward) > ereport(ERROR, > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > errmsg("replication slot \"%s\" was not created in this database", @@ -526,7 > +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn, > * Do not allow consumption of a "synchronized" slot until the standby > * gets promoted. > */ > - if (RecoveryInProgress() && slot->data.synced) > + if (RecoveryInProgress() && slot->data.synced && > + !IsSyncingReplicationSlots()) > > > Add comments at both of the above places. Added. > > > 2. > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto, > + bool *found_consistent_point); > + > > This API looks a bit awkward as the functionality doesn't match the name. How > about having a function with name > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto, > ready_for_decoding) with the same functionality as your patch has for > pg_logical_replication_slot_advance() and then invoke it both from > pg_logical_replication_slot_advance and slotsync.c. The function name is too > big, we can think of a shorter name. Any ideas? How about LogicalSlotAdvanceAndCheckDecodingState() Or just LogicalSlotAdvanceAndCheckDecoding()? (I used the suggested LogicalSlotAdvanceAndCheckReadynessForDecoding in this version, It can be renamed in next version if we agree). Attach the V3 patch which addressed above comments and Kuroda-san's comments[1]. I also adjusted the tap-test to only check the confirmed_flush_lsn after syncing, as the restart_lsn could be different from the remote one due to the new slot_advance() call. I am also testing some optimization idea locally and will share if ready. [1] https://www.postgresql.org/message-id/TYCPR01MB1207757BB2A32B6815CE1CCE7F53A2%40TYCPR01MB12077.jpnprd01.prod.outlook.com Best Regards, Hou zj v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch Description: v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
RE: Synchronizing slots from primary to standby
On Friday, March 29, 2024 2:48 PM Bertrand Drouvot wrote: > > Hi, > > On Fri, Mar 29, 2024 at 01:06:15AM +0000, Zhijie Hou (Fujitsu) wrote: > > Attach a new version patch which fixed an un-initialized variable > > issue and added some comments. Also, temporarily enable DEBUG2 for the > > 040 tap-test so that we can analyze the possible CFbot failures easily. > > > > Thanks! > > + if (remote_slot->confirmed_lsn != slot->data.confirmed_flush) > + { > + /* > +* By advancing the restart_lsn, confirmed_lsn, and xmin using > +* fast-forward logical decoding, we ensure that the required > snapshots > +* are saved to disk. This enables logical decoding to quickly > reach a > +* consistent point at the restart_lsn, eliminating the risk > of > missing > +* data during snapshot creation. > +*/ > + > pg_logical_replication_slot_advance(remote_slot->confirmed_lsn, > + > found_consistent_point); > + ReplicationSlotsComputeRequiredLSN(); > + updated_lsn = true; > + } > > Instead of using pg_logical_replication_slot_advance() for each synced slot > and > during sync cycles what about?: > > - keep sync slot synchronization as it is currently (not using > pg_logical_replication_slot_advance()) > - create "an hidden" logical slot if sync slot feature is on > - at the time of promotion use pg_logical_replication_slot_advance() on this > hidden slot only to advance to the max lsn of the synced slots > > I'm not sure that would be enough, just asking your thoughts on this (benefits > would be to avoid calling pg_logical_replication_slot_advance() on each sync > slots and during the sync cycles). Thanks for the idea ! I considered about this. I think advancing the "hidden" slot on promotion may be a bit late, because if we cannot reach the consistent point after advancing the "hidden" slot, then it means we may need to remove all the synced slots as we are not sure if they are usable(will not loss data) after promotion. And it may confuse user a bit as they have seen these slots to be sync-ready. The current approach is to mark such un-consistent slot as temp and persist them once it reaches consistent point, so that user can ensure the slot can be used after promotion once persisted. Another optimization idea is to check the snapshot file existence before calling the slot_advance(). If the file already exists, we skip the decoding and directly update the restart_lsn. This way, we could also avoid some duplicate decoding work. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Thursday, March 28, 2024 10:02 PM Zhijie Hou (Fujitsu) wrote: > > On Thursday, March 28, 2024 7:32 PM Amit Kapila > wrote: > > > > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) > > > > wrote: > > > > > > When analyzing one BF error[1], we find an issue of slotsync: Since > > > we don't perform logical decoding for the synced slots when syncing > > > the lsn/xmin of slot, no logical snapshots will be serialized to > > > disk. So, when user starts to use these synced slots after > > > promotion, it needs to re-build the consistent snapshot from the > > > restart_lsn if the > > > WAL(xl_running_xacts) at restart_lsn position indicates that there > > > are running transactions. This however could cause the data that > > > before the > > consistent point to be missed[2]. > > > > > > This issue doesn't exist on the primary because the snapshot at > > > restart_lsn should have been serialized to disk > > > (SnapBuildProcessRunningXacts -> SnapBuildSerialize), so even if the > > > logical decoding restarts, it can find consistent snapshot > > > immediately at > > restart_lsn. > > > > > > To fix this, we could use the fast forward logical decoding to > > > advance the synced slot's lsn/xmin when syncing these values instead > > > of directly updating the slot's info. This way, the snapshot will be > > > serialized to > > disk when decoding. > > > If we could not reach to the consistent point at the remote > > > restart_lsn, the slot is marked as temp and will be persisted once > > > it reaches the consistent point. I am still analyzing the fix and > > > will share once > > ready. > > > > > > > Yes, we can use this but one thing to note is that > > CreateDecodingContext() will expect that the slot's and current > > database are the same. I think the reason for that is we need to check > > system tables of the current database while decoding and sending data > > to the output_plugin which won't be a requirement for the fast_forward > > case. So, we need to skip that check in fast_forward mode. > > Agreed. > > > > > Next, I was thinking about the case of the first time updating the > > restart and confirmed_flush LSN while syncing the slots. I think we > > can keep the current logic as it is based on the following analysis. > > > > For each logical slot, cases possible on the primary: > > 1. The restart_lsn doesn't have a serialized snapshot and hasn't yet > > reached the consistent point. > > 2. The restart_lsn doesn't have a serialized snapshot but has reached > > a consistent point. > > 3. The restart_lsn has a serialized snapshot which means it has > > reached a consistent point as well. > > > > Considering we keep the logic to reserve initial WAL positions the > > same as the current (Reserve WAL for the currently active local slot > > using the specified WAL location (restart_lsn). If the given WAL > > location has been removed, reserve WAL using the oldest existing WAL > > segment.), I could think of the below > > scenarios: > > A. For 1, we shouldn't sync the slot as it still wouldn't have been > > marked persistent on the primary. > > B. For 2, we would sync the slot > >B1. If remote_restart_lsn >= local_resart_lsn, then advance the > > slot by calling pg_logical_replication_slot_advance(). > >B11. If we reach consistent point, then it should be okay > > because after promotion as well we should reach consistent point. > > B111. But again is it possible that there is some xact > > that comes before consistent_point on primary and the same is after > > consistent_point on standby? This shouldn't matter as we will start > > decoding transactions after confirmed_flush_lsn which would be the same on > primary and standby. > >B22. If we haven't reached consistent_point, then we won't mark > > the slot as persistent, and at the next sync we will do the same till > > it reaches consistent_point. At that time, the situation will be similar to > > B11. > >B2. If remote_restart_lsn < local_restart_lsn, then we will wait > > for the next sync cycle and keep the slot as temporary. Once in the > > next or some consecutive sync cycle, we reach the condition > > remote_restart_lsn >= local_restart_lsn, we will proceed to advance > > the slot and we should have the same behavior as B1. > > C. For 3, we would sync the slot, but the behavior should be the same as B. > > > > Thoughts? > > Looks reasonable to me. > > Here is the patch based on above lines. > I am also testing and verifying the patch locally. Attach a new version patch which fixed an un-initialized variable issue and added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so that we can analyze the possible CFbot failures easily. Best Regards, Hou zj v2-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch Description: v2-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
RE: Synchronizing slots from primary to standby
On Thursday, March 28, 2024 7:32 PM Amit Kapila wrote: > > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) > wrote: > > > > When analyzing one BF error[1], we find an issue of slotsync: Since we > > don't perform logical decoding for the synced slots when syncing the > > lsn/xmin of slot, no logical snapshots will be serialized to disk. So, > > when user starts to use these synced slots after promotion, it needs > > to re-build the consistent snapshot from the restart_lsn if the > > WAL(xl_running_xacts) at restart_lsn position indicates that there are > > running transactions. This however could cause the data that before the > consistent point to be missed[2]. > > > > This issue doesn't exist on the primary because the snapshot at > > restart_lsn should have been serialized to disk > > (SnapBuildProcessRunningXacts -> SnapBuildSerialize), so even if the > > logical decoding restarts, it can find consistent snapshot immediately at > restart_lsn. > > > > To fix this, we could use the fast forward logical decoding to advance > > the synced slot's lsn/xmin when syncing these values instead of > > directly updating the slot's info. This way, the snapshot will be > > serialized to > disk when decoding. > > If we could not reach to the consistent point at the remote > > restart_lsn, the slot is marked as temp and will be persisted once it > > reaches the consistent point. I am still analyzing the fix and will share > > once > ready. > > > > Yes, we can use this but one thing to note is that > CreateDecodingContext() will expect that the slot's and current database are > the same. I think the reason for that is we need to check system tables of the > current database while decoding and sending data to the output_plugin which > won't be a requirement for the fast_forward case. So, we need to skip that > check in fast_forward mode. Agreed. > > Next, I was thinking about the case of the first time updating the restart and > confirmed_flush LSN while syncing the slots. I think we can keep the current > logic as it is based on the following analysis. > > For each logical slot, cases possible on the primary: > 1. The restart_lsn doesn't have a serialized snapshot and hasn't yet reached > the > consistent point. > 2. The restart_lsn doesn't have a serialized snapshot but has reached a > consistent point. > 3. The restart_lsn has a serialized snapshot which means it has reached a > consistent point as well. > > Considering we keep the logic to reserve initial WAL positions the same as the > current (Reserve WAL for the currently active local slot using the specified > WAL > location (restart_lsn). If the given WAL location has been removed, reserve > WAL using the oldest existing WAL segment.), I could think of the below > scenarios: > A. For 1, we shouldn't sync the slot as it still wouldn't have been marked > persistent on the primary. > B. For 2, we would sync the slot >B1. If remote_restart_lsn >= local_resart_lsn, then advance the slot by > calling > pg_logical_replication_slot_advance(). >B11. If we reach consistent point, then it should be okay because after > promotion as well we should reach consistent point. > B111. But again is it possible that there is some xact that comes > before consistent_point on primary and the same is after consistent_point on > standby? This shouldn't matter as we will start decoding transactions after > confirmed_flush_lsn which would be the same on primary and standby. >B22. If we haven't reached consistent_point, then we won't mark the > slot > as persistent, and at the next sync we will do the same till it reaches > consistent_point. At that time, the situation will be similar to B11. >B2. If remote_restart_lsn < local_restart_lsn, then we will wait for the > next > sync cycle and keep the slot as temporary. Once in the next or some > consecutive sync cycle, we reach the condition remote_restart_lsn >= > local_restart_lsn, we will proceed to advance the slot and we should have the > same behavior as B1. > C. For 3, we would sync the slot, but the behavior should be the same as B. > > Thoughts? Looks reasonable to me. Here is the patch based on above lines. I am also testing and verifying the patch locally. Best Regards, Hou zj 0001-advance-the-restart_lsn-of-synced-slots-using-logica.patch Description: 0001-advance-the-restart_lsn-of-synced-slots-using-logica.patch
RE: Synchronizing slots from primary to standby
Hi, When analyzing one BF error[1], we find an issue of slotsync: Since we don't perform logical decoding for the synced slots when syncing the lsn/xmin of slot, no logical snapshots will be serialized to disk. So, when user starts to use these synced slots after promotion, it needs to re-build the consistent snapshot from the restart_lsn if the WAL(xl_running_xacts) at restart_lsn position indicates that there are running transactions. This however could cause the data that before the consistent point to be missed[2]. This issue doesn't exist on the primary because the snapshot at restart_lsn should have been serialized to disk (SnapBuildProcessRunningXacts -> SnapBuildSerialize), so even if the logical decoding restarts, it can find consistent snapshot immediately at restart_lsn. To fix this, we could use the fast forward logical decoding to advance the synced slot's lsn/xmin when syncing these values instead of directly updating the slot's info. This way, the snapshot will be serialized to disk when decoding. If we could not reach to the consistent point at the remote restart_lsn, the slot is marked as temp and will be persisted once it reaches the consistent point. I am still analyzing the fix and will share once ready. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2024-03-19%2010%3A03%3A06 [2] The steps to reproduce the data miss issue on a primary->standby setup: Note, we need to set LOG_SNAPSHOT_INTERVAL_MS to a bigger number(150) to prevent cocurrent LogStandbySnapshot() call and enable sync_replication_slots on the standby. 1. Create a failover logical slot on the primary. SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot', 'test_decoding', false, false, true); 2. Use the following steps to advance the restart_lsn of the failover slot to a position where the xl_running_xacts at that position indicates that there is running transaction. TXN1 BEGIN; create table dummy1(a int); TXN2 SELECT pg_log_standby_snapshot(); TXN1 COMMIT; TXN1 BEGIN; create table dummy2(a int); TXN2 SELECT pg_log_standby_snapshot(); TXN1 COMMIT; -- the restart_lsn will be advanced to a position where there was 1 running transaction. And we need to wait for the restart_lsn to be synced to the standby. SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn()); -- insert some data here before calling next pg_log_standby_snapshot(). INSERT INTO reptable VALUES(999); 3. Promote the standby and try to consume the change(999) from the synced slot on the standby. We will find that no change is returned. select * from pg_logical_slot_get_changes('logicalslot', NULL, NULL); Best Regards, Hou zj
Buildfarm failure on tamandua - "timed out waiting for subscriber to synchronize data"
Hi, There is a failure in 040_standby_failover_slots_sync on tamandua: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua=2024-03-21%2008%3A28%3A58 The reason of the timeout is that the table sync worker for the new table is not started after executing ALTER SUBSCRIPTION REFRESH PUBLICATION. We have seen similar failures in other tests as well[1]. AFAICS, the reasons of them are the same which is because of a race condition in logicalrep apply worker. The analysis has been posted on another thread[2] and the fix is also being reviewed. [1] https://www.postgresql.org/message-id/OSZPR01MB6310D6F48372F52F1D85E1C5FD609%40OSZPR01MB6310.jpnprd01.prod.outlook.com [2] https://www.postgresql.org/message-id/flat/CALDaNm1XeB3bF%2BVEJZi%3DBT31PZAL_UVys-26%2BYSv_AxCq0G2eg%40mail.gmail.com#87b153fd7676652746406a6f114eb67b Best Regards, Hou Zhijie
RE: Synchronizing slots from primary to standby
Hi, Since the standby_slot_names patch has been committed, I am attaching the last doc patch for review. Best Regards, Hou zj v109-0001-Document-the-steps-to-check-if-the-standby-is-r.patch Description: v109-0001-Document-the-steps-to-check-if-the-standby-is-r.patch
RE: Race condition in FetchTableStates() breaks synchronization of subscription tables
On Wednesday, March 13, 2024 11:49 AMvignesh C wrote: > On Tue, 12 Mar 2024 at 09:34, Ajin Cherian wrote: > > > > > > > > On Tue, Mar 12, 2024 at 2:59 PM vignesh C wrote: > >> > >> > >> Thanks, I have created the following Commitfest entry for this: > >> https://commitfest.postgresql.org/47/4816/ > >> > >> Regards, > >> Vignesh > > > > > > Thanks for the patch, I have verified that the fix works well by following > > the > steps mentioned to reproduce the problem. > > Reviewing the patch, it seems good and is well documented. Just one minor > comment I had was probably to change the name of the variable > table_states_valid to table_states_validity. The current name made sense when > it was a bool, but now that it is a tri-state enum, it doesn't fit well. > > Thanks for reviewing the patch, the attached v6 patch has the changes for the > same. Thanks for the patches. I saw a recent similar BF error[1] which seems related to the issue that 0001 patch is trying to solve. i.e. The table sync worker is somehow not started after refreshing the publication on the subscriber. I didn't see other related ERRORs in the log, so I think the reason is the same as the one being discussed in this thread, which is the table state invalidation got lost. And the 0001 patch looks good to me. For 0002, instead of avoid resetting the latch, is it possible to let the logical rep worker wake up the launcher once after attaching ? [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin=2024-03-11%2000%3A52%3A42 Best Regards, Hou zj
RE: Test failures of 100_bugs.pl
On Monday, April 24, 2023 5:50 PM Yu Shi (Fujitsu) wrote: > > On Fri, Apr 21, 2023 1:48 PM Yu Shi (Fujitsu) wrote: > > > > I wrote a patch to dump rel state in wait_for_subscription_sync() as > suggested. > > Please see the attached patch. > > I will try to add some debug logs in code later. > > > > Please see the attached v2 patch. > > I added some debug logs when invalidating syncing table states and updating > table_states_not_ready list. I also adjusted the message level in the tests > which > failed before. Just a reference. I think similar issue has been analyzed in other thread[1] and the reason seems clear that the table state cache invalidation got lost due to a race condition. The fix is also being discussed there. [1] https://www.postgresql.org/message-id/flat/711a6afe-edb7-1211-cc27-1bef8239eec7%40gmail.com Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Thursday, March 7, 2024 12:46 PM Amit Kapila wrote: > > On Thu, Mar 7, 2024 at 7:35 AM Peter Smith > wrote: > > > > Here are some review comments for v107-0001 > > > > == > > src/backend/replication/slot.c > > > > 1. > > +/* > > + * Struct for the configuration of standby_slot_names. > > + * > > + * Note: this must be a flat representation that can be held in a single > > chunk > > + * of guc_malloc'd memory, so that it can be stored as the "extra" data for > the > > + * standby_slot_names GUC. > > + */ > > +typedef struct > > +{ > > + int slot_num; > > + > > + /* slot_names contains nmembers consecutive nul-terminated C strings */ > > + char slot_names[FLEXIBLE_ARRAY_MEMBER]; > > +} StandbySlotConfigData; > > + > > > > 1a. > > To avoid any ambiguity this 1st field is somehow a slot ID number, I > > felt a better name would be 'nslotnames' or even just 'n' or 'count', > > > > We can probably just add a comment above slot_num and that should be > sufficient but I am fine with 'nslotnames' as well, in anycase let's > add a comment for the same. Added. > > > > > 6b. > > IMO this function would be tidier written such that the > > MyReplicationSlot->data.name is passed as a parameter. Then you can > > name the function more naturally like: > > > > IsSlotInStandbySlotNames(const char *slot_name) > > > > +1. How about naming it as SlotExistsinStandbySlotNames(char > *slot_name) and pass the slot_name from MyReplicationSlot? Otherwise, > we need an Assert for MyReplicationSlot in this function. Changed as suggested. > > Also, can we add a comment like below before the loop: > + /* > + * XXX: We are not expecting this list to be long so a linear search > + * shouldn't hurt but if that turns out not to be true then we can cache > + * this information for each WalSender as well. > + */ Added. Attach the V108 patch set which addressed above and Peter's comments. I also removed the check for "*" in guc check hook. Best Regards, Hou zj v108-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch Description: v108-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
RE: Synchronizing slots from primary to standby
On Thursday, March 7, 2024 10:05 AM Peter Smith wrote: > > Here are some review comments for v107-0001 Thanks for the comments. > > == > src/backend/replication/slot.c > > 1. > +/* > + * Struct for the configuration of standby_slot_names. > + * > + * Note: this must be a flat representation that can be held in a > +single chunk > + * of guc_malloc'd memory, so that it can be stored as the "extra" data > +for the > + * standby_slot_names GUC. > + */ > +typedef struct > +{ > + int slot_num; > + > + /* slot_names contains nmembers consecutive nul-terminated C strings > +*/ char slot_names[FLEXIBLE_ARRAY_MEMBER]; > +} StandbySlotConfigData; > + > > 1a. > To avoid any ambiguity this 1st field is somehow a slot ID number, I felt a > better > name would be 'nslotnames' or even just 'n' or 'count', Changed to 'nslotnames'. > > ~ > > 1b. > (fix typo) > > SUGGESTION for the 2nd field comment > slot_names is a chunk of 'n' X consecutive null-terminated C strings Changed. > > ~ > > 1c. > A more explanatory name for this typedef maybe is > 'StandbySlotNamesConfigData' ? Changed. > > ~~~ > > > 2. > +/* This is parsed and cached configuration for standby_slot_names */ > +static StandbySlotConfigData *standby_slot_config; > > > 2a. > /This is parsed and cached configuration for .../This is the parsed and cached > configuration for .../ Changed. > > ~ > > 2b. > Similar to above -- since this only has name information maybe it is more > correct to call it 'standby_slot_names_config'? > Changed. > ~~~ > > 3. > +/* > + * A helper function to validate slots specified in GUC standby_slot_names. > + * > + * The rawname will be parsed, and the parsed result will be saved into > + * *elemlist. > + */ > +static bool > +validate_standby_slots(char *rawname, List **elemlist) > > /and the parsed result/and the result/ > Changed. > ~~~ > > 4. check_standby_slot_names > > + /* Need a modifiable copy of string */ rawname = pstrdup(*newval); > > /copy of string/copy of the GUC string/ > Changed. > ~~~ > > 5. > +assign_standby_slot_names(const char *newval, void *extra) { > + /* > + * The standby slots may have changed, so we must recompute the oldest > + * LSN. > + */ > + ss_oldest_flush_lsn = InvalidXLogRecPtr; > + > + standby_slot_config = (StandbySlotConfigData *) extra; } > > To avoid leaking don't we need to somewhere take care to free any memory > used by a previous value (if any) of this 'standby_slot_config'? > The memory of extra is maintained by the GUC mechanism. It will be automatically freed when the associated GUC setting is no longer of interest. See src/backend/utils/misc/README for details. > ~~~ > > 6. AcquiredStandbySlot > > +/* > + * Return true if the currently acquired slot is specified in > + * standby_slot_names GUC; otherwise, return false. > + */ > +bool > +AcquiredStandbySlot(void) > +{ > + const char *name; > + > + /* Return false if there is no value in standby_slot_names */ if > + (standby_slot_config == NULL) return false; > + > + name = standby_slot_config->slot_names; for (int i = 0; i < > + standby_slot_config->slot_num; i++) { if (strcmp(name, > + NameStr(MyReplicationSlot->data.name)) == 0) return true; > + > + name += strlen(name) + 1; > + } > + > + return false; > +} > > 6a. > Just checking "(standby_slot_config == NULL)" doesn't seem enough to me, > because IIUIC it is possible when 'standby_slot_names' has no value then > maybe standby_slot_config is not NULL but standby_slot_config->slot_num is > 0. The standby_slot_config will always be NULL if there is no value in it. While checking, I did find a rare case that if there are only some white space in the standby_slot_names, then slot_num will be 0, and have fixed it so that standby_slot_config will always be NULL if there is no meaning value in guc. > > ~ > > 6b. > IMO this function would be tidier written such that the > MyReplicationSlot->data.name is passed as a parameter. Then you can > name the function more naturally like: > > IsSlotInStandbySlotNames(const char *slot_name) Changed it to SlotExistsInStandbySlotNames. > > ~ > > 6c. > IMO the body of the function will be tidier if written so there are only 2 > returns > instead of 3 like > > SUGGESTION: > if (...) > { > for (...) > { > ... > return true; > } > } > return false; I personally prefer the current style. > > ~~~ > > 7. > + /* > + * Don't need to wait for the standbys to catch up if there is no value > + in > + * standby_slot_names. > + */ > + if (standby_slot_config == NULL) > + return true; > > (similar to a previous review comment) > > This check doesn't seem enough because IIUIC it is possible when > 'standby_slot_names' has no value then maybe standby_slot_config is not NULL > but standby_slot_config->slot_num is 0. Same as above. > > ~~~ > > 8. WaitForStandbyConfirmation > > + /* > + * Don't need to wait for the standby to catch up if the current > + acquired > + * slot is not a
RE: Synchronizing slots from primary to standby
On Wednesday, March 6, 2024 9:13 PM Zhijie Hou (Fujitsu) wrote: > > On Wednesday, March 6, 2024 11:04 AM Zhijie Hou (Fujitsu) > wrote: > > > > On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada > > wrote: > > > > Hi, > > > > > On Fri, Mar 1, 2024 at 4:21 PM Zhijie Hou (Fujitsu) > > > > > > wrote: > > > > > > > > On Friday, March 1, 2024 2:11 PM Masahiko Sawada > > > wrote: > > > > > > > > > > > > > > > --- > > > > > +void > > > > > +assign_standby_slot_names(const char *newval, void *extra) { > > > > > +List *standby_slots; > > > > > +MemoryContext oldcxt; > > > > > +char *standby_slot_names_cpy = extra; > > > > > + > > > > > > > > > > Given that the newval and extra have the same data > > > > > (standby_slot_names value), why do we not use newval instead? I > > > > > think that if we use newval, we don't need to guc_strdup() in > > > > > check_standby_slot_names(), we might need to do list_copy_deep() > > > > > instead, though. It's not clear to me as there is no comment. > > > > > > > > I think SplitIdentifierString will modify the passed in string, so > > > > we'd better not pass the newval to it, otherwise the stored guc > > > > string(standby_slot_names) will be changed. I can see we are doing > > > > similar thing in other GUC check/assign function as well. > > > > (check_wal_consistency_checking/ assign_wal_consistency_checking, > > > > check_createrole_self_grant/ assign_createrole_self_grant ...). > > > > > > Why does it have to be a List in the first place? > > > > I thought the List type is convenient to use here, as we have existing > > list build function(SplitIdentifierString), and have convenient list > > macro to loop the > > list(foreach_ptr) which can save some codes. > > > > > In earlier version patches, we > > > used to copy the list and delete the element until it became empty, > > > while waiting for physical wal senders. But we now just refer to > > > each slot name in the list. The current code assumes that > > > stnadby_slot_names_cpy is allocated in GUCMemoryContext but once it > > > changes, it will silently get broken. I think we can check and > > > assign standby_slot_names in a similar way to > > > check/assign_temp_tablespaces and > check/assign_synchronous_standby_names. > > > > Yes, we could do follow it by allocating an array and copy each slot > > name into it, but it also requires some codes to build and scan the > > array. So, is it possible to expose the GucMemorycontext or have an API like > guc_copy_list instead ? > > If we don't want to touch the guc api, I am ok with using an array as well. > > I rethink about this and realize that it's not good to do the memory > allocation in > assign hook function. As the "src/backend/utils/misc/README" said, we'd > better do that in check hook function and pass it via extra to assign hook > function. And thus array is a good choice in this case rather than a List > which > cannot be passed to *extra. > > Here is the V107 patch set which parse and cache the standby slot names in an > array instead of a List. The patch needs to be rebased due to recent commit. Attach the V107_2 path set. There are no code changes in this version. Best Regards, Hou zj v107_2-0001-Allow-logical-walsenders-to-wait-for-the-physi.patch Description: v107_2-0001-Allow-logical-walsenders-to-wait-for-the-physi.patch v107_2-0002-Document-the-steps-to-check-if-the-standby-is-r.patch Description: v107_2-0002-Document-the-steps-to-check-if-the-standby-is-r.patch
RE: Synchronizing slots from primary to standby
On Wednesday, March 6, 2024 11:04 AM Zhijie Hou (Fujitsu) wrote: > > On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada > wrote: > > Hi, > > > On Fri, Mar 1, 2024 at 4:21 PM Zhijie Hou (Fujitsu) > > > > wrote: > > > > > > On Friday, March 1, 2024 2:11 PM Masahiko Sawada > > wrote: > > > > > > > > > > > > --- > > > > +void > > > > +assign_standby_slot_names(const char *newval, void *extra) { > > > > +List *standby_slots; > > > > +MemoryContext oldcxt; > > > > +char *standby_slot_names_cpy = extra; > > > > + > > > > > > > > Given that the newval and extra have the same data > > > > (standby_slot_names value), why do we not use newval instead? I > > > > think that if we use newval, we don't need to guc_strdup() in > > > > check_standby_slot_names(), we might need to do list_copy_deep() > > > > instead, though. It's not clear to me as there is no comment. > > > > > > I think SplitIdentifierString will modify the passed in string, so > > > we'd better not pass the newval to it, otherwise the stored guc > > > string(standby_slot_names) will be changed. I can see we are doing > > > similar thing in other GUC check/assign function as well. > > > (check_wal_consistency_checking/ assign_wal_consistency_checking, > > > check_createrole_self_grant/ assign_createrole_self_grant ...). > > > > Why does it have to be a List in the first place? > > I thought the List type is convenient to use here, as we have existing list > build > function(SplitIdentifierString), and have convenient list macro to loop the > list(foreach_ptr) which can save some codes. > > > In earlier version patches, we > > used to copy the list and delete the element until it became empty, > > while waiting for physical wal senders. But we now just refer to each > > slot name in the list. The current code assumes that > > stnadby_slot_names_cpy is allocated in GUCMemoryContext but once it > > changes, it will silently get broken. I think we can check and assign > > standby_slot_names in a similar way to check/assign_temp_tablespaces > > and check/assign_synchronous_standby_names. > > Yes, we could do follow it by allocating an array and copy each slot name > into it, > but it also requires some codes to build and scan the array. So, is it > possible to > expose the GucMemorycontext or have an API like guc_copy_list instead ? > If we don't want to touch the guc api, I am ok with using an array as well. I rethink about this and realize that it's not good to do the memory allocation in assign hook function. As the "src/backend/utils/misc/README" said, we'd better do that in check hook function and pass it via extra to assign hook function. And thus array is a good choice in this case rather than a List which cannot be passed to *extra. Here is the V107 patch set which parse and cache the standby slot names in an array instead of a List. Best Regards, Hou zj v107-0002-Document-the-steps-to-check-if-the-standby-is-r.patch Description: v107-0002-Document-the-steps-to-check-if-the-standby-is-r.patch v107-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch Description: v107-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
RE: Synchronizing slots from primary to standby
On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada wrote: Hi, > On Fri, Mar 1, 2024 at 4:21 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Friday, March 1, 2024 2:11 PM Masahiko Sawada > wrote: > > > > > > > > > --- > > > +void > > > +assign_standby_slot_names(const char *newval, void *extra) { > > > +List *standby_slots; > > > +MemoryContext oldcxt; > > > +char *standby_slot_names_cpy = extra; > > > + > > > > > > Given that the newval and extra have the same data > > > (standby_slot_names value), why do we not use newval instead? I > > > think that if we use newval, we don't need to guc_strdup() in > > > check_standby_slot_names(), we might need to do list_copy_deep() > > > instead, though. It's not clear to me as there is no comment. > > > > I think SplitIdentifierString will modify the passed in string, so > > we'd better not pass the newval to it, otherwise the stored guc > > string(standby_slot_names) will be changed. I can see we are doing > > similar thing in other GUC check/assign function as well. > > (check_wal_consistency_checking/ assign_wal_consistency_checking, > > check_createrole_self_grant/ assign_createrole_self_grant ...). > > Why does it have to be a List in the first place? I thought the List type is convenient to use here, as we have existing list build function(SplitIdentifierString), and have convenient list macro to loop the list(foreach_ptr) which can save some codes. > In earlier version patches, we > used to copy the list and delete the element until it became empty, while > waiting for physical wal senders. But we now just refer to each slot name in > the > list. The current code assumes that stnadby_slot_names_cpy is allocated in > GUCMemoryContext but once it changes, it will silently get broken. I think we > can check and assign standby_slot_names in a similar way to > check/assign_temp_tablespaces and > check/assign_synchronous_standby_names. Yes, we could do follow it by allocating an array and copy each slot name into it, but it also requires some codes to build and scan the array. So, is it possible to expose the GucMemorycontext or have an API like guc_copy_list instead ? If we don't want to touch the guc api, I am ok with using an array as well. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Tuesday, March 5, 2024 2:35 PM shveta malik wrote: > > On Tue, Mar 5, 2024 at 9:15 AM Amit Kapila wrote: > > > > On Tue, Mar 5, 2024 at 6:10 AM Peter Smith > wrote: > > > > > > == > > > src/backend/replication/walsender.c > > > > > > 5. NeedToWaitForWal > > > > > > + /* > > > + * Check if the standby slots have caught up to the flushed > > > + position. It > > > + * is good to wait up to the flushed position and then let the > > > + WalSender > > > + * send the changes to logical subscribers one by one which are > > > + already > > > + * covered by the flushed position without needing to wait on every > > > + change > > > + * for standby confirmation. > > > + */ > > > + if (NeedToWaitForStandbys(flushed_lsn, wait_event)) return true; > > > + > > > + *wait_event = 0; > > > + return false; > > > +} > > > + > > > > > > 5a. > > > The comment (or part of it?) seems misplaced because it is talking > > > WalSender sending changes, but that is not happening in this function. > > > > > > > I don't think so. This is invoked only by walsender and a static > > function. I don't see any other better place to mention this. > > > > > Also, isn't what this is saying already described by the other > > > comment in the caller? e.g.: > > > > > > > Oh no, here we are explaining the wait order. > > I think there is a scope of improvement here. The comment inside > NeedToWaitForWal() which states that we need to wait here for standbys on > flush-position(and not on each change) should be outside of this function. It > is > too embedded. And the comment which states the order of wait (first flush and > then standbys confirmation) should be outside the for-loop in > WalSndWaitForWal(), but yes we do need both the comments. Attached a > patch (.txt) for comments improvement, please merge if appropriate. Thanks, I have slightly modified the top-up patch and merged it. Attach the V106 patch which addressed above and Peter's comments[1]. [1] https://www.postgresql.org/message-id/CAHut%2BPsATK8z1TEcfFE8zWoS1hagqsvaWYCgom_zYtScfwO7uQ%40mail.gmail.com Best Regards, Hou zj v106-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch Description: v106-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch v106-0002-Document-the-steps-to-check-if-the-standby-is-r.patch Description: v106-0002-Document-the-steps-to-check-if-the-standby-is-r.patch
RE: Synchronizing slots from primary to standby
On Tuesday, March 5, 2024 8:40 AM Peter Smith wrote: > > Here are some review comments for v105-0001 > > == > doc/src/sgml/config.sgml > > 1. > + > +The standbys corresponding to the physical replication slots in > +standby_slot_names must configure > +sync_replication_slots = true so they can receive > +logical failover slots changes from the primary. > + > > /slots changes/slot changes/ Changed. > > == > doc/src/sgml/func.sgml > > 2. > +The function may be waiting if the specified slot is a logical > failover > +slot and linkend="guc-standby-slot-names">standby_slot_names me> > +is configured. > > I know this has been through multiple versions already, but this > latest wording "may be waiting..." doesn't seem very good to me. > > How about one of these? > > * The function may not be able to return immediately if the specified > slot is a logical failover slot and standby_slot_names is configured. > > * The function return might be blocked if the specified slot is a > logical failover slot and standby_slot_names is configured. > > * If the specified slot is a logical failover slot then the function > will block until all physical slots specified in standby_slot_names > have confirmed WAL receipt. > > * If the specified slot is a logical failover slot then the function > will not return until all physical slots specified in > standby_slot_names have confirmed WAL receipt. I prefer the last one. > > ~~~ > > 3. > +slot may return to an earlier position. The function may be waiting > if > +the specified slot is a logical failover slot and > + linkend="guc-standby-slot-names">standby_slot_names me> > > > Same as previous review comment #2 Changed. > > == > src/backend/replication/slot.c > > 4. WaitForStandbyConfirmation > > + * Used by logical decoding SQL functions that acquired logical failover > slot. > > IIUC it doesn't work like that. pg_logical_slot_get_changes_guts() > calls here unconditionally (i.e. the SQL functions don't even check if > they are failover slots before calling this) so the comment seems > misleading/redundant. I removed the "acquired logical failover slot.". > > == > src/backend/replication/walsender.c > > 5. NeedToWaitForWal > > + /* > + * Check if the standby slots have caught up to the flushed position. It > + * is good to wait up to the flushed position and then let the WalSender > + * send the changes to logical subscribers one by one which are already > + * covered by the flushed position without needing to wait on every change > + * for standby confirmation. > + */ > + if (NeedToWaitForStandbys(flushed_lsn, wait_event)) > + return true; > + > + *wait_event = 0; > + return false; > +} > + > > 5a. > The comment (or part of it?) seems misplaced because it is talking > WalSender sending changes, but that is not happening in this function. > > Also, isn't what this is saying already described by the other comment > in the caller? e.g.: > > + /* > + * Within the loop, we wait for the necessary WALs to be flushed to > + * disk first, followed by waiting for standbys to catch up if there > + * are enough WALs or upon receiving the shutdown signal. To avoid the > + * scenario where standbys need to catch up to a newer WAL location in > + * each iteration, we update our idea of the currently flushed > + * position only if we are not waiting for standbys to catch up. > + */ > I moved these comments based on Shveta's suggestion. > ~ > > 5b. > Most of the code is unnecessary. AFAICT all this is exactly same as just 1 > line: > > return NeedToWaitForStandbys(flushed_lsn, wait_event); Changed. > > ~~~ > > 6. WalSndWaitForWal > > + /* > + * Within the loop, we wait for the necessary WALs to be flushed to > + * disk first, followed by waiting for standbys to catch up if there > + * are enough WALs or upon receiving the shutdown signal. To avoid the > + * scenario where standbys need to catch up to a newer WAL location in > + * each iteration, we update our idea of the currently flushed > + * position only if we are not waiting for standbys to catch up. > + */ > > Regarding that 1st sentence: maybe this logic used to be done > explicitly "within the loop" but IIUC this logic is now hidden inside > NeedToWaitForWal() so the comment should mention that. Changed. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Monday, March 4, 2024 11:44 PM Bertrand Drouvot wrote: > > Hi, > > On Mon, Mar 04, 2024 at 01:28:04PM +0000, Zhijie Hou (Fujitsu) wrote: > > Attach the V105 patch set > > Thanks! > > Sorry I missed those during the previous review: No problem, thanks for the comments! > > 1 === > > Commit message: "these functions will block until" > > s/block/wait/ ? > > 2 === > > +when used with logical failover slots, will block until all > > s/block/wait/ ? > > It seems those are the 2 remaining "block" that could deserve the proposed > above change. I prefer using 'block' here. And it seems others also suggest to change the 'wait'[1]. > > 3 === > > + invalidated = slot->data.invalidated != RS_INVAL_NONE; > + inactive = slot->active_pid == 0; > > invalidated = (slot->data.invalidated != RS_INVAL_NONE); inactive = > (slot->active_pid == 0); > > instead? > > I think it's easier to read and it looks like this is the way it's written in > other > places (at least the few I checked). I think the current code is consistent with other similar code in slot.c. (grep "data.invalidated != RS_INVAL_NONE"). [1] https://www.postgresql.org/message-id/CAHut%2BPsATK8z1TEcfFE8zWoS1hagqsvaWYCgom_zYtScfwO7uQ%40mail.gmail.com Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Monday, March 4, 2024 5:52 PM Amit Kapila wrote: > > On Mon, Mar 4, 2024 at 9:35 AM Peter Smith > wrote: > > > > OK, if the code will remain as-is wouldn't it be better to anyway > > change the function name to indicate what it really does? > > > > e.g. NeedToWaitForWal --> NeedToWaitForWalFlushOrStandbys > > > > This seems too long. I would prefer the current name NeedToWaitForWal as > waiting for WAL means waiting to flush the WAL and waiting to replicate it to > standby. On similar lines, the variable name standby_slot_oldest_flush_lsn > looks > too long. How about ss_oldest_flush_lsn (where ss indicates standy_slots)? > > Apart from this, I have made minor modifications in the attached. Thanks, I have merged it. Attach the V105 patch set which addressed Peter, Amit and Bertrand's comments. This version also includes the following changes: * We found a string matching issue for query_until() and fixed it. * Removed one un-used parameter from NeedToWaitForStandbys. * Disable the sub before testing the pg_logical_slot_get_changes in 040.pl, this is to prevent This test from catching the warning from another walsender. * Ran pgindent. Best Regards, Hou zj v105-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch Description: v105-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch v105-0002-Document-the-steps-to-check-if-the-standby-is-r.patch Description: v105-0002-Document-the-steps-to-check-if-the-standby-is-r.patch
RE: Synchronizing slots from primary to standby
On Monday, March 4, 2024 9:55 AM Peter Smith wrote: > > On Sun, Mar 3, 2024 at 6:51 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Sunday, March 3, 2024 7:47 AM Peter Smith > wrote: > > > > > > > 3. > > > + > > > + > > > + Value * is not accepted as it is > > > inappropriate to > > > + block logical replication for physical slots that either lack > > > + associated standbys or have standbys associated that are > > > + not > > > enabled > > > + for replication slot synchronization. (see > > > + > > linkend="logicaldecoding-replication-slots-synchronization"/>). > > > + > > > + > > > > > > Why does the document need to provide an excuse/reason for the rule? > > > You could just say something like: > > > > > > SUGGESTION > > > The slots must be named explicitly. For example, specifying wildcard > > > values like * is not permitted. > > > > As suggested by Amit, I moved this to code comments. > > Was the total removal of this note deliberate? I only suggested removing the > *reason* for the rule, not the entire rule. Otherwise, the user won't know to > avoid doing this until they try it and get an error. OK, Added. > > > > > > > > 9. NeedToWaitForWal > > > > > > + /* > > > + * Check if the standby slots have caught up to the flushed position. > > > + It > > > + * is good to wait up to flushed position and then let it send the > > > + changes > > > + * to logical subscribers one by one which are already covered in > > > + flushed > > > + * position without needing to wait on every change for standby > > > + * confirmation. Note that after receiving the shutdown signal, an > > > + ERROR > > > + * is reported if any slots are dropped, invalidated, or inactive. > > > + This > > > + * measure is taken to prevent the walsender from waiting indefinitely. > > > + */ > > > + if (NeedToWaitForStandby(target_lsn, flushed_lsn, wait_event)) > > > + return true; > > > > > > I felt it was confusing things for this function to also call to the > > > other one -- it seems an overlapping/muddling of the purpose of these. > > > I think it will be easier to understand if the calling code just > > > calls to one or both of these functions as required. > > > > Same as Amit, I didn't change this. > > AFAICT my previous review comment was misinterpreted. Please see [1] for > more details. > > > > Here are some more review comments for v104-1 Thanks! > > == > Commit message > > 1. > Additionally, The SQL functions pg_logical_slot_get_changes, > pg_logical_slot_peek_changes and pg_replication_slot_advance are modified > to wait for the replication slots specified in 'standby_slot_names' to catch > up > before returning. > > ~ > > Maybe that should be expressed using similar wording as the docs... > > SUGGESTION > Additionally, The SQL functions ... are modified. Now, when used with > failover-enabled logical slots, these functions will block until all physical > slots > specified in 'standby_slot_names' have confirmed WAL receipt. Changed. > > == > doc/src/sgml/config.sgml > > 2. > +pg_logical_slot_peek_changes, > +when used with failover enabled logical slots, will block until all > +physical slots specified in > standby_slot_names have > +confirmed WAL receipt. > > /failover enabled logical slots/failover-enabled logical slots/ Changed. Note that for this comment and remaining comments, I used the later version we agreed(logical failover slot). > > == > doc/src/sgml/func.sgml > > 3. > +The function may be blocked if the specified slot is a failover > enabled > +slot and linkend="guc-standby-slot-names">standby_slot_names me> > +is configured. > > > /a failover enabled slot//a failover-enabled slot/ Changed. > > ~~~ > > 4. > +slot may return to an earlier position. The function may be blocked > if > +the specified slot is a failover enabled slot and > + linkend="guc-standby-slot-names">standby_slot_names me> > +is configured. > > /a failover enabled slot//a failover-enabled slot/ Changed. > > == > src/backend/replication/slot.c > > 5. > +/* > + * Wait for physical standbys to conf
RE: Synchronizing slots from primary to standby
On Monday, March 4, 2024 7:22 PM Bertrand Drouvot wrote: > > Hi, > > On Sun, Mar 03, 2024 at 07:56:32AM +0000, Zhijie Hou (Fujitsu) wrote: > > Here is the V104 patch which addressed above and Peter's comments. > > Thanks! > > A few more random comments: Thanks for the comments! > > 1 === > > +The function may be blocked if the specified slot is a failover > + enabled > > s/blocked/waiting/ ? Changed. > > 2 === > > +* specified slot when waiting for them to catch up. See > +* StandbySlotsHaveCaughtup for details. > > s/StandbySlotsHaveCaughtup/StandbySlotsHaveCaughtup()/ ? Changed. > > 3 === > > + /* Now verify if the specified slots really exist and have > + correct type */ > > remove "really"? Changed. > > 4 === > > + /* > +* Don't need to wait for the standbys to catch up if there is no > value in > +* standby_slot_names. > +*/ > + if (standby_slot_names_list == NIL) > + return true; > + > + /* > +* Don't need to wait for the standbys to catch up if we are on a > standby > +* server, since we do not support syncing slots to cascading > standbys. > +*/ > + if (RecoveryInProgress()) > + return true; > + > + /* > +* Don't need to wait for the standbys to catch up if they are already > +* beyond the specified WAL location. > +*/ > + if (!XLogRecPtrIsInvalid(standby_slot_oldest_flush_lsn) && > + standby_slot_oldest_flush_lsn >= wait_for_lsn) > + return true; > > What about using OR conditions instead? > > 5 === > > +static bool > +NeedToWaitForStandby(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn, > +uint32 *wait_event) > > Not a big deal but does it need to return a bool? (I mean it all depends of > the > *wait_event value). Is it for better code readability in the caller? > > 6 === > > +static bool > +NeedToWaitForWal(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn, > +uint32 *wait_event) > > Same questions as for NeedToWaitForStandby(). I also feel the current style looks a bit cleaner, so didn’t change these. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Saturday, March 2, 2024 6:55 PM Amit Kapila wrote: > > On Sat, Mar 2, 2024 at 9:21 AM Zhijie Hou (Fujitsu) > wrote: > > > > Apart from the comments, the code in WalSndWaitForWal was refactored a > > bit to make it neater. Thanks Shveta for helping writing the code and doc. > > > > A few more comments: Thanks for the comments. > == > 1. > +# Wait until the primary server logs a warning indicating that it is > +waiting # for the sb1_slot to catch up. > +$primary->wait_for_log( > + qr/replication slot \"sb1_slot\" specified in parameter > standby_slot_names does not have active_pid/, > + $offset); > > Shouldn't we wait for such a LOG even in the first test as well which > involves two > standbys and two logical subscribers? Yes, we should. Added. > > 2. > +## > +# Test that logical replication will wait for the user-created inactive > +# physical slot to catch up until we remove the slot from standby_slot_names. > +## > > > I don't see anything different tested in this test from what we already > tested in > the first test involving two standbys and two logical subscribers. Can you > please clarify if I am missing something? I think the intention was to test that the wait loop is ended due to GUC config reload, while the first test is for the case when the loop is ended due to restart_lsn movement. But it seems we tested the config reload with xx_get_changes() as well, so I can remove it if you agree. > > 3. > Note that after receiving the shutdown signal, an ERROR > + * is reported if any slots are dropped, invalidated, or inactive. This > + * measure is taken to prevent the walsender from waiting indefinitely. > + */ > + if (NeedToWaitForStandby(target_lsn, flushed_lsn, wait_event)) > > Isn't this part of the comment should be moved inside > NeedToWaitForStandby()? Moved. > > 4. > + /* > + * Update our idea of the currently flushed position only if we are > + * not waiting for standbys to catch up, otherwise the standby would > + * have to catch up to a newer WAL location in each cycle. > + */ > + if (wait_event != WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION) > + { > > This functionality (in function WalSndWaitForWal()) seems to ensure that we > first wait for the required WAL to be flushed and then wait for standbys. If > true, > we should cover that point in the comments here or somewhere in the function > WalSndWaitForWal(). > > Apart from this, I have made a few modifications in the comments. Thanks. I have reviewed and merged them. Here is the V104 patch which addressed above and Peter's comments. Best Regards, Hou zj v104-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch Description: v104-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch v104-0002-Document-the-steps-to-check-if-the-standby-is-r.patch Description: v104-0002-Document-the-steps-to-check-if-the-standby-is-r.patch
RE: Synchronizing slots from primary to standby
On Sunday, March 3, 2024 7:47 AM Peter Smith wrote: > > On Sat, Mar 2, 2024 at 2:51 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Friday, March 1, 2024 12:23 PM Peter Smith > wrote: > > > > ... > > > == > > > src/backend/replication/slot.c > > > > > > 2. validate_standby_slots > > > > > > + else if (!ReplicationSlotCtl) > > > + { > > > + /* > > > + * We cannot validate the replication slot if the replication slots' > > > + * data has not been initialized. This is ok as we will validate > > > + the > > > + * specified slot when waiting for them to catch up. See > > > + * StandbySlotsHaveCaughtup for details. > > > + */ > > > + } > > > + else > > > + { > > > + /* > > > + * If the replication slots' data have been initialized, verify if > > > + the > > > + * specified slots exist and are logical slots. > > > + */ > > > + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); > > > > > > IMO that 2nd comment does not need to say "If the replication slots' > > > data have been initialized," because that is implicit from the if/else. > > > > Removed. > > I only meant to suggest removing the 1st part, not the entire comment. > I thought it is still useful to have a comment like: > > /* Check that the specified slots exist and are logical slots.*/ OK, I misunderstood. Fixed. > > == > > And, here are some review comments for v103-0001. Thanks for the comments. > > == > Commit message > > 1. > Additionally, The SQL functions pg_logical_slot_get_changes, > pg_logical_slot_peek_changes and pg_replication_slot_advance are modified > to wait for the replication slots mentioned in 'standby_slot_names' to catch > up > before returning. > > ~ > > (use the same wording as previous in this message) > > /mentioned in/specified in/ Changed. > > == > doc/src/sgml/config.sgml > > 2. > +Additionally, when using the replication management functions > + > +pg_replication_slot_advance, > + > +pg_logical_slot_get_changes, and > + > +pg_logical_slot_peek_changes, > +with failover enabled logical slots, the functions will wait for the > +physical slots specified in > standby_slot_names to > +confirm WAL receipt before proceeding. > + > > It says "the ... functions" twice so maybe reword it slightly. > > BEFORE > Additionally, when using the replication management functions > pg_replication_slot_advance, pg_logical_slot_get_changes, and > pg_logical_slot_peek_changes, with failover enabled logical slots, the > functions > will wait for the physical slots specified in standby_slot_names to confirm > WAL > receipt before proceeding. > > SUGGESTION > Additionally, the replication management functions > pg_replication_slot_advance, pg_logical_slot_get_changes, and > pg_logical_slot_peek_changes, when used with failover enabled logical slots, > will wait for the physical slots specified in standby_slot_names to confirm > WAL > receipt before proceeding. > > (Actually the "will wait ... before proceeding" is also a bit tricky, so > below is > another possible rewording) > > SUGGESTION #2 > Additionally, the replication management functions > pg_replication_slot_advance, pg_logical_slot_get_changes, and > pg_logical_slot_peek_changes, when used with failover enabled logical slots, > will block until all physical slots specified in standby_slot_names have > confirmed WAL receipt. I prefer the #2 version. > > ~~~ > > 3. > + > + > + Value * is not accepted as it is inappropriate to > + block logical replication for physical slots that either lack > + associated standbys or have standbys associated that are not > enabled > + for replication slot synchronization. (see > + linkend="logicaldecoding-replication-slots-synchronization"/>). > + > + > > Why does the document need to provide an excuse/reason for the rule? > You could just say something like: > > SUGGESTION > The slots must be named explicitly. For example, specifying wildcard values > like > * is not permitted. As suggested by Amit, I moved this to code comments. > > == > doc/src/sgml/func.sgml > > 4. > @@ -28150,7 +28150,7 @@ postgres=# SELECT '0/0'::pg_lsn + > pd.segment_number * ps.setting::int + :offset > > > > -
RE: Synchronizing slots from primary to standby
On Thursday, February 29, 2024 11:16 AM Zhijie Hou (Fujitsu) wrote: > > On Wednesday, February 28, 2024 7:36 PM Bertrand Drouvot > wrote: > > > > 4 === > > > > Regarding the test, what about adding one to test the "new" behavior > > discussed up-thread? (logical replication will wait if slot mentioned > > in standby_slot_names is dropped and/or does not exist when the engine > > starts?) > > Will think about this. I think we could add a test to check the warning message for dropped slot, but since similar wait/warning functionality has been tested, I prefer to leave this for now and can consider it again after the main patch and tests are stabilized considering the previous experience of BF instability with this feature. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Friday, March 1, 2024 12:23 PM Peter Smith wrote: > > Here are some review comments for v102-0001. > > == > doc/src/sgml/config.sgml > > 1. > + > +Lists the streaming replication standby server slot names that > logical > +WAL sender processes will wait for. Logical WAL sender processes will > +send decoded changes to plugins only after the specified replication > +slots confirm receiving WAL. This guarantees that logical replication > +slots with failover enabled do not consume changes until those > changes > +are received and flushed to corresponding physical standbys. If a > +logical replication connection is meant to switch to a physical > standby > +after the standby is promoted, the physical replication slot for the > +standby should be listed here. Note that logical replication will not > +proceed if the slots specified in the standby_slot_names do > not exist or > +are invalidated. > + > > Should this also mention the effect this GUC has on those 2 SQL functions? > E.g. > Commit message says: > > Additionally, The SQL functions pg_logical_slot_get_changes and > pg_replication_slot_advance are modified to wait for the replication slots > mentioned in 'standby_slot_names' to catch up before returning. Added. > > == > src/backend/replication/slot.c > > 2. validate_standby_slots > > + else if (!ReplicationSlotCtl) > + { > + /* > + * We cannot validate the replication slot if the replication slots' > + * data has not been initialized. This is ok as we will validate the > + * specified slot when waiting for them to catch up. See > + * StandbySlotsHaveCaughtup for details. > + */ > + } > + else > + { > + /* > + * If the replication slots' data have been initialized, verify if the > + * specified slots exist and are logical slots. > + */ > + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); > > IMO that 2nd comment does not need to say "If the replication slots' > data have been initialized," because that is implicit from the if/else. Removed. > > ~~~ > > 3. GetStandbySlotList > > +List * > +GetStandbySlotList(void) > +{ > + if (RecoveryInProgress()) > + return NIL; > + else > + return standby_slot_names_list; > +} > > The 'else' is not needed. IMO is better without it, but YMMV. Removed. > > ~~~ > > 4. StandbySlotsHaveCaughtup > > +/* > + * Return true if the slots specified in standby_slot_names have caught > +up to > + * the given WAL location, false otherwise. > + * > + * The elevel parameter determines the error level used for logging > +messages > + * related to slots that do not exist, are invalidated, or are inactive. > + */ > +bool > +StandbySlotsHaveCaughtup(XLogRecPtr wait_for_lsn, int elevel) > > /determines/specifies/ Changed. > > ~ > > 5. > + /* > + * Don't need to wait for the standby to catch up if there is no value > + in > + * standby_slot_names. > + */ > + if (!standby_slot_names_list) > + return true; > + > + /* > + * If we are on a standby server, we do not need to wait for the > + standby to > + * catch up since we do not support syncing slots to cascading standbys. > + */ > + if (RecoveryInProgress()) > + return true; > + > + /* > + * Return true if all the standbys have already caught up to the passed > + in > + * WAL localtion. > + */ > + if (!XLogRecPtrIsInvalid(standby_slot_oldest_flush_lsn) && > + standby_slot_oldest_flush_lsn >= wait_for_lsn) return true; > > > 5a. > I felt all these comments should be worded in a consistent way like "Don't > need > to wait ..." > > e.g. > 1. Don't need to wait for the standbys to catch up if there is no value in > 'standby_slot_names'. > 2. Don't need to wait for the standbys to catch up if we are on a standby > server, > since we do not support syncing slots to cascading standbys. > 3. Don't need to wait for the standbys to catch up if they are already beyond > the specified WAL location. Changed. > > ~ > > 5b. > typo > /WAL localtion/WAL location/ > Fixed. > ~~~ > > 6. > + if (!slot) > + { > + /* > + * It may happen that the slot specified in standby_slot_names GUC > + * value is dropped, so let's skip over it. > + */ > + ereport(elevel, > + errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("replication slot \"%s\" specified in parameter %s does not exist", > +name, "standby_slot_names")); > + continue; > + } > > Is "is dropped" strictly the only reason? IIUC another reason is the slot > maybe > just did not even exist in the first place but it was not detected before now > because inititial validation was skipped. Changed the comment. > > ~~~ > > 7. > + /* > + * Return false if not all the standbys have caught up to the specified > + WAL > + * location. > + */ > + if (caught_up_slot_num != list_length(standby_slot_names_list)) > + return false; > > Somehow it seems complicated to have a counter of the slots as you process > then compare that counter to the list_length
RE: Synchronizing slots from primary to standby
On Friday, March 1, 2024 2:11 PM Masahiko Sawada wrote: > > On Fri, Mar 1, 2024 at 12:42 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Friday, March 1, 2024 10:17 AM Zhijie Hou (Fujitsu) > wrote: > > > > > > > > > Attach the V102 patch set which addressed Amit and Shveta's comments. > > > Thanks Shveta for helping addressing the comments off-list. > > > > The cfbot reported a compile warning, here is the new version patch > > which fixed it, Also removed some outdate comments in this version. > > > > I've reviewed the v102-0001 patch. Here are some comments: Thanks for the comments ! > > --- > I got a compiler warning: > > walsender.c:1829:6: warning: variable 'wait_event' is used uninitialized > whenever '&&' condition is false [-Wsometimes-uninitialized] > if (!XLogRecPtrIsInvalid(RecentFlushPtr) && > ^~~~ > walsender.c:1871:7: note: uninitialized use occurs here > if (wait_event == > WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL) > ^~ > walsender.c:1829:6: note: remove the '&&' if its condition is always true > if (!XLogRecPtrIsInvalid(RecentFlushPtr) && > ^~~ > walsender.c:1818:20: note: initialize the variable 'wait_event' to silence > this > warning > uint32 wait_event; > ^ >= 0 > 1 warning generated. Thanks for reporting, it was fixed in V102_2. > > --- > +void > +assign_standby_slot_names(const char *newval, void *extra) { > +List *standby_slots; > +MemoryContext oldcxt; > +char *standby_slot_names_cpy = extra; > + > > Given that the newval and extra have the same data (standby_slot_names > value), why do we not use newval instead? I think that if we use > newval, we don't need to guc_strdup() in check_standby_slot_names(), > we might need to do list_copy_deep() instead, though. It's not clear > to me as there is no comment. I think SplitIdentifierString will modify the passed in string, so we'd better not pass the newval to it, otherwise the stored guc string(standby_slot_names) will be changed. I can see we are doing similar thing in other GUC check/assign function as well. (check_wal_consistency_checking/ assign_wal_consistency_checking, check_createrole_self_grant/ assign_createrole_self_grant ...). > --- > +/* > + * Switch to the memory context under which GUC variables are > allocated > + * (GUCMemoryContext). > + */ > +oldcxt = > MemoryContextSwitchTo(GetMemoryChunkContext(standby_slot_names_cpy > )); > +standby_slot_names_list = list_copy(standby_slots); > +MemoryContextSwitchTo(oldcxt); > > Why do we not explicitly switch to GUCMemoryContext? I think it's because the GUCMemoryContext is not exposed. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Friday, March 1, 2024 10:17 AM Zhijie Hou (Fujitsu) wrote: > > > Attach the V102 patch set which addressed Amit and Shveta's comments. > Thanks Shveta for helping addressing the comments off-list. The cfbot reported a compile warning, here is the new version patch which fixed it, Also removed some outdate comments in this version. Best Regards, Hou zj v102_2-0001-Allow-logical-walsenders-to-wait-for-the-physi.patch Description: v102_2-0001-Allow-logical-walsenders-to-wait-for-the-physi.patch v102_2-0002-Document-the-steps-to-check-if-the-standby-is-r.patch Description: v102_2-0002-Document-the-steps-to-check-if-the-standby-is-r.patch
RE: Synchronizing slots from primary to standby
On Thursday, February 29, 2024 5:54 PM Amit Kapila wrote: > > On Thu, Feb 29, 2024 at 8:29 AM Peter Smith > wrote: > > > > On Wed, Feb 28, 2024 at 1:23 PM Zhijie Hou (Fujitsu) > > wrote: > > > > > > On Tuesday, February 27, 2024 3:18 PM Peter Smith > wrote: > > ... > > > > 20. > > > > +# > > > > +# | > standby1 (primary_slot_name = sb1_slot) # | > standby2 > > > > +(primary_slot_name = sb2_slot) # primary - | # | > subscriber1 > > > > +(failover = true) # | > subscriber2 (failover = false) > > > > > > > > In the diagram, the "--->" means a mixture of physical standbys and > logical > > > > pub/sub replication. Maybe it can be a bit clearer? > > > > > > > > SUGGESTION > > > > > > > > # primary (publisher) > > > > # > > > > # (physical standbys) > > > > # | > standby1 (primary_slot_name = sb1_slot) > > > > # | > standby2 (primary_slot_name = sb2_slot) > > > > # > > > > # (logical replication) > > > > # | > subscriber1 (failover = true, slot_name = lsub1_slot) > > > > # | > subscriber2 (failover = false, slot_name = lsub2_slot) > > > > > > > > > > I think one can distinguish it based on the 'standby' and 'subscriber' as > > > well, > because > > > 'standby' normally refer to physical standby while the other refer to > > > logical. > > > > > I think Peter's suggestion will make the setup clear. Changed. > > > > > Ok, but shouldn't it at least include info about the logical slot > > names associated with the subscribers (slot_name = lsub1_slot, > > slot_name = lsub2_slot) like suggested above? > > > > == > > > > Here are some more review comments for v100-0001 > > > > == > > doc/src/sgml/config.sgml > > > > 1. > > + > > +Lists the streaming replication standby server slot names that > logical > > +WAL sender processes will wait for. Logical WAL sender processes > will > > +send decoded changes to plugins only after the specified > replication > > +slots confirm receiving WAL. This guarantees that logical > > replication > > +slots with failover enabled do not consume changes until those > changes > > +are received and flushed to corresponding physical standbys. If a > > +logical replication connection is meant to switch to a physical > standby > > +after the standby is promoted, the physical replication slot for > > the > > +standby should be listed here. Note that logical replication will > > not > > +proceed if the slots specified in the standby_slot_names do > > not exist or > > +are invalidated. > > + > > > > Is that note ("Note that logical replication will not proceed if the > > slots specified in the standby_slot_names do not exist or are > > invalidated") meant only for subscriptions marked for 'failover' or > > any subscription? Maybe wording can be modified to help clarify it? > > > > I think it is implicit that here we are talking about failover slots. > I think clarifying again the same could be repetitive considering the > previous sentence: "This guarantees that logical replication slots > with failover enabled do not consume .." have mentioned it. > > > == > > src/backend/replication/slot.c > > > > 2. > > +/* > > + * A helper function to validate slots specified in GUC standby_slot_names. > > + */ > > +static bool > > +validate_standby_slots(char **newval) > > +{ > > + char*rawname; > > + List*elemlist; > > + bool ok; > > + > > + /* Need a modifiable copy of string */ > > + rawname = pstrdup(*newval); > > + > > + /* Verify syntax and parse string into a list of identifiers */ > > + ok = SplitIdentifierString(rawname, ',', ); > > + > > + if (!ok) > > + { > > + GUC_check_errdetail("List syntax is invalid."); > > + } > > + > > + /* > > + * If the replication slots' data have been initialized, verify if the > > + * specified slots exist and are logical slots. > > + */ > > + else if (ReplicationSlotCtl) > > + { > > + foreach_ptr(char, name, elemlist) > > + { > > + ReplicationSlot *slot; > > +
RE: Synchronizing slots from primary to standby
On Thursday, February 29, 2024 7:36 PM shveta malik wrote: > > On Thu, Feb 29, 2024 at 7:04 AM Zhijie Hou (Fujitsu) > wrote: > > > > Here is the v101 patch set which addressed above comments. > > > > Thanks for the patch. Few comments: > > 1) Shall we mention in doc that shutdown will wait for standbys in > standby_slot_names to confirm receiving WAL: > > Suggestion for logicaldecoding.sgml: > > When standby_slot_names is utilized, the primary > server will not completely shut down until the corresponding standbys, > associated with the physical replication slots specified in > standby_slot_names, have confirmed receiving the > WAL up to the latest flushed position on the primary server. > > slot.c > 2) > /* > * If a logical slot name is provided in standby_slot_names, report > * a message and skip it. Although logical slots are disallowed in > * the GUC check_hook(validate_standby_slots), it is still > * possible for a user to drop an existing physical slot and > * recreate a logical slot with the same name. > */ > > This is not completely true, we can still specify a logical slot during > instance > start and it will accept it. > > Suggestion: > /* > * If a logical slot name is provided in standby_slot_names, report > * a message and skip it. It is possible for user to specify a > * logical slot name in standby_slot_names just before the server > * startup. The GUC check_hook(validate_standby_slots) can not > * validate such a slot during startup as the ReplicationSlotCtl > * shared memory is not initialized by that time. It is also > * possible for user to drop an existing physical slot and > * recreate a logical slot with the same name. > */ > > 3. Wait for physical standby to confirm receiving the given lsn > > standby -->standbys > > > 4. > In StandbyConfirmedFlush(), is it better to have below errdetail in all > problematic cases: > Logical replication is waiting on the standby associated with \"%s\ > > We have it only for inactive pid case but we are waiting in all cases. Thanks for the comments, I have addressed them. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Thursday, February 29, 2024 7:17 PM Amit Kapila wrote: > > On Thu, Feb 29, 2024 at 3:23 PM Amit Kapila > wrote: > > > > Few additional comments on the latest patch: > > = > > 1. > > static XLogRecPtr > > WalSndWaitForWal(XLogRecPtr loc) > > { > > ... > > + if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr && > > + (!replication_active || StandbyConfirmedFlush(loc, WARNING))) { > > + /* > > + * Fast path to avoid acquiring the spinlock in case we already know > > + * we have enough WAL available and all the standby servers have > > + * confirmed receipt of WAL up to RecentFlushPtr. This is > > + particularly > > + * interesting if we're far behind. > > + */ > > return RecentFlushPtr; > > + } > > ... > > ... > > + * Wait for WALs to be flushed to disk only if the postmaster has not > > + * asked us to stop. > > + */ > > + if (loc > RecentFlushPtr && !got_STOPPING) wait_event = > > + WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL; > > + > > + /* > > + * Check if the standby slots have caught up to the flushed position. > > + * It is good to wait up to RecentFlushPtr and then let it send the > > + * changes to logical subscribers one by one which are already > > + covered > > + * in RecentFlushPtr without needing to wait on every change for > > + * standby confirmation. Note that after receiving the shutdown > > + signal, > > + * an ERROR is reported if any slots are dropped, invalidated, or > > + * inactive. This measure is taken to prevent the walsender from > > + * waiting indefinitely. > > + */ > > + else if (replication_active && > > + !StandbyConfirmedFlush(RecentFlushPtr, WARNING)) { wait_event = > > + WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION; > > + wait_for_standby = true; > > + } > > + else > > + { > > + /* Already caught up and doesn't need to wait for standby_slots. */ > > break; > > + } > > ... > > } > > > > Can we try to move these checks into a separate function that returns > > a boolean and has an out parameter as wait_event? > > > > 2. How about naming StandbyConfirmedFlush() as > StandbySlotsAreCaughtup? > > > > Some more comments: > == > 1. > + foreach_ptr(char, name, elemlist) > + { > + ReplicationSlot *slot; > + > + slot = SearchNamedReplicationSlot(name, true); > + > + if (!slot) > + { > + GUC_check_errdetail("replication slot \"%s\" does not exist", name); > + ok = false; break; } > + > + if (!SlotIsPhysical(slot)) > + { > + GUC_check_errdetail("\"%s\" is not a physical replication slot", > + name); ok = false; break; } } > > Won't the second check need protection via ReplicationSlotControlLock? Yes, added. > > 2. In WaitForStandbyConfirmation(), we are anyway calling > StandbyConfirmedFlush() before the actual sleep in the loop, so does calling > it > at the beginning of the function will serve any purpose? If so, it is better > to add > some comments explaining the same. It is used as a fast-path to avoid calling condition variable stuff, I think we can directly put failover check and list check in the beginning instead of calling that function. > > 3. Also do we need to perform the validation checks done in > StandbyConfirmedFlush() repeatedly when it is invoked in a loop? We can > probably separate those checks and perform them just once. I have removed slot.failover check from the StandbyConfirmedFlush function, so that we can do it when necessary. I didn’t change the check for standby_slot_names_list because the list could be changed in the loop when reloading config. > > 4. > + * > + * XXX: If needed, this can be attempted in future. > > Remove this part of the comment. Removed. Attach the V102 patch set which addressed Amit and Shveta's comments. Thanks Shveta for helping addressing the comments off-list. Best Regards, Hou zj v102-0002-Document-the-steps-to-check-if-the-standby-is-r.patch Description: v102-0002-Document-the-steps-to-check-if-the-standby-is-r.patch v102-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch Description: v102-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
RE: Synchronizing slots from primary to standby
On Wednesday, February 28, 2024 7:36 PM Bertrand Drouvot wrote: > > On Wed, Feb 28, 2024 at 02:23:27AM +, Zhijie Hou (Fujitsu) wrote: > > Attach the V100 patch set which addressed above comments. > > A few random comments: Thanks for the comments! > > 1 === > > + if (!ok) > + { > + GUC_check_errdetail("List syntax is invalid."); > + } > > What about to get rid of the brackets here? I personally prefer the current style. > > 2 === > > + > + /* > +* If the replication slots' data have been initialized, verify if the > +* specified slots exist and are logical slots. > +*/ > > remove the empty line above the comment? I feel it would be clean to have an empty line before the comments. > > 3 === > > +check_standby_slot_names(char **newval, void **extra, GucSource source) > +{ > + if ((*newval)[0] == '\0') > + return true; > > I think "**newval == '\0'" is easier to read but that's a matter of taste and > check_synchronous_standby_names() is already using the same so it's a nit. I don't have a strong opinion on this, so will change if others also feel so. > > 4 === > > Regarding the test, what about adding one to test the "new" behavior > discussed up-thread? (logical replication will wait if slot mentioned in > standby_slot_names is dropped and/or does not exist when the engine starts?) Will think about this. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Monday, February 26, 2024 7:52 PM Amit Kapila wrote: > > On Mon, Feb 26, 2024 at 7:49 AM Zhijie Hou (Fujitsu) > wrote: > > > > Attach the V98 patch set which addressed above comments. > > > > Few comments: > = > 1. > WalSndWaitForWal(XLogRecPtr loc) > { > int wakeEvents; > + bool wait_for_standby = false; > + uint32 wait_event; > + List*standby_slots = NIL; > static XLogRecPtr RecentFlushPtr = InvalidXLogRecPtr; > > + if (MyReplicationSlot->data.failover && replication_active) > + standby_slots = GetStandbySlotList(true); > + > /* > - * Fast path to avoid acquiring the spinlock in case we already know we > - * have enough WAL available. This is particularly interesting if we're > - * far behind. > + * Check if all the standby servers have confirmed receipt of WAL up to > + * RecentFlushPtr even when we already know we have enough WAL available. > + * > + * Note that we cannot directly return without checking the status of > + * standby servers because the standby_slot_names may have changed, > + which > + * means there could be new standby slots in the list that have not yet > + * caught up to the RecentFlushPtr. > */ > - if (RecentFlushPtr != InvalidXLogRecPtr && > - loc <= RecentFlushPtr) > - return RecentFlushPtr; > + if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr) { > + FilterStandbySlots(RecentFlushPtr, _slots); > > I think even if the slot list is not changed, we will always process each slot > mentioned in standby_slot_names once. Can't we cache the previous list of > slots for we have already waited for? In that case, we won't even need to copy > the list via GetStandbySlotList() unless we need to wait. > > 2. > WalSndWaitForWal(XLogRecPtr loc) > { > + /* > + * Update the standby slots that have not yet caught up to the flushed > + * position. It is good to wait up to RecentFlushPtr and then let it > + * send the changes to logical subscribers one by one which are > + * already covered in RecentFlushPtr without needing to wait on every > + * change for standby confirmation. > + */ > + if (wait_for_standby) > + FilterStandbySlots(RecentFlushPtr, _slots); > + > /* Update our idea of the currently flushed position. */ > - if (!RecoveryInProgress()) > + else if (!RecoveryInProgress()) > RecentFlushPtr = GetFlushRecPtr(NULL); > else > RecentFlushPtr = GetXLogReplayRecPtr(NULL); ... > /* > * If postmaster asked us to stop, don't wait anymore. > * > * It's important to do this check after the recomputation of > * RecentFlushPtr, so we can send all remaining data before shutting > * down. > */ > if (got_STOPPING) > break; > > I think because 'wait_for_standby' may not be set in the first or consecutive > cycles we may send the WAL to the logical subscriber before sending it to the > physical subscriber during shutdown. Here is the v101 patch set which addressed above comments. This version will cache the oldest standby slot's LSN each time we waited for them to catch up. The cached LSN is invalidated when we reload the GUC config. In the WalSndWaitForWal function, instead of traversing the entire standby list each time, we can check the cached LSN to quickly determine if the standbys have caught up. When a shutdown signal is received, we continue to wait for the standby slots to catch up. When waiting for the standbys to catch up after receiving the shutdown signal, an ERROR is reported if any slots are dropped, invalidated, or inactive. This measure is taken to prevent the walsender from waiting indefinitely. Best Regards, Hou zj v101-0002-Document-the-steps-to-check-if-the-standby-is-r.patch Description: v101-0002-Document-the-steps-to-check-if-the-standby-is-r.patch v101-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch Description: v101-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
RE: Synchronizing slots from primary to standby
On Wednesday, February 28, 2024 2:38 PM Bertrand Drouvot wrote: > On Wed, Feb 28, 2024 at 08:49:19AM +0530, Amit Kapila wrote: > > On Mon, Feb 26, 2024 at 9:13 AM shveta malik > wrote: > > > > > > On Fri, Feb 23, 2024 at 7:41 PM Bertrand Drouvot > > > wrote: > > > > > > > > Hi, > > > > > I think to set secure search path for remote connection, the > > > > > standard approach could be to extend the code in > > > > > libpqrcv_connect[1], so that we don't need to schema qualify all the > operators in the queries. > > > > > > > > > > And for local connection, I agree it's also needed to add a > > > > > SetConfigOption("search_path", "" call in the slotsync worker. > > > > > > > > > > [1] > > > > > libpqrcv_connect > > > > > ... > > > > > if (logical) > > > > > ... > > > > > res = libpqrcv_PQexec(conn->streamConn, > > > > > > > > > > ALWAYS_SECURE_SEARCH_PATH_SQL); > > > > > > > > > > > > > Agree, something like in the attached? (it's .txt to not disturb the CF > > > > bot). > > > > > > Thanks for the patch, changes look good. I have corporated it in the > > > patch which addresses the rest of your comments in [1]. I have > > > attached the patch as .txt > > > > > > > Few comments: > > === > > 1. > > - if (logical) > > + if (logical || !replication) > > { > > > > Can we add a comment about connection types that require > > ALWAYS_SECURE_SEARCH_PATH_SQL? > > Yeah, will do. > > > > > 2. > > Can we add a test case to demonstrate that the '=' operator can be > > hijacked to do different things when the slotsync worker didn't use > > ALWAYS_SECURE_SEARCH_PATH_SQL? > > I don't think that's good to create a test to show how to hijack an operator > within a background worker. > > I had a quick look and did not find existing tests in this area around > ALWAYS_SECURE_SEARCH_PATH_SQL / search_patch and background worker. I think a similar commit 11da970 has added a test for the search_path, e.g. # Create some preexisting content on publisher $node_publisher->safe_psql( 'postgres', "CREATE FUNCTION public.pg_get_replica_identity_index(int) RETURNS regclass LANGUAGE sql AS 'SELECT 1/0'");# shall not call Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Tuesday, February 27, 2024 3:18 PM Peter Smith wrote: > > Here are some review comments for v99-0001 Thanks for the comments! > Commit Message > > 1. > A new parameter named standby_slot_names is introduced. > > Maybe quote the GUC names here to make it more readable. Added. > > ~~ > > 2. > Additionally, The SQL functions pg_logical_slot_get_changes and > pg_replication_slot_advance are modified to wait for the replication slots > mentioned in standby_slot_names to catch up before returning the changes to > the user. > > ~ > > 2a. > "pg_replication_slot_advance" is a typo? Did you mean > pg_logical_replication_slot_advance? pg_logical_replication_slot_advance is not a user visible function. So the pg_replication_slot_advance is correct. > > ~ > > 2b. > The "before returning the changes to the user" seems like it is referring > only to > the first function. > > Maybe needs slight rewording like: > /before returning the changes to the user./ before returning./ Changed. > > == > doc/src/sgml/config.sgml > > 3. standby_slot_names > > + > +List of physical slots guarantees that logical replication slots with > +failover enabled do not consume changes until those changes > are received > +and flushed to corresponding physical standbys. If a logical > replication > +connection is meant to switch to a physical standby after the > standby is > +promoted, the physical replication slot for the standby > should be listed > +here. Note that logical replication will not proceed if the slots > +specified in the standby_slot_names do not exist or are invalidated. > + > > The wording doesn't seem right. IMO this should be worded much like how this > GUC is described in guc_tables.c > > e.g something a bit like: > > Lists the streaming replication standby server slot names that logical WAL > sender processes will wait for. Logical WAL sender processes will send > decoded changes to plugins only after the specified replication slots confirm > receiving WAL. This guarantees that logical replication slots with failover > enabled do not consume changes until those changes are received and flushed > to corresponding physical standbys... Changed. > > == > doc/src/sgml/logicaldecoding.sgml > > 4. Section 48.2.3 Replication Slot Synchronization > > + It's also highly recommended that the said physical replication slot > + is named in > + linkend="guc-standby-slot-names">standby_slot_names me> > + list on the primary, to prevent the subscriber from consuming changes > + faster than the hot standby. But once we configure it, then > certain latency > + is expected in sending changes to logical subscribers due to wait on > + physical replication slots in > + + > linkend="guc-standby-slot-names">standby_slot_names me> > + > > 4a. > /It's also highly/It is highly/ > > ~ > > 4b. > > BEFORE > But once we configure it, then certain latency is expected in sending changes > to logical subscribers due to wait on physical replication slots in linkend="guc-standby-slot-names">standby_slot_names me> > > SUGGESTION > Even when correctly configured, some latency is expected when sending > changes to logical subscribers due to the waiting on slots named in > standby_slot_names. Changed. > > == > .../replication/logical/logicalfuncs.c > > 5. pg_logical_slot_get_changes_guts > > + if (XLogRecPtrIsInvalid(upto_lsn)) > + wait_for_wal_lsn = end_of_wal; > + else > + wait_for_wal_lsn = Min(upto_lsn, end_of_wal); > + > + /* > + * Wait for specified streaming replication standby servers (if any) > + * to confirm receipt of WAL up to wait_for_wal_lsn. > + */ > + WaitForStandbyConfirmation(wait_for_wal_lsn); > > Perhaps those statements all belong together with the comment up-front. e.g. > > + /* > + * Wait for specified streaming replication standby servers (if any) > + * to confirm receipt of WAL up to wait_for_wal_lsn. > + */ > + if (XLogRecPtrIsInvalid(upto_lsn)) > + wait_for_wal_lsn = end_of_wal; > + else > + wait_for_wal_lsn = Min(upto_lsn, end_of_wal); > + WaitForStandbyConfirmation(wait_for_wal_lsn); Changed. > > == > src/backend/replication/logical/slotsync.c > > == > src/backend/replication/slot.c > > 6. > +static bool > +validate_standby_slots(char **newval) > +{ > + char*rawname; > + List*elemlist; > + ListCell *lc; > + bool ok; > + > + /* Need a modifiable copy of string */ rawname = pstrdup(*newval); > + > + /* Verify syntax and parse string into a list of identifiers */ ok = > + SplitIdentifierString(rawname, ',', ); > + > + if (!ok) > + { > + GUC_check_errdetail("List syntax is invalid."); } > + > + /* > + * If the replication slots' data have been initialized, verify if the > + * specified slots exist and are logical slots. > + */ > + else if (ReplicationSlotCtl) > + { > + foreach(lc, elemlist) > > 6a. > So, if the ReplicationSlotCtl is
RE: Synchronizing slots from primary to standby
On Monday, February 26, 2024 1:19 PM Amit Kapila wrote: > > On Fri, Feb 23, 2024 at 4:45 PM Bertrand Drouvot > wrote: > > > > On Fri, Feb 23, 2024 at 09:46:00AM +, Zhijie Hou (Fujitsu) wrote: > > > > > > Besides, I'd like to clarify and discuss the behavior of > > > standby_slot_names > once. > > > > > > As it stands in the patch, If the slots specified in > > > standby_slot_names are dropped or invalidated, the logical walsender > > > will issue a WARNING and continue to replicate the changes. Another > > > option for this could be to have the walsender pause until the slot > > > in standby_slot_names is re-created or becomes valid again. Does anyone > else have an opinion on this matter ? > > > > Good point, I'd vote for: the only reasons not to wait are: > > > > - slots mentioned in standby_slot_names exist and valid and do catch > > up or > > - standby_slot_names is empty > > > > The reason is that setting standby_slot_names to a non empty value > > means that one wants the walsender to wait until the standby catchup. > > The way to remove this intentional behavior should be by changing the > > standby_slot_names value (not the existence or the state of the slot(s) it > points too). > > > > It seems we already do wait for the case when there is an inactive slot as > per the > below code [1] in the patch. So, probably waiting in other cases is also okay > and > also as this parameter is a SIGHUP parameter, users should be easily able to > change its value if required. Do you think it is a good idea to mention this > in > docs as well? > > I think it is important to raise WARNING as the patch is doing in all the > cases > where the slot is not being processed so that users can be notified and they > can > take the required action. Agreed. Here is the V99 patch which addressed the above. This version also includes: 1. list_free the slot list when reloading the list due to GUC change. 2. Refactored the validate_standby_slots based on Shveta's suggestion. 3. Added errcode for the warnings as most of existing have errcodes. Amit's latest comments[1] are pending, we will address that in next version. [1] https://www.postgresql.org/message-id/CAA4eK1LJdmGATWG%3DxOD1CB9cogukk2cLNBGH8h-n-ZDJuwBdJg%40mail.gmail.com Best Regards, Hou zj v99-0002-Document-the-steps-to-check-if-the-standby-is-re.patch Description: v99-0002-Document-the-steps-to-check-if-the-standby-is-re.patch v99-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch Description: v99-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch
RE: Synchronizing slots from primary to standby
On Friday, February 23, 2024 6:12 PM Bertrand Drouvot wrote: > > Here are some random comments: Thanks for the comments! > > 1 === > > Commit message "Allow logical walsenders to wait for the physical" > > s/physical/physical standby/? > > 2 == > > +++ b/src/backend/replication/logical/logicalfuncs.c > @@ -30,6 +30,7 @@ > #include "replication/decode.h" > #include "replication/logical.h" > #include "replication/message.h" > +#include "replication/walsender.h" > > Is this include needed? Removed. > > 3 === > > +* Slot sync is currently not supported on the cascading > + standby. This is > > s/on the/on a/? Changed. > > 4 === > > + if (!ok) > + GUC_check_errdetail("List syntax is invalid."); > + > + /* > +* If there is a syntax error in the name or if the replication slots' > +* data is not initialized yet (i.e., we are in the startup process), > skip > +* the slot verification. > +*/ > + if (!ok || !ReplicationSlotCtl) > + { > + pfree(rawname); > + list_free(elemlist); > + return ok; > + } > > we are testing the "ok" value twice, what about using if...else if... instead > and > test it once? If so, it might be worth to put the: > > " > + pfree(rawname); > + list_free(elemlist); > + return ok; > " > > in a "goto". There were comments to remove the 'goto' statement and avoid duplicate free code, so I prefer the current style. > > 5 === > > +* for which all standbys to wait for. Even if we have > + physical-slots > > s/physical-slots/physical slots/? Changed. > > 6 === > > * Switch to the same memory context under which GUC variables are > > s/to the same memory/to the memory/? Changed. > > 7 === > > + * Return a copy of standby_slot_names_list if the copy flag is set to > + true, > > Not sure, but would it be worth explaining why one would want to set to flag > to > true or false? (i.e why one would not want to receive the original list). I think the usage can be found from the caller's code, e.g we need to remove the slots that caught up from the list each time, so we cannot directly modify the global list. The GetStandbySlotList function is general function and I feel we can avoid adding more comments here. > > 8 === > > + if (RecoveryInProgress()) > + return NIL; > > The need is well documented just above, but are we not violating the fact that > we return the original list or a copy of it? (that's what the comment above > the > GetStandbySlotList() function definition is saying). > > I think the comment above the GetStandbySlotList() function needs a bit of > rewording to cover that case. Adjusted. > > 9 === > > +* harmless, a WARNING should be enough, no need to > error-out. > > s/error-out/error out/? Changed. > > 10 === > > + if (slot->data.invalidated != RS_INVAL_NONE) > + { > + /* > +* Specified physical slot have been > invalidated, > so no point > +* in waiting for it. > > We discovered in [1], that if the wal_status is "unreserved" then the slot is > still > serving the standby. I think we should handle this case differently, thoughts? I think the 'invalidated' slot can still be used is a separate bug. Because once the slot is invalidated, it can neither protect WALs or ROWs from being removed even if the restart_lsn of the slot can be moved forward after being invalidated. If the standby can move restart_lsn forward for invalidated slots, then it should also set the 'invalidated' flag back to NONE, otherwise the slot cannot serve its purpose anymore. I also reported similar bug before[1]. > > 11 === > > +* Specified physical slot have been > + invalidated, so no point > > s/have been/has been/? Changed. > > 12 === > > +++ b/src/backend/replication/slotfuncs.c > @@ -22,6 +22,7 @@ > #include "replication/logical.h" > #include "replication/slot.h" > #include "replication/slotsync.h" > +#include "replication/walsender.h" > > Is this include needed? No, it's not needed. Removed. Attach the V98 patch set which addressed above comments. I also adjusted few comments based on off-list comments from Shveta. The discussion for wait behavior is on-going, so I didn't change the behavior in this version. [1] https://www.postgresql.org/message-id/flat/os0pr01mb5716a626a4af5814e057cee394...@os0pr01mb5716.jpnprd01.prod.outlook.com Best Regards, Hou zj v98-0002-Document-the-steps-to-check-if-the-standby-is-re.patch Description: v98-0002-Document-the-steps-to-check-if-the-standby-is-re.patch v98-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch Description: v98-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch
RE: Synchronizing slots from primary to standby
On Friday, February 23, 2024 1:22 PM shveta malik wrote: > > Thanks for the patches. Had a quick look at v95_2, here are some > trivial comments: Thanks for the comments. > 6) streaming replication standby server slot names that logical walsender > processes will wait for > > Is it better to say it like this? (I leave this to your preference) > > streaming replication standby server slot names for which logical > walsender processes will wait. I feel the current one seems better, so didn’t change. Other comments have been addressed. Here is the V97 patch set which addressed Shveta's comments. Besides, I'd like to clarify and discuss the behavior of standby_slot_names once. As it stands in the patch, If the slots specified in standby_slot_names are dropped or invalidated, the logical walsender will issue a WARNING and continue to replicate the changes. Another option for this could be to have the walsender pause until the slot in standby_slot_names is re-created or becomes valid again. Does anyone else have an opinion on this matter ? Best Regards, Hou zj v97-0002-Document-the-steps-to-check-if-the-standby-is-re.patch Description: v97-0002-Document-the-steps-to-check-if-the-standby-is-re.patch v97-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch Description: v97-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch
RE: Synchronizing slots from primary to standby
On Friday, February 23, 2024 5:07 PM Bertrand Drouvot wrote: > > On Fri, Feb 23, 2024 at 02:15:11PM +0530, shveta malik wrote: > > On Fri, Feb 23, 2024 at 1:28 PM Bertrand Drouvot > > wrote: > > > > > > Hi, > > > > > > Because one could create say the "=" OPERATOR in their own schema, > > > attach a function to it doing undesired stuff and change the > > > search_path for the database the sync slot worker connects to. > > > > > > Then this new "=" operator would be used (instead of the > > > pg_catalog.= one), triggering the "undesired" function as superuser. > > > > Thanks for the details. I understand it now. We do not use '=' in our > > main slots-fetch query but we do use '=' in remote-validation query. > > See validate_remote_info(). > > Oh, right, I missed it during the review. > > > Do you think instead of doing the above, we can override search-path > > with empty string in the slot-sync case. > > SImilar to logical apply worker and autovacuum worker case (see > > InitializeLogRepWorker(), AutoVacWorkerMain()). > > Yeah, we should definitively ensure that any operators being used in the query > is coming from the pg_catalog schema (could be by setting the search path or > using the up-thread proposal). > > Setting the search path would prevent any risks in case the query is changed > later on, so I'd vote for changing the search path in validate_remote_info() > and > in synchronize_slots() to be on the safe side. I think to set secure search path for remote connection, the standard approach could be to extend the code in libpqrcv_connect[1], so that we don't need to schema qualify all the operators in the queries. And for local connection, I agree it's also needed to add a SetConfigOption("search_path", "" call in the slotsync worker. [1] libpqrcv_connect ... if (logical) ... res = libpqrcv_PQexec(conn->streamConn, ALWAYS_SECURE_SEARCH_PATH_SQL); Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Friday, February 23, 2024 10:18 AM Zhijie Hou (Fujitsu) wrote: > > > > Hi, > > > > Since the slotsync worker patch has been committed, I rebased the > > remaining patches. > > And here is the V95 patch set. > > > > Also, I fixed a bug in the current 0001 patch where the member of the > > standby slot names list pointed to the freed memory after calling > ProcessConfigFile(). > > Now, we will obtain a new list when we call ProcessConfigFile(). The > > optimization to only get the new list when the names actually change > > has been removed. I think this change is acceptable because > > ProcessConfigFile is not a frequent occurrence. > > > > Additionally, I reordered the tests in > > 040_standby_failover_slots_sync.pl. Now the new test will be conducted > > after the sync slot test to prevent the risk of the logical slot > > occasionally not catching up to the latest catalog_xmin and, as a result, > > not > being able to be synced immediately. > > There is one unexpected change in the previous version, sorry for that. > Here is the correct version. I noticed one CFbot failure[1] which is because the tap-test doesn't wait for the standby to catch up before promoting, thus the data inserted after promotion could not be replicated to the subscriber. Add a wait_for_replay_catchup to fix it. Apart from this, I also adjusted some variable names in the tap-test to be consistent. And added back a mis-removed ProcessConfigFile call. [1] https://cirrus-ci.com/task/6126787437002752?logs=check_world#L312 Best Regards, Hou zj v96-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch Description: v96-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch v96-0002-Document-the-steps-to-check-if-the-standby-is-re.patch Description: v96-0002-Document-the-steps-to-check-if-the-standby-is-re.patch
RE: Synchronizing slots from primary to standby
On Friday, February 23, 2024 10:02 AM Zhijie Hou (Fujitsu) wrote: > > Hi, > > Since the slotsync worker patch has been committed, I rebased the remaining > patches. > And here is the V95 patch set. > > Also, I fixed a bug in the current 0001 patch where the member of the standby > slot names list pointed to the freed memory after calling ProcessConfigFile(). > Now, we will obtain a new list when we call ProcessConfigFile(). The > optimization to only get the new list when the names actually change has been > removed. I think this change is acceptable because ProcessConfigFile is not a > frequent occurrence. > > Additionally, I reordered the tests in 040_standby_failover_slots_sync.pl. Now > the new test will be conducted after the sync slot test to prevent the risk > of the > logical slot occasionally not catching up to the latest catalog_xmin and, as a > result, not being able to be synced immediately. There is one unexpected change in the previous version, sorry for that. Here is the correct version. Best Regards, Hou zj v95_2-0002-Document-the-steps-to-check-if-the-standby-is-r.patch Description: v95_2-0002-Document-the-steps-to-check-if-the-standby-is-r.patch v95_2-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch Description: v95_2-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
RE: Synchronizing slots from primary to standby
Hi, Since the slotsync worker patch has been committed, I rebased the remaining patches. And here is the V95 patch set. Also, I fixed a bug in the current 0001 patch where the member of the standby slot names list pointed to the freed memory after calling ProcessConfigFile(). Now, we will obtain a new list when we call ProcessConfigFile(). The optimization to only get the new list when the names actually change has been removed. I think this change is acceptable because ProcessConfigFile is not a frequent occurrence. Additionally, I reordered the tests in 040_standby_failover_slots_sync.pl. Now the new test will be conducted after the sync slot test to prevent the risk of the logical slot occasionally not catching up to the latest catalog_xmin and, as a result, not being able to be synced immediately. Best Regards, Hou zj v95-0002-Document-the-steps-to-check-if-the-standby-is-r.patch Description: v95-0002-Document-the-steps-to-check-if-the-standby-is-r.patch v95-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch Description: v95-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
RE: Synchronizing slots from primary to standby
On Thursday, February 22, 2024 8:41 PM Amit Kapila wrote: > > On Thu, Feb 22, 2024 at 5:23 PM Amit Kapila > wrote: > > > > On Thu, Feb 22, 2024 at 4:35 PM Bertrand Drouvot > > wrote: > > > > > > On Thu, Feb 22, 2024 at 04:01:34PM +0530, shveta malik wrote: > > > > On Thu, Feb 22, 2024 at 3:44 PM Bertrand Drouvot > > > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > Thanks! > > > > > > > > > > Some random comments about v92_001 (Sorry if it has already been > > > > > discussed > > > > > up-thread): > > > > > > > > Thanks for the feedback. The patch is pushed 15 minutes back. > > > > > > Yeah, saw that after I send the comments ;-) > > > > > > > There is a BF failure. See > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-0 > 2-22%2010%3A13%3A03. > > > > The initial analysis suggests that for some reason, the primary went > > down after the slot sync worker was invoked the first time. See the > > below in the primary's LOG: > > > > The reason is that the test failed waiting on below LOG: > > ### Reloading node "standby1" > # Running: pg_ctl -D > /home/ec2-user/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_ > 040_standby_failover_slots_sync_standby1_data/pgdata > reload > server signaled > timed out waiting for match: (?^:LOG: slot sync worker started) at > t/040_standby_failover_slots_sync.pl line 376. > > Now, on standby, we see a LOG like 2024-02-22 10:57:35.432 UTC [2721638:1] > LOG: 0: slot sync worker started. Even then the test failed and the > reason is > that it has an extra before the actual message which is due to > log_error_verbosity = verbose in config. I think here the test's log matching > code needs to have a more robust log line matching code. Agreed. Here is a small patch to change the msg in wait_for_log so that it only search the message part. Best Regards, Hou zj 0001-Make-recovery-test-pass-with-log_error_verbosity-ver.patch Description: 0001-Make-recovery-test-pass-with-log_error_verbosity-ver.patch
RE: Synchronizing slots from primary to standby
On Monday, February 19, 2024 11:39 AM shveta malik wrote: > > On Sun, Feb 18, 2024 at 7:40 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Friday, February 16, 2024 6:41 PM shveta malik > wrote: > > Thanks for the patch. Here are few comments: > > Thanks for the comments. > > > > > 2. > > > > +static bool > > +validate_remote_info(WalReceiverConn *wrconn, int elevel) > > ... > > + > > + return (!remote_in_recovery && primary_slot_valid); > > > > The primary_slot_valid could be uninitialized in this function. > > return (!remote_in_recovery && primary_slot_valid); > > Here if remote_in_recovery is true, it will not even read primary_slot_valid. > It > will read primary_slot_valid only if remote_in_recovery is false and in such a > case primary_slot_valid will always be initialized in the else block above, > let me > know if you still feel we shall initialize this to some default? I understand that it will not be used, but some complier could report WARNING for the un-initialized variable. The cfbot[1] seems complain about this as well. [1] https://cirrus-ci.com/task/5416851522453504 Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Friday, February 16, 2024 6:41 PM shveta malik wrote: > > On Thu, Feb 15, 2024 at 10:48 PM Bertrand Drouvot > wrote: > > > > Looking at v88_0001, random comments: > > Thanks for the feedback. > > > > > 1 === > > > > Commit message "Be enabling slot synchronization" > > > > Typo? s:Be/By > > Modified. > > > 2 === > > > > +It enables a physical standby to synchronize logical failover slots > > +from the primary server so that logical subscribers are not blocked > > +after failover. > > > > Not sure "not blocked" is the right wording. > > "can be resumed from the new primary" maybe? (was discussed in [1]) > > Modified. > > > 3 === > > > > +#define SlotSyncWorkerAllowed()\ > > + (sync_replication_slots && pmState == PM_HOT_STANDBY && \ > > +SlotSyncWorkerCanRestart()) > > > > Maybe add a comment above the macro explaining the logic? > > Done. > > > 4 === > > > > +#include "replication/walreceiver.h" > > #include "replication/slotsync.h" > > > > should be reverse order? > > Removed walreceiver.h inclusion as it was not needed. > > > 5 === > > > > + if (SlotSyncWorker->syncing) > > { > > - SpinLockRelease(>mutex); > > + SpinLockRelease(>mutex); > > ereport(ERROR, > > > errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > errmsg("cannot synchronize replication > slots concurrently")); > > } > > > > worth to add a test in 040_standby_failover_slots_sync.pl for it? > > It will be very difficult to stabilize this test as we have to make sure that > the > concurrent users (SQL function(s) and/or worker(s)) are in that target > function at > the same time to hit it. > > > > > 6 === > > > > +static void > > +slotsync_reread_config(bool restart) > > +{ > > > > worth to add test(s) in 040_standby_failover_slots_sync.pl for it? > > Added test. > > Please find v89 patch set. The other changes are: Thanks for the patch. Here are few comments: 1. +static char * +get_dbname_from_conninfo(const char *conninfo) +{ + static char *dbname; + + if (dbname) + return dbname; + else + dbname = walrcv_get_dbname_from_conninfo(conninfo); + + return dbname; +} I think it's not necessary to have a static variable here, because the guc check doesn't seem performance sensitive. Additionaly, it does not work well with slotsync SQL functions, because the dbname will be allocated in temp memory context when reaching here via SQL function, so it's not safe to access this static variable in next function call. 2. +static bool +validate_remote_info(WalReceiverConn *wrconn, int elevel) ... + + return (!remote_in_recovery && primary_slot_valid); The primary_slot_valid could be uninitialized in this function. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Friday, February 16, 2024 3:43 PM Amit Kapila wrote: > > On Fri, Feb 16, 2024 at 11:43 AM Amit Kapila wrote: > > > > Thanks for noticing this. I have pushed all your debug patches. Let's > > hope if there is a BF failure next time, we can gather enough > > information to know the reason of the same. > > > > There is a new BF failure [1] after adding these LOGs and I think I know what > is > going wrong. First, let's look at standby LOGs: > > 2024-02-16 06:18:18.442 UTC [241414][client backend][2/14:0] DEBUG: > segno: 4 of purposed restart_lsn for the synced slot, oldest_segno: 4 > available > 2024-02-16 06:18:18.443 UTC [241414][client backend][2/14:0] DEBUG: > xmin required by slots: data 0, catalog 741 > 2024-02-16 06:18:18.443 UTC [241414][client backend][2/14:0] LOG:mote could > not sync slot information as reslot precedes local slot: remote slot > "lsub1_slot": > LSN (0/4000168), catalog xmin (739) local slot: LSN (0/4000168), catalog xmin > (741) > > So, from the above LOG, it is clear that the remote slot's catalog xmin (739) > precedes the local catalog xmin (741) which makes the sync on standby to not > complete. > > Next, let's look at the LOG from the primary during the nearby time: > 2024-02-16 06:18:11.354 UTC [238037][autovacuum worker][5/17:0] DEBUG: > analyzing "pg_catalog.pg_depend" > 2024-02-16 06:18:11.360 UTC [238037][autovacuum worker][5/17:0] DEBUG: > "pg_depend": scanned 13 of 13 pages, containing 1709 live rows and 0 dead > rows; 1709 rows in sample, 1709 estimated total rows ... > 2024-02-16 06:18:11.372 UTC [238037][autovacuum worker][5/0:0] DEBUG: > Autovacuum VacuumUpdateCosts(db=1, rel=14050, dobalance=yes, > cost_limit=200, cost_delay=2 active=yes failsafe=no) > 2024-02-16 06:18:11.372 UTC [238037][autovacuum worker][5/19:0] DEBUG: > analyzing "information_schema.sql_features" > 2024-02-16 06:18:11.377 UTC [238037][autovacuum worker][5/19:0] DEBUG: > "sql_features": scanned 8 of 8 pages, containing 756 live rows and 0 dead > rows; > 756 rows in sample, 756 estimated total rows > > It shows us that autovacuum worker has analyzed catalog table and for updating > its statistics in pg_statistic table, it would have acquired a new > transaction id. Now, > after the slot creation, a new transaction id that has updated the catalog is > generated on primary and would have been replication to standby. Due to this > catalog_xmin of primary's slot would precede standby's catalog_xmin and we see > this failure. > > As per this theory, we should disable autovacuum on primary to avoid updates > to > catalog_xmin values. > Agreed. Here is the patch to disable autovacuum in the test. Best Regards, Hou zj 0001-Disable-autovacuum-on-primary-to-stabilize-the-040_s.patch Description: 0001-Disable-autovacuum-on-primary-to-stabilize-the-040_s.patch
RE: Synchronizing slots from primary to standby
On Friday, February 16, 2024 8:33 AM Zhijie Hou (Fujitsu) wrote: > On Thursday, February 15, 2024 8:29 PM Amit Kapila > wrote: > > > > > On Thu, Feb 15, 2024 at 5:46 PM Bertrand Drouvot > > wrote: > > > > > > On Thu, Feb 15, 2024 at 05:00:18PM +0530, Amit Kapila wrote: > > > > On Thu, Feb 15, 2024 at 4:29 PM Zhijie Hou (Fujitsu) > > > > wrote: > > > > > Attach the v2 patch here. > > > > > > > > > > Apart from the new log message. I think we can add one more > > > > > debug message in reserve_wal_for_local_slot, this could be > > > > > useful to analyze the > > failure. > > > > > > > > Yeah, that can also be helpful, but the added message looks naive to me. > > > > + elog(DEBUG1, "segno: %ld oldest_segno: %ld", oldest_segno, > > > > + segno); > > > > > > > > Instead of the above, how about something like: "segno: %ld of > > > > purposed restart_lsn for the synced slot, oldest_segno: %ld > > > > available"? > > > > > > Looks good to me. I'm not sure if it would make more sense to elog > > > only if segno < oldest_segno means just before the > > XLogSegNoOffsetToRecPtr() call? > > > > > > But I'm fine with the proposed location too. > > > > > > > I am also fine either way but the current location gives required > > information in more number of cases and could be helpful in debugging this > new facility. > > > > > > > > > > > And we > > > > > can also enable the DEBUG log in the 040 tap-test, I see we have > > > > > similar setting in 010_logical_decoding_timline and logging > > > > > debug1 message doesn't increase noticable time on my machine. > > > > > These are done > > in 0002. > > > > > > > > > > > > > I haven't tested it but I think this can help in debugging BF > > > > failures, if any. I am not sure if to keep it always like that but > > > > till the time these tests are stabilized, this sounds like a good > > > > idea. So, how, about just making test changes as a separate patch > > > > so that later if required we can revert/remove it easily? > > > > Bertrand, do you have any thoughts on this? > > > > > > +1 on having DEBUG log in the 040 tap-test until it's stabilized (I > > > +think we > > > took the same approach for 035_standby_logical_decoding.pl IIRC) and > > > then revert it back. > > > > > > > Good to know! > > > > > Also I was thinking: what about adding an output to > > pg_sync_replication_slots()? > > > The output could be the number of sync slots that have been created > > > and are not considered as sync-ready during the execution. > > > > > > > Yeah, we can consider outputting some information via this function > > like how many slots are synced and persisted but not sure what would > > be appropriate here. Because one can anyway find that or more > > information by querying pg_replication_slots. I think we can keep > > discussing what makes more sense as a return value but let's commit > > the debug/log patches as they will be helpful to analyze BF failures or any > other issues reported. > > Agreed. Here is new patch set as suggested. I used debug2 in the 040 as it > could > provide more information about communication between primary and standby. > This also doesn't increase noticeable testing time on my machine for debug > build. Sorry, there was a miss in the DEBUG message, I should have used UINT64_FORMAT for XLogSegNo(uint64) instead of %ld. Here is a small patch to fix this. Best Regards, Hou zj 0001-fix-the-format-for-uint64.patch Description: 0001-fix-the-format-for-uint64.patch
RE: Synchronizing slots from primary to standby
On Thursday, February 15, 2024 8:29 PM Amit Kapila wrote: > > On Thu, Feb 15, 2024 at 5:46 PM Bertrand Drouvot > wrote: > > > > On Thu, Feb 15, 2024 at 05:00:18PM +0530, Amit Kapila wrote: > > > On Thu, Feb 15, 2024 at 4:29 PM Zhijie Hou (Fujitsu) > > > wrote: > > > > Attach the v2 patch here. > > > > > > > > Apart from the new log message. I think we can add one more debug > > > > message in reserve_wal_for_local_slot, this could be useful to analyze > > > > the > failure. > > > > > > Yeah, that can also be helpful, but the added message looks naive to me. > > > + elog(DEBUG1, "segno: %ld oldest_segno: %ld", oldest_segno, segno); > > > > > > Instead of the above, how about something like: "segno: %ld of > > > purposed restart_lsn for the synced slot, oldest_segno: %ld > > > available"? > > > > Looks good to me. I'm not sure if it would make more sense to elog > > only if segno < oldest_segno means just before the > XLogSegNoOffsetToRecPtr() call? > > > > But I'm fine with the proposed location too. > > > > I am also fine either way but the current location gives required information > in > more number of cases and could be helpful in debugging this new facility. > > > > > > > > And we > > > > can also enable the DEBUG log in the 040 tap-test, I see we have > > > > similar setting in 010_logical_decoding_timline and logging debug1 > > > > message doesn't increase noticable time on my machine. These are done > in 0002. > > > > > > > > > > I haven't tested it but I think this can help in debugging BF > > > failures, if any. I am not sure if to keep it always like that but > > > till the time these tests are stabilized, this sounds like a good > > > idea. So, how, about just making test changes as a separate patch so > > > that later if required we can revert/remove it easily? Bertrand, do > > > you have any thoughts on this? > > > > +1 on having DEBUG log in the 040 tap-test until it's stabilized (I > > +think we > > took the same approach for 035_standby_logical_decoding.pl IIRC) and > > then revert it back. > > > > Good to know! > > > Also I was thinking: what about adding an output to > pg_sync_replication_slots()? > > The output could be the number of sync slots that have been created > > and are not considered as sync-ready during the execution. > > > > Yeah, we can consider outputting some information via this function like how > many slots are synced and persisted but not sure what would be appropriate > here. Because one can anyway find that or more information by querying > pg_replication_slots. I think we can keep discussing what makes more sense as > a > return value but let's commit the debug/log patches as they will be helpful to > analyze BF failures or any other issues reported. Agreed. Here is new patch set as suggested. I used debug2 in the 040 as it could provide more information about communication between primary and standby. This also doesn't increase noticeable testing time on my machine for debug build. Best Regards, Hou zj v3-0002-Add-a-DEBUG1-message-for-slot-sync.patch Description: v3-0002-Add-a-DEBUG1-message-for-slot-sync.patch v3-0001-Add-a-log-if-remote-slot-didn-t-catch-up-to-local.patch Description: v3-0001-Add-a-log-if-remote-slot-didn-t-catch-up-to-local.patch v3-0003-change-the-log-level-of-sync-slot-test-to-debug2.patch Description: v3-0003-change-the-log-level-of-sync-slot-test-to-debug2.patch
RE: Synchronizing slots from primary to standby
On Thursday, February 15, 2024 5:20 PM Amit Kapila wrote: > On Thu, Feb 15, 2024 at 9:05 AM Zhijie Hou (Fujitsu) > wrote: > > > > On Thursday, February 15, 2024 10:49 AM Amit Kapila > wrote: > > > > > > On Wed, Feb 14, 2024 at 7:26 PM Bertrand Drouvot > > > > > > Right, we can do that or probably this test would have made more > > > sense with a worker patch where we could wait for the slot to be synced. > > > Anyway, let's try to recreate the slot/subscription idea. BTW, do > > > you think that adding a LOG when we are not able to sync will help > > > in debugging such problems? I think eventually we can change it to > > > DEBUG1 but for now, it can help with stabilizing BF and or some other > reported issues. > > > > Here is the patch that attempts the re-create sub idea. > > > > Pushed this. > > > > I also think that a LOG/DEBUG > > would be useful for such analysis, so the 0002 is to add such a log. > > > > I feel such a LOG would be useful. > > + ereport(LOG, > + errmsg("waiting for remote slot \"%s\" LSN (%X/%X) and catalog xmin" > +" (%u) to pass local slot LSN (%X/%X) and catalog xmin (%u)", > > I think waiting is a bit misleading here, how about something like: > "could not sync slot information as remote slot precedes local slot: > remote slot \"%s\": LSN (%X/%X), catalog xmin (%u) local slot: LSN (%X/%X), > catalog xmin (%u)" Changed. Attach the v2 patch here. Apart from the new log message. I think we can add one more debug message in reserve_wal_for_local_slot, this could be useful to analyze the failure. And we can also enable the DEBUG log in the 040 tap-test, I see we have similar setting in 010_logical_decoding_timline and logging debug1 message doesn't increase noticable time on my machine. These are done in 0002. Best Regards, Hou zj v2-0001-Add-a-log-if-remote-slot-didn-t-catch-up-to-local.patch Description: v2-0001-Add-a-log-if-remote-slot-didn-t-catch-up-to-local.patch v2-0002-Add-a-debug-message-and-enable-DEBUG-log-in-slots.patch Description: v2-0002-Add-a-debug-message-and-enable-DEBUG-log-in-slots.patch
RE: Synchronizing slots from primary to standby
On Thursday, February 15, 2024 10:49 AM Amit Kapila wrote: > > On Wed, Feb 14, 2024 at 7:26 PM Bertrand Drouvot > wrote: > > > > On Wed, Feb 14, 2024 at 10:40:11AM +, Zhijie Hou (Fujitsu) wrote: > > > On Wednesday, February 14, 2024 6:05 PM Amit Kapila > wrote: > > > > > > > > To ensure that restart_lsn has been moved to a recent position, we > > > > need to log XLOG_RUNNING_XACTS and make sure the same is processed > > > > as well by walsender. The attached patch does the required change. > > > > > > > > Hou-San can reproduce this problem by adding additional > > > > checkpoints in the test and after applying the attached it fixes > > > > the problem. Now, this patch is mostly based on the theory we > > > > formed based on LOGs on BF and a reproducer by Hou-San, so still, > > > > there is some chance that this doesn't fix the BF failures in which > > > > case I'll > again look into those. > > > > > > I have verified that the patch can fix the issue on my machine(after > > > adding few more checkpoints before slot invalidation test.) I also > > > added one more check in the test to confirm the synced slot is not temp > > > slot. > Here is the v2 patch. > > > > Thanks! > > > > +# To ensure that restart_lsn has moved to a recent WAL position, we > > +need # to log XLOG_RUNNING_XACTS and make sure the same is processed > > +as well $primary->psql('postgres', "CHECKPOINT"); > > > > Instead of "CHECKPOINT" wouldn't a less heavy "SELECT > pg_log_standby_snapshot();" > > be enough? > > > > Yeah, that would be enough. However, the test still fails randomly due to the > same reason. See [1]. So, as mentioned yesterday, now, I feel it is better to > recreate the subscription/slot so that it can get the latest restart_lsn > rather than > relying on pg_log_standby_snapshot() to move it. > > > Not a big deal but maybe we could do the change while modifying > > 040_standby_failover_slots_sync.pl in the next patch "Add a new slotsync > worker". > > > > Right, we can do that or probably this test would have made more sense with a > worker patch where we could wait for the slot to be synced. > Anyway, let's try to recreate the slot/subscription idea. BTW, do you think > that > adding a LOG when we are not able to sync will help in debugging such > problems? I think eventually we can change it to DEBUG1 but for now, it can > help > with stabilizing BF and or some other reported issues. Here is the patch that attempts the re-create sub idea. I also think that a LOG/DEBUG would be useful for such analysis, so the 0002 is to add such a log. Best Regards, Hou zj 0002-Add-a-log-if-remote-slot-didn-t-catch-up-to-locally-.patch Description: 0002-Add-a-log-if-remote-slot-didn-t-catch-up-to-locally-.patch 0001-fix-BF-error-take-2.patch Description: 0001-fix-BF-error-take-2.patch
RE: Synchronizing slots from primary to standby
On Wednesday, February 14, 2024 6:05 PM Amit Kapila wrote: > > On Wed, Feb 14, 2024 at 2:14 PM Amit Kapila wrote: > > > > On Wed, Feb 14, 2024 at 9:34 AM Zhijie Hou (Fujitsu) > > wrote: > > > > > > Here is V87 patch that adds test for the suggested cases. > > > > > > > I have pushed this patch and it leads to a BF failure: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris > > t=2024-02-14%2004%3A43%3A37 > > > > The test failures are: > > # Failed test 'logical decoding is not allowed on synced slot' > > # at > /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_f > ailover_slots_sync.pl > > line 272. > > # Failed test 'synced slot on standby cannot be altered' > > # at > /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_f > ailover_slots_sync.pl > > line 281. > > # Failed test 'synced slot on standby cannot be dropped' > > # at > /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_f > ailover_slots_sync.pl > > line 287. > > > > The reason is that in LOGs, we see a different ERROR message than what > > is expected: > > 2024-02-14 04:52:32.916 UTC [1767765][client backend][3/4:0] ERROR: > > replication slot "lsub1_slot" is active for PID 1760871 > > > > Now, we see the slot still active because a test before these tests (# > > Test that if the synchronized slot is invalidated while the remote > > slot is still valid, ) is not able to successfully persist the > > slot and the synced temporary slot remains active. > > > > The reason is clear by referring to below standby LOGS: > > > > LOG: connection authorized: user=bf database=postgres > > application_name=040_standby_failover_slots_sync.pl > > LOG: statement: SELECT pg_sync_replication_slots(); > > LOG: dropped replication slot "lsub1_slot" of dbid 5 > > STATEMENT: SELECT pg_sync_replication_slots(); ... > > SELECT conflict_reason IS NULL AND synced FROM pg_replication_slots > > WHERE slot_name = 'lsub1_slot'; > > > > In the above LOGs, we should ideally see: "newly created slot > > "lsub1_slot" is sync-ready now" after the "LOG: dropped replication > > slot "lsub1_slot" of dbid 5" but lack of that means the test didn't > > accomplish what it was supposed to. Ideally, the same test should have > > failed but the pass criteria for the test failed to check whether the > > slot is persisted or not. > > > > The probable reason for failure is that remote_slot's restart_lsn lags > > behind the oldest WAL segment on standby. Now, in the test, we do > > ensure that the publisher and subscriber are caught up by following > > steps: > > # Enable the subscription to let it catch up to the latest wal > > position $subscriber1->safe_psql('postgres', > > "ALTER SUBSCRIPTION regress_mysub1 ENABLE"); > > > > $primary->wait_for_catchup('regress_mysub1'); > > > > However, this doesn't guarantee that restart_lsn is moved to a > > position new enough that standby has a WAL corresponding to it. > > > > To ensure that restart_lsn has been moved to a recent position, we need to log > XLOG_RUNNING_XACTS and make sure the same is processed as well by > walsender. The attached patch does the required change. > > Hou-San can reproduce this problem by adding additional checkpoints in the > test and after applying the attached it fixes the problem. Now, this patch is > mostly based on the theory we formed based on LOGs on BF and a reproducer > by Hou-San, so still, there is some chance that this doesn't fix the BF > failures in > which case I'll again look into those. I have verified that the patch can fix the issue on my machine(after adding few more checkpoints before slot invalidation test.) I also added one more check in the test to confirm the synced slot is not temp slot. Here is the v2 patch. Best Regards, Hou zj v2-0001-fix-040-standby-failover-slots-sync.patch Description: v2-0001-fix-040-standby-failover-slots-sync.patch
RE: Synchronizing slots from primary to standby
On Wednesday, February 14, 2024 10:40 AM Amit Kapila wrote: > > On Tue, Feb 13, 2024 at 9:25 PM Bertrand Drouvot > wrote: > > > > On Tue, Feb 13, 2024 at 05:20:35PM +0530, Amit Kapila wrote: > > > On Tue, Feb 13, 2024 at 4:59 PM Bertrand Drouvot > > > wrote: > > > > - 84% of the slotsync.c code is covered, the parts that are not > > > > are mainly related to "errors". > > > > > > > > Worth to try to extend the coverage? (I've in mind 731, 739, 766, > > > > 778, 786, 796, > > > > 808) > > > > > > > > > > All these additional line numbers mentioned by you are ERROR paths. > > > I think if we want we can easily cover most of those but I am not > > > sure if there is a benefit to cover each error path. > > > > Yeah, I think 731, 739 and one among the remaining ones mentioned > > up-thread should be enough, thoughts? > > > > I don't know how beneficial those selective ones would be but if I have to > pick a > few among those then I would pick the ones at 731 and 808. The reason is that > 731 is related to cascading standby restriction which we may uplift in the > future > and at that time one needs to be careful about the behavior, for 808 as well, > in > the future, we may have a separate GUC for slot_db_name. These may not be > good enough reasons as to why we add tests for these ERROR cases but not for > others, however, if we have to randomly pick a few among all ERROR paths, > these seem better to me than others. Here is V87 patch that adds test for the suggested cases. Best Regards, Hou zj v87-0001-Add-a-slot-synchronization-function.patch Description: v87-0001-Add-a-slot-synchronization-function.patch
RE: Synchronizing slots from primary to standby
On Tuesday, February 13, 2024 7:30 PM Bertrand Drouvot wrote: > > On Tue, Feb 13, 2024 at 04:08:23AM +, Zhijie Hou (Fujitsu) wrote: > > On Tuesday, February 13, 2024 9:16 AM Zhijie Hou (Fujitsu) > wrote: > > > > > > Here is the new version patch which addressed above and most of > > > Bertrand's comments. > > > > > > TODO: trying to add one test for the case the slot is valid on > > > primary while the synced slots is invalidated on the standby. > > > > Here is the V85_2 patch set that added the test and fixed one typo, > > there are no other code changes. > > Thanks! > > Out of curiosity I ran a code coverage and the result for slotsync.c can be > found > in [1]. > > It appears that: > > - only one function is not covered (slotsync_failure_callback()). Thanks for the test ! I think slotsync_failure_callback can be covered easier in the next slotsync worker patch on worker exit, I will post that after rebasing. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Tuesday, February 13, 2024 2:51 PM Amit Kapila wrote: > > On Tue, Feb 13, 2024 at 9:38 AM Zhijie Hou (Fujitsu) > wrote: > > > > Here is the V85_2 patch set that added the test and fixed one typo, > > there are no other code changes. > > > > Few comments on the latest changes: Thanks for the comments. > == > 1. > +# Confirm that the invalidated slot has been dropped. > +$standby1->wait_for_log(qr/dropped replication slot "lsub1_slot" of > +dbid 5/, $log_offset); > > Is it okay to hardcode dbid 5? I am a bit worried that it can lead to > instability in > the test. > > 2. > +check_primary_info(WalReceiverConn *wrconn, int elevel) { > .. > + bool primary_info_valid; > > I don't think for 0001, we need an elevel as an argument, so let's remove it. > Additionally, can we change the variable name primary_info_valid to > primary_slot_valid? Also, can we change the function name to > validate_remote_info() as the remote can be both primary or standby? > > 3. > +SyncReplicationSlots(WalReceiverConn *wrconn) { > +PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, > +PointerGetDatum(wrconn)); { check_primary_info(wrconn, ERROR); > + > + synchronize_slots(wrconn); > + } > + PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, > PointerGetDatum(wrconn)); > + > + walrcv_disconnect(wrconn); > > It is better to disconnect in the caller where we have made the connection. All above comments look good to me. Here is the V86 patch that addressed above. This version also includes some other minor changes: 1. Added few comments for the temporary slot creation and XLogGetOldestSegno. 2. Adjusted the doc for the SQL function. 3. Reordered two error messages in slot create function. 4. Fixed few typos. Thanks Shveta for off-list discussions. Best Regards, Hou zj v86-0001-Add-a-slot-synchronization-function.patch Description: v86-0001-Add-a-slot-synchronization-function.patch
RE: Synchronizing slots from primary to standby
On Tuesday, February 13, 2024 9:16 AM Zhijie Hou (Fujitsu) wrote: > > Here is the new version patch which addressed above and most of Bertrand's > comments. > > TODO: trying to add one test for the case the slot is valid on primary while > the > synced slots is invalidated on the standby. Here is the V85_2 patch set that added the test and fixed one typo, there are no other code changes. Best Regards, Hou zj v85_2-0001-Add-a-slot-synchronization-function.patch Description: v85_2-0001-Add-a-slot-synchronization-function.patch
RE: Synchronizing slots from primary to standby
On Monday, February 12, 2024 5:40 PM Amit Kapila wrote: > > On Sun, Feb 11, 2024 at 6:53 PM Zhijie Hou (Fujitsu) > wrote: > > > > Agreed. Here is the V84 patch which addressed this. > > > > Few comments: > = > 1. Isn't the new function (pg_sync_replication_slots()) allowed to sync the > slots > from physical standby to another cascading standby? > Won't it be better to simply disallow syncing slots on cascading standby to > keep > it consistent with slotsync worker behavior? > > 2. > Previously, I commented to keep the declaration and definition of functions in > the same order but I see that it still doesn't match in the below case: > > @@ -44,6 +46,7 @@ extern void WalSndWakeup(bool physical, bool logical); > extern void WalSndInitStopping(void); extern void WalSndWaitStopping(void); > extern void HandleWalSndInitStopping(void); > +extern XLogRecPtr GetStandbyFlushRecPtr(TimeLineID *tli); > extern void WalSndRqstFileReload(void); > > I think we can keep the new declaration just before WalSndSignals(). > That would be more consistent. > > 3. > + > + True if this is a logical slot that was synced from a primary server. > + > + > + On a hot standby, the slots with the synced column marked as true can > + neither be used for logical decoding nor dropped by the user. > + The value > > I don't think we need a separate para here. > > Apart from this, I have made several cosmetic changes in the attached. > Please include these in the next version unless you see any problems. Thanks for the comments, I have addressed them. Here is the new version patch which addressed above and most of Bertrand's comments. TODO: trying to add one test for the case the slot is valid on primary while the synced slots is invalidated on the standby. Best Regards, Houzj v85-0001-Add-a-slot-synchronization-function.patch Description: v85-0001-Add-a-slot-synchronization-function.patch
RE: Synchronizing slots from primary to standby
On Monday, February 12, 2024 6:03 PM Bertrand Drouvot wrote: > > Hi, > > On Sun, Feb 11, 2024 at 01:23:19PM +0000, Zhijie Hou (Fujitsu) wrote: > > On Saturday, February 10, 2024 9:10 PM Amit Kapila > wrote: > > > > > > On Sat, Feb 10, 2024 at 5:31 PM Masahiko Sawada > > > > > > wrote: > > > > > > > > On Fri, Feb 9, 2024 at 4:08 PM Zhijie Hou (Fujitsu) > > > > wrote: > > > > > > > > > Another alternative is to register the callback when calling > > > > > slotsync functions and unregister it after the function call. > > > > > And register the callback in > > > > > slotsyncworkmain() for the slotsync worker patch, although this > > > > > may adds a few more codes. > > > > > > > > Another idea is that SyncReplicationSlots() calls > > > > synchronize_slots() in PG_ENSURE_ERROR_CLEANUP() block instead of > > > > PG_TRY(), to make sure to clear the flag in case of ERROR or > > > > FATAL. And the slotsync worker uses the before_shmem_callback to clear > the flag. > > > > > > > > > > +1. This sounds like a better way to clear the flag. > > > > Agreed. Here is the V84 patch which addressed this. > > > > Apart from above, I removed the txn start/end codes from 0001 as they > > are used in the slotsync worker patch. And I also ran pgindent and > > pgperltidy for the patch. > > > > Thanks! > > A few random comments: Thanks for the comments. > > 001 === > > " > For > the synchronization to work, it is mandatory to have a physical replication > slot > between the primary and the standby, " > > Maybe mention "primary_slot_name" here? Added. > > 002 === > > + > +Synchronize the logical failover slots from the primary server to the > standby server. > > should we say "logical failover replication slots" instead? Changed. > > 003 === > > + If, after executing the function, > + > + hot_standby_feedback is disabled > on > + the standby or the physical slot configured in > + > + primary_slot_name is > + removed, > > I think another option that could lead to slot invalidation is if > primary_slot_name > is NULL or miss-configured. Indeed hot_standby_feedback would be working > (for the catalog_xmin) but only as long as the standby is up and running. I didn't change this based on the discussion. > > 004 === > > + on the standby. For the synchronization to work, it is mandatory to > + have a physical replication slot between the primary and the > + standby, > > should we mention primary_slot_name here? Added. > > 005 === > > + To resume logical replication after failover from the synced logical > + slots, the subscription's 'conninfo' must be altered > > Only in a pub/sub context but not for other ways of using the logical > replication > slot(s). I am not very sure about this, because the 3-rd part logicalrep can also have their own replication origin, so I didn't change for now, but will think over this. > > 006 === > > + neither be used for logical decoding nor dropped by the user > > what about "nor dropped manually"? Changed. > > 007 === > > +typedef struct SlotSyncCtxStruct > +{ > > Should we remove "Struct" from the struct name? The name was named based on some other comment to be consistent with LogicalReplCtxStruct, so I didn't change this. If other also prefer without struct, we can change it later. > 008 === > > + ereport(LOG, > + errmsg("dropped replication slot > \"%s\" of dbid %d", > + > NameStr(local_slot->data.name), > + > + local_slot->data.database)); > > We emit a message when an "invalidated" slot is dropped but not when we > create a slot. Shouldn't we emit a message when we create a synced slot on the > standby? > > I think that could be confusing to see "a drop" message not followed by "a > create" > one when it's expected (slot valid on the primary for example). I think we will report "sync-ready" for newly synced slot, for newly created temporary slots, I am not sure do we need to report log to them, because they will be dropped on promotion anyway. But if others also prefer to log, I am fine with that. > > 009 === > > Regarding 040_standby_failover_slots_sync.pl what about adding tests for? > > - synced slot invalidation (and ensure it's recreated once > pg_sync_replication_slots() is called and when the slot in primary is valid) Will try this in next version. > - cannot enable failover for a temporary replication slot Added. > - replication slots can only be synchronized from a standby server Added. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Saturday, February 10, 2024 9:10 PM Amit Kapila wrote: > > On Sat, Feb 10, 2024 at 5:31 PM Masahiko Sawada > wrote: > > > > On Fri, Feb 9, 2024 at 4:08 PM Zhijie Hou (Fujitsu) > > wrote: > > > > > Another alternative is to register the callback when calling > > > slotsync functions and unregister it after the function call. And > > > register the callback in > > > slotsyncworkmain() for the slotsync worker patch, although this may > > > adds a few more codes. > > > > Another idea is that SyncReplicationSlots() calls synchronize_slots() > > in PG_ENSURE_ERROR_CLEANUP() block instead of PG_TRY(), to make sure > > to clear the flag in case of ERROR or FATAL. And the slotsync worker > > uses the before_shmem_callback to clear the flag. > > > > +1. This sounds like a better way to clear the flag. Agreed. Here is the V84 patch which addressed this. Apart from above, I removed the txn start/end codes from 0001 as they are used in the slotsync worker patch. And I also ran pgindent and pgperltidy for the patch. Best Regards, Hou zj v84-0001-Add-a-slot-synchronization-function.patch Description: v84-0001-Add-a-slot-synchronization-function.patch
RE: Synchronizing slots from primary to standby
On Saturday, February 10, 2024 11:37 AM Zhijie Hou (Fujitsu) wrote: > > Attach the V83 patch which addressed Peter[1][2], Amit and Sawada-san's[3] > comments. Only 0001 is sent in this version, we will send other patches after > rebasing. > > [1] > https://www.postgresql.org/message-id/CAHut%2BPvW8s6AYD2UD0xadM%2B > 3VqBkXP2LjD30LEGRkHUa-Szm%2BQ%40mail.gmail.com > [2] > https://www.postgresql.org/message-id/CAHut%2BPv88vp9mNxX37c_Bc5FDBs > TS%2BdhV02Vgip9Wqwh7GBYSg%40mail.gmail.com > [3] > https://www.postgresql.org/message-id/CAD21AoDvyLu%3D2-mqfGn_T_3jUa > mR34w%2BsxKvYnVzKqTCpyq_FQ%40mail.gmail.com I noticed one cfbot failure that the slot is not synced when the standby is lagging behind the subscriber. I have modified the test to disable the sub before syncing to avoid this failure. Attach the V83_2 patch, no other code changes are included in this version. Best Regards, Hou zj v83_2-0001-Add-a-slot-synchronization-function.patch Description: v83_2-0001-Add-a-slot-synchronization-function.patch
RE: Synchronizing slots from primary to standby
On Friday, February 9, 2024 4:13 PM Peter Smith wrote: > > FYI -- I checked patch v81-0001 to find which of the #includes are strictly > needed. Thanks! > > 1. > ... > > Many of these #includes seem unnecessary. e.g. I was able to remove > all those that are commented-out below, and the file still compiles OK > with no warnings: Removed. > > > == > src/backend/replication/slot.c > > > > 2. > #include "pgstat.h" > +#include "replication/slotsync.h" > #include "replication/slot.h" > +#include "replication/walsender.h" > #include "storage/fd.h" > > The #include "replication/walsender.h" seems to be unnecessary. Removed. > > == > src/backend/replication/walsender.c > > 3. > #include "replication/logical.h" > +#include "replication/slotsync.h" > #include "replication/slot.h" > > The #include "replication/slotsync.h" is needed, but only for Assert code: > Assert(am_cascading_walsender || IsSyncingReplicationSlots()); > > So you could #ifdef around that #include if you wish to. I am not sure if it's necessary and didn't find similar coding, so didn't change. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Friday, February 9, 2024 12:27 PM Peter Smith wrote: > > Here are some review comments for patch v81-0001. Thanks for the comments. > . GENERAL - ReplicationSlotInvalidationCause enum. > > I was thinking that the ReplicationSlotInvalidationCause should > explicitly set RS_INVAL_NONE = 0 (it's zero anyway, but making it > explicit with a comment / Must be zero. /. will stop it from being > changed in the future). I think the current code is better, so didn't change this. > 5. synchronize_slots > > + /* > + * The primary_slot_name is not set yet or WALs not received yet. > + * Synchronization is not possible if the walreceiver is not started. > + */ > + latestWalEnd = GetWalRcvLatestWalEnd(); > + SpinLockAcquire(>mutex); > + if ((WalRcv->slotname[0] == '\0') || > + XLogRecPtrIsInvalid(latestWalEnd)) > + { > + SpinLockRelease(>mutex); > + return; > + } > + SpinLockRelease(>mutex); > > The comment talks about the GUC "primary_slot_name", but the code is > checking the WalRcv's slotname. It may be the same, but the difference > is confusing. This part has been removed. > 7. > + /* > + * Use shared lock to prevent a conflict with > + * ReplicationSlotsDropDBSlots(), trying to drop the same slot during > + * a drop-database operation. > + */ > + LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock); > + > + synchronize_one_slot(remote_slot, remote_dbid); > + > + UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, > + AccessShareLock); > > IMO remove the blank lines (e.g., you don't use this kind of formatting for > spin > locks) I am not sure if it will look better, so didn't change this. Other comments look good. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Friday, February 9, 2024 6:45 PM Amit Kapila wrote: > > On Fri, Feb 9, 2024 at 10:00 AM Zhijie Hou (Fujitsu) > wrote: > > > > Few comments on 0001 > === > 1. Shouldn't pg_sync_replication_slots() check whether the user has > replication > privilege? Yes, added. > > 2. The function declarations in slotsync.h don't seem to be in the same order > as > they are defined in slotsync.c. For example, see ValidateSlotSyncParams(). The > same is true for new functions exposed via walreceiver.h and walsender.h. > Please > check the patch for other such inconsistencies. I reordered the function declarations. > > 3. > +# Wait for the standby to finish sync > +$standby1->wait_for_log( > + qr/LOG: ( [A-Z0-9]+:)? newly created slot \"lsub1_slot\" is sync-ready > +now/, $offset); > + > +$standby1->wait_for_log( > + qr/LOG: ( [A-Z0-9]+:)? newly created slot \"lsub2_slot\" is sync-ready > +now/, $offset); > + > +# Confirm that the logical failover slots are created on the standby > +and are # flagged as 'synced' > +is($standby1->safe_psql('postgres', > + q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN > ('lsub1_slot', 'lsub2_slot') AND synced;}), > + "t", > + 'logical slots have synced as true on standby'); > > Isn't the last test that queried pg_replication_slots sufficient? I think > wait_for_log() would be required for slotsync worker or am I missing > something? I think it's not needed in 0001, so removed. > Apart from the above, I have modified a few comments in the attached. Thanks, it looks good to me, so applied. Attach the V83 patch which addressed Peter[1][2], Amit and Sawada-san's[3] comments. Only 0001 is sent in this version, we will send other patches after rebasing. [1] https://www.postgresql.org/message-id/CAHut%2BPvW8s6AYD2UD0xadM%2B3VqBkXP2LjD30LEGRkHUa-Szm%2BQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAHut%2BPv88vp9mNxX37c_Bc5FDBsTS%2BdhV02Vgip9Wqwh7GBYSg%40mail.gmail.com [3] https://www.postgresql.org/message-id/CAD21AoDvyLu%3D2-mqfGn_T_3jUamR34w%2BsxKvYnVzKqTCpyq_FQ%40mail.gmail.com Best Regards, Hou zj v83-0001-Add-a-slot-synchronization-function.patch Description: v83-0001-Add-a-slot-synchronization-function.patch
RE: Synchronizing slots from primary to standby
On Friday, February 9, 2024 2:44 PM Masahiko Sawada wrote: > > On Thu, Feb 8, 2024 at 8:01 PM shveta malik wrote: > > > > On Thu, Feb 8, 2024 at 12:08 PM Peter Smith > wrote: > > > > > > Here are some review comments for patch v80_2-0001. > > > > Thanks for the feedback Peter. Addressed the comments in v81. > > Attached patch001 for early feedback. Rest of the patches need > > rebasing and thus will post those later. > > > > It also addresses comments by Amit in [1]. > > Thank you for updating the patch! Here are random comments: Thanks for the comments! > > --- > + > +/* > + * Register the callback function to clean up the shared memory of > slot > + * synchronization. > + */ > +SlotSyncInitialize(); > > I think it would have a wider impact than expected. IIUC this callback is > needed > only for processes who calls synchronize_slots(). Why do we want all processes > to register this callback? I think the current style is similar to the ReplicationSlotInitialize() above it. For backend, both of them can only be used when user calls slot SQL functions. So, I think it could be fine to register it at the general place which can also avoid registering the same again for the later slotsync worker patch. Another alternative is to register the callback when calling slotsync functions and unregister it after the function call. And register the callback in slotsyncworkmain() for the slotsync worker patch, although this may adds a few more codes. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Wednesday, February 7, 2024 9:13 AM Masahiko Sawada wrote: > > On Tue, Feb 6, 2024 at 8:21 PM Amit Kapila wrote: > > > > On Tue, Feb 6, 2024 at 3:33 PM Amit Kapila > wrote: > > > > > > On Tue, Feb 6, 2024 at 1:09 PM Masahiko Sawada > wrote: > > > > > > > > On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila > wrote: > > > > > > > > > > I think users can refer to LOGs to see if it has changed since > > > > > the first time it was configured. I tried by existing parameter > > > > > and see the following in LOG: > > > > > LOG: received SIGHUP, reloading configuration files > > > > > 2024-02-06 11:38:59.069 IST [9240] LOG: parameter "autovacuum" > changed to "on" > > > > > > > > > > If the user can't confirm then it is better to follow the steps > > > > > mentioned in the patch. Do you want something else to be written > > > > > in docs for this? If so, what? > > > > > > > > IIUC even if a wrong slot name is specified to standby_slot_names > > > > or even standby_slot_names is empty, the standby server might not > > > > be lagging behind the subscribers depending on the timing. But > > > > when checking it the next time, the standby server might lag > > > > behind the subscribers. So what I wanted to know is how the user > > > > can confirm if a failover-enabled subscription is ensured not to > > > > go in front of failover-candidate standbys (i.e., standbys using > > > > the slots listed in standby_slot_names). > > > > > > > > > > But isn't the same explained by two steps ((a) Firstly, on the > > > subscriber node check the last replayed WAL. (b) Next, on the > > > standby server check that the last-received WAL location is ahead of > > > the replayed WAL location on the subscriber identified above.) in > > > the latest *_0004 patch. > > > > > > > Additionally, I would like to add that the users can use the queries > > mentioned in the doc after the primary has failed and before promoting > > the standby. If she wants to do that when both primary and standby are > > available, the value of 'standby_slot_names' on primary should be > > referred. Isn't those two sufficient that there won't be false > > positives? > > From a user perspective, I'd like to confirm the following two points : > > 1. replication slots used by subscribers are synchronized to the standby. > 2. it's guaranteed that logical replication doesn't go ahead of physical > replication to the standby. > > These checks are necessary at least when building a replication setup > (primary, > standby, and subscriber). Otherwise, it's too late if we find out that no > standby > is failover-ready when the primary fails and we're about to do a failover. > > As for the point 1 above, we can use the step 1 described in the doc. > > As for point 2, the step 2 described in the doc could return true even if > standby_slot_names isn't working. For example, standby_slot_names is empty, > the user changed the standby_slot_names but forgot to reload the config file, > and the walsender doesn't reflect the standby_slot_names update yet for some > reason etc. It's possible that standby's last-received WAL location just > happens > to be ahead of the replayed WAL location on the subscriber. So even if the > check query returns true once, it could return false when we check it again, > if > standby_slot_names is not working. On the other hand, IIUC if the point 2 is > ensured, the check query always returns true. I think it would be good if we > could provide a reliable way to check point 2 ideally via SQL queries > (especially > for tools). Based on off-list discussions with Sawada-san and Amit, an alternative approach to improve this would be collecting the names of the standby slots that each walsender has waited for, which will be visible in the pg_stat_replication view. By checking this information, users can confirm that the GUC standby_slot_names is correctly configured and that logical replication is not lagging behind the standbys that hold these slots. To achieve this, we can implement the collection of slot information within each logical walsender with failover slot acquired, when waiting for he standby to catch up (WalSndWaitForWal). For each valid standby slot that the walsender has waited for, we will store the slot names in a shared memory area specific to each walsender. To optimize performance, we can only rebuild the slot names if the GUC has changed. We can track this by introducing a flag to monitor GUC modifications. When a user queries the pg_stat_replication view, we will retrieve the collected slot names from the shared memory area associated with each walsender. However, before returning the slot names, we can verify their validity once again. If any of the collected slots have been dropped or invalidated during this time, we will exclude them from the result returned to the user. Apart from the above design, I feel since user currently have a way to detect this manually as mentioned in the 0004 patch(we can improve the
RE: Synchronizing slots from primary to standby
On Monday, February 5, 2024 10:25 PM Masahiko Sawada wrote: lvh.no-ip.org> > Subject: Re: Synchronizing slots from primary to standby > > On Mon, Feb 5, 2024 at 8:26 PM shveta malik > wrote: > > > > On Mon, Feb 5, 2024 at 10:57 AM Ajin Cherian wrote: > > > > > > Just noticed that doc/src/sgml/config.sgml still refers to enable_synclot > instead of sync_replication_slots: > > > > > > The standbys corresponding to the physical replication slots in > > > standby_slot_names must configure > > > enable_syncslot = true so they can receive > > > failover logical slots changes from the primary. > > > > Thanks Ajin for pointing this out. Here are v78 patches, corrected there. > > > > Other changes are: > > > > 1) Rebased the patches as the v77-001 is now pushed. > > 2) Enabled executing pg_sync_replication_slots() on cascading-standby. > > 3) Rearranged the code around parameter validity checks. Changed > > function names and changed the way how dbname is extracted as > > suggested by Amit offlist. > > 4) Rearranged the code around check_primary_info(). Removed output > args. > > 5) Few other trivial changes. > > > > Thank you for updating the patch! Here are some comments: > > --- > Since Two processes (e.g. the slotsync worker and > pg_sync_replication_slots()) concurrently fetch and update the slot > information, > there is a race condition where slot's confirmed_flush_lsn goes backward. . We > have the following check but it doesn't prevent the slot's confirmed_flush_lsn > from moving backward if the restart_lsn does't change: > > /* > * Sanity check: As long as the invalidations are handled > * appropriately as above, this should never happen. > */ > if (remote_slot->restart_lsn < slot->data.restart_lsn) > elog(ERROR, > "cannot synchronize local slot \"%s\" LSN(%X/%X)" > " to remote slot's LSN(%X/%X) as synchronization" > " would move it backwards", remote_slot->name, > LSN_FORMAT_ARGS(slot->data.restart_lsn), > LSN_FORMAT_ARGS(remote_slot->restart_lsn)); > As discussed, I added a flag in shared memory to control the concurrent slot sync. > --- > + It is recommended that subscriptions are first disabled before > + promoting > f+ the standby and are enabled back after altering the connection string. > > I think it's better to describe the reason why it's recommended to disable > subscriptions before the standby promotion. Added. > > --- > +/* Slot sync worker objects */ > +extern PGDLLIMPORT char *PrimaryConnInfo; extern PGDLLIMPORT char > +*PrimarySlotName; > > These two variables are declared also in xlogrecovery.h. Is it intentional? > If so, I > think it's better to write comments. Will address. > > --- > Global functions and variables used by the slotsync worker are declared in > logicalworker.h and worker_internal.h. But is it really okay to make a > dependency between the slotsync worker and logical replication workers? IIUC > the slotsync worker is conceptually a separate feature from the logical > replication. I think the slotsync worker can have its own header file. Added. > > --- > + SELECT r.srsubid AS subid, CONCAT('pg_' || srsubid || > '_sync_' || srrelid || '_' || ctl.system_identifier) AS slotname > > and > > + SELECT (CASE WHEN r.srsubstate = 'f' THEN > pg_replication_origin_progress(CONCAT('pg_' || r.srsubid || '_' || > r.srrelid), false) > > If we use CONCAT function, we can replace '||' with ','. > Will address. > --- > + Confirm that the standby server is not lagging behind the subscribers. > + This step can be skipped if > + linkend="guc-standby-slot-names">standby_slot_names me> > + has been correctly configured. > > How can the user confirm if standby_slot_names is correctly configured? Will address after concluding. Thanks Shveta for helping addressing the comments. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Tuesday, February 6, 2024 3:39 PM Masahiko Sawada wrote: > > On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila wrote: > > > > On Mon, Feb 5, 2024 at 7:56 PM Masahiko Sawada > wrote: > > > > > > --- > > > Since Two processes (e.g. the slotsync worker and > > > pg_sync_replication_slots()) concurrently fetch and update the slot > > > information, there is a race condition where slot's > > > confirmed_flush_lsn goes backward. > > > > > > > Right, this is possible, though there shouldn't be a problem because > > anyway, slotsync is an async process. Till we hold restart_lsn, the > > required WAL won't be removed. Having said that, I can think of two > > ways to avoid it: (a) We can have some flag in shared memory using > > which we can detect whether any other process is doing slot > > syncronization and then either error out at that time or simply wait > > or may take nowait kind of parameter from user to decide what to do? > > If this is feasible, we can simply error out for the first version and > > extend it later if we see any use cases for the same (b) similar to > > restart_lsn, if confirmed_flush_lsn is getting moved back, raise an > > error, this is good for now but in future we may still have another > > similar issue, so I would prefer (a) among these but I am fine if you > > prefer (b) or have some other ideas like just note down in comments > > that this is a harmless case and can happen only very rarely. > > Thank you for sharing the ideas. I would prefer (a). For (b), the same issue > still > happens for other fields. Attach the V79 patch which includes the following changes. (Note that only 0001 is sent in this version, we will send the later patches after rebasing) 1. Address all the comments from Amit[1], all the comments from Peter[2] and some of the comments from Sawada-san[3]. 2. Using a flag in shared to memory to restrcit concurrent slot sync. 3. Add more tap tests for pg_sync_replication_slots function. [1] https://www.postgresql.org/message-id/CAA4eK1KGHT9S-Bst_G1CUNQvRep%3DipMs5aTBNRQFVi6TogbJ9w%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAHut%2BPtyoRf3adoLoTrbL6momzkhXAFKz656Vv9YRu4cp%3D6Yig%40mail.gmail.com [3] https://www.postgresql.org/message-id/CAD21AoCEkcTaPb%2BGdOhSQE49_mKJG6D64quHcioJGx6RCqMv%2BQ%40mail.gmail.com Best Regards, Hou zj v79-0001-Add-a-slot-synchronization-function.patch Description: v79-0001-Add-a-slot-synchronization-function.patch
RE: Synchronizing slots from primary to standby
On Friday, February 2, 2024 2:03 PM Bertrand Drouvot wrote: > > Hi, > > On Thu, Feb 01, 2024 at 05:29:15PM +0530, shveta malik wrote: > > Attached v75 patch-set. Changes are: > > > > 1) Re-arranged the patches: > > 1.1) 'libpqrc' related changes (from v74-001 and v74-004) are > > separated out in v75-001 as those are independent changes. > > 1.2) 'Add logical slot sync capability', 'Slot sync worker as special > > process' and 'App-name changes' are now merged to single patch which > > makes v75-002. > > 1.3) 'Wait for physical Standby confirmation' and 'Failover Validation > > Document' patches are maintained as is (v75-003 and v75-004 now). > > Thanks! > > I only looked at the commit message for v75-0002 and see that it has changed > since the comment done in [1], but it still does not look correct to me. > > " > If a logical slot on the primary is valid but is invalidated on the standby, > then > that slot is dropped and recreated on the standby in next sync-cycle provided > the slot still exists on the primary server. It is okay to recreate such > slots as long > as these are not consumable on the standby (which is the case currently). This > situation may occur due to the following reasons: > - The max_slot_wal_keep_size on the standby is insufficient to retain WAL > records from the restart_lsn of the slot. > - primary_slot_name is temporarily reset to null and the physical slot is > removed. > - The primary changes wal_level to a level lower than logical. > " > > If a logical decoding slot "still exists on the primary server" then the > primary > can not change the wal_level to lower than logical, one would get something > like: > > "FATAL: logical replication slot "logical_slot" exists, but wal_level < > logical" > > and then slots won't get invalidated on the standby. I've the feeling that the > wal_level conflict part may need to be explained separately? (I think it's not > possible that they end up being re-created on the standby for this conflict, > they will be simply removed as it would mean the counterpart one on the > primary does not exist anymore). This is possible in some extreme cases, because the slot is synced asynchronously. For example: If on the primary the wal_level is changed to 'replica' and then changed back to 'logical', the standby would receive two XLOG_PARAMETER_CHANGE wals. And before the standby replay these wals, user can create a failover slot on the primary because the wal_level is logical, and if the slotsync worker has synced the slots before startup process replay the XLOG_PARAMETER_CHANGE, then when replaying the XLOG_PARAMETER_CHANGE, the just synced slot will be invalidated. Although I think it doesn't seem a real world case, so I am not sure is it worth separate explanation. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Thursday, February 1, 2024 12:20 PM Amit Kapila wrote: > > On Thu, Feb 1, 2024 at 8:15 AM Euler Taveira wrote: > > > > On Mon, Jan 29, 2024, at 10:17 AM, Zhijie Hou (Fujitsu) wrote: > > > > Attach the V72-0001 which addressed above comments, other patches will > be > > rebased and posted after pushing first patch. Thanks Shveta for helping > address > > the comments. > > > > > > While working on another patch I noticed a new NOTICE message: > > > > NOTICE: changed the failover state of replication slot "foo" on publisher > > to > false > > > > I wasn't paying much attention to this thread then I start reading the 2 > > patches that was recently committed. The message above surprises me > because > > pg_createsubscriber starts to emit this message. The reason is that it > > doesn't > > create the replication slot during the CREATE SUBSCRIPTION. Instead, it > creates > > the replication slot with failover = false and no such option is informed > > during CREATE SUBSCRIPTION which means it uses the default value (failover > = > > false). I expect that I don't see any message because it is *not* changing > > the > > behavior. I was wrong. It doesn't check the failover state on publisher, it > > just executes walrcv_alter_slot() and emits a message. > > > > IMO if we are changing an outstanding property on node A from node B, > node B > > already knows (or might know) about that behavior change (because it is > sending > > the command), however, node A doesn't (unless log_replication_commands > = on -- > > it is not the default). > > > > Do we really need this message as NOTICE? > > > > The reason for adding this NOTICE was to keep it similar to other > Notice messages in these commands like create/drop slot. However, here > the difference is we may not have altered the slot as the property is > already the same as we want to set on the publisher. So, I am not sure > whether we should follow the existing behavior or just get rid of it. > And then do we remove similar NOTICE in AlterSubscription() as well? > Normally, I think NOTICE intends to let users know if we did anything > with slots while executing subscription commands. Does anyone else > have an opinion on this point? > > A related point, I think we can avoid setting the 'failover' property > in ReplicationSlotAlter() if it is not changed, the advantage is we > will avoid saving slots. OTOH, this won't be a frequent operation so > we can leave it as it is as well. Here is a patch to remove the NOTICE and improve the ReplicationSlotAlter. The patch also includes few cleanups based on Peter's feedback. Best Regards, Hou zj v2-0001-clean-up-for-776621a5.patch Description: v2-0001-clean-up-for-776621a5.patch
RE: Synchronizing slots from primary to standby
On Wednesday, January 31, 2024 9:57 AM Peter Smith wrote: > > Hi, > > I saw that v73-0001 was pushed, but it included some last-minute > changes that I did not get a chance to check yesterday. > > Here are some review comments for the new parts of that patch. > > == > doc/src/sgml/ref/create_subscription.sgml > > 1. > connect (boolean) > > Specifies whether the CREATE SUBSCRIPTION command should connect > to the publisher at all. The default is true. Setting this to false > will force the values of create_slot, enabled, copy_data, and failover > to false. (You cannot combine setting connect to false with setting > create_slot, enabled, copy_data, or failover to true.) > > ~ > > I don't think the first part "Setting this to false will force the > values ... failover to false." is strictly correct. > > I think is correct to say all those *other* properties (create_slot, > enabled, copy_data) are forced to false because those otherwise have > default true values. But the 'failover' has default false, so it > cannot get force-changed at all because you can't set connect to false > when failover is true as the second part ("You cannot combine...") > explains. > > IMO remove 'failover' from that first sentence. > > > 3. > dump can be restored without requiring network access to the remote > servers. It is then up to the user to reactivate the subscriptions in a > suitable way. If the involved hosts have changed, the connection > - information might have to be changed. It might also be appropriate to > + information might have to be changed. If the subscription needs to > + be enabled for > +linkend="sql-createsubscription-params-with-failover">failover eral>, > + then same needs to be done by executing > + > + ALTER SUBSCRIPTION ... SET(failover = true) > + after the slot has been created. It might also be appropriate to > > "then same needs to be done" (English?) > > BEFORE > If the subscription needs to be enabled for failover, then same needs > to be done by executing ALTER SUBSCRIPTION ... SET(failover = true) > after the slot has been created. > > SUGGESTION > If the subscription needs to be enabled for failover, execute ALTER > SUBSCRIPTION ... SET(failover = true) after the slot has been created. > > == > src/backend/commands/subscriptioncmds.c > > 4. > #define SUBOPT_RUN_AS_OWNER 0x1000 > -#define SUBOPT_LSN 0x2000 > -#define SUBOPT_ORIGIN 0x4000 > +#define SUBOPT_FAILOVER 0x2000 > +#define SUBOPT_LSN 0x4000 > +#define SUBOPT_ORIGIN 0x8000 > + > > A spurious blank line was added. > Here is a small patch to address the comment 3 and 4. The discussion for comment 1 is still going on, so we can update the patch once it's concluded. Best Regards, Hou zj 0001-clean-up-for-776621a5.patch Description: 0001-clean-up-for-776621a5.patch
RE: Race condition in FetchTableStates() breaks synchronization of subscription tables
On Tuesday, January 30, 2024 11:21 AM vignesh C wrote: > > On Tue, 30 Jan 2024 at 07:24, Zhijie Hou (Fujitsu) > wrote: > > > > On Monday, January 29, 2024 9:22 PM vignesh C > wrote: > > > > > > On Fri, 26 Jan 2024 at 11:30, Alexander Lakhin > wrote: > > > > > > > > Hello hackers, > > > > > > > > After determining a possible cause for intermittent failures of > > > > the test subscription/031_column_list [1], I was wondering what > > > > makes another subscription test (014_binary) fail on the buildfarm: > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snakefly > > > > t=20 > > > > 24-01-22%2001%3A19%3A03 > > > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon=2 > > > 02 > > > > 4-01-14%2018%3A19%3A20 > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet > > > > =202 > > > > 3-12-21%2001%3A11%3A52 > > > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon=2 > > > 02 > > > > 3-11-27%2001%3A42%3A39 > > > > > > > > All those failures caused by a timeout when waiting for a message > > > > expected in _subscriber.log. For example, in the snakefly's case: > > > > [01:14:46.158](1.937s) ok 7 - check synced data on subscriber with > > > > custom type timed out waiting for match: (?^:ERROR: ( [A-Z0-9]+:)? > > > > incorrect binary data format) at > > > /home/bf/bf-build/piculet/HEAD/pgsql/src/test/subscription/t/014_bin > > > ary.pl > > > line 269. > > > > > > > > _subscriber.log contains: > > > > 2023-12-21 01:14:46.215 UTC [410039] 014_binary.pl LOG: statement: > > > > ALTER SUBSCRIPTION tsub REFRESH PUBLICATION; > > > > 2023-12-21 01:17:46.756 UTC [40] ERROR: could not receive > > > > data from WAL stream: server closed the connection unexpectedly > > > > This probably means the server terminated abnormally > > > > before or while processing the request. > > > > 2023-12-21 01:17:46.760 UTC [405057] LOG: background worker > > > > "logical replication apply worker" (PID 40) exited with exit > > > > code 1 > > > > 2023-12-21 01:17:46.779 UTC [532857] LOG: logical replication > > > > apply worker for subscription "tsub" has started ... > > > > > > > > While _subscriber.log from a successful test run contains: > > > > 2024-01-26 03:49:07.065 UTC [9726:5] 014_binary.pl LOG: statement: > > > > ALTER SUBSCRIPTION tsub REFRESH PUBLICATION; > > > > 2024-01-26 03:49:07.075 UTC [9726:6] 014_binary.pl LOG: disconnection: > > > > session time: 0:00:00.014 user=postgres database=postgres > > > > host=[local] > > > > 2024-01-26 03:49:07.558 UTC [9729:1] LOG: logical replication > > > > apply worker for subscription "tsub" has started > > > > 2024-01-26 03:49:07.563 UTC [9731:1] LOG: logical replication > > > > table synchronization worker for subscription "tsub", table > > > > "test_mismatching_types" has started > > > > 2024-01-26 03:49:07.585 UTC [9731:2] ERROR: incorrect binary data > > > > format > > > > 2024-01-26 03:49:07.585 UTC [9731:3] CONTEXT: COPY > > > > test_mismatching_types, line 1, column a > > > > > > > > In this case, "logical replication apply worker for subscription > > > > "tsub" has started" appears just after "ALTER SUBSCRIPTION", not 3 > > > > minutes > > > later. > > > > > > > > I've managed to reproduce this failure locally by running multiple > > > > tests in parallel, and my analysis shows that it is caused by a > > > > race condition when accessing variable table_states_valid inside > tablesync.c. > > > > > > > > tablesync.c does the following with table_states_valid: > > > > /* > > > > * Callback from syscache invalidation. > > > > */ > > > > void > > > > invalidate_syncing_table_states(Datum arg, int cacheid, uint32 > > > > hashvalue) { > > > > table_states_valid = false; > > > > } > > > > ... > > > > static bool > > > > FetchTableStates(bool *started_tx) { ... > > > > if (!table_states_valid) > > >
RE: Race condition in FetchTableStates() breaks synchronization of subscription tables
On Monday, January 29, 2024 9:22 PM vignesh C wrote: > > On Fri, 26 Jan 2024 at 11:30, Alexander Lakhin wrote: > > > > Hello hackers, > > > > After determining a possible cause for intermittent failures of the > > test subscription/031_column_list [1], I was wondering what makes > > another subscription test (014_binary) fail on the buildfarm: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snakefly=20 > > 24-01-22%2001%3A19%3A03 > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon=202 > > 4-01-14%2018%3A19%3A20 > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet=202 > > 3-12-21%2001%3A11%3A52 > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon=202 > > 3-11-27%2001%3A42%3A39 > > > > All those failures caused by a timeout when waiting for a message > > expected in _subscriber.log. For example, in the snakefly's case: > > [01:14:46.158](1.937s) ok 7 - check synced data on subscriber with > > custom type timed out waiting for match: (?^:ERROR: ( [A-Z0-9]+:)? > > incorrect binary data format) at > /home/bf/bf-build/piculet/HEAD/pgsql/src/test/subscription/t/014_binary.pl > line 269. > > > > _subscriber.log contains: > > 2023-12-21 01:14:46.215 UTC [410039] 014_binary.pl LOG: statement: > > ALTER SUBSCRIPTION tsub REFRESH PUBLICATION; > > 2023-12-21 01:17:46.756 UTC [40] ERROR: could not receive data > > from WAL stream: server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > 2023-12-21 01:17:46.760 UTC [405057] LOG: background worker "logical > > replication apply worker" (PID 40) exited with exit code 1 > > 2023-12-21 01:17:46.779 UTC [532857] LOG: logical replication apply > > worker for subscription "tsub" has started ... > > > > While _subscriber.log from a successful test run contains: > > 2024-01-26 03:49:07.065 UTC [9726:5] 014_binary.pl LOG: statement: > > ALTER SUBSCRIPTION tsub REFRESH PUBLICATION; > > 2024-01-26 03:49:07.075 UTC [9726:6] 014_binary.pl LOG: disconnection: > > session time: 0:00:00.014 user=postgres database=postgres host=[local] > > 2024-01-26 03:49:07.558 UTC [9729:1] LOG: logical replication apply > > worker for subscription "tsub" has started > > 2024-01-26 03:49:07.563 UTC [9731:1] LOG: logical replication table > > synchronization worker for subscription "tsub", table > > "test_mismatching_types" has started > > 2024-01-26 03:49:07.585 UTC [9731:2] ERROR: incorrect binary data > > format > > 2024-01-26 03:49:07.585 UTC [9731:3] CONTEXT: COPY > > test_mismatching_types, line 1, column a > > > > In this case, "logical replication apply worker for subscription > > "tsub" has started" appears just after "ALTER SUBSCRIPTION", not 3 minutes > later. > > > > I've managed to reproduce this failure locally by running multiple > > tests in parallel, and my analysis shows that it is caused by a race > > condition when accessing variable table_states_valid inside tablesync.c. > > > > tablesync.c does the following with table_states_valid: > > /* > > * Callback from syscache invalidation. > > */ > > void > > invalidate_syncing_table_states(Datum arg, int cacheid, uint32 > > hashvalue) { > > table_states_valid = false; > > } > > ... > > static bool > > FetchTableStates(bool *started_tx) > > { > > ... > > if (!table_states_valid) > > { > > ... > > /* Fetch all non-ready tables. */ > > rstates = GetSubscriptionRelations(MySubscription->oid, > > true); ... > > table_states_valid = true; > > } > > > > So, when syscache invalidation occurs inside the code block "if > > (!table_states_valid)", that invalidation is effectively ignored. > > > > In a failed case I observe the following events: > > 1. logical replication apply worker performs > > LogicalRepApplyLoop() -> process_syncing_tables() -> > > process_syncing_tables_for_apply() -> FetchTableStates() periodically. > > > > 2. ALTER SUBSCRIPTION tsub REFRESH PUBLICATION invalidates syscache > > for SUBSCRIPTIONRELMAP, and that leads to calling > > invalidate_syncing_table_states(). > > > > 3. If the apply worker manages to fetch no non-ready tables in > > FetchTableStates() and ignore "table_states_valid = false" from > > invalidate_syncing_table_states(), then it just misses the invalidation > > event, so it keeps working without noticing non-ready tables appeared as > > the result of ALTER SUBSCRIPTION (process_syncing_tables_for_apply() > skips > > a loop "foreach(lc, table_states_not_ready) ..." until some other event > > occurs). > > > > pg_usleep(10) added just below GetSubscriptionRelations(...) > > proves my analysis -- without it, I need tens of iterations (with 50 > > tests running in > > parallel) to catch the failure, but with it, I get the failure on the > > first iteration. > > Thanks for the analysis, I was able to reproduce the issue with the steps you > had >
RE: Synchronizing slots from primary to standby
On Monday, January 29, 2024 9:17 PM Zhijie Hou (Fujitsu) wrote: > > On Monday, January 29, 2024 7:30 PM Amit Kapila > wrote: > > > > On Mon, Jan 29, 2024 at 3:11 PM shveta malik > > wrote: > > > > > > PFA v71 patch set with above changes. > > > > > > > Few comments on 0001 > > Thanks for the comments. > > > === > > 1. > > parse_subscription_options() > > { > > ... > > /* > > * We've been explicitly asked to not connect, that requires some > > * additional processing. > > */ > > if (!opts->connect && IsSet(supported_opts, SUBOPT_CONNECT)) { > > > > Here, along with other options, we need an explicit check for > > failover, so that if connect=false and failover=true, the statement > > should give error. I was expecting the below statement to fail but it passed > with WARNING. > > postgres=# create subscription sub2 connection 'dbname=postgres' > > publication pub2 with(connect=false, failover=true); > > WARNING: subscription was created, but is not connected > > HINT: To initiate replication, you must manually create the > > replication slot, enable the subscription, and refresh the subscription. > > CREATE SUBSCRIPTION > > Added. > > > > > 2. > > @@ -148,6 +153,10 @@ typedef struct Subscription > > List*publications; /* List of publication names to subscribe to */ > > char*origin; /* Only publish data originating from the > > * specified origin */ > > + bool failover; /* True if the associated replication slots > > + * (i.e. the main slot and the table sync > > + * slots) in the upstream database are enabled > > + * to be synchronized to the standbys. */ > > } Subscription; > > > > Let's add this new field immediately after "bool runasowner;" as is > > done for other boolean members. This will help avoid increasing the > > size of the structure due to alignment when we add any new pointer > > field in the future. Also, that would be consistent with what we do for > > other > new boolean members. > > Moved this field as suggested. > > Attach the V72-0001 which addressed above comments, other patches will be > rebased and posted after pushing first patch. Thanks Shveta for helping > address the comments. Apart from above comments. The new V72 patch also includes the followings changes. 1. Moved the test 'altering failover for enabled sub' to the tap-test where most of the alter-sub behaviors are tested. 2. Rename the tap-test from 050_standby_failover_slots_sync.pl to 040_standby_failover_slots_sync.pl (the big number 050 was used to avoid conflict with other newly committed tests). And add the test into meson.build which was missed. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Monday, January 29, 2024 7:30 PM Amit Kapila wrote: > > On Mon, Jan 29, 2024 at 3:11 PM shveta malik > wrote: > > > > PFA v71 patch set with above changes. > > > > Few comments on 0001 Thanks for the comments. > === > 1. > parse_subscription_options() > { > ... > /* > * We've been explicitly asked to not connect, that requires some > * additional processing. > */ > if (!opts->connect && IsSet(supported_opts, SUBOPT_CONNECT)) { > > Here, along with other options, we need an explicit check for failover, so > that if > connect=false and failover=true, the statement should give error. I was > expecting the below statement to fail but it passed with WARNING. > postgres=# create subscription sub2 connection 'dbname=postgres' > publication pub2 with(connect=false, failover=true); > WARNING: subscription was created, but is not connected > HINT: To initiate replication, you must manually create the replication slot, > enable the subscription, and refresh the subscription. > CREATE SUBSCRIPTION Added. > > 2. > @@ -148,6 +153,10 @@ typedef struct Subscription > List*publications; /* List of publication names to subscribe to */ > char*origin; /* Only publish data originating from the > * specified origin */ > + bool failover; /* True if the associated replication slots > + * (i.e. the main slot and the table sync > + * slots) in the upstream database are enabled > + * to be synchronized to the standbys. */ > } Subscription; > > Let's add this new field immediately after "bool runasowner;" as is done for > other boolean members. This will help avoid increasing the size of the > structure > due to alignment when we add any new pointer field in the future. Also, that > would be consistent with what we do for other new boolean members. Moved this field as suggested. Attach the V72-0001 which addressed above comments, other patches will be rebased and posted after pushing first patch. Thanks Shveta for helping address the comments. Best Regards, Hou zj v72-0001-Add-a-failover-option-to-subscriptions.patch Description: v72-0001-Add-a-failover-option-to-subscriptions.patch
Update the doc that refers to the removed column of pg_replication_slots.
Hi, I noticed one minor thing that the upgrade doc refers to the old 'conflicting' column of pg_replication_slot. As the column has been changed to 'conflict_reason', I think the doc needs to be updated like the attachment. Best Regards, Hou Zhijie 0001-Update-the-doc-for-upgrading-slots.patch Description: 0001-Update-the-doc-for-upgrading-slots.patch
RE: Synchronizing slots from primary to standby
On Wednesday, January 24, 2024 1:11 PM Masahiko Sawada wrote: > Here are random comments on slotsyncworker.c (v66): Thanks for the comments: > > --- > + elog(ERROR, > +"cannot synchronize local slot \"%s\" LSN(%X/%X)" > +" to remote slot's LSN(%X/%X) as synchronization" > +" would move it backwards", remote_slot->name, > > Many error messages in slotsync.c are splitted into several lines, but I > think it > would reduce the greppability when the user looks for the error message in the > source code. Thanks for the suggestion! we combined most of the messages in the new version patch. Although some messages including the above one were kept splitted, because It's too long(> 120 col including the indent) to fit into the screen, so I feel it's better to keep these messages splitted. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Monday, January 22, 2024 11:36 AM shveta malik wrote: Hi, > On Fri, Jan 19, 2024 at 4:18 PM shveta malik wrote: > > > > PFA v64. > > V64 fails to apply to HEAD due to a recent commit. Rebased it. PFA v64_2. It > has > no new changes. I noticed few things while analyzing the patch. 1. sleep_ms = Min(sleep_ms * 2, MAX_WORKER_NAPTIME_MS); The initial value for sleep_ms is 0(default value for static variable) which will not be advanced in this expression. We should initialize sleep_ms to a positive number. 2. / Wait a bit, we don't expect to have to wait long / rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, 10L, WAIT_EVENT_BGWORKER_SHUTDOWN); The slotsync worker is not a bgworker anymore after 0003 patch, so a new event is needed I think. 3. slot->effective_catalog_xmin = xmin_horizon; The assignment is also needed in local_slot_update() to make ReplicationSlotsComputeRequiredXmin work. Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Wednesday, January 17, 2024 6:30 PM shveta malik wrote: > PFA v63. I analyzed the security of the slotsync worker and replication connection a bit, and didn't find issue. Here is detail: 1) data security First, we are using the role used in primary_conninfo, the role used here is requested to have REPLICATION or SUPERUSER privilege[1] which means it is reasonable for the role to modify and read replication slots on the primary. On the primary, the slotsync worker only queries the pg_replication_view which doesn't contain any system or user table access, so I think it's safe. On the standby server, the slot sync worker will not read/write any user table as well, thus we don't have the risk of executing arbitrary codes in trigger. 2) privilege check The SQL query of the slotsync worker will take common privilege check on the primary. If I revoke the function execution privilege on pg_get_replication_slots from replication user, then the slotsync worker won't be able to query the pg_replication_slots view. Same is true for the pg_is_in_recovery function. The slotsync worker will keep reporting ERROR after revoking which is as expected. Based on above, I didn't see some security issues for slotsync worker. [1] https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-PRIMARY-CONNINFO Best Regards, Hou zj
RE: Synchronizing slots from primary to standby
On Tuesday, January 16, 2024 9:27 AM Peter Smith wrote: > > Here are some review comments for patch v61-0002 Thanks for the comments. > > == > doc/src/sgml/logical-replication.sgml > > 1. > + > + Examples: logical replication failover > > The current documentation structure (after the patch is applied) looks > like this: > > 30.1. Publication > 30.2. Subscription > 30.2.1. Replication Slot Management > 30.2.2. Examples: Set Up Logical Replication > 30.2.3. Examples: Deferred Replication Slot Creation > 30.2.4. Examples: logical replication failover > > I don't think it is ideal. > > Firstly, I think this new section is not just "Examples:"; it is more > like instructions for steps to check if a successful failover is > possible. IMO call it something like "Logical Replication Failover" or > "Replication Slot Failover". > > Secondly, I don't think this new section strictly belongs underneath > the "Subscription" section anymore because IMO it is just as much > about the promotion of the publications. Now that you are adding this > new (2nd) section about slots, I think the whole structure of this > document should be changed like below: > > SUGGESTION #1 (make a new section 30.3 just for slot-related topics) > > 30.1. Publication > 30.2. Subscription > 30.2.1. Examples: Set Up Logical Replication > 30.3. Logical Replication Slots > 30.3.1. Replication Slot Management > 30.3.2. Examples: Deferred Replication Slot Creation > 30.3.3. Logical Replication Failover > > ~ > > SUGGESTION #2 (keep the existing structure, but give the failover its > own new section 30.3) > > 30.1. Publication > 30.2. Subscription > 30.2.1. Replication Slot Management > 30.2.2. Examples: Set Up Logical Replication > 30.2.3. Examples: Deferred Replication Slot Creation > 30.3 Logical Replication Failover I used this version for now as I am sure about changing other section. > > ~ > > SUGGESTION #2a (and maybe later you can extract some of the failover > examples further) > > 30.1. Publication > 30.2. Subscription > 30.2.1. Replication Slot Management > 30.2.2. Examples: Set Up Logical Replication > 30.2.3. Examples: Deferred Replication Slot Creation > 30.3 Logical Replication Failover > 30.3.1. Examples: Checking if failover ready > > ~~~ > > 2. > + > +In a logical replication setup, if the publisher server is also the > primary > +server of the streaming replication, the logical slots on the > primary server > +can be synchronized to the standby server by specifying > failover = true > +when creating the subscription. Enabling failover ensures a seamless > +transition of the subscription to the promoted standby, allowing it to > +subscribe to the new primary server without any data loss. > + > > I was initially confused by the wording. How about like below: > > SUGGESTION > When the publisher server is the primary server of a streaming > replication, the logical slots on that primary server can be > synchronized to the standby server by specifying failover = > true when creating subscriptions for those publications. > Enabling failover ensures a seamless transition of those subscriptions > after the standby is promoted. They can continue subscribing to > publications now on the new primary server without any data loss. Changed as suggested. > > ~~~ > > 3. > + > +However, the replication slots are copied asynchronously, which > means it's necessary > +to confirm that replication slots have been synced to the standby server > +before the failover happens. Additionally, to ensure a successful > failover, > +the standby server must not lag behind the subscriber. To confirm > +that the standby server is ready for failover, follow these steps: > + > > Minor rewording > > SUGGESTION > Because the slot synchronization logic copies asynchronously, it is > necessary to confirm that replication slots have been synced to the > standby server before the failover happens. Furthermore, to ensure a > successful failover, the standby server must not be lagging behind the > subscriber. To confirm that the standby server is indeed ready for > failover, follow these 2 steps: Changed as suggested. > > ~~~ > > 4. > The instructions said "follow these steps", so the next parts should > be rendered as 2 "steps" (using markup?) > > SUGGESTION (show as steps 1,2 and also some minor rewording of the > step heading) > > 1. Confirm that all the necessary logical replication slots have been > synced to the standby server. > 2. Confirm that the standby server is not lagging behind the subscribers. > Changed as suggested. > ~~~ > > 5. > + > +Check if all the necessary logical replication slots have been synced to > +the standby server. > + > > SUGGESTION > Confirm that all the necessary logical replication slots have been > synced to the standby server. > Changed as suggested. >
RE: Synchronizing slots from primary to standby
On Thursday, January 11, 2024 11:42 PM Bertrand Drouvot wrote: Hi, > On Thu, Jan 11, 2024 at 04:22:56PM +0530, Amit Kapila wrote: > > On Tue, Jan 9, 2024 at 6:39 PM Amit Kapila > wrote: > > > > > > +static bool > > > +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot > > > +*remote_slot) > > > { > > > ... > > > + /* Slot ready for sync, so sync it. */ else { > > > + /* > > > + * Sanity check: With hot_standby_feedback enabled and > > > + * invalidations handled appropriately as above, this should never > > > + * happen. > > > + */ > > > + if (remote_slot->restart_lsn < slot->data.restart_lsn) elog(ERROR, > > > + "cannot synchronize local slot \"%s\" LSN(%X/%X)" > > > + " to remote slot's LSN(%X/%X) as synchronization" > > > + " would move it backwards", remote_slot->name, > > > + LSN_FORMAT_ARGS(slot->data.restart_lsn), > > > + LSN_FORMAT_ARGS(remote_slot->restart_lsn)); > > > ... > > > } > > > > > > I was thinking about the above code in the patch and as far as I can > > > think this can only occur if the same name slot is re-created with > > > prior restart_lsn after the existing slot is dropped. Normally, the > > > newly created slot (with the same name) will have higher restart_lsn > > > but one can mimic it by copying some older slot by using > > > pg_copy_logical_replication_slot(). > > > > > > I don't think as mentioned in comments even if hot_standby_feedback > > > is temporarily set to off, the above shouldn't happen. It can only > > > lead to invalidated slots on standby. > > I also think so. > > > > > > > To close the above race, I could think of the following ways: > > > 1. Drop and re-create the slot. > > > 2. Emit LOG/WARNING in this case and once remote_slot's LSN moves > > > ahead of local_slot's LSN then we can update it; but as mentioned in > > > your previous comment, we need to update all other fields as well. > > > If we follow this then we probably need to have a check for > > > catalog_xmin as well. > > IIUC, this would be a sync slot (so not usable until promotion) that could > not be > used anyway (invalidated), so I'll vote for drop / re-create then. Such race can happen when user drop and re-create the same failover slot on primary as well. For example, user dropped one failover slot and them immediately created a new one by copying from an old slot(using pg_copy_logical_replication_slot). Then the slotsync worker will find the restart_lsn of this slot go backwards. The steps: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'pgoutput', false, false, true); SELECT 'init' FROM pg_create_logical_replication_slot('test', 'pgoutput', false, false, true); - Advance the restart_lsn of 'test' slot CREATE TABLE test2(a int); INSERT INTO test2 SELECT generate_series(1,1,1); SELECT slot_name FROM pg_replication_slot_advance('test', pg_current_wal_lsn()); - re-create the test slot but based on the old isolation_slot. SELECT pg_drop_replication_slot('test'); SELECT 'copy' FROM pg_copy_logical_replication_slot('isolation_slot', 'test'); Then the restart_lsn of 'test' slot will go backwards. Best Regards, Hou zj