from:"Zhijie Hou \(Fujitsu\)"

RE: Synchronizing slots from primary to standby

2024-04-29 Thread Zhijie Hou (Fujitsu)

On Monday, April 29, 2024 5:11 PM shveta malik  wrote:
> 
> On Mon, Apr 29, 2024 at 11:38 AM shveta malik 
> wrote:
> >
> > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot
>  wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote:
> > > > > Hi,
> > > > >
> > > > > Since the standby_slot_names patch has been committed, I am
> > > > > attaching the last doc patch for review.
> > > > >
> > > >
> > > > Thanks!
> > > >
> > > > 1 ===
> > > >
> > > > +   continue subscribing to publications now on the new primary
> > > > + server
> > > > without
> > > > +   any data loss.
> > > >
> > > > I think "without any data loss" should be re-worded in this
> > > > context. Data loss in the sense "data committed on the primary and
> > > > not visible on the subscriber in case of failover" can still occurs (in 
> > > > case
> synchronous replication is not used).
> > > >
> > > > 2 ===
> > > >
> > > > +   If the result (failover_ready) of both above 
> > > > steps is
> > > > +   true, existing subscriptions will be able to continue without data
> loss.
> > > > +  
> > > >
> > > > I don't think that's true if synchronous replication is not used.
> > > > Say,
> > > >
> > > > - synchronous replication is not used
> > > > - primary is not able to reach the standby anymore and
> > > > standby_slot_names is set
> > > > - new data is inserted into the primary
> > > > - then not replicated to subscriber (due to standby_slot_names)
> > > >
> > > > Then I think the both above steps will return true but data would
> > > > be lost in case of failover.
> > >
> > > Thanks for the comments, attach the new version patch which reworded
> > > the above places.
> >
> > Thanks for the patch.
> >
> > Few comments:
> >
> > 1)  Tested the steps, one of the queries still refers to
> > 'conflict_reason'. I think it should refer 'conflicting'.

Thanks for catching this. Fixed.

> >
> > 2) Will it be good to mention that in case of planned promotion, it is
> > recommended to run  pg_sync_replication_slots() as last sync attempt
> > before we run failvoer-ready validation steps? This can be mentioned
> > in high-availaibility.sgml of current patch
> 
> I recall now that with the latest fix, we cannot run
> pg_sync_replication_slots() unless we disable the slot-sync worker.
> Considering that, I think it will be too many steps just to run the SQL 
> function at
> the end without much value added. Thus we can skip this point, we can rely on
> slot sync worker completely.

Agreed. I didn't change this.

Here is the V3 doc patch.

Best Regards,
Hou zj


v3-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch
Description:  v3-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch

RE: Synchronizing slots from primary to standby

2024-04-28 Thread Zhijie Hou (Fujitsu)

On Friday, March 15, 2024 10:45 PM Bertrand Drouvot 
 wrote:
> 
> Hi,
> 
> On Thu, Mar 14, 2024 at 02:22:44AM +0000, Zhijie Hou (Fujitsu) wrote:
> > Hi,
> >
> > Since the standby_slot_names patch has been committed, I am attaching
> > the last doc patch for review.
> >
> 
> Thanks!
> 
> 1 ===
> 
> +   continue subscribing to publications now on the new primary server
> without
> +   any data loss.
> 
> I think "without any data loss" should be re-worded in this context. Data 
> loss in
> the sense "data committed on the primary and not visible on the subscriber in
> case of failover" can still occurs (in case synchronous replication is not 
> used).
> 
> 2 ===
> 
> +   If the result (failover_ready) of both above steps is
> +   true, existing subscriptions will be able to continue without data loss.
> +  
> 
> I don't think that's true if synchronous replication is not used. Say,
> 
> - synchronous replication is not used
> - primary is not able to reach the standby anymore and standby_slot_names is
> set
> - new data is inserted into the primary
> - then not replicated to subscriber (due to standby_slot_names)
> 
> Then I think the both above steps will return true but data would be lost in 
> case
> of failover.

Thanks for the comments, attach the new version patch which reworded the
above places.

Best Regards,
Hou zj


v2-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch
Description:  v2-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch

RE: Improving the latch handling between logical replication launcher and worker processes.

2024-04-26 Thread Zhijie Hou (Fujitsu)

On Thursday, April 25, 2024 4:59 PM vignesh C  wrote:
> 
> Hi,
> 
> Currently the launcher's latch is used for the following: a) worker process 
> attach
> b) worker process exit and c) subscription creation.
> Since this same latch is used for multiple cases, the launcher process is not 
> able
> to handle concurrent scenarios like: a) Launcher started a new apply worker 
> and
> waiting for apply worker to attach and b) create subscription sub2 sending
> launcher wake up signal. In this scenario, both of them will set latch of the
> launcher process, the launcher process is not able to identify that both
> operations have occurred 1) worker is attached 2) subscription is created and
> apply worker should be started. As a result the apply worker does not get
> started for the new subscription created immediately and gets started after 
> the
> timeout of 180 seconds.
> 
> I have started a new thread for this based on suggestions at [1].
> 
> a) Introduce a new latch to handle worker attach and exit.

I found the startup process also uses two latches(see recoveryWakeupLatch) for
different purposes, so maybe this is OK. But note that both logical launcher
and apply worker will call logicalrep_worker_launch(), if we only add one new
latch, it means both workers will wait on the same new Latch, and the launcher
may consume the SetLatch that should have been consumed by the apply
worker(although it's rare).

> b) Add a new GUC launcher_retry_time which gives more flexibility to users as
> suggested by Amit at [1]. Before 5a3a953, the wal_retrieve_retry_interval 
> plays
> a similar role as the suggested new GUC launcher_retry_time, e.g. even if a
> worker is launched, the launcher only wait wal_retrieve_retry_interval time
> before next round.

IIUC, the issue does not happen frequently, and may not be noticeable where
apply workers wouldn't be restarted often. So, I am slightly not sure if it's
worth adding a new GUC.

> c) Don't reset the latch at worker attach and allow launcher main to identify 
> and
> handle it. For this there is a patch v6-0002 available at [2].

This seems simple. I found we are doing something similar in
RegisterSyncRequest() and WalSummarizerMain().

Best Regards,
Hou zj

RE: Race condition in FetchTableStates() breaks synchronization of subscription tables

2024-04-24 Thread Zhijie Hou (Fujitsu)

On Wednesday, April 24, 2024 6:29 PM vignesh C  wrote:
> 
> On Wed, 24 Apr 2024 at 11:59, Amit Kapila  wrote:
> >
> > On Wed, Mar 13, 2024 at 9:19 AM vignesh C  wrote:
> > >
> > > On Tue, 12 Mar 2024 at 09:34, Ajin Cherian  wrote:
> > > >
> > > >
> > > >
> > > > On Tue, Mar 12, 2024 at 2:59 PM vignesh C 
> wrote:
> > > >>
> > > >>
> > > >> Thanks, I have created the following Commitfest entry for this:
> > > >> https://commitfest.postgresql.org/47/4816/
> > > >>
> > > >> Regards,
> > > >> Vignesh
> > > >
> > > >
> > > > Thanks for the patch, I have verified that the fix works well by 
> > > > following
> the steps mentioned to reproduce the problem.
> > > > Reviewing the patch, it seems good and is well documented. Just one
> minor comment I had was probably to change the name of the variable
> table_states_valid to table_states_validity. The current name made sense when
> it was a bool, but now that it is a tri-state enum, it doesn't fit well.
> > >
> > > Thanks for reviewing the patch, the attached v6 patch has the
> > > changes for the same.
> > >
> >
> > v6_0001* looks good to me. This should be backpatched unless you or
> > others think otherwise.
> 
> This patch needs to be committed in master,PG16 and PG15.
> This is not required from PG14 onwards because they don't have
> HasSubscriptionRelations call before updating table_states_valid:
> /*
>  * Does the subscription have tables?
>  *
>  * If there were not-READY relations found then we know it does. But
>  * if table_states_not_ready was empty we still need to check again to
>  * see if there are 0 tables.
>  */
> has_subrels = (table_states_not_ready != NIL) ||
>   HasSubscriptionRelations(MySubscription->oid);
> 
> So the invalidation function will not be called here.
> 
> Whereas for PG15 and newer versions this is applicable:
> HasSubscriptionRelations calls table_open function which will get the
> invalidate callback like in:
> HasSubscriptionRelations -> table_open -> relation_open -> LockRelationOid
> -> AcceptInvalidationMessages -> ReceiveSharedInvalidMessages ->
> LocalExecuteInvalidationMessage -> CallSyscacheCallbacks ->
> invalidate_syncing_table_states
> 
> The attached patch
> v7-0001-Table-sync-missed-due-to-race-condition-in-subscr.patch
> applies for master and PG16 branch,
> v7-0001-Table-sync-missed-due-to-race-condition-in-subscr_PG15.patch
> applies for PG15 branch.

Thanks, I have verified that the patches can be applied cleanly and fix the
issue on each branch. The regression test can also pass after applying the patch
on my machine.

Best Regards,
Hou zj

RE: promotion related handling in pg_sync_replication_slots()

2024-04-21 Thread Zhijie Hou (Fujitsu)

On Friday, April 19, 2024 4:22 PM shveta malik  wrote:
> On Fri, Apr 19, 2024 at 11:37 AM shveta malik  wrote:
> >
> > On Fri, Apr 19, 2024 at 10:53 AM Bertrand Drouvot
> >  wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Apr 18, 2024 at 05:36:05PM +0530, shveta malik wrote:
> > > > Please find v8 attached. Changes are:
> > >
> > > Thanks!
> > >
> > > A few comments:
> >
> > Thanks for reviewing.
> >
> > > 1 ===
> > >
> > > @@ -1440,7 +1461,7 @@ ReplSlotSyncWorkerMain(char *startup_data,
> size_t startup_data_len)
> > >  * slotsync_worker_onexit() but that will need the connection to 
> > > be
> made
> > >  * global and we want to avoid introducing global for this 
> > > purpose.
> > >  */
> > > -   before_shmem_exit(slotsync_failure_callback,
> PointerGetDatum(wrconn));
> > > +   before_shmem_exit(slotsync_worker_disconnect,
> > > + PointerGetDatum(wrconn));
> > >
> > > The comment above this change still states "Register the failure
> > > callback once we have the connection", I think it has to be reworded
> > > a bit now that v8 is making use of slotsync_worker_disconnect().
> > >
> > > 2 ===
> > >
> > > +* Register slotsync_worker_onexit() before we register
> > > +* ReplicationSlotShmemExit() in BaseInit(), to ensure that during
> exit of
> > > +* slot sync worker, ReplicationSlotShmemExit() is called first,
> followed
> > > +* by slotsync_worker_onexit(). Startup process during
> > > + promotion waits for
> > >
> > > Worth to mention in shmem_exit() (where it "while
> (--before_shmem_exit_index >= 0)"
> > > or before the shmem_exit() definition) that ReplSlotSyncWorkerMain()
> > > relies on this LIFO behavior? (not sure if there is other "strong"
> > > LIFO requirement in other part of the code).
> >
> > I see other modules as well relying on LIFO behavior.
> > Please see applyparallelworker.c where
> > 'before_shmem_exit(pa_shutdown)' is needed to be done after
> > 'before_shmem_exit(logicalrep_worker_onexit)' (commit id 3d144c6).
> > Also in postinit.c, I see such comments atop
> > 'before_shmem_exit(ShutdownPostgres, 0)'.
> > I feel we can skip adding this specific comment about
> > ReplSlotSyncWorkerMain() in ipc.c, as none of the other modules has
> > also not added any. I will address the rest of your comments in the
> > next version.
> >
> > > 3 ===
> > >
> > > +* Startup process during promotion waits for slot sync to finish
> and it
> > > +* does that by checking the 'syncing' flag.
> > >
> > > worth to mention ShutDownSlotSync()?
> > >
> > > 4 ===
> > >
> > > I did a few tests manually (launching ShutDownSlotSync() through gdb
> > > / with and without sync worker and with / without
> > > pg_sync_replication_slots() running
> > > concurrently) and it looks like it works as designed.
> >
> > Thanks for testing it.
> >
> > > Having said that, the logic that is in place to take care of the
> > > corner cases described up-thread seems reasonable to me.
> 
> Please find v9 with the above comments addressed.

Thanks, the patch looks good to me. I also tested a few concurrent
promotion/function execution cases and didn't find issues.

Best Regards,
Hou zj

RE: Disallow changing slot's failover option in transaction block

2024-04-21 Thread Zhijie Hou (Fujitsu)

On Friday, April 19, 2024 10:54 AM Kuroda, Hayato/黒田 隼人 
 wrote:
> In your patch, the pg_dump.c was updated. IIUC the main reason was that
> pg_restore executes some queries as single transactions so that ALTER
> SUBSCRIPTION cannot be done, right?

Yes, and please see below for other reasons.

> Also, failover was synchronized only when we were in the upgrade mode, but
> your patch seems to remove the condition. Can you clarify the reason?

We used ALTER SUBSCRIPTION in upgrade mode because it was not allowed to use
connect=false and failover=true together when CREATE SUBSCRIPTION. But since we
don't have this restriction anymore(we don't alter slot when creating sub
anymore), we can directly specify failover in CREATE SUBSCRIPTION and do that
in non-upgrade mode as well.

Attach the V3 patch which also addressed Shveta[1] and Bertrand[2]'s comments.

[1] 
https://www.postgresql.org/message-id/CAJpy0uD3YOeDg-tTCUi3EZ8vznRDfDqO_k6LepJpXUV1Z_%3DgkA%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/ZiIxuaiINsuaWuDK%40ip-10-97-1-34.eu-west-3.compute.internal

Best Regards,
Hou zj

v3-0001-Fix-the-handling-of-failover-option-in-subscripti.patch
Description:  v3-0001-Fix-the-handling-of-failover-option-in-subscripti.patch

RE: Disallow changing slot's failover option in transaction block

2024-04-18 Thread Zhijie Hou (Fujitsu)

On Thursday, April 18, 2024 1:52 PM Amit Kapila  wrote:
> 
> On Tue, Apr 16, 2024 at 5:06 PM shveta malik 
> wrote:
> >
> > On Tue, Apr 16, 2024 at 1:45 PM Hayato Kuroda (Fujitsu)
> >  wrote:
> > >
> > > Dear Hou,
> > >
> > > > Kuroda-San reported an issue off-list that:
> > > >
> > > > If user execute ALTER SUBSCRIPTION SET (failover) command inside a
> > > > txn block and rollback, only the subscription option change can be
> > > > rolled back, while the replication slot's failover change is preserved.
> > > >
> > > > This is because ALTER SUBSCRIPTION SET (failover) command
> > > > internally executes the replication command ALTER_REPLICATION_SLOT
> > > > to change the replication slot's failover property, but this
> > > > replication command execution cannot be rollback.
> > > >
> > > > To fix it, I think we can prevent user from executing ALTER
> > > > SUBSCRIPTION set
> > > > (failover) inside a txn block, which is also consistent to the
> > > > ALTER SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach
> a
> > > > small patch to address this.
> > >
> > > Thanks for posting the patch, the fix is same as my expectation.
> > > Also, should we add the restriction to the doc? I feel [1] can be updated.
> >
> > +1.
> >
> > Similar to ALTER SUB, CREATE SUB also needs to be fixed. This is
> > because we call alter_replication_slot in CREATE SUB as well, for the
> > case when slot_name is provided and create_slot=false. But the tricky
> > part is we call alter_replication_slot() when creating subscription
> > for both failover=true and false. That means if we want to fix it on
> > the similar line of ALTER SUB, we have to disallow user from executing
> > the CREATE SUBSCRIPTION (slot_name = xx) in a txn block, which seems
> > to break some existing use cases. (previously, user can execute such a
> > command inside a txn block).
> >
> > So, we need to think if there are better ways to fix it.  After
> > discussion with Hou-San offlist, here are some ideas:
> >
> > 1. do not alter replication slot's failover option when CREATE
> > SUBSCRIPTION   WITH failover=false. This means we alter replication
> > slot only when failover is set to true. And thus we can disallow
> > CREATE SUB WITH (slot_name =xx, failover=true, create_slot=false)
> > inside a txn block.
> >
> > This option allows user to run CREATE-SUB(create_slot=false) with
> > failover=false in txn block like earlier. But on the downside, it
> > makes the behavior inconsistent for otherwise simpler option like
> > failover,  i.e. with failover=true, CREATE SUB is not allowed in txn
> > block while with failover=false, it is allowed. It makes it slightly
> > complex to be understood by user.
> >
> > 2. let's not disallow CREATE SUB in txn block as earlier, just don't
> > perform internal alter-failover during CREATE SUB for existing
> > slots(create_slot=false, slot_name=xx)  i.e. when create_slot is
> > false, we will ignore failover parameter of CREATE SUB and it is
> > user's responsibility to set it appropriately using ALTER SUB cmd. For
> > create_slot=true, the behaviour of CREATE-SUB remains same as earlier.
> >
> > This option does not add new restriction for CREATE SUB wrt txn block.
> > In context of failover with create_slot=false, we already have a
> > similar restriction (documented one) for ALTER SUB, i.e. with 'ALTER
> > SUBSCRIPTION SET(slot_name = new)', user needs to alter the new slot's
> > failover by himself. CREAT SUB can also be documented in similar way.
> > This seems simpler to be understood considering existing ALTER SUB's
> > behavior as well. Plus, this will make CREATE-SUB code slightly
> > simpler and thus easily manageable.
> >
> 
> +1 for option 2 as it sounds logical to me and consistent with ALTER
> SUBSCRIPTION.

+1.

Here is V2 patch which includes the changes for CREATE SUBSCRIPTION as
suggested. Since we don't connect pub to alter slot when (create_slot=false)
anymore, the restriction that disallows failover=true when connect=false is
also removed.

Best Regards,
Hou zj


v2-0001-Fix-the-handling-of-failover-option-in-subscripti.patch
Description:  v2-0001-Fix-the-handling-of-failover-option-in-subscripti.patch

RE: promotion related handling in pg_sync_replication_slots()

2024-04-16 Thread Zhijie Hou (Fujitsu)

On Tuesday, April 16, 2024 2:52 PM Bertrand Drouvot 
 wrote:

Hi,

> On Tue, Apr 16, 2024 at 10:00:04AM +0530, shveta malik wrote:
> > Please find v5 addressing above comments.
> 
> Thanks!
> 
> @@ -1634,9 +1677,14 @@ SyncReplicationSlots(WalReceiverConn *wrconn)  {
> PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback,
> PointerGetDatum(wrconn));
> {
> +   check_flags_and_set_sync_info(InvalidPid);
> +
> 
> Given the fact that if the sync worker is running it won't be possible to 
> trigger a
> manual sync with pg_sync_replication_slots(), what about also checking the
> "sync_replication_slots" value at the start of SyncReplicationSlots() and 
> emmit
> an error if sync_replication_slots is set to on? (The message could explicitly
> states that it's not possible to use the function if sync_replication_slots 
> is set to
> on).

I personally feel adding the additional check for sync_replication_slots may
not improve the situation here. Because the GUC sync_replication_slots can
change at any point, the GUC could be false when performing this addition check
and is set to true immediately after the check, so It could not simplify the 
logic
anyway.

Best Regards,
Hou zj

RE: Race condition in FetchTableStates() breaks synchronization of subscription tables

2024-04-16 Thread Zhijie Hou (Fujitsu)

On Wednesday, March 13, 2024 11:49 AM vignesh C  wrote:
> 
> On Tue, 12 Mar 2024 at 09:34, Ajin Cherian  wrote:
> >
> >
> >
> > On Tue, Mar 12, 2024 at 2:59 PM vignesh C  wrote:
> >>
> >>
> >> Thanks, I have created the following Commitfest entry for this:
> >> https://commitfest.postgresql.org/47/4816/
> >>
> >> Regards,
> >> Vignesh
> >
> >
> > Thanks for the patch, I have verified that the fix works well by following 
> > the
> steps mentioned to reproduce the problem.
> > Reviewing the patch, it seems good and is well documented. Just one minor
> comment I had was probably to change the name of the variable
> table_states_valid to table_states_validity. The current name made sense when
> it was a bool, but now that it is a tri-state enum, it doesn't fit well.
> 
> Thanks for reviewing the patch, the attached v6 patch has the changes for the
> same.

FYI, I noticed that there is one open item on
https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items which is related to
the fix in this thread.

--
Intermittent failures in 040_standby_failover_slots_sync test
Possible solution in this thread: Race condition in FetchTableStates
--

AFAICS, the bug discussed here is not a new issue on
PG17, so I am thinking to move the item to the "Older bugs affecting stable
branches" section if no objections.

Best Regards,
Hou zj

Disallow changing slot's failover option in transaction block

2024-04-16 Thread Zhijie Hou (Fujitsu)

Hi,

Kuroda-San reported an issue off-list that:

If user execute ALTER SUBSCRIPTION SET (failover) command inside a txn block
and rollback, only the subscription option change can be rolled back, while the
replication slot's failover change is preserved.

This is because ALTER SUBSCRIPTION SET (failover) command internally executes
the replication command ALTER_REPLICATION_SLOT to change the replication slot's
failover property, but this replication command execution cannot be
rollback.

To fix it, I think we can prevent user from executing ALTER SUBSCRIPTION set
(failover) inside a txn block, which is also consistent to the ALTER
SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach a small
patch to address this.

Best Regards,
Hou Zhijie



v1-0001-Disallow-alter-subscription-s-failover-option-ins.patch
Description:  v1-0001-Disallow-alter-subscription-s-failover-option-ins.patch

RE: promotion related handling in pg_sync_replication_slots()

2024-04-15 Thread Zhijie Hou (Fujitsu)

On Monday, April 15, 2024 6:09 PM shveta malik  wrote:
> 
> Please find v4 addressing the above comments.

Thanks for the patch.

Here are few comments:

1.

+   ereport(ERROR,
+   errmsg("promotion in progress, can not 
synchronize replication slots"));
+   }

I think an errcode is needed.

The style of the error message seems a bit unnatural to me. I suggest:
"cannot synchronize replication slots when standby promotion is ongoing"

2.

+   if (worker_pid != InvalidPid)
+   Assert(SlotSyncCtx->pid == InvalidPid);

We could merge the checks into one Assert().
Assert(SlotSyncCtx->pid == InvalidPid || worker_pid == InvalidPid);

3.

-   pqsignal(SIGINT, SignalHandlerForShutdownRequest);

I realized that we should register this before setting SlotSyncCtx->pid,
otherwise if the standby is promoted after setting pid and before registering
signal handle function, the slotsync worker could miss to handle SIGINT sent by
startup process(ShutDownSlotSync). This is an existing issue for slotsync
worker, but maybe we could fix it together with the patch.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-04-12 Thread Zhijie Hou (Fujitsu)

On Friday, April 12, 2024 11:31 AM Amit Kapila  wrote:
> 
> On Thu, Apr 11, 2024 at 5:04 PM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Thursday, April 11, 2024 12:11 PM Amit Kapila 
> wrote:
> >
> > >
> > > 2.
> > > - if (remote_slot->restart_lsn < slot->data.restart_lsn)
> > > + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush)
> > >   elog(ERROR,
> > >   "cannot synchronize local slot \"%s\" LSN(%X/%X)"
> > >
> > > Can we be more specific in this message? How about splitting it into
> > > error_message as "cannot synchronize local slot \"%s\"" and then
> > > errdetail as "Local slot's start streaming location LSN(%X/%X) is
> > > ahead of remote slot's LSN(%X/%X)"?
> >
> > Your version looks better. Since the above two messages all have
> > errdetail, I used the style of ereport(ERROR, errmsg_internal(),
> > errdetail_internal()... in the patch which is equal to the elog(ERROR but 
> > has an
> additional detail message.
> >
> 
> makes sense.
> 
> > Here is V5 patch set.
> >
> 
> I think we should move the check to not advance slot when one of
> remote_slot's restart_lsn or catalog_xmin is lesser than the local slot inside
> update_local_synced_slot() as we want to prevent updating slot in those cases
> even during slot synchronization.

Agreed. Here is the V6 patch which addressed this. I have merged the
two patches into one.

Best Regards,
Hou zj


v6-0001-Fix-the-handling-of-LSN-and-xmin-in-slot-sync.patch
Description:  v6-0001-Fix-the-handling-of-LSN-and-xmin-in-slot-sync.patch

RE: Synchronizing slots from primary to standby

2024-04-11 Thread Zhijie Hou (Fujitsu)

On Thursday, April 11, 2024 12:11 PM Amit Kapila  
wrote:
> 
> On Wed, Apr 10, 2024 at 5:28 PM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Thursday, April 4, 2024 5:37 PM Amit Kapila 
> wrote:
> > >
> > > BTW, while thinking on this one, I
> > > noticed that in the function LogicalConfirmReceivedLocation(), we
> > > first update the disk copy, see comment [1] and then in-memory
> > > whereas the same is not true in
> > > update_local_synced_slot() for the case when snapshot exists. Now,
> > > do we have the same risk here in case of standby? Because I think we
> > > will use these xmins while sending the feedback message (in
> XLogWalRcvSendHSFeedback()).
> > >
> > > * We have to write the changed xmin to disk *before* we change
> > > * the in-memory value, otherwise after a crash we wouldn't know
> > > * that some catalog tuples might have been removed already.
> >
> > Yes, I think we have the risk on the standby, I can reproduce the case
> > that if the server crashes after updating the in-memory value and
> > before saving them to disk, the synced slot could be invalidated after
> > restarting from crash, because the necessary rows have been removed on
> > the primary. The steps can be found in [1].
> >
> > I think we'd better fix the order in update_local_synced_slot() as
> > well. I tried to make the fix in 0002, 0001 is Shveta's patch to fix
> > another issue in this thread. Since they are touching the same function, so
> attach them together for review.
> >
> 
> Few comments:
> ===
> 1.
> +
> + /* Sanity check */
> + if (slot->data.confirmed_flush != remote_slot->confirmed_lsn)
> + ereport(LOG, errmsg("synchronized confirmed_flush for slot \"%s\"
> + differs from
> remote slot",
> +remote_slot->name),
> 
> Is there a reason to use elevel as LOG instead of ERROR? I think it should be
> elog(ERROR, .. as this is an unexpected case.

Agreed.

> 
> 2.
> - if (remote_slot->restart_lsn < slot->data.restart_lsn)
> + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush)
>   elog(ERROR,
>   "cannot synchronize local slot \"%s\" LSN(%X/%X)"
> 
> Can we be more specific in this message? How about splitting it into
> error_message as "cannot synchronize local slot \"%s\"" and then errdetail as
> "Local slot's start streaming location LSN(%X/%X) is ahead of remote slot's
> LSN(%X/%X)"?

Your version looks better. Since the above two messages all have errdetail, I
used the style of ereport(ERROR, errmsg_internal(), errdetail_internal()... in
the patch which is equal to the elog(ERROR but has an additional detail message.

Here is V5 patch set.

Best Regards,
Hou zj


v5-0002-write-the-changed-xmin-to-disk-before-computing-g.patch
Description:  v5-0002-write-the-changed-xmin-to-disk-before-computing-g.patch


v5-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch
Description:  v5-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch

RE: Synchronizing slots from primary to standby

2024-04-10 Thread Zhijie Hou (Fujitsu)


On Thursday, April 4, 2024 5:37 PM Amit Kapila  wrote:
> 
> BTW, while thinking on this one, I
> noticed that in the function LogicalConfirmReceivedLocation(), we first update
> the disk copy, see comment [1] and then in-memory whereas the same is not
> true in
> update_local_synced_slot() for the case when snapshot exists. Now, do we have
> the same risk here in case of standby? Because I think we will use these xmins
> while sending the feedback message (in XLogWalRcvSendHSFeedback()).
>
> * We have to write the changed xmin to disk *before* we change
> * the in-memory value, otherwise after a crash we wouldn't know
> * that some catalog tuples might have been removed already.

Yes, I think we have the risk on the standby, I can reproduce the case that if
the server crashes after updating the in-memory value and before saving them to
disk, the synced slot could be invalidated after restarting from crash, because
the necessary rows have been removed on the primary. The steps can be found in
[1].

I think we'd better fix the order in update_local_synced_slot() as well. I
tried to make the fix in 0002, 0001 is Shveta's patch to fix another issue in 
this thread. Since
they are touching the same function, so attach them together for review.

[1]
-- Primary:
SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot', 
'test_decoding', false, false, true);

-- Standby:
SELECT 'init' FROM pg_create_logical_replication_slot('standbylogicalslot', 
'test_decoding', false, false, false);
SELECT pg_sync_replication_slots();

-- Primary:
CREATE TABLE test (a int);
INSERT INTO test VALUES(1);
DROP TABLE test;

SELECT txid_current();
SELECT txid_current();
SELECT txid_current();
SELECT pg_log_standby_snapshot();

SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn());

-- Standby:
- wait for standby to replay all the changes on the primary.

- this is to serialize snapshots.
SELECT pg_replication_slot_advance('standbylogicalslot', 
pg_last_wal_replay_lsn());

- Use gdb to stop at the place after calling ReplicationSlotsComputexx()
  functions and before calling ReplicationSlotSave().
SELECT pg_sync_replication_slots();

-- Primary:

- First, wait for the primary slot(the physical slot)'s catalog xmin to be
  updated to the same as the failover slot.

VACUUM FULL;

- Wait for VACUMM FULL to be replayed on standby.

-- Standby:

- For the process which is blocked by gdb, let the process crash (elog(PANIC,
  ...)).

After restarting the standby from crash, we can see the synced slot is 
invalidated.

LOG:  invalidating obsolete replication slot "logicalslot"
DETAIL:  The slot conflicted with xid horizon 741.
CONTEXT:  WAL redo at 0/3059B90 for Heap2/PRUNE_ON_ACCESS: 
snapshotConflictHorizon: 741, isCatalogRel: T, nplans: 0, nredirected: 0, 
ndead: 7, nunused: 0, dead: [22, 23, 24, 25, 26, 27, 28]; blkref #0: rel 
1663/5/1249, blk 16


Best Regards,
Hou zj


v4-0002-write-the-changed-xmin-to-disk-before-computing-g.patch
Description:  v4-0002-write-the-changed-xmin-to-disk-before-computing-g.patch


v4-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch
Description:  v4-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch

RE: Synchronizing slots from primary to standby

2024-04-09 Thread Zhijie Hou (Fujitsu)

On Thursday, April 4, 2024 4:25 PM Masahiko Sawada  
wrote:

Hi,

> On Wed, Apr 3, 2024 at 7:06 PM Amit Kapila 
> wrote:
> >
> > On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila 
> wrote:
> > >
> > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
> > >  wrote:
> > >
> > > > I quickly looked at v8, and have a nit, rest all looks good.
> > > >
> > > > +if (DecodingContextReady(ctx) &&
> found_consistent_snapshot)
> > > > +*found_consistent_snapshot = true;
> > > >
> > > > Can the found_consistent_snapshot be checked first to help avoid
> > > > the function call DecodingContextReady() for
> > > > pg_replication_slot_advance callers?
> > > >
> > >
> > > Okay, changed. Additionally, I have updated the comments and commit
> > > message. I'll push this patch after some more testing.
> > >
> >
> > Pushed!
> 
> While testing this change, I realized that it could happen that the server 
> logs
> are flooded with the following logical decoding logs that are written every 
> 200
> ms:

Thanks for reporting!

> 
> 2024-04-04 16:15:19.270 JST [3838739] LOG:  starting logical decoding for slot
> "test_sub"
> 2024-04-04 16:15:19.270 JST [3838739] DETAIL:  Streaming transactions
> committing after 0/50006F48, reading WAL from 0/50006F10.
> 2024-04-04 16:15:19.270 JST [3838739] LOG:  logical decoding found
> consistent point at 0/50006F10
> 2024-04-04 16:15:19.270 JST [3838739] DETAIL:  There are no running
> transactions.
> 2024-04-04 16:15:19.477 JST [3838739] LOG:  starting logical decoding for slot
> "test_sub"
> 2024-04-04 16:15:19.477 JST [3838739] DETAIL:  Streaming transactions
> committing after 0/50006F48, reading WAL from 0/50006F10.
> 2024-04-04 16:15:19.477 JST [3838739] LOG:  logical decoding found
> consistent point at 0/50006F10
> 2024-04-04 16:15:19.477 JST [3838739] DETAIL:  There are no running
> transactions.
> 
> For example, I could reproduce it with the following steps:
> 
> 1. create the primary and start.
> 2. run "pgbench -i -s 100" on the primary.
> 3. run pg_basebackup to create the standby.
> 4. configure slotsync setup on the standby and start.
> 5. create a publication for all tables on the primary.
> 6. create the subscriber and start.
> 7. run "pgbench -i -Idtpf" on the subscriber.
> 8. create a subscription on the subscriber (initial data copy will start).
> 
> The logical decoding logs were written every 200 ms during the initial data
> synchronization.
> 
> Looking at the new changes for update_local_synced_slot():
...
> We call LogicalSlotAdvanceAndCheckSnapState() if one of confirmed_lsn,
> restart_lsn, and catalog_xmin is different between the remote slot and the 
> local
> slot. In my test case, during the initial sync performing, only catalog_xmin 
> was
> different and there was no serialized snapshot at restart_lsn, and the 
> slotsync
> worker called LogicalSlotAdvanceAndCheckSnapState(). However no slot
> properties were changed even after the function and it set slot_updated = 
> true.
> So it starts the next slot synchronization after 200ms.

I was trying to reproduce this and check why the catalog_xmin is different
among synced slot and remote slot, but I was not able to reproduce the case
where there are lots of logical decoding logs. The script I used is attached.

Would it be possible for you to share the script you used to reproduce this
issue? Alternatively, could you please share the log files from both the
primary and standby servers after reproducing the problem (it would be greatly
helpful if you could set the log level to DEBUG2).

Best Regards,
Hou zj


test.sh
Description: test.sh

RE: Synchronizing slots from primary to standby

2024-04-08 Thread Zhijie Hou (Fujitsu)

On Monday, April 8, 2024 6:32 PM Amit Kapila  wrote:
> 
> On Mon, Apr 8, 2024 at 12:19 PM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Saturday, April 6, 2024 12:43 PM Amit Kapila 
> wrote:
> > > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
> > >  wrote:
> > >
> > > Yeah, that could be the first step. We can probably add an injection
> > > point to control the bgwrite behavior and then add tests involving
> > > walsender performing the decoding. But I think it is important to
> > > have sufficient tests in this area as I see they are quite helpful in 
> > > uncovering
> the issues.
> >
> > Here is the patch to drop the subscription in the beginning so that
> > the restart_lsn of the lsub1_slot won't be advanced due to concurrent
> > xl_running_xacts from bgwriter. The subscription will be re-created
> > after all the slots are sync-ready. I think maybe we can use this to
> > stabilize the test as a first step and then think about how to make
> > use of injection point to add more tests if it's worth it.
> >
> 
> Pushed.

Thanks for pushing.

I checked the BF status, and noticed one BF failure, which I think is related to
a miss in the test code.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-04-08%2012%3A04%3A27

From the following log, I can see the sync failed because the standby is
lagging behind of the failover slot.

-
# No postmaster PID for node "cascading_standby"
error running SQL: 'psql::1: ERROR:  skipping slot synchronization as 
the received slot sync LSN 0/4000148 for slot "snap_test_slot" is ahead of the 
standby position 0/4000114'
while running 'psql -XAtq -d port=50074 host=/tmp/t4HQFlrDmI dbname='postgres' 
-f - -v ON_ERROR_STOP=1' with sql 'SELECT pg_sync_replication_slots();' at 
/home/bf/bf-build/adder/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm 
line 2042.
# Postmaster PID for node "publisher" is 3715298
-

I think it's because we missed to call wait_for_replay_catchup before syncing
slots.

-
$primary->safe_psql('postgres',
"SELECT pg_create_logical_replication_slot('snap_test_slot', 
'test_decoding', false, false, true);"
);
# ? missed to wait here
$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
-

While testing, I noticed another place where we were calling
wait_for_replay_catchup before doing pg_replication_slot_advance, which also has
a small possibility to cause the failover slot to be ahead of the standby if
some logs are written in between these two steps. So, I adjusted them together.

Here is a small patch to improve the test.

Best Regards,
Hou zj

0001-Ensure-the-standby-is-not-lagging-behind-the-failove.patch
Description:  0001-Ensure-the-standby-is-not-lagging-behind-the-failove.patch

RE: Synchronizing slots from primary to standby

2024-04-08 Thread Zhijie Hou (Fujitsu)

On Saturday, April 6, 2024 12:43 PM Amit Kapila  wrote:
> On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
>  wrote:
> >
> > On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote:
> > > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila 
> wrote:
> > > Thinking more on this, it doesn't seem related to
> > > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't
> > > change any locking or something like that which impacts write positions.
> >
> > Agree.
> >
> > > I think what has happened here is that running_xact record written
> > > by the background writer [1] is not written to the kernel or disk
> > > (see LogStandbySnapshot()), before pg_current_wal_lsn() checks the
> > > current_lsn to be compared with replayed LSN.
> >
> > Agree, I think it's not visible through pg_current_wal_lsn() yet.
> >
> > Also I think that the DEBUG message in LogCurrentRunningXacts()
> >
> > "
> > elog(DEBUG2,
> >  "snapshot of %d+%d running transaction ids (lsn %X/%X oldest
> xid %u latest complete %u next xid %u)",
> >  CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt,
> >  LSN_FORMAT_ARGS(recptr),
> >  CurrRunningXacts->oldestRunningXid,
> >  CurrRunningXacts->latestCompletedXid,
> >  CurrRunningXacts->nextXid); "
> >
> > should be located after the XLogSetAsyncXactLSN() call. Indeed, the
> > new LSN is visible after the spinlock (XLogCtl->info_lck) in
> > XLogSetAsyncXactLSN() is released,
> >
> 
> I think the new LSN can be visible only when the corresponding WAL is written
> by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can make it
> visible. In your experiment below, isn't it possible that in the meantime WAL
> writer has written that WAL due to which you are seeing an updated location?
> 
> >see:
> >
> > \watch on Session 1 provides:
> >
> >  pg_current_wal_lsn
> > 
> >  0/87D110
> > (1 row)
> >
> > Until:
> >
> > Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at
> xlog.c:2579
> > 2579XLogRecPtr  WriteRqstPtr = asyncXactLSN;
> > (gdb) n
> > 2581boolwakeup = false;
> > (gdb)
> > 2584SpinLockAcquire(>info_lck);
> > (gdb)
> > 2585RefreshXLogWriteResult(LogwrtResult);
> > (gdb)
> > 2586sleeping = XLogCtl->WalWriterSleeping;
> > (gdb)
> > 2587prevAsyncXactLSN = XLogCtl->asyncXactLSN;
> > (gdb)
> > 2588if (XLogCtl->asyncXactLSN < asyncXactLSN)
> > (gdb)
> > 2589XLogCtl->asyncXactLSN = asyncXactLSN;
> > (gdb)
> > 2590SpinLockRelease(>info_lck);
> > (gdb) p p/x (uint32) XLogCtl->asyncXactLSN
> > $1 = 0x87d148
> >
> > Then session 1 provides:
> >
> >  pg_current_wal_lsn
> > 
> >  0/87D148
> > (1 row)
> >
> > So, when we see in the log:
> >
> > 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:
> > snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740
> > latest complete 739 next xid 740)
> > 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG:
> statement: SELECT '0/360' <= replay_lsn AND state = 'streaming'
> >
> > It's indeed possible that the new LSN was not visible yet (spinlock
> > not released?) before the query began (because we can not rely on the
> > time the DEBUG message has been emitted).
> >
> > > Note that the reason why
> > > walsender has picked the running_xact written by background writer
> > > is because it has checked after pg_current_wal_lsn() query, see LOGs [2].
> > > I think we can probably try to reproduce manually via debugger.
> > >
> > > If this theory is correct
> >
> > It think it is.
> >
> > > then I think we will need to use injection points to control the
> > > behavior of bgwriter or use the slots created via SQL API for
> > > syncing in tests.
> > >
> > > Thoughts?
> >
> > I think that maybe as a first step we should move the "elog(DEBUG2,"
> > message as proposed above to help debugging (that could help to confirm
> the above theory).
> >
> 
> I think I am missing how exactly moving DEBUG2 can confirm the above theory.
> 
> > If the theory is proven then I'm not sure we need the extra complexity
> > of injection point here, maybe just relying on the slots created via
> > SQL API could be enough.
> >
> 
> Yeah, that could be the first step. We can probably add an injection point to
> control the bgwrite behavior and then add tests involving walsender
> performing the decoding. But I think it is important to have sufficient tests 
> in
> this area as I see they are quite helpful in uncovering the issues.

Here is the patch to drop the subscription in the beginning so that the
restart_lsn of the lsub1_slot won't be advanced due to concurrent
xl_running_xacts from bgwriter. The subscription will be re-created after all
the slots are sync-ready. I think maybe we can use this to stabilize the test
as a first step and then think about how to make use of injection point

RE: Synchronizing slots from primary to standby

2024-04-02 Thread Zhijie Hou (Fujitsu)

On Tuesday, April 2, 2024 8:49 PM Bharath Rupireddy 
 wrote:
> 
> On Tue, Apr 2, 2024 at 2:11 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > CFbot[1] complained about one query result's order in the tap-test, so I am
> > attaching a V7 patch set which fixed this. There are no changes in 0001.
> >
> > [1] https://cirrus-ci.com/task/6375962162495488
> 
> Thanks. Here are some comments:

Thanks for the comments.

> 
> 1. Can we just remove pg_logical_replication_slot_advance and use
> LogicalSlotAdvanceAndCheckSnapState instead? If worried about the
> function naming, LogicalSlotAdvanceAndCheckSnapState can be renamed to
> pg_logical_replication_slot_advance?
> 
> + * Advance our logical replication slot forward. See
> + * LogicalSlotAdvanceAndCheckSnapState for details.
>   */
>  static XLogRecPtr
>  pg_logical_replication_slot_advance(XLogRecPtr moveto)
>  {

It was commented[1] that it's not appropriate for the
pg_logical_replication_slot_advance to have an out parameter
'ready_for_decoding' which looks bit awkward as the functionality doesn't match
the name, and is also not consistent with the style of
pg_physical_replication_slot_advance(). So, we added a new function.

> 
> 2.
> +if (!ready_for_decoding)
> +{
> +elog(DEBUG1, "could not find consistent point for synced
> slot; restart_lsn = %X/%X",
> + LSN_FORMAT_ARGS(slot->data.restart_lsn));
> 
> Can we specify the slot name in the message?

Added.

> 
> 3. Also, can the "could not find consistent point for synced slot;
> restart_lsn = %X/%X" be emitted at LOG level just like other messages
> in update_and_persist_local_synced_slot. Although, I see "XXX should
> this be changed to elog(DEBUG1) perhaps?", these messages need to be
> at LOG level as they help debug issues if at all they are hit.

Changed to LOG and reworded the message.

> 
> 4. How about using found_consistent_snapshot instead of
> ready_for_decoding? A general understanding is that the synced slots
> are not allowed for decoding (although with this fix, we do that for
> internal purposes), ready_for_decoding looks a bit misleading.

Agreed and renamed.

> 
> 5. As far as the test case for this issue is concerned, I'm fine with
> adding one using an INJECTION point because we seem to be having no
> consistent way to control postgres writing current snapshot to WAL.

Since me and my colleagues can reproduce the issue consistently after applying
0002 and it could be rare for concurrent xl_running_xacts to happen, we 
discussed[2] to
consider adding the INJECTION point after pushing the main fix.

> 
> 6. A nit: can we use "fast_forward mode" instead of "fast-forward
> mode" just to be consistent?
> + * logical changes unless we are in fast-forward mode where no changes
> are
> 
> 7.
> +/*
> + * We need to access the system tables during decoding to build the
> + * logical changes unless we are in fast-forward mode where no changes
> are
> + * generated.
> + */
> +if (slot->data.database != MyDatabaseId && !fast_forward)
> 
> May I know if we need this change for this fix?

The slotsync worker needs to advance the slots from different databases in
fast_forward. So, we need to skip this check in fast_forward mode. The analysis 
can
be found in [3].

Attach the V8 patch which addressed above comments.


[1] 
https://www.postgresql.org/message-id/CAA4eK1%2BwkaRi2BrLLC_0gKbHN68Awc9dRp811G3An6A6fuqdOg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/ZgvI9iAUWCZ17z5V%40ip-10-97-1-34.eu-west-3.compute.internal
[3] 
https://www.postgresql.org/message-id/CAJpy0uCQ2PDCAqcnbdOz6q_ZqmBfMyBpVqKDqL_XZBP%3DeK-1yw%40mail.gmail.com

Best Regards,
Hou zj


v8-0002-test-the-data-loss-case.patch
Description: v8-0002-test-the-data-loss-case.patch


v8-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description:  v8-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch

RE: Synchronizing slots from primary to standby

2024-04-02 Thread Zhijie Hou (Fujitsu)

On Tuesday, April 2, 2024 3:21 PM Zhijie Hou (Fujitsu)  
wrote:
> On Tuesday, April 2, 2024 8:35 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Monday, April 1, 2024 7:30 PM Amit Kapila 
> > wrote:
> > >
> > > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu)
> > > 
> > > wrote:
> > > >
> > > > On Friday, March 29, 2024 2:50 PM Amit Kapila
> > > > 
> > > wrote:
> > > > >
> > > >
> > > > >
> > > > >
> > > > > 2.
> > > > > +extern XLogRecPtr
> > > > > +pg_logical_replication_slot_advance(XLogRecPtr
> > > moveto,
> > > > > +   bool *found_consistent_point);
> > > > > +
> > > > >
> > > > > This API looks a bit awkward as the functionality doesn't match
> > > > > the name. How about having a function with name
> > > > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > > > > ready_for_decoding) with the same functionality as your patch
> > > > > has for
> > > > > pg_logical_replication_slot_advance() and then invoke it both
> > > > > from pg_logical_replication_slot_advance and slotsync.c. The
> > > > > function name is too big, we can think of a shorter name. Any ideas?
> > > >
> > > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> > > > LogicalSlotAdvanceAndCheckDecoding()?
> > > >
> > >
> > > It is about snapbuild state, so how about naming the function as
> > > LogicalSlotAdvanceAndCheckSnapState()?
> >
> > It looks better to me, so changed.
> >
> > >
> > > I have made quite a few cosmetic changes in comments and code. See
> > > attached. This is atop your latest patch. Can you please review and
> > > include these changes in the next version?
> >
> > Thanks, I have reviewed and merged them.
> > Attach the V5 patch set which addressed above comments and ran pgindent.
> 
> I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which 
> can
> reproduce the data loss issue consistently on my machine. It may not
> reproduce in some rare cases if concurrent xl_running_xacts are written by
> bgwriter, but I think it's still valuable if it can verify the fix in most 
> cases. The test
> will fail if directly applied on HEAD, and will pass after applying atop of 
> 0001.

CFbot[1] complained about one query result's order in the tap-test, so I am
attaching a V7 patch set which fixed this. There are no changes in 0001.

[1] https://cirrus-ci.com/task/6375962162495488

Best Regards,
Hou zj



v7-0002-test-the-data-loss-case.patch
Description: v7-0002-test-the-data-loss-case.patch


v7-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description:  v7-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch

RE: Synchronizing slots from primary to standby

2024-04-02 Thread Zhijie Hou (Fujitsu)

On Tuesday, April 2, 2024 8:35 AM Zhijie Hou (Fujitsu)  
wrote:
> 
> On Monday, April 1, 2024 7:30 PM Amit Kapila 
> wrote:
> >
> > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu)
> > 
> > wrote:
> > >
> > > On Friday, March 29, 2024 2:50 PM Amit Kapila
> > > 
> > wrote:
> > > >
> > >
> > > >
> > > >
> > > > 2.
> > > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr
> > moveto,
> > > > +   bool *found_consistent_point);
> > > > +
> > > >
> > > > This API looks a bit awkward as the functionality doesn't match
> > > > the name. How about having a function with name
> > > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > > > ready_for_decoding) with the same functionality as your patch has
> > > > for
> > > > pg_logical_replication_slot_advance() and then invoke it both from
> > > > pg_logical_replication_slot_advance and slotsync.c. The function
> > > > name is too big, we can think of a shorter name. Any ideas?
> > >
> > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> > > LogicalSlotAdvanceAndCheckDecoding()?
> > >
> >
> > It is about snapbuild state, so how about naming the function as
> > LogicalSlotAdvanceAndCheckSnapState()?
> 
> It looks better to me, so changed.
> 
> >
> > I have made quite a few cosmetic changes in comments and code. See
> > attached. This is atop your latest patch. Can you please review and
> > include these changes in the next version?
> 
> Thanks, I have reviewed and merged them.
> Attach the V5 patch set which addressed above comments and ran pgindent.

I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which can
reproduce the data loss issue consistently on my machine. It may not reproduce
in some rare cases if concurrent xl_running_xacts are written by bgwriter, but
I think it's still valuable if it can verify the fix in most cases. The test
will fail if directly applied on HEAD, and will pass after applying atop of
0001.

Best Regards,
Hou zj


v6-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description:  v6-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch


v6-0002-test-the-data-loss-case.patch
Description: v6-0002-test-the-data-loss-case.patch

RE: Synchronizing slots from primary to standby

2024-04-01 Thread Zhijie Hou (Fujitsu)

On Tuesday, April 2, 2024 8:43 AM Bharath Rupireddy 
 wrote:
> 
> On Mon, Apr 1, 2024 at 11:36 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > Attach the V4 patch which includes the optimization to skip the
> > decoding if the snapshot at the syncing restart_lsn is already
> > serialized. It can avoid most of the duplicate decoding in my test, and I am
> doing some more tests locally.
> 
> Thanks for the patch. I'm thinking if we can reduce the amount of work that we
> do for synced slots in each sync worker cycle. With that context in mind, why 
> do
> we need to create decoding context every time?
> Can't we create it once, store it in an in-memory structure and use it for 
> each
> sync worker cycle? Is there any problem with it? What do you think?

Thanks for the idea. I think the cost of decoding context seems to be
relatively minor when compared to the IO cost. After generating the profiles
for the tests shared by Nisha[1], it appears that the StartupDecodingContext is
not a issue. While the suggested refactoring is an option, I think
we can consider this as a future improvement and addressing it only if we
encounter scenarios where the creation of decoding context becomes a
bottleneck.

[1] 
https://www.postgresql.org/message-id/CALj2ACUeij5tFzJ1-cuoUh%2Bmhj33v%2BYgqD_gHYUpRdXSCSBbhw%40mail.gmail.com

Best Regards,
Hou zj
<>

RE: Synchronizing slots from primary to standby

2024-04-01 Thread Zhijie Hou (Fujitsu)

On Monday, April 1, 2024 7:30 PM Amit Kapila  wrote:
> 
> On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Friday, March 29, 2024 2:50 PM Amit Kapila 
> wrote:
> > >
> >
> > >
> > >
> > > 2.
> > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr
> moveto,
> > > +   bool *found_consistent_point);
> > > +
> > >
> > > This API looks a bit awkward as the functionality doesn't match the
> > > name. How about having a function with name
> > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > > ready_for_decoding) with the same functionality as your patch has
> > > for
> > > pg_logical_replication_slot_advance() and then invoke it both from
> > > pg_logical_replication_slot_advance and slotsync.c. The function
> > > name is too big, we can think of a shorter name. Any ideas?
> >
> > How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> > LogicalSlotAdvanceAndCheckDecoding()?
> >
> 
> It is about snapbuild state, so how about naming the function as
> LogicalSlotAdvanceAndCheckSnapState()?

It looks better to me, so changed.

> 
> I have made quite a few cosmetic changes in comments and code. See
> attached. This is atop your latest patch. Can you please review and include
> these changes in the next version?

Thanks, I have reviewed and merged them.
Attach the V5 patch set which addressed above comments and ran pgindent.

I will think and test the improvement suggested by Bertrand[1] and reply after 
that.

[1] 
https://www.postgresql.org/message-id/Zgp8n9QD5nYSESnM%40ip-10-97-1-34.eu-west-3.compute.internal

Best Regards,
Hou zj


v5-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description:  v5-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch

RE: Synchronizing slots from primary to standby

2024-04-01 Thread Zhijie Hou (Fujitsu)

On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu)  
wrote:
> 
> On Friday, March 29, 2024 2:50 PM Amit Kapila 
> wrote:
> >
> > On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu)
> > 
> > wrote:
> > >
> > >
> > > Attach a new version patch which fixed an un-initialized variable
> > > issue and added some comments.
> > >
> >
> > The other approach to fix this issue could be that the slotsync worker
> > get the serialized snapshot using pg_read_binary_file() corresponding
> > to restart_lsn and writes those at standby. But there are cases when
> > we won't have such a file like (a) when we initially create the slot
> > and reach the consistent_point, or (b) also by the time the slotsync
> > worker starts to read the remote snapshot file, the snapshot file
> > could have been removed by the checkpointer on the primary (if the
> > restart_lsn of the remote has been advanced in this window). So, in
> > such cases, we anyway need to advance the slot. I think these could be
> optimizations that we could do in the future.
> >
> > Few comments:
> 
> Thanks for the comments.
> 
> > =
> > 1.
> > - if (slot->data.database != MyDatabaseId)
> > + if (slot->data.database != MyDatabaseId && !fast_forward)
> >   ereport(ERROR,
> >   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >   errmsg("replication slot \"%s\" was not created in this database",
> > @@ -526,7
> > +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn,
> >   * Do not allow consumption of a "synchronized" slot until the standby
> >   * gets promoted.
> >   */
> > - if (RecoveryInProgress() && slot->data.synced)
> > + if (RecoveryInProgress() && slot->data.synced &&
> > + !IsSyncingReplicationSlots())
> >
> >
> > Add comments at both of the above places.
> 
> Added.
> 
> >
> >
> > 2.
> > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr
> moveto,
> > +   bool *found_consistent_point);
> > +
> >
> > This API looks a bit awkward as the functionality doesn't match the
> > name. How about having a function with name
> > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > ready_for_decoding) with the same functionality as your patch has for
> > pg_logical_replication_slot_advance() and then invoke it both from
> > pg_logical_replication_slot_advance and slotsync.c. The function name
> > is too big, we can think of a shorter name. Any ideas?
> 
> How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> LogicalSlotAdvanceAndCheckDecoding()? (I used the suggested
> LogicalSlotAdvanceAndCheckReadynessForDecoding in this version, It can be
> renamed in next version if we agree).
> 
> Attach the V3 patch which addressed above comments and Kuroda-san's
> comments[1]. I also adjusted the tap-test to only check the 
> confirmed_flush_lsn
> after syncing, as the restart_lsn could be different from the remote one due 
> to
> the new slot_advance() call. I am also testing some optimization idea locally 
> and
> will share if ready.

Attach the V4 patch which includes the optimization to skip the decoding if
the snapshot at the syncing restart_lsn is already serialized. It can avoid most
of the duplicate decoding in my test, and I am doing some more tests locally.

Best Regards,
Hou zj


v4-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description:  v4-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch

RE: Synchronizing slots from primary to standby

2024-03-31 Thread Zhijie Hou (Fujitsu)

On Friday, March 29, 2024 2:50 PM Amit Kapila  wrote:
> 
> On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> >
> > Attach a new version patch which fixed an un-initialized variable
> > issue and added some comments.
> >
> 
> The other approach to fix this issue could be that the slotsync worker get the
> serialized snapshot using pg_read_binary_file() corresponding to restart_lsn
> and writes those at standby. But there are cases when we won't have such a 
> file
> like (a) when we initially create the slot and reach the consistent_point, or 
> (b)
> also by the time the slotsync worker starts to read the remote snapshot file, 
> the
> snapshot file could have been removed by the checkpointer on the primary (if
> the restart_lsn of the remote has been advanced in this window). So, in such
> cases, we anyway need to advance the slot. I think these could be 
> optimizations
> that we could do in the future.
> 
> Few comments:

Thanks for the comments.

> =
> 1.
> - if (slot->data.database != MyDatabaseId)
> + if (slot->data.database != MyDatabaseId && !fast_forward)
>   ereport(ERROR,
>   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>   errmsg("replication slot \"%s\" was not created in this database", @@ -526,7
> +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn,
>   * Do not allow consumption of a "synchronized" slot until the standby
>   * gets promoted.
>   */
> - if (RecoveryInProgress() && slot->data.synced)
> + if (RecoveryInProgress() && slot->data.synced &&
> + !IsSyncingReplicationSlots())
> 
> 
> Add comments at both of the above places.

Added.

> 
> 
> 2.
> +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto,
> +   bool *found_consistent_point);
> +
> 
> This API looks a bit awkward as the functionality doesn't match the name. How
> about having a function with name
> LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> ready_for_decoding) with the same functionality as your patch has for
> pg_logical_replication_slot_advance() and then invoke it both from
> pg_logical_replication_slot_advance and slotsync.c. The function name is too
> big, we can think of a shorter name. Any ideas?

How about LogicalSlotAdvanceAndCheckDecodingState() Or just
LogicalSlotAdvanceAndCheckDecoding()? (I used the suggested
LogicalSlotAdvanceAndCheckReadynessForDecoding in this version, It can be 
renamed in
next version if we agree).

Attach the V3 patch which addressed above comments and Kuroda-san's
comments[1]. I also adjusted the tap-test to only check the confirmed_flush_lsn
after syncing, as the restart_lsn could be different from the remote one due to
the new slot_advance() call. I am also testing some optimization idea locally
and will share if ready.

[1] 
https://www.postgresql.org/message-id/TYCPR01MB1207757BB2A32B6815CE1CCE7F53A2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hou zj


v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description:  v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch

RE: Synchronizing slots from primary to standby

2024-03-29 Thread Zhijie Hou (Fujitsu)

On Friday, March 29, 2024 2:48 PM Bertrand Drouvot 
 wrote:
> 
> Hi,
> 
> On Fri, Mar 29, 2024 at 01:06:15AM +0000, Zhijie Hou (Fujitsu) wrote:
> > Attach a new version patch which fixed an un-initialized variable
> > issue and added some comments. Also, temporarily enable DEBUG2 for the
> > 040 tap-test so that we can analyze the possible CFbot failures easily.
> >
> 
> Thanks!
> 
> +   if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
> +   {
> +   /*
> +* By advancing the restart_lsn, confirmed_lsn, and xmin using
> +* fast-forward logical decoding, we ensure that the required
> snapshots
> +* are saved to disk. This enables logical decoding to quickly
> reach a
> +* consistent point at the restart_lsn, eliminating the risk 
> of
> missing
> +* data during snapshot creation.
> +*/
> +
> pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
> +
> found_consistent_point);
> +   ReplicationSlotsComputeRequiredLSN();
> +   updated_lsn = true;
> +   }
> 
> Instead of using pg_logical_replication_slot_advance() for each synced slot 
> and
> during sync cycles what about?:
> 
> - keep sync slot synchronization as it is currently (not using
> pg_logical_replication_slot_advance())
> - create "an hidden" logical slot if sync slot feature is on
> - at the time of promotion use pg_logical_replication_slot_advance() on this
> hidden slot only to advance to the max lsn of the synced slots
> 
> I'm not sure that would be enough, just asking your thoughts on this (benefits
> would be to avoid calling pg_logical_replication_slot_advance() on each sync
> slots and during the sync cycles).

Thanks for the idea !

I considered about this. I think advancing the "hidden" slot on promotion may 
be a
bit late, because if we cannot reach the consistent point after advancing the
"hidden" slot, then it means we may need to remove all the synced slots as we
are not sure if they are usable(will not loss data) after promotion. And it may
confuse user a bit as they have seen these slots to be sync-ready.

The current approach is to mark such un-consistent slot as temp and persist
them once it reaches consistent point, so that user can ensure the slot can be
used after promotion once persisted.

Another optimization idea is to check the snapshot file existence before 
calling the
slot_advance(). If the file already exists, we skip the decoding and directly
update the restart_lsn. This way, we could also avoid some duplicate decoding
work.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-03-28 Thread Zhijie Hou (Fujitsu)

On Thursday, March 28, 2024 10:02 PM Zhijie Hou (Fujitsu) 
 wrote:
> 
> On Thursday, March 28, 2024 7:32 PM Amit Kapila 
> wrote:
> >
> > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
> > 
> > wrote:
> > >
> > > When analyzing one BF error[1], we find an issue of slotsync: Since
> > > we don't perform logical decoding for the synced slots when syncing
> > > the lsn/xmin of slot, no logical snapshots will be serialized to
> > > disk. So, when user starts to use these synced slots after
> > > promotion, it needs to re-build the consistent snapshot from the
> > > restart_lsn if the
> > > WAL(xl_running_xacts) at restart_lsn position indicates that there
> > > are running transactions. This however could cause the data that
> > > before the
> > consistent point to be missed[2].
> > >
> > > This issue doesn't exist on the primary because the snapshot at
> > > restart_lsn should have been serialized to disk
> > > (SnapBuildProcessRunningXacts -> SnapBuildSerialize), so even if the
> > > logical decoding restarts, it can find consistent snapshot
> > > immediately at
> > restart_lsn.
> > >
> > > To fix this, we could use the fast forward logical decoding to
> > > advance the synced slot's lsn/xmin when syncing these values instead
> > > of directly updating the slot's info. This way, the snapshot will be
> > > serialized to
> > disk when decoding.
> > > If we could not reach to the consistent point at the remote
> > > restart_lsn, the slot is marked as temp and will be persisted once
> > > it reaches the consistent point. I am still analyzing the fix and
> > > will share once
> > ready.
> > >
> >
> > Yes, we can use this but one thing to note is that
> > CreateDecodingContext() will expect that the slot's and current
> > database are the same. I think the reason for that is we need to check
> > system tables of the current database while decoding and sending data
> > to the output_plugin which won't be a requirement for the fast_forward
> > case. So, we need to skip that check in fast_forward mode.
> 
> Agreed.
> 
> >
> > Next, I was thinking about the case of the first time updating the
> > restart and confirmed_flush LSN while syncing the slots. I think we
> > can keep the current logic as it is based on the following analysis.
> >
> > For each logical slot, cases possible on the primary:
> > 1. The restart_lsn doesn't have a serialized snapshot and hasn't yet
> > reached the consistent point.
> > 2. The restart_lsn doesn't have a serialized snapshot but has reached
> > a consistent point.
> > 3. The restart_lsn has a serialized snapshot which means it has
> > reached a consistent point as well.
> >
> > Considering we keep the logic to reserve initial WAL positions the
> > same as the current (Reserve WAL for the currently active local slot
> > using the specified WAL location (restart_lsn). If the given WAL
> > location has been removed, reserve WAL using the oldest existing WAL
> > segment.), I could think of the below
> > scenarios:
> > A. For 1, we shouldn't sync the slot as it still wouldn't have been
> > marked persistent on the primary.
> > B. For 2, we would sync the slot
> >B1. If remote_restart_lsn >= local_resart_lsn, then advance the
> > slot by calling pg_logical_replication_slot_advance().
> >B11. If we reach consistent point, then it should be okay
> > because after promotion as well we should reach consistent point.
> > B111. But again is it possible that there is some xact
> > that comes before consistent_point on primary and the same is after
> > consistent_point on standby? This shouldn't matter as we will start
> > decoding transactions after confirmed_flush_lsn which would be the same on
> primary and standby.
> >B22. If we haven't reached consistent_point, then we won't mark
> > the slot as persistent, and at the next sync we will do the same till
> > it reaches consistent_point. At that time, the situation will be similar to 
> > B11.
> >B2. If remote_restart_lsn < local_restart_lsn, then we will wait
> > for the next sync cycle and keep the slot as temporary. Once in the
> > next or some consecutive sync cycle, we reach the condition
> > remote_restart_lsn >= local_restart_lsn, we will proceed to advance
> > the slot and we should have the same behavior as B1.
> > C. For 3, we would sync the slot, but the behavior should be the same as B.
> >
> > Thoughts?
> 
> Looks reasonable to me.
> 
> Here is the patch based on above lines.
> I am also testing and verifying the patch locally.

Attach a new version patch which fixed an un-initialized variable issue and
added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so 
that
we can analyze the possible CFbot failures easily.

Best Regards,
Hou zj


v2-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description:  v2-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch

RE: Synchronizing slots from primary to standby

2024-03-28 Thread Zhijie Hou (Fujitsu)

On Thursday, March 28, 2024 7:32 PM Amit Kapila  wrote:
> 
> On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > When analyzing one BF error[1], we find an issue of slotsync: Since we
> > don't perform logical decoding for the synced slots when syncing the
> > lsn/xmin of slot, no logical snapshots will be serialized to disk. So,
> > when user starts to use these synced slots after promotion, it needs
> > to re-build the consistent snapshot from the restart_lsn if the
> > WAL(xl_running_xacts) at restart_lsn position indicates that there are
> > running transactions. This however could cause the data that before the
> consistent point to be missed[2].
> >
> > This issue doesn't exist on the primary because the snapshot at
> > restart_lsn should have been serialized to disk
> > (SnapBuildProcessRunningXacts -> SnapBuildSerialize), so even if the
> > logical decoding restarts, it can find consistent snapshot immediately at
> restart_lsn.
> >
> > To fix this, we could use the fast forward logical decoding to advance
> > the synced slot's lsn/xmin when syncing these values instead of
> > directly updating the slot's info. This way, the snapshot will be 
> > serialized to
> disk when decoding.
> > If we could not reach to the consistent point at the remote
> > restart_lsn, the slot is marked as temp and will be persisted once it
> > reaches the consistent point. I am still analyzing the fix and will share 
> > once
> ready.
> >
> 
> Yes, we can use this but one thing to note is that
> CreateDecodingContext() will expect that the slot's and current database are
> the same. I think the reason for that is we need to check system tables of the
> current database while decoding and sending data to the output_plugin which
> won't be a requirement for the fast_forward case. So, we need to skip that
> check in fast_forward mode.

Agreed.

> 
> Next, I was thinking about the case of the first time updating the restart and
> confirmed_flush LSN while syncing the slots. I think we can keep the current
> logic as it is based on the following analysis.
> 
> For each logical slot, cases possible on the primary:
> 1. The restart_lsn doesn't have a serialized snapshot and hasn't yet reached 
> the
> consistent point.
> 2. The restart_lsn doesn't have a serialized snapshot but has reached a
> consistent point.
> 3. The restart_lsn has a serialized snapshot which means it has reached a
> consistent point as well.
> 
> Considering we keep the logic to reserve initial WAL positions the same as the
> current (Reserve WAL for the currently active local slot using the specified 
> WAL
> location (restart_lsn). If the given WAL location has been removed, reserve
> WAL using the oldest existing WAL segment.), I could think of the below
> scenarios:
> A. For 1, we shouldn't sync the slot as it still wouldn't have been marked
> persistent on the primary.
> B. For 2, we would sync the slot
>B1. If remote_restart_lsn >= local_resart_lsn, then advance the slot by 
> calling
> pg_logical_replication_slot_advance().
>B11. If we reach consistent point, then it should be okay because after
> promotion as well we should reach consistent point.
> B111. But again is it possible that there is some xact that comes
> before consistent_point on primary and the same is after consistent_point on
> standby? This shouldn't matter as we will start decoding transactions after
> confirmed_flush_lsn which would be the same on primary and standby.
>B22. If we haven't reached consistent_point, then we won't mark the 
> slot
> as persistent, and at the next sync we will do the same till it reaches
> consistent_point. At that time, the situation will be similar to B11.
>B2. If remote_restart_lsn < local_restart_lsn, then we will wait for the 
> next
> sync cycle and keep the slot as temporary. Once in the next or some
> consecutive sync cycle, we reach the condition remote_restart_lsn >=
> local_restart_lsn, we will proceed to advance the slot and we should have the
> same behavior as B1.
> C. For 3, we would sync the slot, but the behavior should be the same as B.
> 
> Thoughts?

Looks reasonable to me.

Here is the patch based on above lines.
I am also testing and verifying the patch locally.

Best Regards,
Hou zj


0001-advance-the-restart_lsn-of-synced-slots-using-logica.patch
Description:  0001-advance-the-restart_lsn-of-synced-slots-using-logica.patch

RE: Synchronizing slots from primary to standby

2024-03-27 Thread Zhijie Hou (Fujitsu)

Hi,

When analyzing one BF error[1], we find an issue of slotsync: Since we don't
perform logical decoding for the synced slots when syncing the lsn/xmin of
slot, no logical snapshots will be serialized to disk. So, when user starts to
use these synced slots after promotion, it needs to re-build the consistent
snapshot from the restart_lsn if the WAL(xl_running_xacts) at restart_lsn
position indicates that there are running transactions. This however could
cause the data that before the consistent point to be missed[2].

This issue doesn't exist on the primary because the snapshot at restart_lsn
should have been serialized to disk (SnapBuildProcessRunningXacts ->
SnapBuildSerialize), so even if the logical decoding restarts, it can find
consistent snapshot immediately at restart_lsn.

To fix this, we could use the fast forward logical decoding to advance the 
synced
slot's lsn/xmin when syncing these values instead of directly updating the
slot's info. This way, the snapshot will be serialized to disk when decoding.
If we could not reach to the consistent point at the remote restart_lsn, the
slot is marked as temp and will be persisted once it reaches the consistent
point. I am still analyzing the fix and will share once ready.


[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2024-03-19%2010%3A03%3A06
[2] The steps to reproduce the data miss issue on a primary->standby setup:
 
Note, we need to set LOG_SNAPSHOT_INTERVAL_MS to a bigger number(150) to
prevent cocurrent LogStandbySnapshot() call and enable sync_replication_slots 
on the standby.
 
1. Create a failover logical slot on the primary.
SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot', 
'test_decoding', false, false, true);
 
2. Use the following steps to advance the restart_lsn of the failover slot to a
position where the xl_running_xacts at that position indicates that there is
running transaction.
 
TXN1
BEGIN;
create table dummy1(a int);
 
TXN2
SELECT pg_log_standby_snapshot();
 
TXN1
COMMIT;

TXN1
BEGIN;
create table dummy2(a int);
 
TXN2
SELECT pg_log_standby_snapshot();
 
TXN1
COMMIT;
 
-- the restart_lsn will be advanced to a position where there was 1 running
transaction. And we need to wait for the restart_lsn to be synced to the
standby.
SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn());
 
-- insert some data here before calling next pg_log_standby_snapshot().
INSERT INTO reptable VALUES(999);
 
3. Promote the standby and try to consume the change(999) from the synced slot
on the standby. We will find that no change is returned.
 
select * from pg_logical_slot_get_changes('logicalslot', NULL, NULL);

Best Regards,
Hou zj

Buildfarm failure on tamandua - "timed out waiting for subscriber to synchronize data"

2024-03-21 Thread Zhijie Hou (Fujitsu)

Hi,

There is a failure in 040_standby_failover_slots_sync on tamandua:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua=2024-03-21%2008%3A28%3A58

The reason of the timeout is that the table sync worker for the new table is 
not started
after executing ALTER SUBSCRIPTION REFRESH PUBLICATION.

We have seen similar failures in other tests as well[1]. AFAICS, the reasons of
them are the same which is because of a race condition in logicalrep apply 
worker. The
analysis has been posted on another thread[2] and the fix is also being
reviewed.

[1] 
https://www.postgresql.org/message-id/OSZPR01MB6310D6F48372F52F1D85E1C5FD609%40OSZPR01MB6310.jpnprd01.prod.outlook.com
[2] 
https://www.postgresql.org/message-id/flat/CALDaNm1XeB3bF%2BVEJZi%3DBT31PZAL_UVys-26%2BYSv_AxCq0G2eg%40mail.gmail.com#87b153fd7676652746406a6f114eb67b

Best Regards,
Hou Zhijie

RE: Synchronizing slots from primary to standby

2024-03-13 Thread Zhijie Hou (Fujitsu)

Hi,

Since the standby_slot_names patch has been committed, I am attaching the last
doc patch for review.

Best Regards,
Hou zj


v109-0001-Document-the-steps-to-check-if-the-standby-is-r.patch
Description:  v109-0001-Document-the-steps-to-check-if-the-standby-is-r.patch

RE: Race condition in FetchTableStates() breaks synchronization of subscription tables

2024-03-12 Thread Zhijie Hou (Fujitsu)

On Wednesday, March 13, 2024 11:49 AMvignesh C  wrote:
> On Tue, 12 Mar 2024 at 09:34, Ajin Cherian  wrote:
> >
> >
> >
> > On Tue, Mar 12, 2024 at 2:59 PM vignesh C  wrote:
> >>
> >>
> >> Thanks, I have created the following Commitfest entry for this:
> >> https://commitfest.postgresql.org/47/4816/
> >>
> >> Regards,
> >> Vignesh
> >
> >
> > Thanks for the patch, I have verified that the fix works well by following 
> > the
> steps mentioned to reproduce the problem.
> > Reviewing the patch, it seems good and is well documented. Just one minor
> comment I had was probably to change the name of the variable
> table_states_valid to table_states_validity. The current name made sense when
> it was a bool, but now that it is a tri-state enum, it doesn't fit well.
> 
> Thanks for reviewing the patch, the attached v6 patch has the changes for the
> same.

Thanks for the patches.

I saw a recent similar BF error[1] which seems related to the issue that 0001
patch is trying to solve. i.e. The table sync worker is somehow not started
after refreshing the publication on the subscriber. I didn't see other related 
ERRORs in
the log, so I think the reason is the same as the one being discussed in this
thread, which is the table state invalidation got lost. And the 0001 patch
looks good to me.

For 0002, instead of avoid resetting the latch, is it possible to let the
logical rep worker wake up the launcher once after attaching ?

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin=2024-03-11%2000%3A52%3A42

Best Regards,
Hou zj

RE: Test failures of 100_bugs.pl

2024-03-11 Thread Zhijie Hou (Fujitsu)

On Monday, April 24, 2023 5:50 PM Yu Shi (Fujitsu)  
wrote:
> 
> On Fri, Apr 21, 2023 1:48 PM Yu Shi (Fujitsu)  wrote:
> >
> > I wrote a patch to dump rel state in wait_for_subscription_sync() as
> suggested.
> > Please see the attached patch.
> > I will try to add some debug logs in code later.
> >
> 
> Please see the attached v2 patch.
> 
> I added some debug logs when invalidating syncing table states and updating
> table_states_not_ready list. I also adjusted the message level in the tests 
> which
> failed before.

Just a reference.

I think similar issue has been analyzed in other thread[1] and the reason seems
clear that the table state cache invalidation got lost due to a race condition.
The fix is also being discussed there.

[1] 
https://www.postgresql.org/message-id/flat/711a6afe-edb7-1211-cc27-1bef8239eec7%40gmail.com

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-03-06 Thread Zhijie Hou (Fujitsu)

On Thursday, March 7, 2024 12:46 PM Amit Kapila  wrote:
> 
> On Thu, Mar 7, 2024 at 7:35 AM Peter Smith 
> wrote:
> >
> > Here are some review comments for v107-0001
> >
> > ==
> > src/backend/replication/slot.c
> >
> > 1.
> > +/*
> > + * Struct for the configuration of standby_slot_names.
> > + *
> > + * Note: this must be a flat representation that can be held in a single 
> > chunk
> > + * of guc_malloc'd memory, so that it can be stored as the "extra" data for
> the
> > + * standby_slot_names GUC.
> > + */
> > +typedef struct
> > +{
> > + int slot_num;
> > +
> > + /* slot_names contains nmembers consecutive nul-terminated C strings */
> > + char slot_names[FLEXIBLE_ARRAY_MEMBER];
> > +} StandbySlotConfigData;
> > +
> >
> > 1a.
> > To avoid any ambiguity this 1st field is somehow a slot ID number, I
> > felt a better name would be 'nslotnames' or even just 'n' or 'count',
> >
> 
> We can probably just add a comment above slot_num and that should be
> sufficient but I am fine with 'nslotnames' as well, in anycase let's
> add a comment for the same.

Added.

> 
> >
> > 6b.
> > IMO this function would be tidier written such that the
> > MyReplicationSlot->data.name is passed as a parameter. Then you can
> > name the function more naturally like:
> >
> > IsSlotInStandbySlotNames(const char *slot_name)
> >
> 
> +1. How about naming it as SlotExistsinStandbySlotNames(char
> *slot_name) and pass the slot_name from MyReplicationSlot? Otherwise,
> we need an Assert for MyReplicationSlot in this function.

Changed as suggested.

> 
> Also, can we add a comment like below before the loop:
> + /*
> + * XXX: We are not expecting this list to be long so a linear search
> + * shouldn't hurt but if that turns out not to be true then we can cache
> + * this information for each WalSender as well.
> + */

Added.

Attach the V108 patch set which addressed above and Peter's comments.
I also removed the check for "*" in guc check hook.

Best Regards,
Hou zj



v108-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
Description:  v108-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch

RE: Synchronizing slots from primary to standby

2024-03-06 Thread Zhijie Hou (Fujitsu)

On Thursday, March 7, 2024 10:05 AM Peter Smith  wrote:
> 
> Here are some review comments for v107-0001

Thanks for the comments.

> 
> ==
> src/backend/replication/slot.c
> 
> 1.
> +/*
> + * Struct for the configuration of standby_slot_names.
> + *
> + * Note: this must be a flat representation that can be held in a
> +single chunk
> + * of guc_malloc'd memory, so that it can be stored as the "extra" data
> +for the
> + * standby_slot_names GUC.
> + */
> +typedef struct
> +{
> + int slot_num;
> +
> + /* slot_names contains nmembers consecutive nul-terminated C strings
> +*/  char slot_names[FLEXIBLE_ARRAY_MEMBER];
> +} StandbySlotConfigData;
> +
> 
> 1a.
> To avoid any ambiguity this 1st field is somehow a slot ID number, I felt a 
> better
> name would be 'nslotnames' or even just 'n' or 'count',

Changed to 'nslotnames'.

> 
> ~
> 
> 1b.
> (fix typo)
> 
> SUGGESTION for the 2nd field comment
> slot_names is a chunk of 'n' X consecutive null-terminated C strings

Changed.

> 
> ~
> 
> 1c.
> A more explanatory name for this typedef maybe is
> 'StandbySlotNamesConfigData' ?

Changed.

> 
> ~~~
> 
> 
> 2.
> +/* This is parsed and cached configuration for standby_slot_names */
> +static StandbySlotConfigData *standby_slot_config;
> 
> 
> 2a.
> /This is parsed and cached configuration for .../This is the parsed and cached
> configuration for .../

Changed.

> 
> ~
> 
> 2b.
> Similar to above -- since this only has name information maybe it is more
> correct to call it 'standby_slot_names_config'?
> 


Changed.

> ~~~
> 
> 3.
> +/*
> + * A helper function to validate slots specified in GUC standby_slot_names.
> + *
> + * The rawname will be parsed, and the parsed result will be saved into
> + * *elemlist.
> + */
> +static bool
> +validate_standby_slots(char *rawname, List **elemlist)
> 
> /and the parsed result/and the result/
> 


Changed.

> ~~~
> 
> 4. check_standby_slot_names
> 
> + /* Need a modifiable copy of string */ rawname = pstrdup(*newval);
> 
> /copy of string/copy of the GUC string/
> 


Changed.

> ~~~
> 
> 5.
> +assign_standby_slot_names(const char *newval, void *extra) {
> + /*
> + * The standby slots may have changed, so we must recompute the oldest
> + * LSN.
> + */
> + ss_oldest_flush_lsn = InvalidXLogRecPtr;
> +
> + standby_slot_config = (StandbySlotConfigData *) extra; }
> 
> To avoid leaking don't we need to somewhere take care to free any memory
> used by a previous value (if any) of this 'standby_slot_config'?
> 

The memory of extra is maintained by the GUC mechanism. It will be
automatically freed when the associated GUC setting is no longer of interest.
See src/backend/utils/misc/README for details.

 
> ~~~
> 
> 6. AcquiredStandbySlot
> 
> +/*
> + * Return true if the currently acquired slot is specified in
> + * standby_slot_names GUC; otherwise, return false.
> + */
> +bool
> +AcquiredStandbySlot(void)
> +{
> + const char *name;
> +
> + /* Return false if there is no value in standby_slot_names */ if
> + (standby_slot_config == NULL) return false;
> +
> + name = standby_slot_config->slot_names; for (int i = 0; i <
> + standby_slot_config->slot_num; i++) { if (strcmp(name,
> + NameStr(MyReplicationSlot->data.name)) == 0) return true;
> +
> + name += strlen(name) + 1;
> + }
> +
> + return false;
> +}
> 
> 6a.
> Just checking "(standby_slot_config == NULL)" doesn't seem enough to me,
> because IIUIC it is possible when 'standby_slot_names' has no value then
> maybe standby_slot_config is not NULL but standby_slot_config->slot_num is
> 0.

The standby_slot_config will always be NULL if there is no value in it.

While checking, I did find a rare case that if there are only some white space
in the standby_slot_names, then slot_num will be 0, and have fixed it so that
standby_slot_config will always be NULL if there is no meaning value in guc.

> 
> ~
> 
> 6b.
> IMO this function would be tidier written such that the
> MyReplicationSlot->data.name is passed as a parameter. Then you can
> name the function more naturally like:
> 
> IsSlotInStandbySlotNames(const char *slot_name)

Changed it to SlotExistsInStandbySlotNames.

> 
> ~
> 
> 6c.
> IMO the body of the function will be tidier if written so there are only 2 
> returns
> instead of 3 like
> 
> SUGGESTION:
> if (...)
> {
>   for (...)
>   {
>  ...
> return true;
>   }
> }
> return false;

I personally prefer the current style.

> 
> ~~~
> 
> 7.
> + /*
> + * Don't need to wait for the standbys to catch up if there is no value
> + in
> + * standby_slot_names.
> + */
> + if (standby_slot_config == NULL)
> + return true;
> 
> (similar to a previous review comment)
> 
> This check doesn't seem enough because IIUIC it is possible when
> 'standby_slot_names' has no value then maybe standby_slot_config is not NULL
> but standby_slot_config->slot_num is 0.

Same as above.

> 
> ~~~
> 
> 8. WaitForStandbyConfirmation
> 
> + /*
> + * Don't need to wait for the standby to catch up if the current
> + acquired
> + * slot is not a

RE: Synchronizing slots from primary to standby

2024-03-06 Thread Zhijie Hou (Fujitsu)

On Wednesday, March 6, 2024 9:13 PM Zhijie Hou (Fujitsu) 
 wrote:
> 
> On Wednesday, March 6, 2024 11:04 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada
> >  wrote:
> >
> > Hi,
> >
> > > On Fri, Mar 1, 2024 at 4:21 PM Zhijie Hou (Fujitsu)
> > > 
> > > wrote:
> > > >
> > > > On Friday, March 1, 2024 2:11 PM Masahiko Sawada
> > >  wrote:
> > > > >
> > > > >
> > > > > ---
> > > > > +void
> > > > > +assign_standby_slot_names(const char *newval, void *extra) {
> > > > > +List  *standby_slots;
> > > > > +MemoryContext oldcxt;
> > > > > +char  *standby_slot_names_cpy = extra;
> > > > > +
> > > > >
> > > > > Given that the newval and extra have the same data
> > > > > (standby_slot_names value), why do we not use newval instead? I
> > > > > think that if we use newval, we don't need to guc_strdup() in
> > > > > check_standby_slot_names(), we might need to do list_copy_deep()
> > > > > instead, though. It's not clear to me as there is no comment.
> > > >
> > > > I think SplitIdentifierString will modify the passed in string, so
> > > > we'd better not pass the newval to it, otherwise the stored guc
> > > > string(standby_slot_names) will be changed. I can see we are doing
> > > > similar thing in other GUC check/assign function as well.
> > > > (check_wal_consistency_checking/ assign_wal_consistency_checking,
> > > > check_createrole_self_grant/ assign_createrole_self_grant ...).
> > >
> > > Why does it have to be a List in the first place?
> >
> > I thought the List type is convenient to use here, as we have existing
> > list build function(SplitIdentifierString), and have convenient list
> > macro to loop the
> > list(foreach_ptr) which can save some codes.
> >
> > > In earlier version patches, we
> > > used to copy the list and delete the element until it became empty,
> > > while waiting for physical wal senders. But we now just refer to
> > > each slot name in the list. The current code assumes that
> > > stnadby_slot_names_cpy is allocated in GUCMemoryContext but once it
> > > changes, it will silently get broken. I think we can check and
> > > assign standby_slot_names in a similar way to
> > > check/assign_temp_tablespaces and
> check/assign_synchronous_standby_names.
> >
> > Yes, we could do follow it by allocating an array and copy each slot
> > name into it, but it also requires some codes to build and scan the
> > array. So, is it possible to expose the GucMemorycontext or have an API like
> guc_copy_list instead ?
> > If we don't want to touch the guc api, I am ok with using an array as well.
> 
> I rethink about this and realize that it's not good to do the memory 
> allocation in
> assign hook function. As the "src/backend/utils/misc/README" said, we'd
> better do that in check hook function and pass it via extra to assign hook
> function. And thus array is a good choice in this case rather than a List 
> which
> cannot be passed to *extra.
> 
> Here is the V107 patch set which parse and cache the standby slot names in an
> array instead of a List.

The patch needs to be rebased due to recent commit.

Attach the V107_2 path set. There are no code changes in this version.

Best Regards,
Hou zj


v107_2-0001-Allow-logical-walsenders-to-wait-for-the-physi.patch
Description:  v107_2-0001-Allow-logical-walsenders-to-wait-for-the-physi.patch


v107_2-0002-Document-the-steps-to-check-if-the-standby-is-r.patch
Description:  v107_2-0002-Document-the-steps-to-check-if-the-standby-is-r.patch

RE: Synchronizing slots from primary to standby

2024-03-06 Thread Zhijie Hou (Fujitsu)

On Wednesday, March 6, 2024 11:04 AM Zhijie Hou (Fujitsu) 
 wrote:
> 
> On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada
>  wrote:
> 
> Hi,
> 
> > On Fri, Mar 1, 2024 at 4:21 PM Zhijie Hou (Fujitsu)
> > 
> > wrote:
> > >
> > > On Friday, March 1, 2024 2:11 PM Masahiko Sawada
> >  wrote:
> > > >
> > > >
> > > > ---
> > > > +void
> > > > +assign_standby_slot_names(const char *newval, void *extra) {
> > > > +List  *standby_slots;
> > > > +MemoryContext oldcxt;
> > > > +char  *standby_slot_names_cpy = extra;
> > > > +
> > > >
> > > > Given that the newval and extra have the same data
> > > > (standby_slot_names value), why do we not use newval instead? I
> > > > think that if we use newval, we don't need to guc_strdup() in
> > > > check_standby_slot_names(), we might need to do list_copy_deep()
> > > > instead, though. It's not clear to me as there is no comment.
> > >
> > > I think SplitIdentifierString will modify the passed in string, so
> > > we'd better not pass the newval to it, otherwise the stored guc
> > > string(standby_slot_names) will be changed. I can see we are doing
> > > similar thing in other GUC check/assign function as well.
> > > (check_wal_consistency_checking/ assign_wal_consistency_checking,
> > > check_createrole_self_grant/ assign_createrole_self_grant ...).
> >
> > Why does it have to be a List in the first place?
> 
> I thought the List type is convenient to use here, as we have existing list 
> build
> function(SplitIdentifierString), and have convenient list macro to loop the
> list(foreach_ptr) which can save some codes.
> 
> > In earlier version patches, we
> > used to copy the list and delete the element until it became empty,
> > while waiting for physical wal senders. But we now just refer to each
> > slot name in the list. The current code assumes that
> > stnadby_slot_names_cpy is allocated in GUCMemoryContext but once it
> > changes, it will silently get broken. I think we can check and assign
> > standby_slot_names in a similar way to check/assign_temp_tablespaces
> > and check/assign_synchronous_standby_names.
> 
> Yes, we could do follow it by allocating an array and copy each slot name 
> into it,
> but it also requires some codes to build and scan the array. So, is it 
> possible to
> expose the GucMemorycontext or have an API like guc_copy_list instead ?
> If we don't want to touch the guc api, I am ok with using an array as well.

I rethink about this and realize that it's not good to do the memory allocation
in assign hook function. As the "src/backend/utils/misc/README" said, we'd 
better
do that in check hook function and pass it via extra to assign hook function. 
And thus
array is a good choice in this case rather than a List which cannot be passed 
to *extra.

Here is the V107 patch set which parse and cache the standby slot names in an
array instead of a List.

Best Regards,
Hou zj



v107-0002-Document-the-steps-to-check-if-the-standby-is-r.patch
Description:  v107-0002-Document-the-steps-to-check-if-the-standby-is-r.patch


v107-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
Description:  v107-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch

RE: Synchronizing slots from primary to standby

2024-03-05 Thread Zhijie Hou (Fujitsu)

On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada  
wrote:

Hi,

> On Fri, Mar 1, 2024 at 4:21 PM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Friday, March 1, 2024 2:11 PM Masahiko Sawada
>  wrote:
> > >
> > >
> > > ---
> > > +void
> > > +assign_standby_slot_names(const char *newval, void *extra) {
> > > +List  *standby_slots;
> > > +MemoryContext oldcxt;
> > > +char  *standby_slot_names_cpy = extra;
> > > +
> > >
> > > Given that the newval and extra have the same data
> > > (standby_slot_names value), why do we not use newval instead? I
> > > think that if we use newval, we don't need to guc_strdup() in
> > > check_standby_slot_names(), we might need to do list_copy_deep()
> > > instead, though. It's not clear to me as there is no comment.
> >
> > I think SplitIdentifierString will modify the passed in string, so
> > we'd better not pass the newval to it, otherwise the stored guc
> > string(standby_slot_names) will be changed. I can see we are doing
> > similar thing in other GUC check/assign function as well.
> > (check_wal_consistency_checking/ assign_wal_consistency_checking,
> > check_createrole_self_grant/ assign_createrole_self_grant ...).
> 
> Why does it have to be a List in the first place? 

I thought the List type is convenient to use here, as we have existing list
build function(SplitIdentifierString), and have convenient list macro to loop
the list(foreach_ptr) which can save some codes.

> In earlier version patches, we
> used to copy the list and delete the element until it became empty, while
> waiting for physical wal senders. But we now just refer to each slot name in 
> the
> list. The current code assumes that stnadby_slot_names_cpy is allocated in
> GUCMemoryContext but once it changes, it will silently get broken. I think we
> can check and assign standby_slot_names in a similar way to
> check/assign_temp_tablespaces and
> check/assign_synchronous_standby_names.

Yes, we could do follow it by allocating an array and copy each slot name into
it, but it also requires some codes to build and scan the array. So, is it 
possible
to expose the GucMemorycontext or have an API like guc_copy_list instead ?
If we don't want to touch the guc api, I am ok with using an array as well.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-03-04 Thread Zhijie Hou (Fujitsu)

On Tuesday, March 5, 2024 2:35 PM shveta malik  wrote:
> 
> On Tue, Mar 5, 2024 at 9:15 AM Amit Kapila  wrote:
> >
> > On Tue, Mar 5, 2024 at 6:10 AM Peter Smith 
> wrote:
> > >
> > > ==
> > > src/backend/replication/walsender.c
> > >
> > > 5. NeedToWaitForWal
> > >
> > > + /*
> > > + * Check if the standby slots have caught up to the flushed
> > > + position. It
> > > + * is good to wait up to the flushed position and then let the
> > > + WalSender
> > > + * send the changes to logical subscribers one by one which are
> > > + already
> > > + * covered by the flushed position without needing to wait on every
> > > + change
> > > + * for standby confirmation.
> > > + */
> > > + if (NeedToWaitForStandbys(flushed_lsn, wait_event)) return true;
> > > +
> > > + *wait_event = 0;
> > > + return false;
> > > +}
> > > +
> > >
> > > 5a.
> > > The comment (or part of it?) seems misplaced because it is talking
> > > WalSender sending changes, but that is not happening in this function.
> > >
> >
> > I don't think so. This is invoked only by walsender and a static
> > function. I don't see any other better place to mention this.
> >
> > > Also, isn't what this is saying already described by the other
> > > comment in the caller? e.g.:
> > >
> >
> > Oh no, here we are explaining the wait order.
> 
> I think there is a scope of improvement here. The comment inside
> NeedToWaitForWal() which states that we need to wait here for standbys on
> flush-position(and not on each change) should be outside of this function. It 
> is
> too embedded. And the comment which states the order of wait (first flush and
> then standbys confirmation) should be outside the for-loop in
> WalSndWaitForWal(), but yes we do need both the comments. Attached a
> patch (.txt) for comments improvement, please merge if appropriate.

Thanks, I have slightly modified the top-up patch and merged it.

Attach the V106 patch which addressed above and Peter's comments[1].

[1] 
https://www.postgresql.org/message-id/CAHut%2BPsATK8z1TEcfFE8zWoS1hagqsvaWYCgom_zYtScfwO7uQ%40mail.gmail.com

Best Regards,
Hou zj


v106-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
Description:  v106-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch


v106-0002-Document-the-steps-to-check-if-the-standby-is-r.patch
Description:  v106-0002-Document-the-steps-to-check-if-the-standby-is-r.patch

RE: Synchronizing slots from primary to standby

2024-03-04 Thread Zhijie Hou (Fujitsu)

On Tuesday, March 5, 2024 8:40 AM Peter Smith  wrote:
> 
> Here are some review comments for v105-0001
> 
> ==
> doc/src/sgml/config.sgml
> 
> 1.
> +   
> +The standbys corresponding to the physical replication slots in
> +standby_slot_names must configure
> +sync_replication_slots = true so they can receive
> +logical failover slots changes from the primary.
> +   
> 
> /slots changes/slot changes/

Changed.

> 
> ==
> doc/src/sgml/func.sgml
> 
> 2.
> +The function may be waiting if the specified slot is a logical 
> failover
> +slot and  linkend="guc-standby-slot-names">standby_slot_names me>
> +is configured.
> 
> I know this has been through multiple versions already, but this
> latest wording "may be waiting..." doesn't seem very good to me.
> 
> How about one of these?
> 
> * The function may not be able to return immediately if the specified
> slot is a logical failover slot and standby_slot_names is configured.
> 
> * The function return might be blocked if the specified slot is a
> logical failover slot and standby_slot_names is configured.
> 
> * If the specified slot is a logical failover slot then the function
> will block until all physical slots specified in standby_slot_names
> have confirmed WAL receipt.
> 
> * If the specified slot is a logical failover slot then the function
> will not return until all physical slots specified in
> standby_slot_names have confirmed WAL receipt.

I prefer the last one.


> 
> ~~~
> 
> 3.
> +slot may return to an earlier position. The function may be waiting 
> if
> +the specified slot is a logical failover slot and
> + linkend="guc-standby-slot-names">standby_slot_names me>
> 
> 
> Same as previous review comment #2

Changed.

> 
> ==
> src/backend/replication/slot.c
> 
> 4. WaitForStandbyConfirmation
> 
> + * Used by logical decoding SQL functions that acquired logical failover 
> slot.
> 
> IIUC it doesn't work like that. pg_logical_slot_get_changes_guts()
> calls here unconditionally (i.e. the SQL functions don't even check if
> they are failover slots before calling this) so the comment seems
> misleading/redundant.

I removed the "acquired logical failover slot.".

> 
> ==
> src/backend/replication/walsender.c
> 
> 5. NeedToWaitForWal
> 
> + /*
> + * Check if the standby slots have caught up to the flushed position. It
> + * is good to wait up to the flushed position and then let the WalSender
> + * send the changes to logical subscribers one by one which are already
> + * covered by the flushed position without needing to wait on every change
> + * for standby confirmation.
> + */
> + if (NeedToWaitForStandbys(flushed_lsn, wait_event))
> + return true;
> +
> + *wait_event = 0;
> + return false;
> +}
> +
> 
> 5a.
> The comment (or part of it?) seems misplaced because it is talking
> WalSender sending changes, but that is not happening in this function.
> 
> Also, isn't what this is saying already described by the other comment
> in the caller? e.g.:
> 
> + /*
> + * Within the loop, we wait for the necessary WALs to be flushed to
> + * disk first, followed by waiting for standbys to catch up if there
> + * are enough WALs or upon receiving the shutdown signal. To avoid the
> + * scenario where standbys need to catch up to a newer WAL location in
> + * each iteration, we update our idea of the currently flushed
> + * position only if we are not waiting for standbys to catch up.
> + */
> 

I moved these comments based on Shveta's suggestion.

> ~
> 
> 5b.
> Most of the code is unnecessary. AFAICT all this is exactly same as just 1 
> line:
> 
> return NeedToWaitForStandbys(flushed_lsn, wait_event);

Changed.

> 
> ~~~
> 
> 6. WalSndWaitForWal
> 
> + /*
> + * Within the loop, we wait for the necessary WALs to be flushed to
> + * disk first, followed by waiting for standbys to catch up if there
> + * are enough WALs or upon receiving the shutdown signal. To avoid the
> + * scenario where standbys need to catch up to a newer WAL location in
> + * each iteration, we update our idea of the currently flushed
> + * position only if we are not waiting for standbys to catch up.
> + */
> 
> Regarding that 1st sentence: maybe this logic used to be done
> explicitly "within the loop" but IIUC this logic is now hidden inside
> NeedToWaitForWal() so the comment should mention that.

Changed.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-03-04 Thread Zhijie Hou (Fujitsu)

On Monday, March 4, 2024 11:44 PM Bertrand Drouvot 
 wrote:
> 
> Hi,
> 
> On Mon, Mar 04, 2024 at 01:28:04PM +0000, Zhijie Hou (Fujitsu) wrote:
> > Attach the V105 patch set
> 
> Thanks!
> 
> Sorry I missed those during the previous review:

No problem, thanks for the comments!

> 
> 1 ===
> 
> Commit message: "these functions will block until"
> 
> s/block/wait/ ?
> 
> 2 ===
> 
> +when used with logical failover slots, will block until all
> 
> s/block/wait/ ?
> 
> It seems those are the 2 remaining "block" that could deserve the proposed
> above change.

I prefer using 'block' here. And it seems others also suggest
to change the 'wait'[1].

> 
> 3 ===
> 
> +   invalidated = slot->data.invalidated != RS_INVAL_NONE;
> +   inactive = slot->active_pid == 0;
> 
> invalidated = (slot->data.invalidated != RS_INVAL_NONE); inactive =
> (slot->active_pid == 0);
> 
> instead?
> 
> I think it's easier to read and it looks like this is the way it's written in 
> other
> places (at least the few I checked).

I think the current code is consistent with other similar code in slot.c.
(grep "data.invalidated != RS_INVAL_NONE").

[1] 
https://www.postgresql.org/message-id/CAHut%2BPsATK8z1TEcfFE8zWoS1hagqsvaWYCgom_zYtScfwO7uQ%40mail.gmail.com

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-03-04 Thread Zhijie Hou (Fujitsu)

On Monday, March 4, 2024 5:52 PM Amit Kapila  wrote:
> 
> On Mon, Mar 4, 2024 at 9:35 AM Peter Smith 
> wrote:
> >
> > OK, if the code will remain as-is wouldn't it be better to anyway
> > change the function name to indicate what it really does?
> >
> > e.g.  NeedToWaitForWal --> NeedToWaitForWalFlushOrStandbys
> >
> 
> This seems too long. I would prefer the current name NeedToWaitForWal as
> waiting for WAL means waiting to flush the WAL and waiting to replicate it to
> standby. On similar lines, the variable name standby_slot_oldest_flush_lsn 
> looks
> too long. How about ss_oldest_flush_lsn (where ss indicates standy_slots)?
> 
> Apart from this, I have made minor modifications in the attached.

Thanks, I have merged it.

Attach the V105 patch set which addressed Peter, Amit and Bertrand's comments.

This version also includes the following changes:
* We found a string matching issue for query_until() and fixed it.
* Removed one un-used parameter from NeedToWaitForStandbys.
* Disable the sub before testing the pg_logical_slot_get_changes in 040.pl, 
this is to prevent
This test from catching the warning from another walsender.
* Ran pgindent.

Best Regards,
Hou zj

v105-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
Description:  v105-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch

v105-0002-Document-the-steps-to-check-if-the-standby-is-r.patch
Description:  v105-0002-Document-the-steps-to-check-if-the-standby-is-r.patch

RE: Synchronizing slots from primary to standby

2024-03-04 Thread Zhijie Hou (Fujitsu)

On Monday, March 4, 2024 9:55 AM Peter Smith  wrote:
> 
> On Sun, Mar 3, 2024 at 6:51 PM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Sunday, March 3, 2024 7:47 AM Peter Smith 
> wrote:
> > >
> 
> > > 3.
> > > +   
> > > +
> > > + Value * is not accepted as it is 
> > > inappropriate to
> > > + block logical replication for physical slots that either lack
> > > + associated standbys or have standbys associated that are
> > > + not
> > > enabled
> > > + for replication slot synchronization. (see
> > > +  > > linkend="logicaldecoding-replication-slots-synchronization"/>).
> > > +
> > > +   
> > >
> > > Why does the document need to provide an excuse/reason for the rule?
> > > You could just say something like:
> > >
> > > SUGGESTION
> > > The slots must be named explicitly. For example, specifying wildcard
> > > values like * is not permitted.
> >
> > As suggested by Amit, I moved this to code comments.
> 
> Was the total removal of this note deliberate? I only suggested removing the
> *reason* for the rule, not the entire rule. Otherwise, the user won't know to
> avoid doing this until they try it and get an error.

OK, Added.

> 
> 
> > >
> > > 9. NeedToWaitForWal
> > >
> > > + /*
> > > + * Check if the standby slots have caught up to the flushed position.
> > > + It
> > > + * is good to wait up to flushed position and then let it send the
> > > + changes
> > > + * to logical subscribers one by one which are already covered in
> > > + flushed
> > > + * position without needing to wait on every change for standby
> > > + * confirmation. Note that after receiving the shutdown signal, an
> > > + ERROR
> > > + * is reported if any slots are dropped, invalidated, or inactive.
> > > + This
> > > + * measure is taken to prevent the walsender from waiting indefinitely.
> > > + */
> > > + if (NeedToWaitForStandby(target_lsn, flushed_lsn, wait_event))
> > > + return true;
> > >
> > > I felt it was confusing things for this function to also call to the
> > > other one -- it seems an overlapping/muddling of the purpose of these.
> > > I think it will be easier to understand if the calling code just
> > > calls to one or both of these functions as required.
> >
> > Same as Amit, I didn't change this.
> 
> AFAICT my previous review comment was misinterpreted. Please see [1] for
> more details.
> 
> 
> 
> Here are some more review comments for v104-1

Thanks!

> 
> ==
> Commit message
> 
> 1.
> Additionally, The SQL functions pg_logical_slot_get_changes,
> pg_logical_slot_peek_changes and pg_replication_slot_advance are modified
> to wait for the replication slots specified in 'standby_slot_names' to catch 
> up
> before returning.
> 
> ~
> 
> Maybe that should be expressed using similar wording as the docs...
> 
> SUGGESTION
> Additionally, The SQL functions ... are modified. Now, when used with
> failover-enabled logical slots, these functions will block until all physical 
> slots
> specified in 'standby_slot_names' have confirmed WAL receipt.

Changed.

> 
> ==
> doc/src/sgml/config.sgml
> 
> 2.
> +pg_logical_slot_peek_changes,
> +when used with failover enabled logical slots, will block until all
> +physical slots specified in
> standby_slot_names have
> +confirmed WAL receipt.
> 
> /failover enabled logical slots/failover-enabled logical slots/

Changed. Note that for this comment and remaining comments, 
I used the later version we agreed(logical failover slot).

> 
> ==
> doc/src/sgml/func.sgml
> 
> 3.
> +The function may be blocked if the specified slot is a failover 
> enabled
> +slot and  linkend="guc-standby-slot-names">standby_slot_names me>
> +is configured.
> 
> 
> /a failover enabled slot//a failover-enabled slot/

Changed.

> 
> ~~~
> 
> 4.
> +slot may return to an earlier position. The function may be blocked 
> if
> +the specified slot is a failover enabled slot and
> + linkend="guc-standby-slot-names">standby_slot_names me>
> +is configured.
> 
> /a failover enabled slot//a failover-enabled slot/

Changed.

> 
> ==
> src/backend/replication/slot.c
> 
> 5.
> +/*
> + * Wait for physical standbys to conf

RE: Synchronizing slots from primary to standby

2024-03-04 Thread Zhijie Hou (Fujitsu)

On Monday, March 4, 2024 7:22 PM Bertrand Drouvot 
 wrote:
> 
> Hi,
> 
> On Sun, Mar 03, 2024 at 07:56:32AM +0000, Zhijie Hou (Fujitsu) wrote:
> > Here is the V104 patch which addressed above and Peter's comments.
> 
> Thanks!
> 
> A few more random comments:

Thanks for the comments!

> 
> 1 ===
> 
> +The function may be blocked if the specified slot is a failover
> + enabled
> 
> s/blocked/waiting/ ?

Changed.

> 
> 2 ===
> 
> +* specified slot when waiting for them to catch up. See
> +* StandbySlotsHaveCaughtup for details.
> 
> s/StandbySlotsHaveCaughtup/StandbySlotsHaveCaughtup()/ ?

Changed.

> 
> 3 ===
> 
> +   /* Now verify if the specified slots really exist and have
> + correct type */
> 
> remove "really"?

Changed.

> 
> 4 ===
> 
> +   /*
> +* Don't need to wait for the standbys to catch up if there is no 
> value in
> +* standby_slot_names.
> +*/
> +   if (standby_slot_names_list == NIL)
> +   return true;
> +
> +   /*
> +* Don't need to wait for the standbys to catch up if we are on a
> standby
> +* server, since we do not support syncing slots to cascading 
> standbys.
> +*/
> +   if (RecoveryInProgress())
> +   return true;
> +
> +   /*
> +* Don't need to wait for the standbys to catch up if they are already
> +* beyond the specified WAL location.
> +*/
> +   if (!XLogRecPtrIsInvalid(standby_slot_oldest_flush_lsn) &&
> +   standby_slot_oldest_flush_lsn >= wait_for_lsn)
> +   return true;
> 
> What about using OR conditions instead?
> 
> 5 ===
> 
> +static bool
> +NeedToWaitForStandby(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn,
> +uint32 *wait_event)
> 
> Not a big deal but does it need to return a bool? (I mean it all depends of 
> the
> *wait_event value). Is it for better code readability in the caller?
> 
> 6 ===
> 
> +static bool
> +NeedToWaitForWal(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn,
> +uint32 *wait_event)
> 
> Same questions as for NeedToWaitForStandby().

I also feel the current style looks a bit cleaner, so didn’t change these.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-03-02 Thread Zhijie Hou (Fujitsu)

On Saturday, March 2, 2024 6:55 PM Amit Kapila  wrote:
> 
> On Sat, Mar 2, 2024 at 9:21 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > Apart from the comments, the code in WalSndWaitForWal was refactored a
> > bit to make it neater. Thanks Shveta for helping writing the code and doc.
> >
> 
> A few more comments:

Thanks for the comments.

> ==
> 1.
> +# Wait until the primary server logs a warning indicating that it is
> +waiting # for the sb1_slot to catch up.
> +$primary->wait_for_log(
> + qr/replication slot \"sb1_slot\" specified in parameter
> standby_slot_names does not have active_pid/,
> + $offset);
> 
> Shouldn't we wait for such a LOG even in the first test as well which 
> involves two
> standbys and two logical subscribers?

Yes, we should. Added.

> 
> 2.
> +##
> +# Test that logical replication will wait for the user-created inactive
> +# physical slot to catch up until we remove the slot from standby_slot_names.
> +##
> 
> 
> I don't see anything different tested in this test from what we already 
> tested in
> the first test involving two standbys and two logical subscribers. Can you
> please clarify if I am missing something?

I think the intention was to test that the wait loop is ended due to GUC config
reload, while the first test is for the case when the loop is ended due to
restart_lsn movement. But it seems we tested the config reload with 
xx_get_changes() as
well, so I can remove it if you agree.

> 
> 3.
> Note that after receiving the shutdown signal, an ERROR
> + * is reported if any slots are dropped, invalidated, or inactive. This
> + * measure is taken to prevent the walsender from waiting indefinitely.
> + */
> + if (NeedToWaitForStandby(target_lsn, flushed_lsn, wait_event))
> 
> Isn't this part of the comment should be moved inside
> NeedToWaitForStandby()?

Moved.

> 
> 4.
> + /*
> + * Update our idea of the currently flushed position only if we are
> + * not waiting for standbys to catch up, otherwise the standby would
> + * have to catch up to a newer WAL location in each cycle.
> + */
> + if (wait_event != WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION)
> + {
> 
> This functionality (in function WalSndWaitForWal()) seems to ensure that we
> first wait for the required WAL to be flushed and then wait for standbys. If 
> true,
> we should cover that point in the comments here or somewhere in the function
> WalSndWaitForWal().
> 
> Apart from this, I have made a few modifications in the comments.

Thanks. I have reviewed and merged them.

Here is the V104 patch which addressed above and Peter's comments.

Best Regards,
Hou zj


v104-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
Description:  v104-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch


v104-0002-Document-the-steps-to-check-if-the-standby-is-r.patch
Description:  v104-0002-Document-the-steps-to-check-if-the-standby-is-r.patch

RE: Synchronizing slots from primary to standby

2024-03-02 Thread Zhijie Hou (Fujitsu)

On Sunday, March 3, 2024 7:47 AM Peter Smith  wrote:
> 
> On Sat, Mar 2, 2024 at 2:51 PM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Friday, March 1, 2024 12:23 PM Peter Smith 
> wrote:
> > >
> ...
> > > ==
> > > src/backend/replication/slot.c
> > >
> > > 2. validate_standby_slots
> > >
> > > + else if (!ReplicationSlotCtl)
> > > + {
> > > + /*
> > > + * We cannot validate the replication slot if the replication slots'
> > > + * data has not been initialized. This is ok as we will validate
> > > + the
> > > + * specified slot when waiting for them to catch up. See
> > > + * StandbySlotsHaveCaughtup for details.
> > > + */
> > > + }
> > > + else
> > > + {
> > > + /*
> > > + * If the replication slots' data have been initialized, verify if
> > > + the
> > > + * specified slots exist and are logical slots.
> > > + */
> > > + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> > >
> > > IMO that 2nd comment does not need to say "If the replication slots'
> > > data have been initialized," because that is implicit from the if/else.
> >
> > Removed.
> 
> I only meant to suggest removing the 1st part, not the entire comment.
> I thought it is still useful to have a comment like:
> 
> /* Check that the specified slots exist and are logical slots.*/

OK, I misunderstood. Fixed.

> 
> ==
> 
> And, here are some review comments for v103-0001.

Thanks for the comments.

> 
> ==
> Commit message
> 
> 1.
> Additionally, The SQL functions pg_logical_slot_get_changes,
> pg_logical_slot_peek_changes and pg_replication_slot_advance are modified
> to wait for the replication slots mentioned in 'standby_slot_names' to catch 
> up
> before returning.
> 
> ~
> 
> (use the same wording as previous in this message)
> 
> /mentioned in/specified in/

Changed.

> 
> ==
> doc/src/sgml/config.sgml
> 
> 2.
> +Additionally, when using the replication management functions
> +
> +pg_replication_slot_advance,
> +
> +pg_logical_slot_get_changes, and
> +
> +pg_logical_slot_peek_changes,
> +with failover enabled logical slots, the functions will wait for the
> +physical slots specified in
> standby_slot_names to
> +confirm WAL receipt before proceeding.
> +   
> 
> It says "the ... functions" twice so maybe reword it slightly.
> 
> BEFORE
> Additionally, when using the replication management functions
> pg_replication_slot_advance, pg_logical_slot_get_changes, and
> pg_logical_slot_peek_changes, with failover enabled logical slots, the 
> functions
> will wait for the physical slots specified in standby_slot_names to confirm 
> WAL
> receipt before proceeding.
> 
> SUGGESTION
> Additionally, the replication management functions
> pg_replication_slot_advance, pg_logical_slot_get_changes, and
> pg_logical_slot_peek_changes, when used with failover enabled logical slots,
> will wait for the physical slots specified in standby_slot_names to confirm 
> WAL
> receipt before proceeding.
> 
> (Actually the "will wait ... before proceeding" is also a bit tricky, so 
> below is
> another possible rewording)
> 
> SUGGESTION #2
> Additionally, the replication management functions
> pg_replication_slot_advance, pg_logical_slot_get_changes, and
> pg_logical_slot_peek_changes, when used with failover enabled logical slots,
> will block until all physical slots specified in standby_slot_names have
> confirmed WAL receipt.

I prefer the #2 version.

> 
> ~~~
> 
> 3.
> +   
> +
> + Value * is not accepted as it is inappropriate to
> + block logical replication for physical slots that either lack
> + associated standbys or have standbys associated that are not
> enabled
> + for replication slot synchronization. (see
> +  linkend="logicaldecoding-replication-slots-synchronization"/>).
> +
> +   
> 
> Why does the document need to provide an excuse/reason for the rule?
> You could just say something like:
> 
> SUGGESTION
> The slots must be named explicitly. For example, specifying wildcard values 
> like
> * is not permitted.

As suggested by Amit, I moved this to code comments.

> 
> ==
> doc/src/sgml/func.sgml
> 
> 4.
> @@ -28150,7 +28150,7 @@ postgres=# SELECT '0/0'::pg_lsn +
> pd.segment_number * ps.setting::int + :offset
>
> 
>
> -

RE: Synchronizing slots from primary to standby

2024-03-01 Thread Zhijie Hou (Fujitsu)

On Thursday, February 29, 2024 11:16 AM Zhijie Hou (Fujitsu) 
 wrote:
> 
> On Wednesday, February 28, 2024 7:36 PM Bertrand Drouvot
>  wrote:
> >
> > 4 ===
> >
> > Regarding the test, what about adding one to test the "new" behavior
> > discussed up-thread? (logical replication will wait if slot mentioned
> > in standby_slot_names is dropped and/or does not exist when the engine
> > starts?)
> 
> Will think about this.

I think we could add a test to check the warning message for dropped slot, but
since similar wait/warning functionality has been tested, I prefer to leave
this for now and can consider it again after the main patch and tests are
stabilized considering the previous experience of BF instability with this
feature.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-03-01 Thread Zhijie Hou (Fujitsu)

On Friday, March 1, 2024 12:23 PM Peter Smith  wrote:
> 
> Here are some review comments for v102-0001.
> 
> ==
> doc/src/sgml/config.sgml
> 
> 1.
> +   
> +Lists the streaming replication standby server slot names that 
> logical
> +WAL sender processes will wait for. Logical WAL sender processes will
> +send decoded changes to plugins only after the specified replication
> +slots confirm receiving WAL. This guarantees that logical replication
> +slots with failover enabled do not consume changes until those
> changes
> +are received and flushed to corresponding physical standbys. If a
> +logical replication connection is meant to switch to a physical 
> standby
> +after the standby is promoted, the physical replication slot for the
> +standby should be listed here. Note that logical replication will not
> +proceed if the slots specified in the standby_slot_names do
> not exist or
> +are invalidated.
> +   
> 
> Should this also mention the effect this GUC has on those 2 SQL functions? 
> E.g.
> Commit message says:
> 
> Additionally, The SQL functions pg_logical_slot_get_changes and
> pg_replication_slot_advance are modified to wait for the replication slots
> mentioned in 'standby_slot_names' to catch up before returning.

Added.

> 
> ==
> src/backend/replication/slot.c
> 
> 2. validate_standby_slots
> 
> + else if (!ReplicationSlotCtl)
> + {
> + /*
> + * We cannot validate the replication slot if the replication slots'
> + * data has not been initialized. This is ok as we will validate the
> + * specified slot when waiting for them to catch up. See
> + * StandbySlotsHaveCaughtup for details.
> + */
> + }
> + else
> + {
> + /*
> + * If the replication slots' data have been initialized, verify if the
> + * specified slots exist and are logical slots.
> + */
> + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> 
> IMO that 2nd comment does not need to say "If the replication slots'
> data have been initialized," because that is implicit from the if/else.

Removed.

> 
> ~~~
> 
> 3. GetStandbySlotList
> 
> +List *
> +GetStandbySlotList(void)
> +{
> + if (RecoveryInProgress())
> + return NIL;
> + else
> + return standby_slot_names_list;
> +}
> 
> The 'else' is not needed. IMO is better without it, but YMMV.

Removed.

> 
> ~~~
> 
> 4. StandbySlotsHaveCaughtup
> 
> +/*
> + * Return true if the slots specified in standby_slot_names have caught
> +up to
> + * the given WAL location, false otherwise.
> + *
> + * The elevel parameter determines the error level used for logging
> +messages
> + * related to slots that do not exist, are invalidated, or are inactive.
> + */
> +bool
> +StandbySlotsHaveCaughtup(XLogRecPtr wait_for_lsn, int elevel)
> 
> /determines/specifies/

Changed.

> 
> ~
> 
> 5.
> + /*
> + * Don't need to wait for the standby to catch up if there is no value
> + in
> + * standby_slot_names.
> + */
> + if (!standby_slot_names_list)
> + return true;
> +
> + /*
> + * If we are on a standby server, we do not need to wait for the
> + standby to
> + * catch up since we do not support syncing slots to cascading standbys.
> + */
> + if (RecoveryInProgress())
> + return true;
> +
> + /*
> + * Return true if all the standbys have already caught up to the passed
> + in
> + * WAL localtion.
> + */
> + if (!XLogRecPtrIsInvalid(standby_slot_oldest_flush_lsn) &&
> + standby_slot_oldest_flush_lsn >= wait_for_lsn) return true;
> 
> 
> 5a.
> I felt all these comments should be worded in a consistent way like "Don't 
> need
> to wait ..."
> 
> e.g.
> 1. Don't need to wait for the standbys to catch up if there is no value in
> 'standby_slot_names'.
> 2. Don't need to wait for the standbys to catch up if we are on a standby 
> server,
> since we do not support syncing slots to cascading standbys.
> 3. Don't need to wait for the standbys to catch up if they are already beyond
> the specified WAL location.

Changed.

> 
> ~
> 
> 5b.
> typo
> /WAL localtion/WAL location/
> 

Fixed.

> ~~~
> 
> 6.
> + if (!slot)
> + {
> + /*
> + * It may happen that the slot specified in standby_slot_names GUC
> + * value is dropped, so let's skip over it.
> + */
> + ereport(elevel,
> + errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("replication slot \"%s\" specified in parameter %s does not exist",
> +name, "standby_slot_names"));
> + continue;
> + }
> 
> Is "is dropped" strictly the only reason? IIUC another reason is the slot 
> maybe
> just did not even exist in the first place but it was not detected before now
> because inititial validation was skipped.

Changed the comment.

> 
> ~~~
> 
> 7.
> + /*
> + * Return false if not all the standbys have caught up to the specified
> + WAL
> + * location.
> + */
> + if (caught_up_slot_num != list_length(standby_slot_names_list))
> + return false;
> 
> Somehow it seems complicated to have a counter of the slots as you process
> then compare that counter to the list_length

RE: Synchronizing slots from primary to standby

2024-02-29 Thread Zhijie Hou (Fujitsu)

On Friday, March 1, 2024 2:11 PM Masahiko Sawada  wrote:
> 
> On Fri, Mar 1, 2024 at 12:42 PM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Friday, March 1, 2024 10:17 AM Zhijie Hou (Fujitsu)
>  wrote:
> > >
> > >
> > > Attach the V102 patch set which addressed Amit and Shveta's comments.
> > > Thanks Shveta for helping addressing the comments off-list.
> >
> > The cfbot reported a compile warning, here is the new version patch
> > which fixed it, Also removed some outdate comments in this version.
> >
> 
> I've reviewed the v102-0001 patch. Here are some comments:

Thanks for the comments !

> 
> ---
> I got a compiler warning:
> 
> walsender.c:1829:6: warning: variable 'wait_event' is used uninitialized
> whenever '&&' condition is false [-Wsometimes-uninitialized]
> if (!XLogRecPtrIsInvalid(RecentFlushPtr) &&
> ^~~~
> walsender.c:1871:7: note: uninitialized use occurs here
> if (wait_event ==
> WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL)
> ^~
> walsender.c:1829:6: note: remove the '&&' if its condition is always true
> if (!XLogRecPtrIsInvalid(RecentFlushPtr) &&
> ^~~
> walsender.c:1818:20: note: initialize the variable 'wait_event' to silence 
> this
> warning
> uint32  wait_event;
>   ^
>= 0
> 1 warning generated.


Thanks for reporting, it was fixed in V102_2.

> 
> ---
> +void
> +assign_standby_slot_names(const char *newval, void *extra) {
> +List  *standby_slots;
> +MemoryContext oldcxt;
> +char  *standby_slot_names_cpy = extra;
> +
> 
> Given that the newval and extra have the same data (standby_slot_names
> value), why do we not use newval instead? I think that if we use
> newval, we don't need to guc_strdup() in check_standby_slot_names(),
> we might need to do list_copy_deep() instead, though. It's not clear
> to me as there is no comment.

I think SplitIdentifierString will modify the passed in string, so we'd better
not pass the newval to it, otherwise the stored guc string(standby_slot_names)
will be changed. I can see we are doing similar thing in other GUC check/assign
function as well. (check_wal_consistency_checking/
assign_wal_consistency_checking, check_createrole_self_grant/
assign_createrole_self_grant ...).

> ---
> +/*
> + * Switch to the memory context under which GUC variables are
> allocated
> + * (GUCMemoryContext).
> + */
> +oldcxt =
> MemoryContextSwitchTo(GetMemoryChunkContext(standby_slot_names_cpy
> ));
> +standby_slot_names_list = list_copy(standby_slots);
> +MemoryContextSwitchTo(oldcxt);
> 
> Why do we not explicitly switch to GUCMemoryContext?

I think it's because the GUCMemoryContext is not exposed.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-02-29 Thread Zhijie Hou (Fujitsu)

On Friday, March 1, 2024 10:17 AM Zhijie Hou (Fujitsu)  
wrote:
> 
> 
> Attach the V102 patch set which addressed Amit and Shveta's comments.
> Thanks Shveta for helping addressing the comments off-list.

The cfbot reported a compile warning, here is the new version patch which fixed 
it,
Also removed some outdate comments in this version.

Best Regards,
Hou zj


v102_2-0001-Allow-logical-walsenders-to-wait-for-the-physi.patch
Description:  v102_2-0001-Allow-logical-walsenders-to-wait-for-the-physi.patch


v102_2-0002-Document-the-steps-to-check-if-the-standby-is-r.patch
Description:  v102_2-0002-Document-the-steps-to-check-if-the-standby-is-r.patch

RE: Synchronizing slots from primary to standby

2024-02-29 Thread Zhijie Hou (Fujitsu)

On Thursday, February 29, 2024 5:54 PM Amit Kapila  
wrote:
> 
> On Thu, Feb 29, 2024 at 8:29 AM Peter Smith 
> wrote:
> >
> > On Wed, Feb 28, 2024 at 1:23 PM Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > On Tuesday, February 27, 2024 3:18 PM Peter Smith
>  wrote:
> > ...
> > > > 20.
> > > > +#
> > > > +# | > standby1 (primary_slot_name = sb1_slot) # | > standby2
> > > > +(primary_slot_name = sb2_slot) # primary - | # | > subscriber1
> > > > +(failover = true) # | > subscriber2 (failover = false)
> > > >
> > > > In the diagram, the "--->" means a mixture of physical standbys and
> logical
> > > > pub/sub replication. Maybe it can be a bit clearer?
> > > >
> > > > SUGGESTION
> > > >
> > > > # primary (publisher)
> > > > #
> > > > # (physical standbys)
> > > > # | > standby1 (primary_slot_name = sb1_slot)
> > > > # | > standby2 (primary_slot_name = sb2_slot)
> > > > #
> > > > # (logical replication)
> > > > # | > subscriber1 (failover = true, slot_name = lsub1_slot)
> > > > # | > subscriber2 (failover = false, slot_name = lsub2_slot)
> > > >
> > >
> > > I think one can distinguish it based on the 'standby' and 'subscriber' as 
> > > well,
> because
> > > 'standby' normally refer to physical standby while the other refer to 
> > > logical.
> > >
> 
> I think Peter's suggestion will make the setup clear.

Changed.

> 
> >
> > Ok, but shouldn't it at least include info about the logical slot
> > names associated with the subscribers (slot_name = lsub1_slot,
> > slot_name = lsub2_slot) like suggested above?
> >
> > ==
> >
> > Here are some more review comments for v100-0001
> >
> > ==
> > doc/src/sgml/config.sgml
> >
> > 1.
> > +   
> > +Lists the streaming replication standby server slot names that
> logical
> > +WAL sender processes will wait for. Logical WAL sender processes
> will
> > +send decoded changes to plugins only after the specified
> replication
> > +slots confirm receiving WAL. This guarantees that logical 
> > replication
> > +slots with failover enabled do not consume changes until those
> changes
> > +are received and flushed to corresponding physical standbys. If a
> > +logical replication connection is meant to switch to a physical
> standby
> > +after the standby is promoted, the physical replication slot for 
> > the
> > +standby should be listed here. Note that logical replication will 
> > not
> > +proceed if the slots specified in the standby_slot_names do
> > not exist or
> > +are invalidated.
> > +   
> >
> > Is that note ("Note that logical replication will not proceed if the
> > slots specified in the standby_slot_names do not exist or are
> > invalidated") meant only for subscriptions marked for 'failover' or
> > any subscription? Maybe wording can be modified to help clarify it?
> >
> 
> I think it is implicit that here we are talking about failover slots.
> I think clarifying again the same could be repetitive considering the
> previous sentence: "This guarantees that logical replication slots
> with failover enabled do not consume .." have mentioned it.
> 
> > ==
> > src/backend/replication/slot.c
> >
> > 2.
> > +/*
> > + * A helper function to validate slots specified in GUC standby_slot_names.
> > + */
> > +static bool
> > +validate_standby_slots(char **newval)
> > +{
> > + char*rawname;
> > + List*elemlist;
> > + bool ok;
> > +
> > + /* Need a modifiable copy of string */
> > + rawname = pstrdup(*newval);
> > +
> > + /* Verify syntax and parse string into a list of identifiers */
> > + ok = SplitIdentifierString(rawname, ',', );
> > +
> > + if (!ok)
> > + {
> > + GUC_check_errdetail("List syntax is invalid.");
> > + }
> > +
> > + /*
> > + * If the replication slots' data have been initialized, verify if the
> > + * specified slots exist and are logical slots.
> > + */
> > + else if (ReplicationSlotCtl)
> > + {
> > + foreach_ptr(char, name, elemlist)
> > + {
> > + ReplicationSlot *slot;
> > +

RE: Synchronizing slots from primary to standby

2024-02-29 Thread Zhijie Hou (Fujitsu)

On Thursday, February 29, 2024 7:36 PM shveta malik  
wrote:
> 
> On Thu, Feb 29, 2024 at 7:04 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > Here is the v101 patch set which addressed above comments.
> >
> 
> Thanks for the patch. Few comments:
> 
> 1)  Shall we mention in doc that shutdown will wait for standbys in
> standby_slot_names to confirm receiving WAL:
> 
> Suggestion for logicaldecoding.sgml:
> 
> When standby_slot_names is utilized, the primary
> server will not completely shut down until the corresponding standbys,
> associated with the physical replication slots specified in
> standby_slot_names, have confirmed receiving the
> WAL up to the latest flushed position on the primary server.
> 
> slot.c
> 2)
> /*
>  * If a logical slot name is provided in standby_slot_names, report
>  * a message and skip it. Although logical slots are disallowed in
>  * the GUC check_hook(validate_standby_slots), it is still
>  * possible for a user to drop an existing physical slot and
>  * recreate a logical slot with the same name.
>  */
> 
> This is not completely true, we can still specify a logical slot during 
> instance
> start and it will accept it.
> 
> Suggestion:
> /*
>  * If a logical slot name is provided in standby_slot_names, report
>  * a message and skip it. It is possible for user to specify a
>  * logical slot name in standby_slot_names just before the server
>  * startup. The GUC check_hook(validate_standby_slots) can not
>  * validate such a slot during startup as the ReplicationSlotCtl
>  * shared memory is not initialized by that time. It is also
>  * possible for user to drop an existing physical slot and
>  * recreate a logical slot with the same name.
>  */
> 
> 3. Wait for physical standby to confirm receiving the given lsn
> 
> standby -->standbys
> 
> 
> 4.
> In StandbyConfirmedFlush(), is it better to have below errdetail in all
> problematic cases:
> Logical replication is waiting on the standby associated with \"%s\
> 
> We have it only for inactive pid case but we are waiting in all cases.

Thanks for the comments, I have addressed them.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-02-29 Thread Zhijie Hou (Fujitsu)

On Thursday, February 29, 2024 7:17 PM Amit Kapila  
wrote:
> 
> On Thu, Feb 29, 2024 at 3:23 PM Amit Kapila 
> wrote:
> >
> > Few additional comments on the latest patch:
> > =
> > 1.
> >  static XLogRecPtr
> >  WalSndWaitForWal(XLogRecPtr loc)
> > {
> > ...
> > + if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr &&
> > + (!replication_active || StandbyConfirmedFlush(loc, WARNING))) {
> > + /*
> > + * Fast path to avoid acquiring the spinlock in case we already know
> > + * we have enough WAL available and all the standby servers have
> > + * confirmed receipt of WAL up to RecentFlushPtr. This is
> > + particularly
> > + * interesting if we're far behind.
> > + */
> >   return RecentFlushPtr;
> > + }
> > ...
> > ...
> > + * Wait for WALs to be flushed to disk only if the postmaster has not
> > + * asked us to stop.
> > + */
> > + if (loc > RecentFlushPtr && !got_STOPPING) wait_event =
> > + WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL;
> > +
> > + /*
> > + * Check if the standby slots have caught up to the flushed position.
> > + * It is good to wait up to RecentFlushPtr and then let it send the
> > + * changes to logical subscribers one by one which are already
> > + covered
> > + * in RecentFlushPtr without needing to wait on every change for
> > + * standby confirmation. Note that after receiving the shutdown
> > + signal,
> > + * an ERROR is reported if any slots are dropped, invalidated, or
> > + * inactive. This measure is taken to prevent the walsender from
> > + * waiting indefinitely.
> > + */
> > + else if (replication_active &&
> > + !StandbyConfirmedFlush(RecentFlushPtr, WARNING)) { wait_event =
> > + WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION;
> > + wait_for_standby = true;
> > + }
> > + else
> > + {
> > + /* Already caught up and doesn't need to wait for standby_slots. */
> >   break;
> > + }
> > ...
> > }
> >
> > Can we try to move these checks into a separate function that returns
> > a boolean and has an out parameter as wait_event?
> >
> > 2. How about naming StandbyConfirmedFlush() as
> StandbySlotsAreCaughtup?
> >
> 
> Some more comments:
> ==
> 1.
> + foreach_ptr(char, name, elemlist)
> + {
> + ReplicationSlot *slot;
> +
> + slot = SearchNamedReplicationSlot(name, true);
> +
> + if (!slot)
> + {
> + GUC_check_errdetail("replication slot \"%s\" does not exist", name);
> + ok = false; break; }
> +
> + if (!SlotIsPhysical(slot))
> + {
> + GUC_check_errdetail("\"%s\" is not a physical replication slot",
> + name); ok = false; break; } }
> 
> Won't the second check need protection via ReplicationSlotControlLock?

Yes, added.

> 
> 2. In WaitForStandbyConfirmation(), we are anyway calling
> StandbyConfirmedFlush() before the actual sleep in the loop, so does calling 
> it
> at the beginning of the function will serve any purpose? If so, it is better 
> to add
> some comments explaining the same.

It is used as a fast-path to avoid calling condition variable stuff, I think we 
can directly
put failover check and list check in the beginning instead of calling that 
function.

> 
> 3. Also do we need to perform the validation checks done in
> StandbyConfirmedFlush() repeatedly when it is invoked in a loop? We can
> probably separate those checks and perform them just once.

I have removed slot.failover check from the StandbyConfirmedFlush
function, so that we can do it when necessary. I didn’t change the check
for standby_slot_names_list because the list could be changed in the loop
when reloading config.

> 
> 4.
> +   *
> +   * XXX: If needed, this can be attempted in future.
> 
> Remove this part of the comment.

Removed.

Attach the V102 patch set which addressed Amit and Shveta's comments.
Thanks Shveta for helping addressing the comments off-list.

Best Regards,
Hou zj


v102-0002-Document-the-steps-to-check-if-the-standby-is-r.patch
Description:  v102-0002-Document-the-steps-to-check-if-the-standby-is-r.patch


v102-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
Description:  v102-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch

RE: Synchronizing slots from primary to standby

2024-02-28 Thread Zhijie Hou (Fujitsu)

On Wednesday, February 28, 2024 7:36 PM Bertrand Drouvot 
 wrote:
> 
> On Wed, Feb 28, 2024 at 02:23:27AM +, Zhijie Hou (Fujitsu) wrote:
> > Attach the V100 patch set which addressed above comments.
> 
> A few random comments:

Thanks for the comments!

> 
> 1 ===
> 
> +   if (!ok)
> +   {
> +   GUC_check_errdetail("List syntax is invalid.");
> +   }
> 
> What about to get rid of the brackets here?

I personally prefer the current style.

> 
> 2 ===
> 
> +
> +   /*
> +* If the replication slots' data have been initialized, verify if the
> +* specified slots exist and are logical slots.
> +*/
> 
> remove the empty line above the comment?

I feel it would be clean to have an empty line before the comments.

> 
> 3 ===
> 
> +check_standby_slot_names(char **newval, void **extra, GucSource source)
> +{
> +   if ((*newval)[0] == '\0')
> +   return true;
> 
> I think "**newval == '\0'" is easier to read but that's a matter of taste and
> check_synchronous_standby_names() is already using the same so it's a nit.

I don't have a strong opinion on this, so will change if others also feel so.

> 
> 4 ===
> 
> Regarding the test, what about adding one to test the "new" behavior
> discussed up-thread? (logical replication will wait if slot mentioned in
> standby_slot_names is dropped and/or does not exist when the engine starts?)

Will think about this.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-02-28 Thread Zhijie Hou (Fujitsu)

On Monday, February 26, 2024 7:52 PM Amit Kapila  
wrote:
> 
> On Mon, Feb 26, 2024 at 7:49 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > Attach the V98 patch set which addressed above comments.
> >
> 
> Few comments:
> =
> 1.
>  WalSndWaitForWal(XLogRecPtr loc)
>  {
>   int wakeEvents;
> + bool wait_for_standby = false;
> + uint32 wait_event;
> + List*standby_slots = NIL;
>   static XLogRecPtr RecentFlushPtr = InvalidXLogRecPtr;
> 
> + if (MyReplicationSlot->data.failover && replication_active)
> + standby_slots = GetStandbySlotList(true);
> +
>   /*
> - * Fast path to avoid acquiring the spinlock in case we already know we
> - * have enough WAL available. This is particularly interesting if we're
> - * far behind.
> + * Check if all the standby servers have confirmed receipt of WAL up to
> + * RecentFlushPtr even when we already know we have enough WAL available.
> + *
> + * Note that we cannot directly return without checking the status of
> + * standby servers because the standby_slot_names may have changed,
> + which
> + * means there could be new standby slots in the list that have not yet
> + * caught up to the RecentFlushPtr.
>   */
> - if (RecentFlushPtr != InvalidXLogRecPtr &&
> - loc <= RecentFlushPtr)
> - return RecentFlushPtr;
> + if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr) {
> + FilterStandbySlots(RecentFlushPtr, _slots);
> 
> I think even if the slot list is not changed, we will always process each slot
> mentioned in standby_slot_names once. Can't we cache the previous list of
> slots for we have already waited for? In that case, we won't even need to copy
> the list via GetStandbySlotList() unless we need to wait.
> 
> 2.
>  WalSndWaitForWal(XLogRecPtr loc)
>  {
> + /*
> + * Update the standby slots that have not yet caught up to the flushed
> + * position. It is good to wait up to RecentFlushPtr and then let it
> + * send the changes to logical subscribers one by one which are
> + * already covered in RecentFlushPtr without needing to wait on every
> + * change for standby confirmation.
> + */
> + if (wait_for_standby)
> + FilterStandbySlots(RecentFlushPtr, _slots);
> +
>   /* Update our idea of the currently flushed position. */
> - if (!RecoveryInProgress())
> + else if (!RecoveryInProgress())
>   RecentFlushPtr = GetFlushRecPtr(NULL);
>   else
>   RecentFlushPtr = GetXLogReplayRecPtr(NULL); ...
> /*
> * If postmaster asked us to stop, don't wait anymore.
> *
> * It's important to do this check after the recomputation of
> * RecentFlushPtr, so we can send all remaining data before shutting
> * down.
> */
> if (got_STOPPING)
> break;
> 
> I think because 'wait_for_standby' may not be set in the first or consecutive
> cycles we may send the WAL to the logical subscriber before sending it to the
> physical subscriber during shutdown.

Here is the v101 patch set which addressed above comments.

This version will cache the oldest standby slot's LSN each time we waited for
them to catch up. The cached LSN is invalidated when we reload the GUC config.
In the WalSndWaitForWal function, instead of traversing the entire standby list
each time, we can check the cached LSN to quickly determine if the standbys
have caught up. When a shutdown signal is received, we continue to wait for the
standby slots to catch up. When waiting for the standbys to catch up after
receiving the shutdown signal, an ERROR is reported if any slots are dropped,
invalidated, or inactive. This measure is taken to prevent the walsender from
waiting indefinitely.

Best Regards,
Hou zj


v101-0002-Document-the-steps-to-check-if-the-standby-is-r.patch
Description:  v101-0002-Document-the-steps-to-check-if-the-standby-is-r.patch


v101-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
Description:  v101-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch

RE: Synchronizing slots from primary to standby

2024-02-27 Thread Zhijie Hou (Fujitsu)

On Wednesday, February 28, 2024 2:38 PM Bertrand Drouvot 
 wrote:
> On Wed, Feb 28, 2024 at 08:49:19AM +0530, Amit Kapila wrote:
> > On Mon, Feb 26, 2024 at 9:13 AM shveta malik 
> wrote:
> > >
> > > On Fri, Feb 23, 2024 at 7:41 PM Bertrand Drouvot
> > >  wrote:
> > > >
> > > > Hi,
> > > > > I think to set secure search path for remote connection, the
> > > > > standard approach could be to extend the code in
> > > > > libpqrcv_connect[1], so that we don't need to schema qualify all the
> operators in the queries.
> > > > >
> > > > > And for local connection, I agree it's also needed to add a
> > > > > SetConfigOption("search_path", "" call in the slotsync worker.
> > > > >
> > > > > [1]
> > > > > libpqrcv_connect
> > > > > ...
> > > > >   if (logical)
> > > > > ...
> > > > >   res = libpqrcv_PQexec(conn->streamConn,
> > > > >
> > > > > ALWAYS_SECURE_SEARCH_PATH_SQL);
> > > > >
> > > >
> > > > Agree, something like in the attached? (it's .txt to not disturb the CF 
> > > > bot).
> > >
> > > Thanks for the patch, changes look good. I have corporated it in the
> > > patch which addresses the rest of your comments in [1]. I have
> > > attached the patch as .txt
> > >
> >
> > Few comments:
> > ===
> > 1.
> > - if (logical)
> > + if (logical || !replication)
> >   {
> >
> > Can we add a comment about connection types that require
> > ALWAYS_SECURE_SEARCH_PATH_SQL?
> 
> Yeah, will do.
> 
> >
> > 2.
> > Can we add a test case to demonstrate that the '=' operator can be
> > hijacked to do different things when the slotsync worker didn't use
> > ALWAYS_SECURE_SEARCH_PATH_SQL?
> 
> I don't think that's good to create a test to show how to hijack an operator
> within a background worker.
> 
> I had a quick look and did not find existing tests in this area around
> ALWAYS_SECURE_SEARCH_PATH_SQL / search_patch and background worker.

I think a similar commit 11da970 has added a test for the search_path, e.g.

# Create some preexisting content on publisher
$node_publisher->safe_psql(
'postgres',
"CREATE FUNCTION public.pg_get_replica_identity_index(int)
 RETURNS regclass LANGUAGE sql AS 'SELECT 1/0'");# shall not call

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-02-27 Thread Zhijie Hou (Fujitsu)

On Tuesday, February 27, 2024 3:18 PM Peter Smith  wrote:

> 
> Here are some review comments for v99-0001

Thanks for the comments!

> Commit Message
> 
> 1.
> A new parameter named standby_slot_names is introduced.
> 
> Maybe quote the GUC names here to make it more readable.

Added.

> 
> ~~
> 
> 2.
> Additionally, The SQL functions pg_logical_slot_get_changes and
> pg_replication_slot_advance are modified to wait for the replication slots
> mentioned in standby_slot_names to catch up before returning the changes to
> the user.
> 
> ~
> 
> 2a.
> "pg_replication_slot_advance" is a typo? Did you mean
> pg_logical_replication_slot_advance?

pg_logical_replication_slot_advance is not a user visible function. So the
pg_replication_slot_advance is correct.

> 
> ~
> 
> 2b.
> The "before returning the changes to the user" seems like it is referring 
> only to
> the first function.
> 
> Maybe needs slight rewording like:
> /before returning the changes to the user./ before returning./

Changed.

> 
> ==
> doc/src/sgml/config.sgml
> 
> 3. standby_slot_names
> 
> +   
> +List of physical slots guarantees that logical replication slots with
> +failover enabled do not consume changes until those changes
> are received
> +and flushed to corresponding physical standbys. If a logical
> replication
> +connection is meant to switch to a physical standby after the
> standby is
> +promoted, the physical replication slot for the standby
> should be listed
> +here. Note that logical replication will not proceed if the slots
> +specified in the standby_slot_names do not exist or are invalidated.
> +   
> 
> The wording doesn't seem right. IMO this should be worded much like how this
> GUC is described in guc_tables.c
> 
> e.g something a bit like:
> 
> Lists the streaming replication standby server slot names that logical WAL
> sender processes will wait for. Logical WAL sender processes will send
> decoded changes to plugins only after the specified replication slots confirm
> receiving WAL. This guarantees that logical replication slots with failover
> enabled do not consume changes until those changes are received and flushed
> to corresponding physical standbys...

Changed.

> 
> ==
> doc/src/sgml/logicaldecoding.sgml
> 
> 4. Section 48.2.3 Replication Slot Synchronization
> 
> + It's also highly recommended that the said physical replication slot
> + is named in
> +  linkend="guc-standby-slot-names">standby_slot_names me>
> + list on the primary, to prevent the subscriber from consuming changes
> + faster than the hot standby. But once we configure it, then
> certain latency
> + is expected in sending changes to logical subscribers due to wait on
> + physical replication slots in
> +  +
> linkend="guc-standby-slot-names">standby_slot_names me>
> + 
> 
> 4a.
> /It's also highly/It is highly/
> 
> ~
> 
> 4b.
> 
> BEFORE
> But once we configure it, then certain latency is expected in sending changes
> to logical subscribers due to wait on physical replication slots in  linkend="guc-standby-slot-names">standby_slot_names me>
> 
> SUGGESTION
> Even when correctly configured, some latency is expected when sending
> changes to logical subscribers due to the waiting on slots named in
> standby_slot_names.

Changed.

> 
> ==
> .../replication/logical/logicalfuncs.c
> 
> 5. pg_logical_slot_get_changes_guts
> 
> + if (XLogRecPtrIsInvalid(upto_lsn))
> + wait_for_wal_lsn = end_of_wal;
> + else
> + wait_for_wal_lsn = Min(upto_lsn, end_of_wal);
> +
> + /*
> + * Wait for specified streaming replication standby servers (if any)
> + * to confirm receipt of WAL up to wait_for_wal_lsn.
> + */
> + WaitForStandbyConfirmation(wait_for_wal_lsn);
> 
> Perhaps those statements all belong together with the comment up-front. e.g.
> 
> + /*
> + * Wait for specified streaming replication standby servers (if any)
> + * to confirm receipt of WAL up to wait_for_wal_lsn.
> + */
> + if (XLogRecPtrIsInvalid(upto_lsn))
> + wait_for_wal_lsn = end_of_wal;
> + else
> + wait_for_wal_lsn = Min(upto_lsn, end_of_wal);
> + WaitForStandbyConfirmation(wait_for_wal_lsn);

Changed.

> 
> ==
> src/backend/replication/logical/slotsync.c
> 
> ==
> src/backend/replication/slot.c
> 
> 6.
> +static bool
> +validate_standby_slots(char **newval)
> +{
> + char*rawname;
> + List*elemlist;
> + ListCell   *lc;
> + bool ok;
> +
> + /* Need a modifiable copy of string */ rawname = pstrdup(*newval);
> +
> + /* Verify syntax and parse string into a list of identifiers */ ok =
> + SplitIdentifierString(rawname, ',', );
> +
> + if (!ok)
> + {
> + GUC_check_errdetail("List syntax is invalid."); }
> +
> + /*
> + * If the replication slots' data have been initialized, verify if the
> + * specified slots exist and are logical slots.
> + */
> + else if (ReplicationSlotCtl)
> + {
> + foreach(lc, elemlist)
> 
> 6a.
> So, if the ReplicationSlotCtl is

RE: Synchronizing slots from primary to standby

2024-02-26 Thread Zhijie Hou (Fujitsu)

On Monday, February 26, 2024 1:19 PM Amit Kapila  
wrote:
> 
> On Fri, Feb 23, 2024 at 4:45 PM Bertrand Drouvot
>  wrote:
> >
> > On Fri, Feb 23, 2024 at 09:46:00AM +, Zhijie Hou (Fujitsu) wrote:
> > >
> > > Besides, I'd like to clarify and discuss the behavior of 
> > > standby_slot_names
> once.
> > >
> > > As it stands in the patch, If the slots specified in
> > > standby_slot_names are dropped or invalidated, the logical walsender
> > > will issue a WARNING and continue to replicate the changes. Another
> > > option for this could be to have the walsender pause until the slot
> > > in standby_slot_names is re-created or becomes valid again. Does anyone
> else have an opinion on this matter ?
> >
> > Good point, I'd vote for: the only reasons not to wait are:
> >
> > - slots mentioned in standby_slot_names exist and valid and do catch
> > up or
> > - standby_slot_names is empty
> >
> > The reason is that setting standby_slot_names to a non empty value
> > means that one wants the walsender to wait until the standby catchup.
> > The way to remove this intentional behavior should be by changing the
> > standby_slot_names value (not the existence or the state of the slot(s) it
> points too).
> >
> 
> It seems we already do wait for the case when there is an inactive slot as 
> per the
> below code [1] in the patch. So, probably waiting in other cases is also okay 
> and
> also as this parameter is a SIGHUP parameter, users should be easily able to
> change its value if required. Do you think it is a good idea to mention this 
> in
> docs as well?
> 
> I think it is important to raise WARNING as the patch is doing in all the 
> cases
> where the slot is not being processed so that users can be notified and they 
> can
> take the required action.

Agreed. Here is the V99 patch which addressed the above.

This version also includes:
1. list_free the slot list when reloading the list due to GUC change.
2. Refactored the validate_standby_slots based on Shveta's suggestion.
3. Added errcode for the warnings as most of existing have errcodes.

Amit's latest comments[1] are pending, we will address that in next version.

[1] 
https://www.postgresql.org/message-id/CAA4eK1LJdmGATWG%3DxOD1CB9cogukk2cLNBGH8h-n-ZDJuwBdJg%40mail.gmail.com

Best Regards,
Hou zj



v99-0002-Document-the-steps-to-check-if-the-standby-is-re.patch
Description:  v99-0002-Document-the-steps-to-check-if-the-standby-is-re.patch


v99-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch
Description:  v99-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch

RE: Synchronizing slots from primary to standby

2024-02-25 Thread Zhijie Hou (Fujitsu)

On Friday, February 23, 2024 6:12 PM Bertrand Drouvot 
 wrote:
> 
> Here are some random comments:

Thanks for the comments!

> 
> 1 ===
> 
> Commit message "Allow logical walsenders to wait for the physical"
> 
> s/physical/physical standby/?
> 
> 2 ==
> 
> +++ b/src/backend/replication/logical/logicalfuncs.c
> @@ -30,6 +30,7 @@
>  #include "replication/decode.h"
>  #include "replication/logical.h"
>  #include "replication/message.h"
> +#include "replication/walsender.h"
> 
> Is this include needed?

Removed.

> 
> 3 ===
> 
> +* Slot sync is currently not supported on the cascading
> + standby. This is
> 
> s/on the/on a/?

Changed.

> 
> 4 ===
> 
> +   if (!ok)
> +   GUC_check_errdetail("List syntax is invalid.");
> +
> +   /*
> +* If there is a syntax error in the name or if the replication slots'
> +* data is not initialized yet (i.e., we are in the startup process), 
> skip
> +* the slot verification.
> +*/
> +   if (!ok || !ReplicationSlotCtl)
> +   {
> +   pfree(rawname);
> +   list_free(elemlist);
> +   return ok;
> +   }
> 
> we are testing the "ok" value twice, what about using if...else if... instead 
> and
> test it once? If so, it might be worth to put the:
> 
> "
> +   pfree(rawname);
> +   list_free(elemlist);
> +   return ok;
> "
> 
> in a "goto".

There were comments to remove the 'goto' statement and avoid
duplicate free code, so I prefer the current style.

> 
> 5 ===
> 
> +* for which all standbys to wait for. Even if we have
> + physical-slots
> 
> s/physical-slots/physical slots/?

Changed.

> 
> 6 ===
> 
> * Switch to the same memory context under which GUC variables are
> 
> s/to the same memory/to the memory/?

Changed.

> 
> 7 ===
> 
> + * Return a copy of standby_slot_names_list if the copy flag is set to
> + true,
> 
> Not sure, but would it be worth explaining why one would want to set to flag 
> to
> true or false? (i.e why one would not want to receive the original list).

I think the usage can be found from the caller's code, e.g we need to remove
the slots that caught up from the list each time, so we cannot directly modify
the global list. The GetStandbySlotList function is general function and I feel
we can avoid adding more comments here.

> 
> 8 ===
> 
> +   if (RecoveryInProgress())
> +   return NIL;
> 
> The need is well documented just above, but are we not violating the fact that
> we return the original list or a copy of it? (that's what the comment above 
> the
> GetStandbySlotList() function definition is saying).
> 
> I think the comment above the GetStandbySlotList() function needs a bit of
> rewording to cover that case.

Adjusted.


> 
> 9 ===
> 
> +* harmless, a WARNING should be enough, no need to
> error-out.
> 
> s/error-out/error out/?

Changed.

> 
> 10 ===
> 
> +   if (slot->data.invalidated != RS_INVAL_NONE)
> +   {
> +   /*
> +* Specified physical slot have been 
> invalidated,
> so no point
> +* in waiting for it.
> 
> We discovered in [1], that if the wal_status is "unreserved" then the slot is 
> still
> serving the standby. I think we should handle this case differently, thoughts?

I think the 'invalidated' slot can still be used is a separate bug. Because
once the slot is invalidated, it can neither protect WALs or ROWs from being
removed even if the restart_lsn of the slot can be moved forward after being 
invalidated.

If the standby can move restart_lsn forward for invalidated slots, then
it should also set the 'invalidated' flag back to NONE, otherwise the slot
cannot serve its purpose anymore. I also reported similar bug before[1].

> 
> 11 ===
> 
> +* Specified physical slot have been
> + invalidated, so no point
> 
> s/have been/has been/?

Changed.

> 
> 12 ===
> 
> +++ b/src/backend/replication/slotfuncs.c
> @@ -22,6 +22,7 @@
>  #include "replication/logical.h"
>  #include "replication/slot.h"
>  #include "replication/slotsync.h"
> +#include "replication/walsender.h"
> 
> Is this include needed?

No, it's not needed. Removed.

Attach the V98 patch set which addressed above comments.
I also adjusted few comments based on off-list comments from Shveta.

The discussion for wait behavior is on-going, so I didn't change the behavior 
in this version.

[1] 
https://www.postgresql.org/message-id/flat/os0pr01mb5716a626a4af5814e057cee394...@os0pr01mb5716.jpnprd01.prod.outlook.com

Best Regards,
Hou zj


v98-0002-Document-the-steps-to-check-if-the-standby-is-re.patch
Description:  v98-0002-Document-the-steps-to-check-if-the-standby-is-re.patch


v98-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch
Description:  v98-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch

RE: Synchronizing slots from primary to standby

2024-02-23 Thread Zhijie Hou (Fujitsu)

On Friday, February 23, 2024 1:22 PM shveta malik  
wrote:

> 
> Thanks for the patches. Had a quick look at v95_2, here are some
> trivial comments:

Thanks for the comments.

> 6) streaming replication standby server slot names that logical walsender
> processes will wait for
> 
> Is it better to say it like this? (I leave this to your preference)
> 
> streaming replication standby server slot names for which logical
> walsender processes will wait.

I feel the current one seems better, so didn’t change. Other comments have been
addressed. Here is the V97 patch set which addressed Shveta's comments.

Besides, I'd like to clarify and discuss the behavior of standby_slot_names 
once.

As it stands in the patch, If the slots specified in standby_slot_names are
dropped or invalidated, the logical walsender will issue a WARNING and continue
to replicate the changes. Another option for this could be to have the
walsender pause until the slot in standby_slot_names is re-created or becomes
valid again. Does anyone else have an opinion on this matter ?

Best Regards,
Hou zj

v97-0002-Document-the-steps-to-check-if-the-standby-is-re.patch
Description:  v97-0002-Document-the-steps-to-check-if-the-standby-is-re.patch

v97-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch
Description:  v97-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch

RE: Synchronizing slots from primary to standby

2024-02-23 Thread Zhijie Hou (Fujitsu)

On Friday, February 23, 2024 5:07 PM Bertrand Drouvot 
 wrote:
> 
> On Fri, Feb 23, 2024 at 02:15:11PM +0530, shveta malik wrote:
> > On Fri, Feb 23, 2024 at 1:28 PM Bertrand Drouvot
> >  wrote:
> > >
> > > Hi,
> > >
> > > Because one could create say the "=" OPERATOR in their own schema,
> > > attach a function to it doing undesired stuff and change the
> > > search_path for the database the sync slot worker connects to.
> > >
> > > Then this new "=" operator would be used (instead of the
> > > pg_catalog.= one), triggering the "undesired" function as superuser.
> >
> > Thanks for the details. I understand it now.  We do not use '=' in our
> > main slots-fetch query but we do use '=' in remote-validation query.
> > See validate_remote_info().
> 
> Oh, right, I missed it during the review.
> 
> > Do you think instead of doing the above, we can override search-path
> > with empty string in the slot-sync case.
> > SImilar to logical apply worker and autovacuum worker case (see
> > InitializeLogRepWorker(), AutoVacWorkerMain()).
> 
> Yeah, we should definitively ensure that any operators being used in the query
> is coming from the pg_catalog schema (could be by setting the search path or
> using the up-thread proposal).
> 
> Setting the search path would prevent any risks in case the query is changed
> later on, so I'd vote for changing the search path in validate_remote_info() 
> and
> in synchronize_slots() to be on the safe side.

I think to set secure search path for remote connection, the standard approach
could be to extend the code in libpqrcv_connect[1], so that we don't need to 
schema
qualify all the operators in the queries.

And for local connection, I agree it's also needed to add a
SetConfigOption("search_path", "" call in the slotsync worker.

[1]
libpqrcv_connect
...
if (logical)
...
res = libpqrcv_PQexec(conn->streamConn,
  
ALWAYS_SECURE_SEARCH_PATH_SQL);

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-02-22 Thread Zhijie Hou (Fujitsu)

On Friday, February 23, 2024 10:18 AM Zhijie Hou (Fujitsu) 
 wrote:
> >
> > Hi,
> >
> > Since the slotsync worker patch has been committed, I rebased the
> > remaining patches.
> > And here is the V95 patch set.
> >
> > Also, I fixed a bug in the current 0001 patch where the member of the
> > standby slot names list pointed to the freed memory after calling
> ProcessConfigFile().
> > Now, we will obtain a new list when we call ProcessConfigFile(). The
> > optimization to only get the new list when the names actually change
> > has been removed. I think this change is acceptable because
> > ProcessConfigFile is not a frequent occurrence.
> >
> > Additionally, I reordered the tests in
> > 040_standby_failover_slots_sync.pl. Now the new test will be conducted
> > after the sync slot test to prevent the risk of the logical slot
> > occasionally not catching up to the latest catalog_xmin and, as a result, 
> > not
> being able to be synced immediately.
> 
> There is one unexpected change in the previous version, sorry for that.
> Here is the correct version.

I noticed one CFbot failure[1] which is because the tap-test doesn't wait for 
the
standby to catch up before promoting, thus the data inserted after promotion
could not be replicated to the subscriber. Add a wait_for_replay_catchup to fix 
it.

Apart from this, I also adjusted some variable names in the tap-test to be
consistent. And added back a mis-removed ProcessConfigFile call.

[1] https://cirrus-ci.com/task/6126787437002752?logs=check_world#L312

Best Regards,
Hou zj


v96-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch
Description:  v96-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch


v96-0002-Document-the-steps-to-check-if-the-standby-is-re.patch
Description:  v96-0002-Document-the-steps-to-check-if-the-standby-is-re.patch

RE: Synchronizing slots from primary to standby

2024-02-22 Thread Zhijie Hou (Fujitsu)

On Friday, February 23, 2024 10:02 AM Zhijie Hou (Fujitsu) 
 wrote:
> 
> Hi,
> 
> Since the slotsync worker patch has been committed, I rebased the remaining
> patches.
> And here is the V95 patch set.
> 
> Also, I fixed a bug in the current 0001 patch where the member of the standby
> slot names list pointed to the freed memory after calling ProcessConfigFile().
> Now, we will obtain a new list when we call ProcessConfigFile(). The
> optimization to only get the new list when the names actually change has been
> removed. I think this change is acceptable because ProcessConfigFile is not a
> frequent occurrence.
> 
> Additionally, I reordered the tests in 040_standby_failover_slots_sync.pl. Now
> the new test will be conducted after the sync slot test to prevent the risk 
> of the
> logical slot occasionally not catching up to the latest catalog_xmin and, as a
> result, not being able to be synced immediately.

There is one unexpected change in the previous version, sorry for that.
Here is the correct version.

Best Regards,
Hou zj


v95_2-0002-Document-the-steps-to-check-if-the-standby-is-r.patch
Description:  v95_2-0002-Document-the-steps-to-check-if-the-standby-is-r.patch


v95_2-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
Description:  v95_2-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch

RE: Synchronizing slots from primary to standby

2024-02-22 Thread Zhijie Hou (Fujitsu)

Hi,

Since the slotsync worker patch has been committed, I rebased the remaining 
patches.
And here is the V95 patch set.

Also, I fixed a bug in the current 0001 patch where the member of the standby
slot names list pointed to the freed memory after calling ProcessConfigFile().
Now, we will obtain a new list when we call ProcessConfigFile(). The
optimization to only get the new list when the names actually change has been
removed. I think this change is acceptable because ProcessConfigFile is not a
frequent occurrence.

Additionally, I reordered the tests in 040_standby_failover_slots_sync.pl. Now 
the new test
will be conducted after the sync slot test to prevent the risk of the logical
slot occasionally not catching up to the latest catalog_xmin and, as a result,
not being able to be synced immediately.

Best Regards,
Hou zj


v95-0002-Document-the-steps-to-check-if-the-standby-is-r.patch
Description:  v95-0002-Document-the-steps-to-check-if-the-standby-is-r.patch


v95-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
Description:  v95-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch

RE: Synchronizing slots from primary to standby

2024-02-22 Thread Zhijie Hou (Fujitsu)

On Thursday, February 22, 2024 8:41 PM Amit Kapila  
wrote:
> 
> On Thu, Feb 22, 2024 at 5:23 PM Amit Kapila 
> wrote:
> >
> > On Thu, Feb 22, 2024 at 4:35 PM Bertrand Drouvot
> >  wrote:
> > >
> > > On Thu, Feb 22, 2024 at 04:01:34PM +0530, shveta malik wrote:
> > > > On Thu, Feb 22, 2024 at 3:44 PM Bertrand Drouvot
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Some random comments about v92_001 (Sorry if it has already been
> > > > > discussed
> > > > > up-thread):
> > > >
> > > > Thanks for the feedback. The patch is pushed 15 minutes back.
> > >
> > > Yeah, saw that after I send the comments ;-)
> > >
> >
> > There is a BF failure. See
> >
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-0
> 2-22%2010%3A13%3A03.
> >
> > The initial analysis suggests that for some reason, the primary went
> > down after the slot sync worker was invoked the first time. See the
> > below in the primary's LOG:
> >
> 
> The reason is that the test failed waiting on below LOG:
> 
> ### Reloading node "standby1"
> # Running: pg_ctl -D
> /home/ec2-user/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_
> 040_standby_failover_slots_sync_standby1_data/pgdata
> reload
> server signaled
> timed out waiting for match: (?^:LOG:  slot sync worker started) at
> t/040_standby_failover_slots_sync.pl line 376.
> 
> Now, on standby, we see a LOG like 2024-02-22 10:57:35.432 UTC [2721638:1]
> LOG: 0: slot sync worker started. Even then the test failed and the 
> reason is
> that it has an extra  before the actual message which is due to
> log_error_verbosity = verbose in config. I think here the test's log matching
> code needs to have a more robust log line matching code.

Agreed. Here is a small patch to change the msg in wait_for_log so that it only
search the message part.

Best Regards,
Hou zj


0001-Make-recovery-test-pass-with-log_error_verbosity-ver.patch
Description:  0001-Make-recovery-test-pass-with-log_error_verbosity-ver.patch

RE: Synchronizing slots from primary to standby

2024-02-18 Thread Zhijie Hou (Fujitsu)

On Monday, February 19, 2024 11:39 AM shveta malik  
wrote:
> 
> On Sun, Feb 18, 2024 at 7:40 PM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Friday, February 16, 2024 6:41 PM shveta malik 
> wrote:
> > Thanks for the patch. Here are few comments:
> 
> Thanks for the comments.
> 
> >
> > 2.
> >
> > +static bool
> > +validate_remote_info(WalReceiverConn *wrconn, int elevel)
> > ...
> > +
> > +   return (!remote_in_recovery && primary_slot_valid);
> >
> > The primary_slot_valid could be uninitialized in this function.
> 
> return (!remote_in_recovery && primary_slot_valid);
> 
> Here if remote_in_recovery is true, it will not even read primary_slot_valid. 
> It
> will read primary_slot_valid only if remote_in_recovery is false and in such a
> case primary_slot_valid will always be initialized in the else block above, 
> let me
> know if you still feel we shall initialize this to some default?

I understand that it will not be used, but some complier could report WARNING
for the un-initialized variable. The cfbot[1] seems complain about this as well.

[1] https://cirrus-ci.com/task/5416851522453504

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-02-18 Thread Zhijie Hou (Fujitsu)

On Friday, February 16, 2024 6:41 PM shveta malik  
wrote:
> 
> On Thu, Feb 15, 2024 at 10:48 PM Bertrand Drouvot
>  wrote:
> >
> > Looking at v88_0001, random comments:
> 
> Thanks for the feedback.
> 
> >
> > 1 ===
> >
> > Commit message "Be enabling slot synchronization"
> >
> > Typo? s:Be/By
> 
> Modified.
> 
> > 2 ===
> >
> > +It enables a physical standby to synchronize logical failover slots
> > +from the primary server so that logical subscribers are not blocked
> > +after failover.
> >
> > Not sure "not blocked" is the right wording.
> > "can be resumed from the new primary" maybe? (was discussed in [1])
> 
> Modified.
> 
> > 3 ===
> >
> > +#define SlotSyncWorkerAllowed()\
> > +   (sync_replication_slots && pmState == PM_HOT_STANDBY && \
> > +SlotSyncWorkerCanRestart())
> >
> > Maybe add a comment above the macro explaining the logic?
> 
> Done.
> 
> > 4 ===
> >
> > +#include "replication/walreceiver.h"
> >  #include "replication/slotsync.h"
> >
> > should be reverse order?
> 
> Removed walreceiver.h inclusion as it was not needed.
> 
> > 5 ===
> >
> > +   if (SlotSyncWorker->syncing)
> > {
> > -   SpinLockRelease(>mutex);
> > +   SpinLockRelease(>mutex);
> > ereport(ERROR,
> >
> errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > errmsg("cannot synchronize replication
> slots concurrently"));
> > }
> >
> > worth to add a test in 040_standby_failover_slots_sync.pl for it?
> 
> It will be very difficult to stabilize this test as we have to make sure that 
> the
> concurrent users (SQL function(s) and/or worker(s)) are in that target 
> function at
> the same time to hit it.
> 
> >
> > 6 ===
> >
> > +static void
> > +slotsync_reread_config(bool restart)
> > +{
> >
> > worth to add test(s) in 040_standby_failover_slots_sync.pl for it?
> 
> Added test.
> 
> Please find v89  patch set. The other changes are:

Thanks for the patch. Here are few comments:

1.

+static char *
+get_dbname_from_conninfo(const char *conninfo)
+{
+   static char *dbname;
+
+   if (dbname)
+   return dbname;
+   else
+   dbname = walrcv_get_dbname_from_conninfo(conninfo);
+
+   return dbname;
+}


I think it's not necessary to have a static variable here, because the guc
check doesn't seem performance sensitive. Additionaly, it does not work well
with slotsync SQL functions, because the dbname will be allocated in temp
memory context when reaching here via SQL function, so it's not safe to access
this static variable in next function call.

2.

+static bool
+validate_remote_info(WalReceiverConn *wrconn, int elevel)
...
+
+   return (!remote_in_recovery && primary_slot_valid);

The primary_slot_valid could be uninitialized in this function.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-02-16 Thread Zhijie Hou (Fujitsu)

On Friday, February 16, 2024 3:43 PM Amit Kapila  
wrote:
> 
> On Fri, Feb 16, 2024 at 11:43 AM Amit Kapila  wrote:
> >
> > Thanks for noticing this. I have pushed all your debug patches. Let's
> > hope if there is a BF failure next time, we can gather enough
> > information to know the reason of the same.
> >
> 
> There is a new BF failure [1] after adding these LOGs and I think I know what 
> is
> going wrong. First, let's look at standby LOGs:
> 
> 2024-02-16 06:18:18.442 UTC [241414][client backend][2/14:0] DEBUG:
> segno: 4 of purposed restart_lsn for the synced slot, oldest_segno: 4 
> available
> 2024-02-16 06:18:18.443 UTC [241414][client backend][2/14:0] DEBUG:
> xmin required by slots: data 0, catalog 741
> 2024-02-16 06:18:18.443 UTC [241414][client backend][2/14:0] LOG:mote  could
> not sync slot information as reslot precedes local slot: remote slot 
> "lsub1_slot":
> LSN (0/4000168), catalog xmin (739) local slot: LSN (0/4000168), catalog xmin
> (741)
> 
> So, from the above LOG, it is clear that the remote slot's catalog xmin (739)
> precedes the local catalog xmin (741) which makes the sync on standby to not
> complete.
> 
> Next, let's look at the LOG from the primary during the nearby time:
> 2024-02-16 06:18:11.354 UTC [238037][autovacuum worker][5/17:0] DEBUG:
>  analyzing "pg_catalog.pg_depend"
> 2024-02-16 06:18:11.360 UTC [238037][autovacuum worker][5/17:0] DEBUG:
>  "pg_depend": scanned 13 of 13 pages, containing 1709 live rows and 0 dead
> rows; 1709 rows in sample, 1709 estimated total rows ...
> 2024-02-16 06:18:11.372 UTC [238037][autovacuum worker][5/0:0] DEBUG:
> Autovacuum VacuumUpdateCosts(db=1, rel=14050, dobalance=yes,
> cost_limit=200, cost_delay=2 active=yes failsafe=no)
> 2024-02-16 06:18:11.372 UTC [238037][autovacuum worker][5/19:0] DEBUG:
>  analyzing "information_schema.sql_features"
> 2024-02-16 06:18:11.377 UTC [238037][autovacuum worker][5/19:0] DEBUG:
>  "sql_features": scanned 8 of 8 pages, containing 756 live rows and 0 dead 
> rows;
> 756 rows in sample, 756 estimated total rows
> 
> It shows us that autovacuum worker has analyzed catalog table and for updating
> its statistics in pg_statistic table, it would have acquired a new 
> transaction id. Now,
> after the slot creation, a new transaction id that has updated the catalog is
> generated on primary and would have been replication to standby. Due to this
> catalog_xmin of primary's slot would precede standby's catalog_xmin and we see
> this failure.
> 
> As per this theory, we should disable autovacuum on primary to avoid updates 
> to
> catalog_xmin values.
> 

Agreed. Here is the patch to disable autovacuum in the test.

Best Regards,
Hou zj


0001-Disable-autovacuum-on-primary-to-stabilize-the-040_s.patch
Description:  0001-Disable-autovacuum-on-primary-to-stabilize-the-040_s.patch

RE: Synchronizing slots from primary to standby

2024-02-15 Thread Zhijie Hou (Fujitsu)

On Friday, February 16, 2024 8:33 AM Zhijie Hou (Fujitsu) 
 wrote:
> On Thursday, February 15, 2024 8:29 PM Amit Kapila
>  wrote:
> 
> >
> > On Thu, Feb 15, 2024 at 5:46 PM Bertrand Drouvot
> >  wrote:
> > >
> > > On Thu, Feb 15, 2024 at 05:00:18PM +0530, Amit Kapila wrote:
> > > > On Thu, Feb 15, 2024 at 4:29 PM Zhijie Hou (Fujitsu)
> > > >  wrote:
> > > > > Attach the v2 patch here.
> > > > >
> > > > > Apart from the new log message. I think we can add one more
> > > > > debug message in reserve_wal_for_local_slot, this could be
> > > > > useful to analyze the
> > failure.
> > > >
> > > > Yeah, that can also be helpful, but the added message looks naive to me.
> > > > + elog(DEBUG1, "segno: %ld oldest_segno: %ld", oldest_segno,
> > > > + segno);
> > > >
> > > > Instead of the above, how about something like: "segno: %ld of
> > > > purposed restart_lsn for the synced slot, oldest_segno: %ld
> > > > available"?
> > >
> > > Looks good to me. I'm not sure if it would make more sense to elog
> > > only if segno < oldest_segno means just before the
> > XLogSegNoOffsetToRecPtr() call?
> > >
> > > But I'm fine with the proposed location too.
> > >
> >
> > I am also fine either way but the current location gives required
> > information in more number of cases and could be helpful in debugging this
> new facility.
> >
> > > >
> > > > > And we
> > > > > can also enable the DEBUG log in the 040 tap-test, I see we have
> > > > > similar setting in 010_logical_decoding_timline and logging
> > > > > debug1 message doesn't increase noticable time on my machine.
> > > > > These are done
> > in 0002.
> > > > >
> > > >
> > > > I haven't tested it but I think this can help in debugging BF
> > > > failures, if any. I am not sure if to keep it always like that but
> > > > till the time these tests are stabilized, this sounds like a good
> > > > idea. So, how, about just making test changes as a separate patch
> > > > so that later if required we can revert/remove it easily?
> > > > Bertrand, do you have any thoughts on this?
> > >
> > > +1 on having DEBUG log in the 040 tap-test until it's stabilized (I
> > > +think we
> > > took the same approach for 035_standby_logical_decoding.pl IIRC) and
> > > then revert it back.
> > >
> >
> > Good to know!
> >
> > > Also I was thinking: what about adding an output to
> > pg_sync_replication_slots()?
> > > The output could be the number of sync slots that have been created
> > > and are not considered as sync-ready during the execution.
> > >
> >
> > Yeah, we can consider outputting some information via this function
> > like how many slots are synced and persisted but not sure what would
> > be appropriate here. Because one can anyway find that or more
> > information by querying pg_replication_slots. I think we can keep
> > discussing what makes more sense as a return value but let's commit
> > the debug/log patches as they will be helpful to analyze BF failures or any
> other issues reported.
> 
> Agreed. Here is new patch set as suggested. I used debug2 in the 040 as it 
> could
> provide more information about communication between primary and standby.
> This also doesn't increase noticeable testing time on my machine for debug
> build.

Sorry, there was a miss in the DEBUG message, I should have used
UINT64_FORMAT for XLogSegNo(uint64) instead of %ld. Here is a small patch
to fix this.

Best Regards,
Hou zj


0001-fix-the-format-for-uint64.patch
Description: 0001-fix-the-format-for-uint64.patch

RE: Synchronizing slots from primary to standby

2024-02-15 Thread Zhijie Hou (Fujitsu)

On Thursday, February 15, 2024 8:29 PM Amit Kapila  
wrote:

> 
> On Thu, Feb 15, 2024 at 5:46 PM Bertrand Drouvot
>  wrote:
> >
> > On Thu, Feb 15, 2024 at 05:00:18PM +0530, Amit Kapila wrote:
> > > On Thu, Feb 15, 2024 at 4:29 PM Zhijie Hou (Fujitsu)
> > >  wrote:
> > > > Attach the v2 patch here.
> > > >
> > > > Apart from the new log message. I think we can add one more debug
> > > > message in reserve_wal_for_local_slot, this could be useful to analyze 
> > > > the
> failure.
> > >
> > > Yeah, that can also be helpful, but the added message looks naive to me.
> > > + elog(DEBUG1, "segno: %ld oldest_segno: %ld", oldest_segno, segno);
> > >
> > > Instead of the above, how about something like: "segno: %ld of
> > > purposed restart_lsn for the synced slot, oldest_segno: %ld
> > > available"?
> >
> > Looks good to me. I'm not sure if it would make more sense to elog
> > only if segno < oldest_segno means just before the
> XLogSegNoOffsetToRecPtr() call?
> >
> > But I'm fine with the proposed location too.
> >
> 
> I am also fine either way but the current location gives required information 
> in
> more number of cases and could be helpful in debugging this new facility.
> 
> > >
> > > > And we
> > > > can also enable the DEBUG log in the 040 tap-test, I see we have
> > > > similar setting in 010_logical_decoding_timline and logging debug1
> > > > message doesn't increase noticable time on my machine. These are done
> in 0002.
> > > >
> > >
> > > I haven't tested it but I think this can help in debugging BF
> > > failures, if any. I am not sure if to keep it always like that but
> > > till the time these tests are stabilized, this sounds like a good
> > > idea. So, how, about just making test changes as a separate patch so
> > > that later if required we can revert/remove it easily? Bertrand, do
> > > you have any thoughts on this?
> >
> > +1 on having DEBUG log in the 040 tap-test until it's stabilized (I
> > +think we
> > took the same approach for 035_standby_logical_decoding.pl IIRC) and
> > then revert it back.
> >
> 
> Good to know!
> 
> > Also I was thinking: what about adding an output to
> pg_sync_replication_slots()?
> > The output could be the number of sync slots that have been created
> > and are not considered as sync-ready during the execution.
> >
> 
> Yeah, we can consider outputting some information via this function like how
> many slots are synced and persisted but not sure what would be appropriate
> here. Because one can anyway find that or more information by querying
> pg_replication_slots. I think we can keep discussing what makes more sense as 
> a
> return value but let's commit the debug/log patches as they will be helpful to
> analyze BF failures or any other issues reported.

Agreed. Here is new patch set as suggested. I used debug2 in the 040 as it
could provide more information about communication between primary and standby.
This also doesn't increase noticeable testing time on my machine for debug
build.

Best Regards,
Hou zj


v3-0002-Add-a-DEBUG1-message-for-slot-sync.patch
Description: v3-0002-Add-a-DEBUG1-message-for-slot-sync.patch


v3-0001-Add-a-log-if-remote-slot-didn-t-catch-up-to-local.patch
Description:  v3-0001-Add-a-log-if-remote-slot-didn-t-catch-up-to-local.patch


v3-0003-change-the-log-level-of-sync-slot-test-to-debug2.patch
Description:  v3-0003-change-the-log-level-of-sync-slot-test-to-debug2.patch

RE: Synchronizing slots from primary to standby

2024-02-15 Thread Zhijie Hou (Fujitsu)

On Thursday, February 15, 2024 5:20 PM Amit Kapila  
wrote:
> On Thu, Feb 15, 2024 at 9:05 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Thursday, February 15, 2024 10:49 AM Amit Kapila
>  wrote:
> > >
> > > On Wed, Feb 14, 2024 at 7:26 PM Bertrand Drouvot
> > >
> > > Right, we can do that or probably this test would have made more
> > > sense with a worker patch where we could wait for the slot to be synced.
> > > Anyway, let's try to recreate the slot/subscription idea. BTW, do
> > > you think that adding a LOG when we are not able to sync will help
> > > in debugging such problems? I think eventually we can change it to
> > > DEBUG1 but for now, it can help with stabilizing BF and or some other
> reported issues.
> >
> > Here is the patch that attempts the re-create sub idea.
> >
> 
> Pushed this.
> 
> >
>  I also think that a LOG/DEBUG
> > would be useful for such analysis, so the 0002 is to add such a log.
> >
> 
> I feel such a LOG would be useful.
> 
> + ereport(LOG,
> + errmsg("waiting for remote slot \"%s\" LSN (%X/%X) and catalog xmin"
> +" (%u) to pass local slot LSN (%X/%X) and catalog xmin (%u)",
> 
> I think waiting is a bit misleading here, how about something like:
> "could not sync slot information as remote slot precedes local slot:
> remote slot \"%s\": LSN (%X/%X), catalog xmin (%u) local slot: LSN (%X/%X),
> catalog xmin (%u)"

Changed.

Attach the v2 patch here. 

Apart from the new log message. I think we can add one more debug message in
reserve_wal_for_local_slot, this could be useful to analyze the failure. And we
can also enable the DEBUG log in the 040 tap-test, I see we have similar
setting in 010_logical_decoding_timline and logging debug1 message doesn't
increase noticable time on my machine. These are done in 0002.

Best Regards,
Hou zj


v2-0001-Add-a-log-if-remote-slot-didn-t-catch-up-to-local.patch
Description:  v2-0001-Add-a-log-if-remote-slot-didn-t-catch-up-to-local.patch


v2-0002-Add-a-debug-message-and-enable-DEBUG-log-in-slots.patch
Description:  v2-0002-Add-a-debug-message-and-enable-DEBUG-log-in-slots.patch

RE: Synchronizing slots from primary to standby

2024-02-14 Thread Zhijie Hou (Fujitsu)

On Thursday, February 15, 2024 10:49 AM Amit Kapila  
wrote:
> 
> On Wed, Feb 14, 2024 at 7:26 PM Bertrand Drouvot
>  wrote:
> >
> > On Wed, Feb 14, 2024 at 10:40:11AM +, Zhijie Hou (Fujitsu) wrote:
> > > On Wednesday, February 14, 2024 6:05 PM Amit Kapila
>  wrote:
> > > >
> > > > To ensure that restart_lsn has been moved to a recent position, we
> > > > need to log XLOG_RUNNING_XACTS and make sure the same is processed
> > > > as well by walsender. The attached patch does the required change.
> > > >
> > > > Hou-San can reproduce this problem by adding additional
> > > > checkpoints in the test and after applying the attached it fixes
> > > > the problem. Now, this patch is mostly based on the theory we
> > > > formed based on LOGs on BF and a reproducer by Hou-San, so still,
> > > > there is some chance that this doesn't fix the BF failures in which 
> > > > case I'll
> again look into those.
> > >
> > > I have verified that the patch can fix the issue on my machine(after
> > > adding few more checkpoints before slot invalidation test.) I also
> > > added one more check in the test to confirm the synced slot is not temp 
> > > slot.
> Here is the v2 patch.
> >
> > Thanks!
> >
> > +# To ensure that restart_lsn has moved to a recent WAL position, we
> > +need # to log XLOG_RUNNING_XACTS and make sure the same is processed
> > +as well $primary->psql('postgres', "CHECKPOINT");
> >
> > Instead of "CHECKPOINT" wouldn't a less heavy "SELECT
> pg_log_standby_snapshot();"
> > be enough?
> >
> 
> Yeah, that would be enough. However, the test still fails randomly due to the
> same reason. See [1]. So, as mentioned yesterday, now, I feel it is better to
> recreate the subscription/slot so that it can get the latest restart_lsn 
> rather than
> relying on pg_log_standby_snapshot() to move it.
> 
> > Not a big deal but maybe we could do the change while modifying
> > 040_standby_failover_slots_sync.pl in the next patch "Add a new slotsync
> worker".
> >
> 
> Right, we can do that or probably this test would have made more sense with a
> worker patch where we could wait for the slot to be synced.
> Anyway, let's try to recreate the slot/subscription idea. BTW, do you think 
> that
> adding a LOG when we are not able to sync will help in debugging such
> problems? I think eventually we can change it to DEBUG1 but for now, it can 
> help
> with stabilizing BF and or some other reported issues.

Here is the patch that attempts the re-create sub idea. I also think that a 
LOG/DEBUG
would be useful for such analysis, so the 0002 is to add such a log.

Best Regards,
Hou zj


0002-Add-a-log-if-remote-slot-didn-t-catch-up-to-locally-.patch
Description:  0002-Add-a-log-if-remote-slot-didn-t-catch-up-to-locally-.patch


0001-fix-BF-error-take-2.patch
Description: 0001-fix-BF-error-take-2.patch

RE: Synchronizing slots from primary to standby

2024-02-14 Thread Zhijie Hou (Fujitsu)

On Wednesday, February 14, 2024 6:05 PM Amit Kapila  
wrote:
> 
> On Wed, Feb 14, 2024 at 2:14 PM Amit Kapila  wrote:
> >
> > On Wed, Feb 14, 2024 at 9:34 AM Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > Here is V87 patch that adds test for the suggested cases.
> > >
> >
> > I have pushed this patch and it leads to a BF failure:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris
> > t=2024-02-14%2004%3A43%3A37
> >
> > The test failures are:
> > #   Failed test 'logical decoding is not allowed on synced slot'
> > #   at
> /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_f
> ailover_slots_sync.pl
> > line 272.
> > #   Failed test 'synced slot on standby cannot be altered'
> > #   at
> /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_f
> ailover_slots_sync.pl
> > line 281.
> > #   Failed test 'synced slot on standby cannot be dropped'
> > #   at
> /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_f
> ailover_slots_sync.pl
> > line 287.
> >
> > The reason is that in LOGs, we see a different ERROR message than what
> > is expected:
> > 2024-02-14 04:52:32.916 UTC [1767765][client backend][3/4:0] ERROR:
> > replication slot "lsub1_slot" is active for PID 1760871
> >
> > Now, we see the slot still active because a test before these tests (#
> > Test that if the synchronized slot is invalidated while the remote
> > slot is still valid, ) is not able to successfully persist the
> > slot and the synced temporary slot remains active.
> >
> > The reason is clear by referring to below standby LOGS:
> >
> > LOG:  connection authorized: user=bf database=postgres
> > application_name=040_standby_failover_slots_sync.pl
> > LOG:  statement: SELECT pg_sync_replication_slots();
> > LOG:  dropped replication slot "lsub1_slot" of dbid 5
> > STATEMENT:  SELECT pg_sync_replication_slots(); ...
> > SELECT conflict_reason IS NULL AND synced FROM pg_replication_slots
> > WHERE slot_name = 'lsub1_slot';
> >
> > In the above LOGs, we should ideally see: "newly created slot
> > "lsub1_slot" is sync-ready now" after the "LOG:  dropped replication
> > slot "lsub1_slot" of dbid 5" but lack of that means the test didn't
> > accomplish what it was supposed to. Ideally, the same test should have
> > failed but the pass criteria for the test failed to check whether the
> > slot is persisted or not.
> >
> > The probable reason for failure is that remote_slot's restart_lsn lags
> > behind the oldest WAL segment on standby. Now, in the test, we do
> > ensure that the publisher and subscriber are caught up by following
> > steps:
> > # Enable the subscription to let it catch up to the latest wal
> > position $subscriber1->safe_psql('postgres',
> > "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
> >
> > $primary->wait_for_catchup('regress_mysub1');
> >
> > However, this doesn't guarantee that restart_lsn is moved to a
> > position new enough that standby has a WAL corresponding to it.
> >
> 
> To ensure that restart_lsn has been moved to a recent position, we need to log
> XLOG_RUNNING_XACTS and make sure the same is processed as well by
> walsender. The attached patch does the required change.
> 
> Hou-San can reproduce this problem by adding additional checkpoints in the
> test and after applying the attached it fixes the problem. Now, this patch is
> mostly based on the theory we formed based on LOGs on BF and a reproducer
> by Hou-San, so still, there is some chance that this doesn't fix the BF 
> failures in
> which case I'll again look into those.

I have verified that the patch can fix the issue on my machine(after adding few
more checkpoints before slot invalidation test.) I also added one more check in
the test to confirm the synced slot is not temp slot. Here is the v2 patch.

Best Regards,
Hou zj


v2-0001-fix-040-standby-failover-slots-sync.patch
Description: v2-0001-fix-040-standby-failover-slots-sync.patch

RE: Synchronizing slots from primary to standby

2024-02-13 Thread Zhijie Hou (Fujitsu)

On Wednesday, February 14, 2024 10:40 AM Amit Kapila  
wrote:
> 
> On Tue, Feb 13, 2024 at 9:25 PM Bertrand Drouvot
>  wrote:
> >
> > On Tue, Feb 13, 2024 at 05:20:35PM +0530, Amit Kapila wrote:
> > > On Tue, Feb 13, 2024 at 4:59 PM Bertrand Drouvot
> > >  wrote:
> > > > - 84% of the slotsync.c code is covered, the parts that are not
> > > > are mainly related to "errors".
> > > >
> > > > Worth to try to extend the coverage? (I've in mind 731, 739, 766,
> > > > 778, 786, 796,
> > > > 808)
> > > >
> > >
> > > All these additional line numbers mentioned by you are ERROR paths.
> > > I think if we want we can easily cover most of those but I am not
> > > sure if there is a benefit to cover each error path.
> >
> > Yeah, I think 731, 739 and one among the remaining ones mentioned
> > up-thread should be enough, thoughts?
> >
> 
> I don't know how beneficial those selective ones would be but if I have to 
> pick a
> few among those then I would pick the ones at 731 and 808. The reason is that
> 731 is related to cascading standby restriction which we may uplift in the 
> future
> and at that time one needs to be careful about the behavior, for 808 as well, 
> in
> the future, we may have a separate GUC for slot_db_name. These may not be
> good enough reasons as to why we add tests for these ERROR cases but not for
> others, however, if we have to randomly pick a few among all ERROR paths,
> these seem better to me than others.

Here is V87 patch that adds test for the suggested cases.

Best Regards,
Hou zj


v87-0001-Add-a-slot-synchronization-function.patch
Description: v87-0001-Add-a-slot-synchronization-function.patch

RE: Synchronizing slots from primary to standby

2024-02-13 Thread Zhijie Hou (Fujitsu)

On Tuesday, February 13, 2024 7:30 PM Bertrand Drouvot 
 wrote:
> 
> On Tue, Feb 13, 2024 at 04:08:23AM +, Zhijie Hou (Fujitsu) wrote:
> > On Tuesday, February 13, 2024 9:16 AM Zhijie Hou (Fujitsu)
>  wrote:
> > >
> > > Here is the new version patch which addressed above and most of
> > > Bertrand's comments.
> > >
> > > TODO: trying to add one test for the case the slot is valid on
> > > primary while the synced slots is invalidated on the standby.
> >
> > Here is the V85_2 patch set that added the test and fixed one typo,
> > there are no other code changes.
> 
> Thanks!
> 
> Out of curiosity I ran a code coverage and the result for slotsync.c can be 
> found
> in [1].
> 
> It appears that:
> 
> - only one function is not covered (slotsync_failure_callback()).

Thanks for the test ! I think slotsync_failure_callback can be covered easier 
in the
next slotsync worker patch on worker exit, I will post that after rebasing.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-02-13 Thread Zhijie Hou (Fujitsu)

On Tuesday, February 13, 2024 2:51 PM Amit Kapila  
wrote:
> 
> On Tue, Feb 13, 2024 at 9:38 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > Here is the V85_2 patch set that added the test and fixed one typo,
> > there are no other code changes.
> >
> 
> Few comments on the latest changes:

Thanks for the comments.

> ==
> 1.
> +# Confirm that the invalidated slot has been dropped.
> +$standby1->wait_for_log(qr/dropped replication slot "lsub1_slot" of
> +dbid 5/,  $log_offset);
> 
> Is it okay to hardcode dbid 5? I am a bit worried that it can lead to 
> instability in
> the test.
> 
> 2.
> +check_primary_info(WalReceiverConn *wrconn, int elevel) {
> ..
> + bool primary_info_valid;
> 
> I don't think for 0001, we need an elevel as an argument, so let's remove it.
> Additionally, can we change the variable name primary_info_valid to
> primary_slot_valid? Also, can we change the function name to
> validate_remote_info() as the remote can be both primary or standby?
> 
> 3.
> +SyncReplicationSlots(WalReceiverConn *wrconn) {
> +PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback,
> +PointerGetDatum(wrconn));  {  check_primary_info(wrconn, ERROR);
> +
> + synchronize_slots(wrconn);
> + }
> + PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback,
> PointerGetDatum(wrconn));
> +
> + walrcv_disconnect(wrconn);
> 
> It is better to disconnect in the caller where we have made the connection.

All above comments look good to me.
Here is the V86 patch that addressed above. This version also includes some
other minor changes:

1. Added few comments for the temporary slot creation and XLogGetOldestSegno.
2. Adjusted the doc for the SQL function.
3. Reordered two error messages in slot create function.
4. Fixed few typos.

Thanks Shveta for off-list discussions.

Best Regards,
Hou zj


v86-0001-Add-a-slot-synchronization-function.patch
Description: v86-0001-Add-a-slot-synchronization-function.patch

RE: Synchronizing slots from primary to standby

2024-02-12 Thread Zhijie Hou (Fujitsu)

On Tuesday, February 13, 2024 9:16 AM Zhijie Hou (Fujitsu) 
 wrote:
> 
> Here is the new version patch which addressed above and most of Bertrand's
> comments.
> 
> TODO: trying to add one test for the case the slot is valid on primary while 
> the
> synced slots is invalidated on the standby.

Here is the V85_2 patch set that added the test and fixed one typo,
there are no other code changes.

Best Regards,
Hou zj


v85_2-0001-Add-a-slot-synchronization-function.patch
Description: v85_2-0001-Add-a-slot-synchronization-function.patch

RE: Synchronizing slots from primary to standby

2024-02-12 Thread Zhijie Hou (Fujitsu)

On Monday, February 12, 2024 5:40 PM Amit Kapila  
wrote:
> 
> On Sun, Feb 11, 2024 at 6:53 PM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > Agreed. Here is the V84 patch which addressed this.
> >
> 
> Few comments:
> =
> 1. Isn't the new function (pg_sync_replication_slots()) allowed to sync the 
> slots
> from physical standby to another cascading standby?
> Won't it be better to simply disallow syncing slots on cascading standby to 
> keep
> it consistent with slotsync worker behavior?
> 
> 2.
> Previously, I commented to keep the declaration and definition of functions in
> the same order but I see that it still doesn't match in the below case:
> 
> @@ -44,6 +46,7 @@ extern void WalSndWakeup(bool physical, bool logical);
> extern void WalSndInitStopping(void); extern void WalSndWaitStopping(void);
> extern void HandleWalSndInitStopping(void);
> +extern XLogRecPtr GetStandbyFlushRecPtr(TimeLineID *tli);
> extern void WalSndRqstFileReload(void);
> 
> I think we can keep the new declaration just before WalSndSignals().
> That would be more consistent.
> 
> 3.
> +  
> +  True if this is a logical slot that was synced from a primary server.
> +  
> +  
> +   On a hot standby, the slots with the synced column marked as true can
> +   neither be used for logical decoding nor dropped by the user.
> + The value
> 
> I don't think we need a separate para here.
> 
> Apart from this, I have made several cosmetic changes in the attached.
> Please include these in the next version unless you see any problems.

Thanks for the comments, I have addressed them.

Here is the new version patch which addressed above and
most of Bertrand's comments.

TODO: trying to add one test for the case the slot is valid on
primary while the synced slots is invalidated on the standby.

Best Regards,
Houzj


v85-0001-Add-a-slot-synchronization-function.patch
Description: v85-0001-Add-a-slot-synchronization-function.patch

RE: Synchronizing slots from primary to standby

2024-02-12 Thread Zhijie Hou (Fujitsu)

On Monday, February 12, 2024 6:03 PM Bertrand Drouvot 
 wrote:
> 
> Hi,
> 
> On Sun, Feb 11, 2024 at 01:23:19PM +0000, Zhijie Hou (Fujitsu) wrote:
> > On Saturday, February 10, 2024 9:10 PM Amit Kapila
>  wrote:
> > >
> > > On Sat, Feb 10, 2024 at 5:31 PM Masahiko Sawada
> > > 
> > > wrote:
> > > >
> > > > On Fri, Feb 9, 2024 at 4:08 PM Zhijie Hou (Fujitsu)
> > > >  wrote:
> > > >
> > > > > Another alternative is to register the callback when calling
> > > > > slotsync functions and unregister it after the function call.
> > > > > And register the callback in
> > > > > slotsyncworkmain() for the slotsync worker patch, although this
> > > > > may adds a few more codes.
> > > >
> > > > Another idea is that SyncReplicationSlots() calls
> > > > synchronize_slots() in PG_ENSURE_ERROR_CLEANUP() block instead of
> > > > PG_TRY(), to make sure to clear the flag in case of ERROR or
> > > > FATAL. And the slotsync worker uses the before_shmem_callback to clear
> the flag.
> > > >
> > >
> > > +1. This sounds like a better way to clear the flag.
> >
> > Agreed. Here is the V84 patch which addressed this.
> >
> > Apart from above, I removed the txn start/end codes from 0001 as they
> > are used in the slotsync worker patch. And I also ran pgindent and
> > pgperltidy for the patch.
> >
> 
> Thanks!
> 
> A few random comments:

Thanks for the comments.

> 
> 001 ===
> 
> "
>  For
>  the synchronization to work, it is mandatory to have a physical  replication 
> slot
> between the primary and the standby, "
> 
> Maybe mention "primary_slot_name" here?

Added.

> 
> 002 ===
> 
> +   
> +Synchronize the logical failover slots from the primary server to the
> standby server.
> 
> should we say "logical failover replication slots" instead?

Changed.

> 
> 003 ===
> 
> +  If, after executing the function,
> +  
> +  hot_standby_feedback is disabled
> on
> +  the standby or the physical slot configured in
> +  
> +  primary_slot_name is
> +  removed,
> 
> I think another option that could lead to slot invalidation is if 
> primary_slot_name
> is NULL or miss-configured. Indeed hot_standby_feedback would be working
> (for the catalog_xmin) but only as long as the standby is up and running.

I didn't change this based on the discussion.

> 
> 004 ===
> 
> + on the standby. For the synchronization to work, it is mandatory to
> + have a physical replication slot between the primary and the
> + standby,
> 
> should we mention primary_slot_name here?

Added.

> 
> 005 ===
> 
> + To resume logical replication after failover from the synced logical
> + slots, the subscription's 'conninfo' must be altered
> 
> Only in a pub/sub context but not for other ways of using the logical 
> replication
> slot(s).

I am not very sure about this, because the 3-rd part logicalrep can also
have their own replication origin, so I didn't change for now, but will think 
over
this.

> 
> 006 ===
> 
> +   neither be used for logical decoding nor dropped by the user
> 
> what about "nor dropped manually"?

Changed.

> 
> 007 ===
> 
> +typedef struct SlotSyncCtxStruct
> +{
> 
> Should we remove "Struct" from the struct name?

The name was named based on some other comment to be consistent
with LogicalReplCtxStruct, so I didn't change this.
If other also prefer without struct, we can change it later.

> 008 ===
> 
> +   ereport(LOG,
> +   errmsg("dropped replication slot
> \"%s\" of dbid %d",
> +
> NameStr(local_slot->data.name),
> +
> + local_slot->data.database));
> 
> We emit a message when an "invalidated" slot is dropped but not when we
> create a slot. Shouldn't we emit a message when we create a synced slot on the
> standby?
> 
> I think that could be confusing to see "a drop" message not followed by "a
> create"
> one when it's expected (slot valid on the primary for example).

I think we will report "sync-ready" for newly synced slot, for newly
created temporary slots, I am not sure do we need to report log to them,
because they will be dropped on promotion anyway. But if others also prefer to 
log,
I am fine with that.

> 
> 009 ===
> 
> Regarding 040_standby_failover_slots_sync.pl what about adding tests for?
> 
> - synced slot invalidation (and ensure it's recreated once
> pg_sync_replication_slots() is called and when the slot in primary is valid)

Will try this in next version.

> - cannot enable failover for a temporary replication slot

Added.

> - replication slots can only be synchronized from a standby server

Added.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-02-11 Thread Zhijie Hou (Fujitsu)

On Saturday, February 10, 2024 9:10 PM Amit Kapila  
wrote:
> 
> On Sat, Feb 10, 2024 at 5:31 PM Masahiko Sawada 
> wrote:
> >
> > On Fri, Feb 9, 2024 at 4:08 PM Zhijie Hou (Fujitsu)
> >  wrote:
> >
> > > Another alternative is to register the callback when calling
> > > slotsync functions and unregister it after the function call. And
> > > register the callback in
> > > slotsyncworkmain() for the slotsync worker patch, although this may
> > > adds a few more codes.
> >
> > Another idea is that SyncReplicationSlots() calls synchronize_slots()
> > in PG_ENSURE_ERROR_CLEANUP() block instead of PG_TRY(), to make sure
> > to clear the flag in case of ERROR or FATAL. And the slotsync worker
> > uses the before_shmem_callback to clear the flag.
> >
> 
> +1. This sounds like a better way to clear the flag.

Agreed. Here is the V84 patch which addressed this.

Apart from above, I removed the txn start/end codes from 0001 as they are used
in the slotsync worker patch. And I also ran pgindent and pgperltidy for the
patch.

Best Regards,
Hou zj



v84-0001-Add-a-slot-synchronization-function.patch
Description: v84-0001-Add-a-slot-synchronization-function.patch

RE: Synchronizing slots from primary to standby

2024-02-10 Thread Zhijie Hou (Fujitsu)

On Saturday, February 10, 2024 11:37 AM Zhijie Hou (Fujitsu) 
 wrote:
> 
> Attach the V83 patch which addressed Peter[1][2], Amit and Sawada-san's[3]
> comments. Only 0001 is sent in this version, we will send other patches after
> rebasing.
> 
> [1]
> https://www.postgresql.org/message-id/CAHut%2BPvW8s6AYD2UD0xadM%2B
> 3VqBkXP2LjD30LEGRkHUa-Szm%2BQ%40mail.gmail.com
> [2]
> https://www.postgresql.org/message-id/CAHut%2BPv88vp9mNxX37c_Bc5FDBs
> TS%2BdhV02Vgip9Wqwh7GBYSg%40mail.gmail.com
> [3]
> https://www.postgresql.org/message-id/CAD21AoDvyLu%3D2-mqfGn_T_3jUa
> mR34w%2BsxKvYnVzKqTCpyq_FQ%40mail.gmail.com

I noticed one cfbot failure that the slot is not synced when the standby is
lagging behind the subscriber. I have modified the test to disable the sub
before syncing to avoid this failure. Attach the V83_2 patch, no other code
changes are included in this version.

Best Regards,
Hou zj


v83_2-0001-Add-a-slot-synchronization-function.patch
Description: v83_2-0001-Add-a-slot-synchronization-function.patch

RE: Synchronizing slots from primary to standby

2024-02-09 Thread Zhijie Hou (Fujitsu)

On Friday, February 9, 2024 4:13 PM Peter Smith  wrote:
> 
> FYI -- I checked patch v81-0001 to find which of the #includes are strictly 
> needed.

Thanks!

> 
> 1.
> ...
> 
> Many of these #includes seem unnecessary. e.g. I was able to remove
> all those that are commented-out below, and the file still compiles OK
> with no warnings:

Removed.

> 
> 
> ==
> src/backend/replication/slot.c
> 
> 
> 
> 2.
>  #include "pgstat.h"
> +#include "replication/slotsync.h"
>  #include "replication/slot.h"
> +#include "replication/walsender.h"
>  #include "storage/fd.h"
> 
> The #include "replication/walsender.h" seems to be unnecessary.

Removed.

> 
> ==
> src/backend/replication/walsender.c
> 
> 3.
>  #include "replication/logical.h"
> +#include "replication/slotsync.h"
>  #include "replication/slot.h"
> 
> The #include "replication/slotsync.h" is needed, but only for Assert code:
> Assert(am_cascading_walsender || IsSyncingReplicationSlots());
> 
> So you could #ifdef around that #include if you wish to.

I am not sure if it's necessary and didn't find similar coding, so
didn't change.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-02-09 Thread Zhijie Hou (Fujitsu)

On Friday, February 9, 2024 12:27 PM Peter Smith  wrote:
> 
> Here are some review comments for patch v81-0001.

Thanks for the comments.

> . GENERAL - ReplicationSlotInvalidationCause enum.
> 
> I was thinking that the ReplicationSlotInvalidationCause should
> explicitly set RS_INVAL_NONE = 0 (it's zero anyway, but making it
> explicit with a comment / Must be zero. /. will stop it from being
> changed in the future).

I think the current code is better, so didn't change this.

> 5. synchronize_slots
> 
> + /*
> + * The primary_slot_name is not set yet or WALs not received yet.
> + * Synchronization is not possible if the walreceiver is not started.
> + */
> + latestWalEnd = GetWalRcvLatestWalEnd();
> + SpinLockAcquire(>mutex);
> + if ((WalRcv->slotname[0] == '\0') ||
> + XLogRecPtrIsInvalid(latestWalEnd))
> + {
> + SpinLockRelease(>mutex);
> + return;
> + }
> + SpinLockRelease(>mutex);
> 
> The comment talks about the GUC "primary_slot_name", but the code is
> checking the WalRcv's slotname. It may be the same, but the difference
> is confusing.

This part has been removed.

> 7.
> + /*
> + * Use shared lock to prevent a conflict with
> + * ReplicationSlotsDropDBSlots(), trying to drop the same slot during
> + * a drop-database operation.
> + */
> + LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
> +
> + synchronize_one_slot(remote_slot, remote_dbid);
> +
> + UnlockSharedObject(DatabaseRelationId, remote_dbid, 0,
> + AccessShareLock);
> 
> IMO remove the blank lines (e.g., you don't use this kind of formatting for 
> spin
> locks)

I am not sure if it will look better, so didn't change this.

Other comments look good.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-02-09 Thread Zhijie Hou (Fujitsu)

On Friday, February 9, 2024 6:45 PM Amit Kapila  wrote:
> 
> On Fri, Feb 9, 2024 at 10:00 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> 
> Few comments on 0001
> ===
> 1. Shouldn't pg_sync_replication_slots() check whether the user has 
> replication
> privilege?

Yes, added.

> 
> 2. The function declarations in slotsync.h don't seem to be in the same order 
> as
> they are defined in slotsync.c. For example, see ValidateSlotSyncParams(). The
> same is true for new functions exposed via walreceiver.h and walsender.h. 
> Please
> check the patch for other such inconsistencies.

I reordered the function declarations.

> 
> 3.
> +# Wait for the standby to finish sync
> +$standby1->wait_for_log(
> + qr/LOG: ( [A-Z0-9]+:)? newly created slot \"lsub1_slot\" is sync-ready
> +now/,  $offset);
> +
> +$standby1->wait_for_log(
> + qr/LOG: ( [A-Z0-9]+:)? newly created slot \"lsub2_slot\" is sync-ready
> +now/,  $offset);
> +
> +# Confirm that the logical failover slots are created on the standby
> +and are # flagged as 'synced'
> +is($standby1->safe_psql('postgres',
> + q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN
> ('lsub1_slot', 'lsub2_slot') AND synced;}),
> + "t",
> + 'logical slots have synced as true on standby');
> 
> Isn't the last test that queried pg_replication_slots sufficient? I think
> wait_for_log() would be required for slotsync worker or am I missing 
> something?

I think it's not needed in 0001, so removed.

> Apart from the above, I have modified a few comments in the attached.

Thanks, it looks good to me, so applied.

Attach the V83 patch which addressed Peter[1][2], Amit and Sawada-san's[3]
comments. Only 0001 is sent in this version, we will send other patches after
rebasing.

[1] 
https://www.postgresql.org/message-id/CAHut%2BPvW8s6AYD2UD0xadM%2B3VqBkXP2LjD30LEGRkHUa-Szm%2BQ%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAHut%2BPv88vp9mNxX37c_Bc5FDBsTS%2BdhV02Vgip9Wqwh7GBYSg%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/CAD21AoDvyLu%3D2-mqfGn_T_3jUamR34w%2BsxKvYnVzKqTCpyq_FQ%40mail.gmail.com

Best Regards,
Hou zj


v83-0001-Add-a-slot-synchronization-function.patch
Description: v83-0001-Add-a-slot-synchronization-function.patch

RE: Synchronizing slots from primary to standby

2024-02-08 Thread Zhijie Hou (Fujitsu)

On Friday, February 9, 2024 2:44 PM Masahiko Sawada  
wrote:
> 
> On Thu, Feb 8, 2024 at 8:01 PM shveta malik  wrote:
> >
> > On Thu, Feb 8, 2024 at 12:08 PM Peter Smith 
> wrote:
> > >
> > > Here are some review comments for patch v80_2-0001.
> >
> > Thanks for the feedback Peter. Addressed the comments in v81.
> > Attached patch001 for early feedback. Rest of the patches need
> > rebasing and thus will post those later.
> >
> > It also addresses comments by Amit in [1].
> 
> Thank you for updating the patch! Here are random comments:

Thanks for the comments!

> 
> ---
> +
> +/*
> + * Register the callback function to clean up the shared memory of
> slot
> + * synchronization.
> + */
> +SlotSyncInitialize();
> 
> I think it would have a wider impact than expected. IIUC this callback is 
> needed
> only for processes who calls synchronize_slots(). Why do we want all processes
> to register this callback?

I think the current style is similar to the ReplicationSlotInitialize() above 
it. For backend,
both of them can only be used when user calls slot SQL functions. So, I think 
it could be fine to
register it at the general place which can also avoid registering the same 
again for the later
slotsync worker patch.

Another alternative is to register the callback when calling slotsync functions
and unregister it after the function call. And register the callback in
slotsyncworkmain() for the slotsync worker patch, although this may adds a few
more codes.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-02-07 Thread Zhijie Hou (Fujitsu)

On Wednesday, February 7, 2024 9:13 AM Masahiko Sawada  
wrote:
> 
> On Tue, Feb 6, 2024 at 8:21 PM Amit Kapila  wrote:
> >
> > On Tue, Feb 6, 2024 at 3:33 PM Amit Kapila 
> wrote:
> > >
> > > On Tue, Feb 6, 2024 at 1:09 PM Masahiko Sawada
>  wrote:
> > > >
> > > > On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila 
> wrote:
> > > > >
> > > > > I think users can refer to LOGs to see if it has changed since
> > > > > the first time it was configured. I tried by existing parameter
> > > > > and see the following in LOG:
> > > > > LOG:  received SIGHUP, reloading configuration files
> > > > > 2024-02-06 11:38:59.069 IST [9240] LOG:  parameter "autovacuum"
> changed to "on"
> > > > >
> > > > > If the user can't confirm then it is better to follow the steps
> > > > > mentioned in the patch. Do you want something else to be written
> > > > > in docs for this? If so, what?
> > > >
> > > > IIUC even if a wrong slot name is specified to standby_slot_names
> > > > or even standby_slot_names is empty, the standby server might not
> > > > be lagging behind the subscribers depending on the timing. But
> > > > when checking it the next time, the standby server might lag
> > > > behind the subscribers. So what I wanted to know is how the user
> > > > can confirm if a failover-enabled subscription is ensured not to
> > > > go in front of failover-candidate standbys (i.e., standbys using
> > > > the slots listed in standby_slot_names).
> > > >
> > >
> > > But isn't the same explained by two steps ((a) Firstly, on the
> > > subscriber node check the last replayed WAL. (b) Next, on the
> > > standby server check that the last-received WAL location is ahead of
> > > the replayed WAL location on the subscriber identified above.) in
> > > the latest *_0004 patch.
> > >
> >
> > Additionally, I would like to add that the users can use the queries
> > mentioned in the doc after the primary has failed and before promoting
> > the standby. If she wants to do that when both primary and standby are
> > available, the value of 'standby_slot_names' on primary should be
> > referred. Isn't those two sufficient that there won't be false
> > positives?
> 
> From a user perspective, I'd like to confirm the following two points :
> 
> 1. replication slots used by subscribers are synchronized to the standby.
> 2. it's guaranteed that logical replication doesn't go ahead of physical
> replication to the standby.
> 
> These checks are necessary at least when building a replication setup 
> (primary,
> standby, and subscriber). Otherwise, it's too late if we find out that no 
> standby
> is failover-ready when the primary fails and we're about to do a failover.
> 
> As for the point 1 above, we can use the step 1 described in the doc.
> 
> As for point 2, the step 2 described in the doc could return true even if
> standby_slot_names isn't working. For example, standby_slot_names is empty,
> the user changed the standby_slot_names but forgot to reload the config file,
> and the walsender doesn't reflect the standby_slot_names update yet for some
> reason etc. It's possible that standby's last-received WAL location just 
> happens
> to be ahead of the replayed WAL location on the subscriber. So even if the
> check query returns true once, it could return false when we check it again, 
> if
> standby_slot_names is not working. On the other hand, IIUC if the point 2 is
> ensured, the check query always returns true. I think it would be good if we
> could provide a reliable way to check point 2 ideally via SQL queries 
> (especially
> for tools).

Based on off-list discussions with Sawada-san and Amit, an alternative approach 
to
improve this would be collecting the names of the standby slots that each
walsender has waited for, which will be visible in the pg_stat_replication
view. By checking this information, users can confirm that the GUC
standby_slot_names is correctly configured and that logical replication is not
lagging behind the standbys that hold these slots.

To achieve this, we can implement the collection of slot information within
each logical walsender with failover slot acquired, when waiting for he standby
to catch up (WalSndWaitForWal). For each valid standby slot that the walsender
has waited for, we will store the slot names in a shared memory area specific
to each walsender. To optimize performance, we can only rebuild the slot names
if the GUC has changed. We can track this by introducing a flag to monitor GUC
modifications.

When a user queries the pg_stat_replication view, we will retrieve the
collected slot names from the shared memory area associated with each
walsender. However, before returning the slot names, we can verify their
validity once again. If any of the collected slots have been dropped or
invalidated during this time, we will exclude them from the result returned to
the user.

Apart from the above design, I feel since user currently have a way to detect 
this
manually as mentioned in the 0004 patch(we can improve the

RE: Synchronizing slots from primary to standby

2024-02-06 Thread Zhijie Hou (Fujitsu)

On Monday, February 5, 2024 10:25 PM Masahiko Sawada  
wrote:
lvh.no-ip.org>
> Subject: Re: Synchronizing slots from primary to standby
> 
> On Mon, Feb 5, 2024 at 8:26 PM shveta malik 
> wrote:
> >
> > On Mon, Feb 5, 2024 at 10:57 AM Ajin Cherian  wrote:
> > >
> > > Just noticed that doc/src/sgml/config.sgml still refers to enable_synclot
> instead of sync_replication_slots:
> > >
> > > The standbys corresponding to the physical replication slots in
> > > standby_slot_names must configure
> > > enable_syncslot = true so they can receive
> > > failover logical slots changes from the primary.
> >
> > Thanks Ajin for pointing this out. Here are v78 patches, corrected there.
> >
> > Other changes are:
> >
> > 1)  Rebased the patches as the v77-001 is now pushed.
> > 2)  Enabled executing pg_sync_replication_slots() on cascading-standby.
> > 3)  Rearranged the code around parameter validity checks. Changed
> > function names and changed the way how dbname is extracted as
> > suggested by Amit offlist.
> > 4)  Rearranged the code around check_primary_info(). Removed output
> args.
> > 5)  Few other trivial changes.
> >
> 
> Thank you for updating the patch! Here are some comments:
> 
> ---
> Since Two processes (e.g. the slotsync worker and
> pg_sync_replication_slots()) concurrently fetch and update the slot 
> information,
> there is a race condition where slot's confirmed_flush_lsn goes backward. . We
> have the following check but it doesn't prevent the slot's confirmed_flush_lsn
> from moving backward if the restart_lsn does't change:
> 
> /*
>  * Sanity check: As long as the invalidations are handled
>  * appropriately as above, this should never happen.
>  */
> if (remote_slot->restart_lsn < slot->data.restart_lsn)
> elog(ERROR,
>  "cannot synchronize local slot \"%s\" LSN(%X/%X)"
>  " to remote slot's LSN(%X/%X) as synchronization"
>  " would move it backwards", remote_slot->name,
>  LSN_FORMAT_ARGS(slot->data.restart_lsn),
>  LSN_FORMAT_ARGS(remote_slot->restart_lsn));
> 

As discussed, I added a flag in shared memory to control the concurrent slot 
sync.

> ---
> + It is recommended that subscriptions are first disabled before
> + promoting
> f+ the standby and are enabled back after altering the connection string.
> 
> I think it's better to describe the reason why it's recommended to disable
> subscriptions before the standby promotion.

Added.

> 
> ---
> +/* Slot sync worker objects */
> +extern PGDLLIMPORT char *PrimaryConnInfo; extern PGDLLIMPORT char
> +*PrimarySlotName;
> 
> These two variables are declared also in xlogrecovery.h. Is it intentional? 
> If so, I
> think it's better to write comments.

Will address.

> 
> ---
> Global functions and variables used by the slotsync worker are declared in
> logicalworker.h and worker_internal.h. But is it really okay to make a
> dependency between the slotsync worker and logical replication workers? IIUC
> the slotsync worker is conceptually a separate feature from the logical
> replication. I think the slotsync worker can have its own header file.

Added.

> 
> ---
> +   SELECT r.srsubid AS subid, CONCAT('pg_' || srsubid ||
> '_sync_' || srrelid || '_' || ctl.system_identifier) AS slotname
> 
> and
> 
> +   SELECT (CASE WHEN r.srsubstate = 'f' THEN
> pg_replication_origin_progress(CONCAT('pg_' || r.srsubid || '_' || 
> r.srrelid), false)
> 
> If we use CONCAT function, we can replace '||' with ','.
> 

Will address.

> ---
> + Confirm that the standby server is not lagging behind the subscribers.
> + This step can be skipped if
> +  linkend="guc-standby-slot-names">standby_slot_names me>
> + has been correctly configured.
> 
> How can the user confirm if standby_slot_names is correctly configured?

Will address after concluding.

Thanks Shveta for helping addressing the comments.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-02-06 Thread Zhijie Hou (Fujitsu)

On Tuesday, February 6, 2024 3:39 PM Masahiko Sawada  
wrote:
> 
> On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila  wrote:
> >
> > On Mon, Feb 5, 2024 at 7:56 PM Masahiko Sawada
>  wrote:
> > >
> > > ---
> > > Since Two processes (e.g. the slotsync worker and
> > > pg_sync_replication_slots()) concurrently fetch and update the slot
> > > information, there is a race condition where slot's
> > > confirmed_flush_lsn goes backward.
> > >
> >
> > Right, this is possible, though there shouldn't be a problem because
> > anyway, slotsync is an async process. Till we hold restart_lsn, the
> > required WAL won't be removed. Having said that, I can think of two
> > ways to avoid it: (a) We can have some flag in shared memory using
> > which we can detect whether any other process is doing slot
> > syncronization and then either error out at that time or simply wait
> > or may take nowait kind of parameter from user to decide what to do?
> > If this is feasible, we can simply error out for the first version and
> > extend it later if we see any use cases for the same (b) similar to
> > restart_lsn, if confirmed_flush_lsn is getting moved back, raise an
> > error, this is good for now but in future we may still have another
> > similar issue, so I would prefer (a) among these but I am fine if you
> > prefer (b) or have some other ideas like just note down in comments
> > that this is a harmless case and can happen only very rarely.
> 
> Thank you for sharing the ideas. I would prefer (a). For (b), the same issue 
> still
> happens for other fields.

Attach the V79 patch which includes the following changes. (Note that only
0001 is sent in this version, we will send the later patches after rebasing)

1. Address all the comments from Amit[1], all the comments from Peter[2] and 
some of
   the comments from Sawada-san[3].
2. Using a flag in shared to memory to restrcit concurrent slot sync.
3. Add more tap tests for pg_sync_replication_slots function.

[1] 
https://www.postgresql.org/message-id/CAA4eK1KGHT9S-Bst_G1CUNQvRep%3DipMs5aTBNRQFVi6TogbJ9w%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAHut%2BPtyoRf3adoLoTrbL6momzkhXAFKz656Vv9YRu4cp%3D6Yig%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/CAD21AoCEkcTaPb%2BGdOhSQE49_mKJG6D64quHcioJGx6RCqMv%2BQ%40mail.gmail.com

Best Regards,
Hou zj


v79-0001-Add-a-slot-synchronization-function.patch
Description: v79-0001-Add-a-slot-synchronization-function.patch

RE: Synchronizing slots from primary to standby

2024-02-05 Thread Zhijie Hou (Fujitsu)

On Friday, February 2, 2024 2:03 PM Bertrand Drouvot 
 wrote:
> 
> Hi,
> 
> On Thu, Feb 01, 2024 at 05:29:15PM +0530, shveta malik wrote:
> > Attached v75 patch-set. Changes are:
> >
> > 1) Re-arranged the patches:
> > 1.1) 'libpqrc' related changes (from v74-001 and v74-004) are
> > separated out in v75-001 as those are independent changes.
> > 1.2) 'Add logical slot sync capability', 'Slot sync worker as special
> > process' and 'App-name changes' are now merged to single patch which
> > makes v75-002.
> > 1.3) 'Wait for physical Standby confirmation' and 'Failover Validation
> > Document' patches are maintained as is (v75-003 and v75-004 now).
> 
> Thanks!
> 
> I only looked at the commit message for v75-0002 and see that it has changed
> since the comment done in [1], but it still does not look correct to me.
> 
> "
> If a logical slot on the primary is valid but is invalidated on the standby, 
> then
> that slot is dropped and recreated on the standby in next sync-cycle provided
> the slot still exists on the primary server. It is okay to recreate such 
> slots as long
> as these are not consumable on the standby (which is the case currently). This
> situation may occur due to the following reasons:
> - The max_slot_wal_keep_size on the standby is insufficient to retain WAL
>   records from the restart_lsn of the slot.
> - primary_slot_name is temporarily reset to null and the physical slot is
>   removed.
> - The primary changes wal_level to a level lower than logical.
> "
> 
> If a logical decoding slot "still exists on the primary server" then the 
> primary
> can not change the wal_level to lower than logical, one would get something
> like:
> 
> "FATAL:  logical replication slot "logical_slot" exists, but wal_level < 
> logical"
> 
> and then slots won't get invalidated on the standby. I've the feeling that the
> wal_level conflict part may need to be explained separately? (I think it's not
> possible that they end up being re-created on the standby for this conflict,
> they will be simply removed as it would mean the counterpart one on the
> primary does not exist anymore).

This is possible in some extreme cases, because the slot is synced
asynchronously.

For example: If on the primary the wal_level is changed to 'replica' and then
changed back to 'logical', the standby would receive two XLOG_PARAMETER_CHANGE
wals. And before the standby replay these wals, user can create a failover slot
on the primary because the wal_level is logical, and if the slotsync worker has
synced the slots before startup process replay the XLOG_PARAMETER_CHANGE, then
when replaying the XLOG_PARAMETER_CHANGE, the just synced slot will be
invalidated.

Although I think it doesn't seem a real world case, so I am not sure is it worth
separate explanation.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-02-04 Thread Zhijie Hou (Fujitsu)

On Thursday, February 1, 2024 12:20 PM Amit Kapila  
wrote:
> 
> On Thu, Feb 1, 2024 at 8:15 AM Euler Taveira  wrote:
> >
> > On Mon, Jan 29, 2024, at 10:17 AM, Zhijie Hou (Fujitsu) wrote:
> >
> > Attach the V72-0001 which addressed above comments, other patches will
> be
> > rebased and posted after pushing first patch. Thanks Shveta for helping
> address
> > the comments.
> >
> >
> > While working on another patch I noticed a new NOTICE message:
> >
> > NOTICE:  changed the failover state of replication slot "foo" on publisher 
> > to
> false
> >
> > I wasn't paying much attention to this thread then I start reading the 2
> > patches that was recently committed. The message above surprises me
> because
> > pg_createsubscriber starts to emit this message. The reason is that it 
> > doesn't
> > create the replication slot during the CREATE SUBSCRIPTION. Instead, it
> creates
> > the replication slot with failover = false and no such option is informed
> > during CREATE SUBSCRIPTION which means it uses the default value (failover
> =
> > false). I expect that I don't see any message because it is *not* changing 
> > the
> > behavior. I was wrong. It doesn't check the failover state on publisher, it
> > just executes walrcv_alter_slot() and emits a message.
> >
> > IMO if we are changing an outstanding property on node A from node B,
> node B
> > already knows (or might know) about that behavior change (because it is
> sending
> > the command), however, node A doesn't (unless log_replication_commands
> = on --
> > it is not the default).
> >
> > Do we really need this message as NOTICE?
> >
> 
> The reason for adding this NOTICE was to keep it similar to other
> Notice messages in these commands like create/drop slot. However, here
> the difference is we may not have altered the slot as the property is
> already the same as we want to set on the publisher. So, I am not sure
> whether we should follow the existing behavior or just get rid of it.
> And then do we remove similar NOTICE in AlterSubscription() as well?
> Normally, I think NOTICE intends to let users know if we did anything
> with slots while executing subscription commands. Does anyone else
> have an opinion on this point?
> 
> A related point, I think we can avoid setting the 'failover' property
> in ReplicationSlotAlter() if it is not changed, the advantage is we
> will avoid saving slots. OTOH, this won't be a frequent operation so
> we can leave it as it is as well.

Here is a patch to remove the NOTICE and improve the ReplicationSlotAlter.
The patch also includes few cleanups based on Peter's feedback.

Best Regards,
Hou zj


v2-0001-clean-up-for-776621a5.patch
Description: v2-0001-clean-up-for-776621a5.patch

RE: Synchronizing slots from primary to standby

2024-01-30 Thread Zhijie Hou (Fujitsu)

On Wednesday, January 31, 2024 9:57 AM Peter Smith  
wrote:
> 
> Hi,
> 
> I saw that v73-0001 was pushed, but it included some last-minute
> changes that I did not get a chance to check yesterday.
> 
> Here are some review comments for the new parts of that patch.
> 
> ==
> doc/src/sgml/ref/create_subscription.sgml
> 
> 1.
> connect (boolean)
> 
> Specifies whether the CREATE SUBSCRIPTION command should connect
> to the publisher at all. The default is true. Setting this to false
> will force the values of create_slot, enabled, copy_data, and failover
> to false. (You cannot combine setting connect to false with setting
> create_slot, enabled, copy_data, or failover to true.)
> 
> ~
> 
> I don't think the first part "Setting this to false will force the
> values ... failover to false." is strictly correct.
> 
> I think is correct to say all those *other* properties (create_slot,
> enabled, copy_data) are forced to false because those otherwise have
> default true values. But the 'failover' has default false, so it
> cannot get force-changed at all because you can't set connect to false
> when failover is true as the second part ("You cannot combine...")
> explains.
> 
> IMO remove 'failover' from that first sentence.
> 
> 
> 3.
> dump can be restored without requiring network access to the remote
> servers.  It is then up to the user to reactivate the subscriptions in a
> suitable way.  If the involved hosts have changed, the connection
> -   information might have to be changed.  It might also be appropriate to
> +   information might have to be changed.  If the subscription needs to
> +   be enabled for
> +linkend="sql-createsubscription-params-with-failover">failover eral>,
> +   then same needs to be done by executing
> +   
> +   ALTER SUBSCRIPTION ... SET(failover = true)
> +   after the slot has been created.  It might also be appropriate to
> 
> "then same needs to be done" (English?)
> 
> BEFORE
> If the subscription needs to be enabled for failover, then same needs
> to be done by executing ALTER SUBSCRIPTION ... SET(failover = true)
> after the slot has been created.
> 
> SUGGESTION
> If the subscription needs to be enabled for failover, execute ALTER
> SUBSCRIPTION ... SET(failover = true) after the slot has been created.
> 
> ==
> src/backend/commands/subscriptioncmds.c
> 
> 4.
>  #define SUBOPT_RUN_AS_OWNER 0x1000
> -#define SUBOPT_LSN 0x2000
> -#define SUBOPT_ORIGIN 0x4000
> +#define SUBOPT_FAILOVER 0x2000
> +#define SUBOPT_LSN 0x4000
> +#define SUBOPT_ORIGIN 0x8000
> +
> 
> A spurious blank line was added.
> 

Here is a small patch to address the comment 3 and 4.
The discussion for comment 1 is still going on, so we can
update the patch once it's concluded.

Best Regards,
Hou zj



0001-clean-up-for-776621a5.patch
Description: 0001-clean-up-for-776621a5.patch

RE: Race condition in FetchTableStates() breaks synchronization of subscription tables

2024-01-30 Thread Zhijie Hou (Fujitsu)

On Tuesday, January 30, 2024 11:21 AM vignesh C  wrote:
> 
> On Tue, 30 Jan 2024 at 07:24, Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Monday, January 29, 2024 9:22 PM vignesh C 
> wrote:
> > >
> > > On Fri, 26 Jan 2024 at 11:30, Alexander Lakhin 
> wrote:
> > > >
> > > > Hello hackers,
> > > >
> > > > After determining a possible cause for intermittent failures of
> > > > the test subscription/031_column_list [1], I was wondering what
> > > > makes another subscription test (014_binary) fail on the buildfarm:
> > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snakefly
> > > > t=20
> > > > 24-01-22%2001%3A19%3A03
> > > >
> > >
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon=2
> > > 02
> > > > 4-01-14%2018%3A19%3A20
> > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet
> > > > =202
> > > > 3-12-21%2001%3A11%3A52
> > > >
> > >
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon=2
> > > 02
> > > > 3-11-27%2001%3A42%3A39
> > > >
> > > > All those failures caused by a timeout when waiting for a message
> > > > expected in _subscriber.log. For example, in the snakefly's case:
> > > > [01:14:46.158](1.937s) ok 7 - check synced data on subscriber with
> > > > custom type timed out waiting for match: (?^:ERROR: ( [A-Z0-9]+:)?
> > > > incorrect binary data format) at
> > > /home/bf/bf-build/piculet/HEAD/pgsql/src/test/subscription/t/014_bin
> > > ary.pl
> > > line 269.
> > > >
> > > > _subscriber.log contains:
> > > > 2023-12-21 01:14:46.215 UTC [410039] 014_binary.pl LOG:  statement:
> > > > ALTER SUBSCRIPTION tsub REFRESH PUBLICATION;
> > > > 2023-12-21 01:17:46.756 UTC [40] ERROR:  could not receive
> > > > data from WAL stream: server closed the connection unexpectedly
> > > >  This probably means the server terminated abnormally
> > > >  before or while processing the request.
> > > > 2023-12-21 01:17:46.760 UTC [405057] LOG:  background worker
> > > > "logical replication apply worker" (PID 40) exited with exit
> > > > code 1
> > > > 2023-12-21 01:17:46.779 UTC [532857] LOG:  logical replication
> > > > apply worker for subscription "tsub" has started ...
> > > >
> > > > While _subscriber.log from a successful test run contains:
> > > > 2024-01-26 03:49:07.065 UTC [9726:5] 014_binary.pl LOG:  statement:
> > > > ALTER SUBSCRIPTION tsub REFRESH PUBLICATION;
> > > > 2024-01-26 03:49:07.075 UTC [9726:6] 014_binary.pl LOG: disconnection:
> > > > session time: 0:00:00.014 user=postgres database=postgres
> > > > host=[local]
> > > > 2024-01-26 03:49:07.558 UTC [9729:1] LOG:  logical replication
> > > > apply worker for subscription "tsub" has started
> > > > 2024-01-26 03:49:07.563 UTC [9731:1] LOG:  logical replication
> > > > table synchronization worker for subscription "tsub", table
> > > > "test_mismatching_types" has started
> > > > 2024-01-26 03:49:07.585 UTC [9731:2] ERROR:  incorrect binary data
> > > > format
> > > > 2024-01-26 03:49:07.585 UTC [9731:3] CONTEXT:  COPY
> > > > test_mismatching_types, line 1, column a
> > > >
> > > > In this case, "logical replication apply worker for subscription
> > > > "tsub" has started" appears just after "ALTER SUBSCRIPTION", not 3
> > > > minutes
> > > later.
> > > >
> > > > I've managed to reproduce this failure locally by running multiple
> > > > tests in parallel, and my analysis shows that it is caused by a
> > > > race condition when accessing variable table_states_valid inside
> tablesync.c.
> > > >
> > > > tablesync.c does the following with table_states_valid:
> > > > /*
> > > >   * Callback from syscache invalidation.
> > > >   */
> > > > void
> > > > invalidate_syncing_table_states(Datum arg, int cacheid, uint32
> > > > hashvalue) {
> > > >  table_states_valid = false;
> > > > }
> > > > ...
> > > > static bool
> > > > FetchTableStates(bool *started_tx) { ...
> > > >  if (!table_states_valid)
> > >

RE: Race condition in FetchTableStates() breaks synchronization of subscription tables

2024-01-29 Thread Zhijie Hou (Fujitsu)

On Monday, January 29, 2024 9:22 PM vignesh C  wrote:
> 
> On Fri, 26 Jan 2024 at 11:30, Alexander Lakhin  wrote:
> >
> > Hello hackers,
> >
> > After determining a possible cause for intermittent failures of the
> > test subscription/031_column_list [1], I was wondering what makes
> > another subscription test (014_binary) fail on the buildfarm:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snakefly=20
> > 24-01-22%2001%3A19%3A03
> >
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon=202
> > 4-01-14%2018%3A19%3A20
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet=202
> > 3-12-21%2001%3A11%3A52
> >
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon=202
> > 3-11-27%2001%3A42%3A39
> >
> > All those failures caused by a timeout when waiting for a message
> > expected in _subscriber.log. For example, in the snakefly's case:
> > [01:14:46.158](1.937s) ok 7 - check synced data on subscriber with
> > custom type timed out waiting for match: (?^:ERROR: ( [A-Z0-9]+:)?
> > incorrect binary data format) at
> /home/bf/bf-build/piculet/HEAD/pgsql/src/test/subscription/t/014_binary.pl
> line 269.
> >
> > _subscriber.log contains:
> > 2023-12-21 01:14:46.215 UTC [410039] 014_binary.pl LOG:  statement:
> > ALTER SUBSCRIPTION tsub REFRESH PUBLICATION;
> > 2023-12-21 01:17:46.756 UTC [40] ERROR:  could not receive data
> > from WAL stream: server closed the connection unexpectedly
> >  This probably means the server terminated abnormally
> >  before or while processing the request.
> > 2023-12-21 01:17:46.760 UTC [405057] LOG:  background worker "logical
> > replication apply worker" (PID 40) exited with exit code 1
> > 2023-12-21 01:17:46.779 UTC [532857] LOG:  logical replication apply
> > worker for subscription "tsub" has started ...
> >
> > While _subscriber.log from a successful test run contains:
> > 2024-01-26 03:49:07.065 UTC [9726:5] 014_binary.pl LOG:  statement:
> > ALTER SUBSCRIPTION tsub REFRESH PUBLICATION;
> > 2024-01-26 03:49:07.075 UTC [9726:6] 014_binary.pl LOG: disconnection:
> > session time: 0:00:00.014 user=postgres database=postgres host=[local]
> > 2024-01-26 03:49:07.558 UTC [9729:1] LOG:  logical replication apply
> > worker for subscription "tsub" has started
> > 2024-01-26 03:49:07.563 UTC [9731:1] LOG:  logical replication table
> > synchronization worker for subscription "tsub", table
> > "test_mismatching_types" has started
> > 2024-01-26 03:49:07.585 UTC [9731:2] ERROR:  incorrect binary data
> > format
> > 2024-01-26 03:49:07.585 UTC [9731:3] CONTEXT:  COPY
> > test_mismatching_types, line 1, column a
> >
> > In this case, "logical replication apply worker for subscription
> > "tsub" has started" appears just after "ALTER SUBSCRIPTION", not 3 minutes
> later.
> >
> > I've managed to reproduce this failure locally by running multiple
> > tests in parallel, and my analysis shows that it is caused by a race
> > condition when accessing variable table_states_valid inside tablesync.c.
> >
> > tablesync.c does the following with table_states_valid:
> > /*
> >   * Callback from syscache invalidation.
> >   */
> > void
> > invalidate_syncing_table_states(Datum arg, int cacheid, uint32
> > hashvalue) {
> >  table_states_valid = false;
> > }
> > ...
> > static bool
> > FetchTableStates(bool *started_tx)
> > {
> > ...
> >  if (!table_states_valid)
> >  {
> > ...
> >  /* Fetch all non-ready tables. */
> >  rstates = GetSubscriptionRelations(MySubscription->oid,
> > true); ...
> >  table_states_valid = true;
> >  }
> >
> > So, when syscache invalidation occurs inside the code block "if
> > (!table_states_valid)", that invalidation is effectively ignored.
> >
> > In a failed case I observe the following events:
> > 1. logical replication apply worker performs
> >   LogicalRepApplyLoop() -> process_syncing_tables() ->
> >   process_syncing_tables_for_apply() -> FetchTableStates() periodically.
> >
> > 2. ALTER SUBSCRIPTION tsub REFRESH PUBLICATION invalidates syscache
> >   for SUBSCRIPTIONRELMAP, and that leads to calling
> >   invalidate_syncing_table_states().
> >
> > 3. If the apply worker manages to fetch no non-ready tables in
> >   FetchTableStates() and ignore "table_states_valid = false" from
> >   invalidate_syncing_table_states(), then it just misses the invalidation
> >   event, so it keeps working without noticing non-ready tables appeared as
> >   the result of ALTER SUBSCRIPTION (process_syncing_tables_for_apply()
> skips
> >   a loop "foreach(lc, table_states_not_ready) ..." until some other event
> >   occurs).
> >
> > pg_usleep(10) added just below GetSubscriptionRelations(...)
> > proves my analysis -- without it, I need tens of iterations (with 50
> > tests running in
> > parallel) to catch the failure, but with it, I get the failure on the
> > first iteration.
> 
> Thanks for the analysis, I was able to reproduce the issue with the steps you 
> had
>

RE: Synchronizing slots from primary to standby

2024-01-29 Thread Zhijie Hou (Fujitsu)

On Monday, January 29, 2024 9:17 PM Zhijie Hou (Fujitsu) 
 wrote:
> 
> On Monday, January 29, 2024 7:30 PM Amit Kapila 
> wrote:
> >
> > On Mon, Jan 29, 2024 at 3:11 PM shveta malik 
> > wrote:
> > >
> > > PFA v71 patch set with above changes.
> > >
> >
> > Few comments on 0001
> 
> Thanks for the comments.
> 
> > ===
> > 1.
> > parse_subscription_options()
> > {
> > ...
> > /*
> > * We've been explicitly asked to not connect, that requires some
> > * additional processing.
> > */
> > if (!opts->connect && IsSet(supported_opts, SUBOPT_CONNECT)) {
> >
> > Here, along with other options, we need an explicit check for
> > failover, so that if connect=false and failover=true, the statement
> > should give error. I was expecting the below statement to fail but it passed
> with WARNING.
> > postgres=# create subscription sub2 connection 'dbname=postgres'
> > publication pub2 with(connect=false, failover=true);
> > WARNING:  subscription was created, but is not connected
> > HINT:  To initiate replication, you must manually create the
> > replication slot, enable the subscription, and refresh the subscription.
> > CREATE SUBSCRIPTION
> 
> Added.
> 
> >
> > 2.
> > @@ -148,6 +153,10 @@ typedef struct Subscription
> >   List*publications; /* List of publication names to subscribe to */
> >   char*origin; /* Only publish data originating from the
> >   * specified origin */
> > + bool failover; /* True if the associated replication slots
> > + * (i.e. the main slot and the table sync
> > + * slots) in the upstream database are enabled
> > + * to be synchronized to the standbys. */
> >  } Subscription;
> >
> > Let's add this new field immediately after "bool runasowner;" as is
> > done for other boolean members. This will help avoid increasing the
> > size of the structure due to alignment when we add any new pointer
> > field in the future. Also, that would be consistent with what we do for 
> > other
> new boolean members.
> 
> Moved this field as suggested.
> 
> Attach the V72-0001 which addressed above comments, other patches will be
> rebased and posted after pushing first patch. Thanks Shveta for helping
> address the comments.

Apart from above comments. The new V72 patch also includes the followings 
changes.

1. Moved the test 'altering failover for enabled sub' to the tap-test where most
of the alter-sub behaviors are tested.

2. Rename the tap-test from 050_standby_failover_slots_sync.pl to
040_standby_failover_slots_sync.pl (the big number 050 was used to avoid
conflict with other newly committed tests). And add the test into meson.build
which was missed.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-01-29 Thread Zhijie Hou (Fujitsu)

On Monday, January 29, 2024 7:30 PM Amit Kapila  wrote:
> 
> On Mon, Jan 29, 2024 at 3:11 PM shveta malik 
> wrote:
> >
> > PFA v71 patch set with above changes.
> >
> 
> Few comments on 0001

Thanks for the comments.

> ===
> 1.
> parse_subscription_options()
> {
> ...
> /*
> * We've been explicitly asked to not connect, that requires some
> * additional processing.
> */
> if (!opts->connect && IsSet(supported_opts, SUBOPT_CONNECT)) {
> 
> Here, along with other options, we need an explicit check for failover, so 
> that if
> connect=false and failover=true, the statement should give error. I was
> expecting the below statement to fail but it passed with WARNING.
> postgres=# create subscription sub2 connection 'dbname=postgres'
> publication pub2 with(connect=false, failover=true);
> WARNING:  subscription was created, but is not connected
> HINT:  To initiate replication, you must manually create the replication slot,
> enable the subscription, and refresh the subscription.
> CREATE SUBSCRIPTION

Added.

> 
> 2.
> @@ -148,6 +153,10 @@ typedef struct Subscription
>   List*publications; /* List of publication names to subscribe to */
>   char*origin; /* Only publish data originating from the
>   * specified origin */
> + bool failover; /* True if the associated replication slots
> + * (i.e. the main slot and the table sync
> + * slots) in the upstream database are enabled
> + * to be synchronized to the standbys. */
>  } Subscription;
> 
> Let's add this new field immediately after "bool runasowner;" as is done for
> other boolean members. This will help avoid increasing the size of the 
> structure
> due to alignment when we add any new pointer field in the future. Also, that
> would be consistent with what we do for other new boolean members.

Moved this field as suggested.

Attach the V72-0001 which addressed above comments, other patches will be
rebased and posted after pushing first patch. Thanks Shveta for helping address
the comments.

Best Regards,
Hou zj



v72-0001-Add-a-failover-option-to-subscriptions.patch
Description: v72-0001-Add-a-failover-option-to-subscriptions.patch

Update the doc that refers to the removed column of pg_replication_slots.

2024-01-28 Thread Zhijie Hou (Fujitsu)

Hi,

I noticed one minor thing that the upgrade doc refers to the old 'conflicting'
column of pg_replication_slot. As the column has been changed to 
'conflict_reason', 
I think the doc needs to be updated like the attachment.

Best Regards,
Hou Zhijie


0001-Update-the-doc-for-upgrading-slots.patch
Description: 0001-Update-the-doc-for-upgrading-slots.patch

RE: Synchronizing slots from primary to standby

2024-01-24 Thread Zhijie Hou (Fujitsu)

On Wednesday, January 24, 2024 1:11 PM Masahiko Sawada  
wrote:
> Here are random comments on slotsyncworker.c (v66):

Thanks for the comments:

> 
> ---
> +   elog(ERROR,
> +"cannot synchronize local slot \"%s\" LSN(%X/%X)"
> +" to remote slot's LSN(%X/%X) as synchronization"
> +" would move it backwards", remote_slot->name,
> 
> Many error messages in slotsync.c are splitted into several lines, but I 
> think it
> would reduce the greppability when the user looks for the error message in the
> source code.

Thanks for the suggestion! we combined most of the messages in the new version
patch. Although some messages including the above one were kept splitted,
because It's too long(> 120 col including the indent) to fit into the screen,
so I feel it's better to keep these messages splitted.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-01-22 Thread Zhijie Hou (Fujitsu)

On Monday, January 22, 2024 11:36 AM shveta malik  
wrote:

Hi,

> On Fri, Jan 19, 2024 at 4:18 PM shveta malik  wrote:
> >
> > PFA v64.
> 
> V64 fails to apply to HEAD due to a recent commit. Rebased it. PFA v64_2. It 
> has
> no new changes.

I noticed few things while analyzing the patch.

1.
sleep_ms = Min(sleep_ms * 2, MAX_WORKER_NAPTIME_MS);

The initial value for sleep_ms is 0(default value for static variable) which
will not be advanced in this expression. We should initialize sleep_ms to a 
positive
number.

2.
/ Wait a bit, we don't expect to have to wait long /
rc = WaitLatch(MyLatch,
   WL_LATCH_SET | WL_TIMEOUT | 
WL_EXIT_ON_PM_DEATH,
   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);

The slotsync worker is not a bgworker anymore after 0003 patch, so a new event
is needed I think.

3.
slot->effective_catalog_xmin = xmin_horizon;

The assignment is also needed in local_slot_update() to make 
ReplicationSlotsComputeRequiredXmin work.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-01-18 Thread Zhijie Hou (Fujitsu)

On Wednesday, January 17, 2024 6:30 PM shveta malik  
wrote:
> PFA v63.

I analyzed the security of the slotsync worker and replication connection a bit,
and didn't find issue. Here is detail: 

1) data security

First, we are using the role used in primary_conninfo, the role used here is
requested to have REPLICATION or SUPERUSER privilege[1] which means it is
reasonable for the role to modify and read replication slots on the primary.

On the primary, the slotsync worker only queries the pg_replication_view which
doesn't contain any system or user table access, so I think it's safe.

On the standby server, the slot sync worker will not read/write any user table 
as
well, thus we don't have the risk of executing arbitrary codes in trigger.

2) privilege check

The SQL query of the slotsync worker will take common privilege check on the
primary. If I revoke the function execution privilege on
pg_get_replication_slots from replication user, then the slotsync worker
won't be able to query the pg_replication_slots view. Same is true for the
pg_is_in_recovery function. The slotsync worker will keep reporting ERROR after
revoking which is as expected.

Based on above, I didn't see some security issues for slotsync worker.

[1] 
https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-PRIMARY-CONNINFO

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-01-16 Thread Zhijie Hou (Fujitsu)

On Tuesday, January 16, 2024 9:27 AM Peter Smith  wrote:
> 
> Here are some review comments for patch v61-0002

Thanks for the comments.

> 
> ==
> doc/src/sgml/logical-replication.sgml
> 
> 1.
> +  
> +   Examples: logical replication failover
> 
> The current documentation structure (after the patch is applied) looks
> like this:
> 
> 30.1. Publication
> 30.2. Subscription
> 30.2.1. Replication Slot Management
> 30.2.2. Examples: Set Up Logical Replication
> 30.2.3. Examples: Deferred Replication Slot Creation
> 30.2.4. Examples: logical replication failover
> 
> I don't think it is ideal.
> 
> Firstly, I think this new section is not just "Examples:"; it is more
> like instructions for steps to check if a successful failover is
> possible. IMO call it something like "Logical Replication Failover" or
> "Replication Slot Failover".
> 
> Secondly, I don't think this new section strictly belongs underneath
> the "Subscription" section anymore because IMO it is just as much
> about the promotion of the publications. Now that you are adding this
> new (2nd) section about slots, I think the whole structure of this
> document should be changed like below:
> 
> SUGGESTION #1 (make a new section 30.3 just for slot-related topics)
> 
> 30.1. Publication
> 30.2. Subscription
> 30.2.1. Examples: Set Up Logical Replication
> 30.3. Logical Replication Slots
> 30.3.1. Replication Slot Management
> 30.3.2. Examples: Deferred Replication Slot Creation
> 30.3.3. Logical Replication Failover
> 
> ~
> 
> SUGGESTION #2 (keep the existing structure, but give the failover its
> own new section 30.3)
> 
> 30.1. Publication
> 30.2. Subscription
> 30.2.1. Replication Slot Management
> 30.2.2. Examples: Set Up Logical Replication
> 30.2.3. Examples: Deferred Replication Slot Creation
> 30.3 Logical Replication Failover

I used this version for now as I am sure about changing other section.

> 
> ~
> 
> SUGGESTION #2a (and maybe later you can extract some of the failover
> examples further)
> 
> 30.1. Publication
> 30.2. Subscription
> 30.2.1. Replication Slot Management
> 30.2.2. Examples: Set Up Logical Replication
> 30.2.3. Examples: Deferred Replication Slot Creation
> 30.3 Logical Replication Failover
> 30.3.1. Examples: Checking if failover ready
> 
> ~~~
> 
> 2.
> +   
> +In a logical replication setup, if the publisher server is also the 
> primary
> +server of the streaming replication, the logical slots on the
> primary server
> +can be synchronized to the standby server by specifying
> failover = true
> +when creating the subscription. Enabling failover ensures a seamless
> +transition of the subscription to the promoted standby, allowing it to
> +subscribe to the new primary server without any data loss.
> +   
> 
> I was initially confused by the wording. How about like below:
> 
> SUGGESTION
> When the publisher server is the primary server of a streaming
> replication, the logical slots on that primary server can be
> synchronized to the standby server by specifying failover =
> true when creating subscriptions for those publications.
> Enabling failover ensures a seamless transition of those subscriptions
> after the standby is promoted. They can continue subscribing to
> publications now on the new primary server without any data loss.

Changed as suggested.

> 
> ~~~
> 
> 3.
> +   
> +However, the replication slots are copied asynchronously, which
> means it's necessary
> +to confirm that replication slots have been synced to the standby server
> +before the failover happens. Additionally, to ensure a successful 
> failover,
> +the standby server must not lag behind the subscriber. To confirm
> +that the standby server is ready for failover, follow these steps:
> +   
> 
> Minor rewording
> 
> SUGGESTION
> Because the slot synchronization logic copies asynchronously, it is
> necessary to confirm that replication slots have been synced to the
> standby server before the failover happens. Furthermore, to ensure a
> successful failover, the standby server must not be lagging behind the
> subscriber. To confirm that the standby server is indeed ready for
> failover, follow these 2 steps:

Changed as suggested.

> 
> ~~~
> 
> 4.
> The instructions said "follow these steps", so the next parts should
> be rendered as 2 "steps" (using  markup?)
> 
> SUGGESTION (show as steps 1,2  and also some minor rewording of the
> step heading)
> 
> 1. Confirm that all the necessary logical replication slots have been
> synced to the standby server.
> 2. Confirm that the standby server is not lagging behind the subscribers.
> 

Changed as suggested.

> ~~~
> 
> 5.
> +   
> +Check if all the necessary logical replication slots have been synced to
> +the standby server.
> +   
> 
> SUGGESTION
> Confirm that all the necessary logical replication slots have been
> synced to the standby server.
> 

Changed as suggested.

>

RE: Synchronizing slots from primary to standby

2024-01-11 Thread Zhijie Hou (Fujitsu)

On Thursday, January 11, 2024 11:42 PM Bertrand Drouvot 
 wrote:

Hi,
 
> On Thu, Jan 11, 2024 at 04:22:56PM +0530, Amit Kapila wrote:
> > On Tue, Jan 9, 2024 at 6:39 PM Amit Kapila 
> wrote:
> > >
> > > +static bool
> > > +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot
> > > +*remote_slot)
> > > {
> > > ...
> > > + /* Slot ready for sync, so sync it. */ else {
> > > + /*
> > > + * Sanity check: With hot_standby_feedback enabled and
> > > + * invalidations handled appropriately as above, this should never
> > > + * happen.
> > > + */
> > > + if (remote_slot->restart_lsn < slot->data.restart_lsn) elog(ERROR,
> > > + "cannot synchronize local slot \"%s\" LSN(%X/%X)"
> > > + " to remote slot's LSN(%X/%X) as synchronization"
> > > + " would move it backwards", remote_slot->name,
> > > + LSN_FORMAT_ARGS(slot->data.restart_lsn),
> > > + LSN_FORMAT_ARGS(remote_slot->restart_lsn));
> > > ...
> > > }
> > >
> > > I was thinking about the above code in the patch and as far as I can
> > > think this can only occur if the same name slot is re-created with
> > > prior restart_lsn after the existing slot is dropped. Normally, the
> > > newly created slot (with the same name) will have higher restart_lsn
> > > but one can mimic it by copying some older slot by using
> > > pg_copy_logical_replication_slot().
> > >
> > > I don't think as mentioned in comments even if hot_standby_feedback
> > > is temporarily set to off, the above shouldn't happen. It can only
> > > lead to invalidated slots on standby.
> 
> I also think so.
> 
> > >
> > > To close the above race, I could think of the following ways:
> > > 1. Drop and re-create the slot.
> > > 2. Emit LOG/WARNING in this case and once remote_slot's LSN moves
> > > ahead of local_slot's LSN then we can update it; but as mentioned in
> > > your previous comment, we need to update all other fields as well.
> > > If we follow this then we probably need to have a check for
> > > catalog_xmin as well.
> 
> IIUC, this would be a sync slot (so not usable until promotion) that could 
> not be
> used anyway (invalidated), so I'll vote for drop / re-create then.

Such race can happen when user drop and re-create the same failover slot on
primary as well. For example, user dropped one failover slot and them
immediately created a new one by copying from an old slot(using
pg_copy_logical_replication_slot). Then the slotsync worker will find the
restart_lsn of this slot go backwards.

The steps:

SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 
'pgoutput', false, false, true);
SELECT 'init' FROM pg_create_logical_replication_slot('test', 'pgoutput', 
false, false, true);

- Advance the restart_lsn of 'test' slot
CREATE TABLE test2(a int);
INSERT INTO test2 SELECT generate_series(1,1,1);
SELECT slot_name FROM pg_replication_slot_advance('test', pg_current_wal_lsn());

- re-create the test slot but based on the old isolation_slot.
SELECT pg_drop_replication_slot('test');
SELECT 'copy' FROM pg_copy_logical_replication_slot('isolation_slot', 'test');

Then the restart_lsn of 'test' slot will go backwards.

Best Regards,
Hou zj

1 2 3 >

1 - 100 of 201 matches

Mail list logo