subject:"Re\: Synchronizing slots from primary to standby"

Re: Synchronizing slots from primary to standby

2024-06-07 Thread Amit Kapila

On Fri, Jun 7, 2024 at 7:57 AM Zhijie Hou (Fujitsu)
 wrote:
>
> Thanks for the comments! Here is the V6 patch that addressed the these.
>

I have pushed this after making minor changes in the wording. I have
also changed one of the queries in docs to ignore the NULL slot_name
values.

-- 
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

2024-06-06 Thread Zhijie Hou (Fujitsu)

On Thursday, June 6, 2024 12:21 PM Peter Smith 
> 
> Hi, here are some review comments for the docs patch v5-0001.

Thanks for the comments! Here is the V6 patch that addressed the these.

Best Regards,
Hou zj


v6-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch
Description:  v6-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch

Re: Synchronizing slots from primary to standby

2024-06-05 Thread Peter Smith

Hi, here are some review comments for the docs patch v5-0001.

Apart from these it LGTM.

==
doc/src/sgml/logical-replication.sgml

1.
+
+ On the subscriber node, use the following SQL to identify which slots
+ should be synced to the standby that we plan to promote. This query will
+ return the relevant replication slots, including the main slots and table
+ synchronization slots associated with the failover enabled subscriptions.

/failover enabled/failover-enabled/

~~~

2.
+  
+   If all the slots are present on the standby server and result
+   (failover_ready) of is true, then existing subscriptions
+   can continue subscribing to publications now on the new primary server
+   without any loss of data.
+  

Hmm. It looks like there is some typo or missing words here: "of is true".

Did you mean something like: "of the above SQL query is true"?

==
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: Synchronizing slots from primary to standby

2024-06-05 Thread Zhijie Hou (Fujitsu)

On Wednesday, June 5, 2024 2:32 PM Peter Smith  wrote:

> Hi. Here are some minor review comments for the docs patch v4-0001.

Thanks for the comments!

> The SGML file wrapping can be fixed to fill up to 80 cols for some of the
> paragraphs.

Unlike comments in C code, I think we don't force the 80 cols limit in doc
file unless it's too long to read. I checked the doc once and think it's
OK.

Here is the V5 patch which addressed Peter's comments and Amit's comments[1].

[1] 
https://www.postgresql.org/message-id/CAA4eK1%2Bq1MYGgF3-LZCj6Xd0idujnjbTsfk-RqU%2BC51wYGaD5g%40mail.gmail.com

Best Regards,
Hou zj

v5-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch
Description:  v5-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch

Re: Synchronizing slots from primary to standby

2024-06-05 Thread Peter Smith

Hi. Here are some minor review comments for the docs patch v4-0001.

==
doc/src/sgml/logical-replication.sgml

1. General

The SGML file wrapping can be fixed to fill up to 80 cols for some of
the paragraphs.

~~~

2.
+   standby is promoted. They can continue subscribing to publications
now on the
+   new primary server without any loss of data. But please note that in case of
+   asynchronous replication, there remains a risk of data loss for transactions
+   that have been committed on the former primary server but have yet to be
+   replicated to the new primary server.
+  

/in case/in the case/

/But please note that.../Note that.../

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Synchronizing slots from primary to standby

2024-06-05 Thread Amit Kapila

On Wed, Jun 5, 2024 at 7:52 AM Zhijie Hou (Fujitsu)
 wrote:
>
> Attach the V4 doc patch which addressed Peter and Bertrand's comments.
>

Few comments:

1.
+   On the subscriber node, use the following SQL to identify
+   which slots should be synced to the standby that we plan to promote.
+
+test_sub=# SELECT
+   array_agg(slot_name) AS slots
+   FROM
+   ((
+   SELECT r.srsubid AS subid, CONCAT('pg_', srsubid,
'_sync_', srrelid, '_', ctl.system_identifier) AS slot_name
+   FROM pg_control_system() ctl, pg_subscription_rel r,
pg_subscription s
+   WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND s.subfailover
+   ) UNION (
+   SELECT s.oid AS subid, s.subslotname as slot_name
+   FROM pg_subscription s
+   WHERE s.subfailover
+   ));
+ slots
+---
+ {sub1,sub2,sub3}

This should additionally say what exactly this SQL is doing to fetch
the required slots.

2.
If standby_slot_names is
+ not configured correctly, it is highly recommended to run this step after
+ the primary server is down, otherwise the results of the query can vary
+ due to the ongoing replication on the logical subscribers from the primary
+ server.
+
+ 
+  
+   
+On the subscriber node, check the last replayed WAL.
+This step needs to be run on any database that includes
failover enabled
+subscriptions.
+
+test_sub=# SELECT
+   MAX(remote_lsn) AS remote_lsn_on_subscriber
+   FROM

If the 'standby_slot_names' is not configured then we can't ensure
that standby is always ahead because what if immediately after running
this query the additional WAL got synced to the subscriber before
standby? Now, as you mentioned users can first shutdown primary to
ensure that no additional WAL is sent to the subscriber. After that,
it is possible that one can use these complex queries to ensure that
the subscriber is behind the standby but it is better to encourage
users to use standby_slot_names to ensure the same. If at all we get
such use cases and or requirements then we can add such additional
steps after understanding the user's requirements. For now, we should
remove these additional steps.

-- 
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

2024-06-04 Thread Zhijie Hou (Fujitsu)

On Thursday, May 23, 2024 1:34 PM Peter Smith  wrote:

Thanks for the comments. I addressed most of the comments except the
following one which I am not sure:

> 5b.
> Patch says "on the subscriber node", but isn't that the simplest case?
> e.g. maybe there are multiple nodes having subscriptions for these
> publications. Maybe the sentence needs to account for case of subscribers on
> >1 nodes.

I think it's not necessary mention the multiple nodes case, as in that case, 
user can just
perform the same steps on each node that have failover subscription.

> Is there no way to discover this information by querying the publisher?

I am not aware of the way for user to get the necessary info such as 
replication origin
progress on the publisher, because such information is only available on 
subscriber.

Attach the V4 doc patch which addressed Peter and Bertrand's comments.

Best Regards,
Hou zj

v4-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch
Description:  v4-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch

RE: Synchronizing slots from primary to standby

2024-06-04 Thread Zhijie Hou (Fujitsu)

On Wednesday, May 8, 2024 5:21 PM Bertrand Drouvot 
 wrote:
> A few comments:
Thanks for the comments!

> 2 ===
> 
> +test_sub=# SELECT
> +   array_agg(slotname) AS slots
> +   FROM
> +   ((
> +   SELECT r.srsubid AS subid, CONCAT('pg_', srsubid, '_sync_',
> srrelid, '_', ctl.system_identifier) AS slotname
> +   FROM pg_control_system() ctl, pg_subscription_rel r,
> pg_subscription s
> +   WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND
> s.subfailover
> +   ) UNION (
> 
> I guess this format comes from ReplicationSlotNameForTablesync(). What
> about creating a SQL callable function on top of it and make use of it in the
> query above? (that would ensure to keep the doc up to date even if the format
> changes in ReplicationSlotNameForTablesync()).

We could add a new function as suggested but I think it's not the right
time(beta1) to add this function because new function will bring
catversion bump which I think may not be worth at this stage. I think we can
consider this after releasing and maybe gather more use cases for the new
function you suggested.

> 
> 3 ===
> 
> +test_sub=# SELECT
> +   MAX(remote_lsn) AS remote_lsn_on_subscriber
> +   FROM
> +   ((
> +   SELECT (CASE WHEN r.srsubstate = 'f' THEN
> pg_replication_origin_progress(CONCAT('pg_', r.srsubid, '_', r.srrelid), 
> false)
> +   WHEN r.srsubstate IN ('s', 'r') THEN r.srsublsn
> END) AS remote_lsn
> +   FROM pg_subscription_rel r, pg_subscription s
> +   WHERE r.srsubstate IN ('f', 's', 'r') AND s.oid = r.srsubid 
> AND
> s.subfailover
> +   ) UNION (
> +   SELECT pg_replication_origin_progress(CONCAT('pg_',
> s.oid), false) AS remote_lsn
> +   FROM pg_subscription s
> +   WHERE s.subfailover
> +   ));
> 
> What about adding a join to pg_replication_origin to get rid of the 
> "hardcoded"
> format "CONCAT('pg_', r.srsubid, '_', r.srrelid)" and "CONCAT('pg_', s.oid)"?

I tried a bit, but it doesn't seem feasible to get the relationship between
subscription and origin by querying pg_subscription and
pg_replication_origin.

Best Regards,
Hou zj

Re: Synchronizing slots from primary to standby

2024-05-22 Thread Peter Smith

Here are some review comments for the docs patch v3-0001.

==
Commit message

1.
This patch adds detailed documentation for the slot sync feature
including examples to guide users on how to verify that all slots have
been successfully synchronized to the standby server and how to
confirm whether the subscription can continue subscribing to
publications on the promoted standby server.

~

This may be easier to read if you put it in bullet form like below:

SUGGESTION

It includes examples to guide the user:

* How to verify that all slots have been successfully synchronized to
the standby server

* How to confirm whether the subscription can continue subscribing to
publications on the promoted standby server

==
doc/src/sgml/high-availability.sgml

2.
+   
+If you have opted for synchronization of logical slots (see
+),
+then before switching to the standby server, it is recommended to check
+if the logical slots synchronized on the standby server are ready
+for failover. This can be done by following the steps described in
+.
+   
+

Maybe it is better to call this feature "logical replication slot
synchronization" to be more consistent with the title of section
47.2.3.

SUGGESTION
If you have opted for logical replication slot synchronization (see ...

==
doc/src/sgml/logical-replication.sgml

3.
+  
+   When the publisher server is the primary server of a streaming replication,
+   the logical slots on that primary server can be synchronized to the standby
+   server by specifying failover = true when creating
+   subscriptions for those publications. Enabling failover ensures a seamless
+   transition of those subscriptions after the standby is promoted. They can
+   continue subscribing to publications now on the new primary server without
+   losing any data that has been flushed to the new primary server.
+  
+

3a.
BEFORE
When the publisher server is the primary server of...

SUGGESTION
When publications are defined on the primary server of...

~

3b.
Enabling failover...

Maybe say "Enabling the failover parameter..." and IMO there should
also be a link to the CREATE SUBSCRIPTION failover parameter so the
user can easily navigate there to read more about it.

~

3c.
BEFORE
They can continue subscribing to publications now on the new primary
server without losing any data that has been flushed to the new
primary server.

SUGGESTION (removes some extra info I did not think was needed)
They can continue subscribing to publications now on the new primary
server without any loss of data.

~~~

4.
+  
+   Because the slot synchronization logic copies asynchronously, it is
+   necessary to confirm that replication slots have been synced to the standby
+   server before the failover happens. Furthermore, to ensure a successful
+   failover, the standby server must not be lagging behind the subscriber. It
+   is highly recommended to use
+   standby_slot_names
+   to prevent the subscriber from consuming changes faster than the
hot standby.
+   To confirm that the standby server is indeed ready for failover, follow
+   these 2 steps:
+  

IMO the last sentence "To confirm..." should be a new paragraph.

~~~

5.
+  
+   Firstly, on the subscriber node, use the following SQL to identify
+   which slots should be synced to the standby that we plan to promote.

AND

+  
+   Next, check that the logical replication slots identified above exist on
+   the standby server and are ready for failover.

~~

5a.
I don't think you need to say "Firstly," and "Next," because the order
to do these steps is already self-evident.

~

5b.
Patch says "on the subscriber node", but isn't that the simplest case?
e.g. maybe there are multiple nodes having subscriptions for these
publications. Maybe the sentence needs to account for case of
subscribers on >1 nodes.

Is there no way to discover this information by querying the publisher?

~~~

6.
+
+test_sub=# SELECT
+   array_agg(slotname) AS slots
+   FROM

...

+
+test_standby=# SELECT slot_name, (synced AND NOT temporary AND NOT
conflicting) AS failover_ready
+   FROM pg_replication_slots
+   WHERE slot_name IN ('sub1','sub2','sub3');


The example SQL for "1a" refers to 'slotname', but the example SQL for
"1b" refers to "slot_name" (i.e. with underscore). It might be better
if those are consistently called 'slot_name'.

~~~

7.
+   
+
+ Confirm that the standby server is not lagging behind the subscribers.
+ This step can be skipped if
+ standby_slot_names
+ has been correctly configured. If standby_slot_names is not configured
+ correctly, it is highly recommended to run this step after the primary
+ server is down, otherwise the results of the query may vary at different
+ points of time due to the ongoing replication on the logical subscribers
+ from the primary server.
+

7a.
I felt that the step should just say "Confirm that the

Re: Synchronizing slots from primary to standby

2024-05-08 Thread Bertrand Drouvot

Hi,

On Mon, Apr 29, 2024 at 11:58:09AM +, Zhijie Hou (Fujitsu) wrote:
> On Monday, April 29, 2024 5:11 PM shveta malik  wrote:
> > 
> > On Mon, Apr 29, 2024 at 11:38 AM shveta malik 
> > wrote:
> > >
> > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> > >  wrote:
> > > >
> > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot
> >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote:
> > > > > > Hi,
> > > > > >
> > > > > > Since the standby_slot_names patch has been committed, I am
> > > > > > attaching the last doc patch for review.
> > > > > >
> > > > >
> > > > > Thanks!
> > > > >
> > > > > 1 ===
> > > > >
> > > > > +   continue subscribing to publications now on the new primary
> > > > > + server
> > > > > without
> > > > > +   any data loss.
> > > > >
> > > > > I think "without any data loss" should be re-worded in this
> > > > > context. Data loss in the sense "data committed on the primary and
> > > > > not visible on the subscriber in case of failover" can still occurs 
> > > > > (in case
> > synchronous replication is not used).
> > > > >
> > > > > 2 ===
> > > > >
> > > > > +   If the result (failover_ready) of both above 
> > > > > steps is
> > > > > +   true, existing subscriptions will be able to continue without data
> > loss.
> > > > > +  
> > > > >
> > > > > I don't think that's true if synchronous replication is not used.
> > > > > Say,
> > > > >
> > > > > - synchronous replication is not used
> > > > > - primary is not able to reach the standby anymore and
> > > > > standby_slot_names is set
> > > > > - new data is inserted into the primary
> > > > > - then not replicated to subscriber (due to standby_slot_names)
> > > > >
> > > > > Then I think the both above steps will return true but data would
> > > > > be lost in case of failover.
> > > >
> > > > Thanks for the comments, attach the new version patch which reworded
> > > > the above places.

Thanks!

> Here is the V3 doc patch.

Thanks! A few comments:

1 ===

+   losing any data that has been flushed to the new primary server.

Worth to add a few words about possible data loss, something like?

Please note that in case synchronous replication is not used and 
standby_slot_names
is set correctly, it might be possible to lose data that would have been 
committed
on the old primary server (in case the standby was not reachable during that 
time 
for example).

2 ===

+test_sub=# SELECT
+   array_agg(slotname) AS slots
+   FROM
+   ((
+   SELECT r.srsubid AS subid, CONCAT('pg_', srsubid, '_sync_', 
srrelid, '_', ctl.system_identifier) AS slotname
+   FROM pg_control_system() ctl, pg_subscription_rel r, 
pg_subscription s
+   WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND s.subfailover
+   ) UNION (

I guess this format comes from ReplicationSlotNameForTablesync(). What about
creating a SQL callable function on top of it and make use of it in the query
above? (that would ensure to keep the doc up to date even if the format changes
in ReplicationSlotNameForTablesync()).

3 ===

+test_sub=# SELECT
+   MAX(remote_lsn) AS remote_lsn_on_subscriber
+   FROM
+   ((
+   SELECT (CASE WHEN r.srsubstate = 'f' THEN 
pg_replication_origin_progress(CONCAT('pg_', r.srsubid, '_', r.srrelid), false)
+   WHEN r.srsubstate IN ('s', 'r') THEN r.srsublsn 
END) AS remote_lsn
+   FROM pg_subscription_rel r, pg_subscription s
+   WHERE r.srsubstate IN ('f', 's', 'r') AND s.oid = r.srsubid AND 
s.subfailover
+   ) UNION (
+   SELECT pg_replication_origin_progress(CONCAT('pg_', s.oid), 
false) AS remote_lsn
+   FROM pg_subscription s
+   WHERE s.subfailover
+   ));

What about adding a join to pg_replication_origin to get rid of the "hardcoded"
format "CONCAT('pg_', r.srsubid, '_', r.srrelid)" and "CONCAT('pg_', s.oid)"?

Idea behind 2 ===  and 3 === is to have the queries as generic as possible and
not rely on a hardcoded format (that would be more difficult to maintain should
those formats change in the future).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-04-29 Thread shveta malik

On Mon, Apr 29, 2024 at 5:28 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Monday, April 29, 2024 5:11 PM shveta malik  wrote:
> >
> > On Mon, Apr 29, 2024 at 11:38 AM shveta malik 
> > wrote:
> > >
> > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> > >  wrote:
> > > >
> > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot
> >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote:
> > > > > > Hi,
> > > > > >
> > > > > > Since the standby_slot_names patch has been committed, I am
> > > > > > attaching the last doc patch for review.
> > > > > >
> > > > >
> > > > > Thanks!
> > > > >
> > > > > 1 ===
> > > > >
> > > > > +   continue subscribing to publications now on the new primary
> > > > > + server
> > > > > without
> > > > > +   any data loss.
> > > > >
> > > > > I think "without any data loss" should be re-worded in this
> > > > > context. Data loss in the sense "data committed on the primary and
> > > > > not visible on the subscriber in case of failover" can still occurs 
> > > > > (in case
> > synchronous replication is not used).
> > > > >
> > > > > 2 ===
> > > > >
> > > > > +   If the result (failover_ready) of both above 
> > > > > steps is
> > > > > +   true, existing subscriptions will be able to continue without data
> > loss.
> > > > > +  
> > > > >
> > > > > I don't think that's true if synchronous replication is not used.
> > > > > Say,
> > > > >
> > > > > - synchronous replication is not used
> > > > > - primary is not able to reach the standby anymore and
> > > > > standby_slot_names is set
> > > > > - new data is inserted into the primary
> > > > > - then not replicated to subscriber (due to standby_slot_names)
> > > > >
> > > > > Then I think the both above steps will return true but data would
> > > > > be lost in case of failover.
> > > >
> > > > Thanks for the comments, attach the new version patch which reworded
> > > > the above places.
> > >
> > > Thanks for the patch.
> > >
> > > Few comments:
> > >
> > > 1)  Tested the steps, one of the queries still refers to
> > > 'conflict_reason'. I think it should refer 'conflicting'.
>
> Thanks for catching this. Fixed.
>
> > >
> > > 2) Will it be good to mention that in case of planned promotion, it is
> > > recommended to run  pg_sync_replication_slots() as last sync attempt
> > > before we run failvoer-ready validation steps? This can be mentioned
> > > in high-availaibility.sgml of current patch
> >
> > I recall now that with the latest fix, we cannot run
> > pg_sync_replication_slots() unless we disable the slot-sync worker.
> > Considering that, I think it will be too many steps just to run the SQL 
> > function at
> > the end without much value added. Thus we can skip this point, we can rely 
> > on
> > slot sync worker completely.
>
> Agreed. I didn't change this.
>
> Here is the V3 doc patch.

Thanks for the patch.

It will be good if 1a can produce quoted slot-names list as output,
which can be used directly in step 1b's query; otherwise, it is little
inconvenient to give input to 1b if the number of slots are huge. User
needs to manually quote each slot-name.

Other than this, the patch looks good to me.

thanks
Shveta

RE: Synchronizing slots from primary to standby

2024-04-29 Thread Zhijie Hou (Fujitsu)

On Monday, April 29, 2024 5:11 PM shveta malik  wrote:
> 
> On Mon, Apr 29, 2024 at 11:38 AM shveta malik 
> wrote:
> >
> > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot
>  wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote:
> > > > > Hi,
> > > > >
> > > > > Since the standby_slot_names patch has been committed, I am
> > > > > attaching the last doc patch for review.
> > > > >
> > > >
> > > > Thanks!
> > > >
> > > > 1 ===
> > > >
> > > > +   continue subscribing to publications now on the new primary
> > > > + server
> > > > without
> > > > +   any data loss.
> > > >
> > > > I think "without any data loss" should be re-worded in this
> > > > context. Data loss in the sense "data committed on the primary and
> > > > not visible on the subscriber in case of failover" can still occurs (in 
> > > > case
> synchronous replication is not used).
> > > >
> > > > 2 ===
> > > >
> > > > +   If the result (failover_ready) of both above 
> > > > steps is
> > > > +   true, existing subscriptions will be able to continue without data
> loss.
> > > > +  
> > > >
> > > > I don't think that's true if synchronous replication is not used.
> > > > Say,
> > > >
> > > > - synchronous replication is not used
> > > > - primary is not able to reach the standby anymore and
> > > > standby_slot_names is set
> > > > - new data is inserted into the primary
> > > > - then not replicated to subscriber (due to standby_slot_names)
> > > >
> > > > Then I think the both above steps will return true but data would
> > > > be lost in case of failover.
> > >
> > > Thanks for the comments, attach the new version patch which reworded
> > > the above places.
> >
> > Thanks for the patch.
> >
> > Few comments:
> >
> > 1)  Tested the steps, one of the queries still refers to
> > 'conflict_reason'. I think it should refer 'conflicting'.

Thanks for catching this. Fixed.

> >
> > 2) Will it be good to mention that in case of planned promotion, it is
> > recommended to run  pg_sync_replication_slots() as last sync attempt
> > before we run failvoer-ready validation steps? This can be mentioned
> > in high-availaibility.sgml of current patch
> 
> I recall now that with the latest fix, we cannot run
> pg_sync_replication_slots() unless we disable the slot-sync worker.
> Considering that, I think it will be too many steps just to run the SQL 
> function at
> the end without much value added. Thus we can skip this point, we can rely on
> slot sync worker completely.

Agreed. I didn't change this.

Here is the V3 doc patch.

Best Regards,
Hou zj


v3-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch
Description:  v3-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch

Re: Synchronizing slots from primary to standby

2024-04-29 Thread shveta malik

On Mon, Apr 29, 2024 at 11:38 AM shveta malik  wrote:
>
> On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot 
> >  wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote:
> > > > Hi,
> > > >
> > > > Since the standby_slot_names patch has been committed, I am attaching
> > > > the last doc patch for review.
> > > >
> > >
> > > Thanks!
> > >
> > > 1 ===
> > >
> > > +   continue subscribing to publications now on the new primary server
> > > without
> > > +   any data loss.
> > >
> > > I think "without any data loss" should be re-worded in this context. Data 
> > > loss in
> > > the sense "data committed on the primary and not visible on the 
> > > subscriber in
> > > case of failover" can still occurs (in case synchronous replication is 
> > > not used).
> > >
> > > 2 ===
> > >
> > > +   If the result (failover_ready) of both above steps 
> > > is
> > > +   true, existing subscriptions will be able to continue without data 
> > > loss.
> > > +  
> > >
> > > I don't think that's true if synchronous replication is not used. Say,
> > >
> > > - synchronous replication is not used
> > > - primary is not able to reach the standby anymore and standby_slot_names 
> > > is
> > > set
> > > - new data is inserted into the primary
> > > - then not replicated to subscriber (due to standby_slot_names)
> > >
> > > Then I think the both above steps will return true but data would be lost 
> > > in case
> > > of failover.
> >
> > Thanks for the comments, attach the new version patch which reworded the
> > above places.
>
> Thanks for the patch.
>
> Few comments:
>
> 1)  Tested the steps, one of the queries still refers to
> 'conflict_reason'. I think it should refer 'conflicting'.
>
> 2) Will it be good to mention that in case of planned promotion, it is
> recommended to run  pg_sync_replication_slots() as last sync attempt
> before we run failvoer-ready validation steps? This can be mentioned
> in high-availaibility.sgml of current patch

I recall now that with the latest fix, we cannot run
pg_sync_replication_slots() unless we disable the slot-sync worker.
Considering that, I think it will be too many steps just to run the
SQL function at the end without much value added. Thus we can skip
this point, we can rely on slot sync worker completely.

thanks
Shveta

Re: Synchronizing slots from primary to standby

2024-04-29 Thread shveta malik

On Mon, Apr 29, 2024 at 11:38 AM shveta malik  wrote:
>
> On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot 
> >  wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote:
> > > > Hi,
> > > >
> > > > Since the standby_slot_names patch has been committed, I am attaching
> > > > the last doc patch for review.
> > > >
> > >
> > > Thanks!
> > >
> > > 1 ===
> > >
> > > +   continue subscribing to publications now on the new primary server
> > > without
> > > +   any data loss.
> > >
> > > I think "without any data loss" should be re-worded in this context. Data 
> > > loss in
> > > the sense "data committed on the primary and not visible on the 
> > > subscriber in
> > > case of failover" can still occurs (in case synchronous replication is 
> > > not used).
> > >
> > > 2 ===
> > >
> > > +   If the result (failover_ready) of both above steps 
> > > is
> > > +   true, existing subscriptions will be able to continue without data 
> > > loss.
> > > +  
> > >
> > > I don't think that's true if synchronous replication is not used. Say,
> > >
> > > - synchronous replication is not used
> > > - primary is not able to reach the standby anymore and standby_slot_names 
> > > is
> > > set
> > > - new data is inserted into the primary
> > > - then not replicated to subscriber (due to standby_slot_names)
> > >
> > > Then I think the both above steps will return true but data would be lost 
> > > in case
> > > of failover.
> >
> > Thanks for the comments, attach the new version patch which reworded the
> > above places.
>
> Thanks for the patch.
>
> Few comments:
>
> 1)  Tested the steps, one of the queries still refers to
> 'conflict_reason'. I think it should refer 'conflicting'.
>
> 2) Will it be good to mention that in case of planned promotion, it is
> recommended to run  pg_sync_replication_slots() as last sync attempt
> before we run failvoer-ready validation steps? This can be mentioned
> in high-availaibility.sgml of current patch

I recall now that with the latest fix, we cannot run
pg_sync_replication_slots() unless we disable the slot-sync worker.
Considering that, I think it will be too many steps just to run the
SQL function at the end without much value added. Thus we can skip
this point, we can rely on slot sync worker completely.

thanks
Shveta

Re: Synchronizing slots from primary to standby

2024-04-29 Thread shveta malik

On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Friday, March 15, 2024 10:45 PM Bertrand Drouvot 
>  wrote:
> >
> > Hi,
> >
> > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote:
> > > Hi,
> > >
> > > Since the standby_slot_names patch has been committed, I am attaching
> > > the last doc patch for review.
> > >
> >
> > Thanks!
> >
> > 1 ===
> >
> > +   continue subscribing to publications now on the new primary server
> > without
> > +   any data loss.
> >
> > I think "without any data loss" should be re-worded in this context. Data 
> > loss in
> > the sense "data committed on the primary and not visible on the subscriber 
> > in
> > case of failover" can still occurs (in case synchronous replication is not 
> > used).
> >
> > 2 ===
> >
> > +   If the result (failover_ready) of both above steps is
> > +   true, existing subscriptions will be able to continue without data loss.
> > +  
> >
> > I don't think that's true if synchronous replication is not used. Say,
> >
> > - synchronous replication is not used
> > - primary is not able to reach the standby anymore and standby_slot_names is
> > set
> > - new data is inserted into the primary
> > - then not replicated to subscriber (due to standby_slot_names)
> >
> > Then I think the both above steps will return true but data would be lost 
> > in case
> > of failover.
>
> Thanks for the comments, attach the new version patch which reworded the
> above places.

Thanks for the patch.

Few comments:

1)  Tested the steps, one of the queries still refers to
'conflict_reason'. I think it should refer 'conflicting'.

2) Will it be good to mention that in case of planned promotion, it is
recommended to run  pg_sync_replication_slots() as last sync attempt
before we run failvoer-ready validation steps? This can be mentioned
in high-availaibility.sgml of current patch

thanks
Shveta

RE: Synchronizing slots from primary to standby

2024-04-28 Thread Zhijie Hou (Fujitsu)

On Friday, March 15, 2024 10:45 PM Bertrand Drouvot 
 wrote:
> 
> Hi,
> 
> On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote:
> > Hi,
> >
> > Since the standby_slot_names patch has been committed, I am attaching
> > the last doc patch for review.
> >
> 
> Thanks!
> 
> 1 ===
> 
> +   continue subscribing to publications now on the new primary server
> without
> +   any data loss.
> 
> I think "without any data loss" should be re-worded in this context. Data 
> loss in
> the sense "data committed on the primary and not visible on the subscriber in
> case of failover" can still occurs (in case synchronous replication is not 
> used).
> 
> 2 ===
> 
> +   If the result (failover_ready) of both above steps is
> +   true, existing subscriptions will be able to continue without data loss.
> +  
> 
> I don't think that's true if synchronous replication is not used. Say,
> 
> - synchronous replication is not used
> - primary is not able to reach the standby anymore and standby_slot_names is
> set
> - new data is inserted into the primary
> - then not replicated to subscriber (due to standby_slot_names)
> 
> Then I think the both above steps will return true but data would be lost in 
> case
> of failover.

Thanks for the comments, attach the new version patch which reworded the
above places.

Best Regards,
Hou zj


v2-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch
Description:  v2-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch

RE: Synchronizing slots from primary to standby

2024-04-12 Thread Zhijie Hou (Fujitsu)

On Friday, April 12, 2024 11:31 AM Amit Kapila  wrote:
> 
> On Thu, Apr 11, 2024 at 5:04 PM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Thursday, April 11, 2024 12:11 PM Amit Kapila 
> wrote:
> >
> > >
> > > 2.
> > > - if (remote_slot->restart_lsn < slot->data.restart_lsn)
> > > + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush)
> > >   elog(ERROR,
> > >   "cannot synchronize local slot \"%s\" LSN(%X/%X)"
> > >
> > > Can we be more specific in this message? How about splitting it into
> > > error_message as "cannot synchronize local slot \"%s\"" and then
> > > errdetail as "Local slot's start streaming location LSN(%X/%X) is
> > > ahead of remote slot's LSN(%X/%X)"?
> >
> > Your version looks better. Since the above two messages all have
> > errdetail, I used the style of ereport(ERROR, errmsg_internal(),
> > errdetail_internal()... in the patch which is equal to the elog(ERROR but 
> > has an
> additional detail message.
> >
> 
> makes sense.
> 
> > Here is V5 patch set.
> >
> 
> I think we should move the check to not advance slot when one of
> remote_slot's restart_lsn or catalog_xmin is lesser than the local slot inside
> update_local_synced_slot() as we want to prevent updating slot in those cases
> even during slot synchronization.

Agreed. Here is the V6 patch which addressed this. I have merged the
two patches into one.

Best Regards,
Hou zj


v6-0001-Fix-the-handling-of-LSN-and-xmin-in-slot-sync.patch
Description:  v6-0001-Fix-the-handling-of-LSN-and-xmin-in-slot-sync.patch

Re: Synchronizing slots from primary to standby

2024-04-11 Thread Amit Kapila

On Thu, Apr 11, 2024 at 5:04 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Thursday, April 11, 2024 12:11 PM Amit Kapila  
> wrote:
>
> >
> > 2.
> > - if (remote_slot->restart_lsn < slot->data.restart_lsn)
> > + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush)
> >   elog(ERROR,
> >   "cannot synchronize local slot \"%s\" LSN(%X/%X)"
> >
> > Can we be more specific in this message? How about splitting it into
> > error_message as "cannot synchronize local slot \"%s\"" and then errdetail 
> > as
> > "Local slot's start streaming location LSN(%X/%X) is ahead of remote slot's
> > LSN(%X/%X)"?
>
> Your version looks better. Since the above two messages all have errdetail, I
> used the style of ereport(ERROR, errmsg_internal(), errdetail_internal()... in
> the patch which is equal to the elog(ERROR but has an additional detail 
> message.
>

makes sense.

> Here is V5 patch set.
>

I think we should move the check to not advance slot when one of
remote_slot's restart_lsn or catalog_xmin is lesser than the local
slot inside update_local_synced_slot() as we want to prevent updating
slot in those cases even during slot synchronization.

-- 
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

2024-04-11 Thread Zhijie Hou (Fujitsu)

On Thursday, April 11, 2024 12:11 PM Amit Kapila  
wrote:
> 
> On Wed, Apr 10, 2024 at 5:28 PM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Thursday, April 4, 2024 5:37 PM Amit Kapila 
> wrote:
> > >
> > > BTW, while thinking on this one, I
> > > noticed that in the function LogicalConfirmReceivedLocation(), we
> > > first update the disk copy, see comment [1] and then in-memory
> > > whereas the same is not true in
> > > update_local_synced_slot() for the case when snapshot exists. Now,
> > > do we have the same risk here in case of standby? Because I think we
> > > will use these xmins while sending the feedback message (in
> XLogWalRcvSendHSFeedback()).
> > >
> > > * We have to write the changed xmin to disk *before* we change
> > > * the in-memory value, otherwise after a crash we wouldn't know
> > > * that some catalog tuples might have been removed already.
> >
> > Yes, I think we have the risk on the standby, I can reproduce the case
> > that if the server crashes after updating the in-memory value and
> > before saving them to disk, the synced slot could be invalidated after
> > restarting from crash, because the necessary rows have been removed on
> > the primary. The steps can be found in [1].
> >
> > I think we'd better fix the order in update_local_synced_slot() as
> > well. I tried to make the fix in 0002, 0001 is Shveta's patch to fix
> > another issue in this thread. Since they are touching the same function, so
> attach them together for review.
> >
> 
> Few comments:
> ===
> 1.
> +
> + /* Sanity check */
> + if (slot->data.confirmed_flush != remote_slot->confirmed_lsn)
> + ereport(LOG, errmsg("synchronized confirmed_flush for slot \"%s\"
> + differs from
> remote slot",
> +remote_slot->name),
> 
> Is there a reason to use elevel as LOG instead of ERROR? I think it should be
> elog(ERROR, .. as this is an unexpected case.

Agreed.

> 
> 2.
> - if (remote_slot->restart_lsn < slot->data.restart_lsn)
> + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush)
>   elog(ERROR,
>   "cannot synchronize local slot \"%s\" LSN(%X/%X)"
> 
> Can we be more specific in this message? How about splitting it into
> error_message as "cannot synchronize local slot \"%s\"" and then errdetail as
> "Local slot's start streaming location LSN(%X/%X) is ahead of remote slot's
> LSN(%X/%X)"?

Your version looks better. Since the above two messages all have errdetail, I
used the style of ereport(ERROR, errmsg_internal(), errdetail_internal()... in
the patch which is equal to the elog(ERROR but has an additional detail message.

Here is V5 patch set.

Best Regards,
Hou zj


v5-0002-write-the-changed-xmin-to-disk-before-computing-g.patch
Description:  v5-0002-write-the-changed-xmin-to-disk-before-computing-g.patch


v5-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch
Description:  v5-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch

Re: Synchronizing slots from primary to standby

2024-04-10 Thread Amit Kapila

On Wed, Apr 10, 2024 at 5:28 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Thursday, April 4, 2024 5:37 PM Amit Kapila  
> wrote:
> >
> > BTW, while thinking on this one, I
> > noticed that in the function LogicalConfirmReceivedLocation(), we first 
> > update
> > the disk copy, see comment [1] and then in-memory whereas the same is not
> > true in
> > update_local_synced_slot() for the case when snapshot exists. Now, do we 
> > have
> > the same risk here in case of standby? Because I think we will use these 
> > xmins
> > while sending the feedback message (in XLogWalRcvSendHSFeedback()).
> >
> > * We have to write the changed xmin to disk *before* we change
> > * the in-memory value, otherwise after a crash we wouldn't know
> > * that some catalog tuples might have been removed already.
>
> Yes, I think we have the risk on the standby, I can reproduce the case that if
> the server crashes after updating the in-memory value and before saving them 
> to
> disk, the synced slot could be invalidated after restarting from crash, 
> because
> the necessary rows have been removed on the primary. The steps can be found in
> [1].
>
> I think we'd better fix the order in update_local_synced_slot() as well. I
> tried to make the fix in 0002, 0001 is Shveta's patch to fix another issue in 
> this thread. Since
> they are touching the same function, so attach them together for review.
>

Few comments:
===
1.
+
+ /* Sanity check */
+ if (slot->data.confirmed_flush != remote_slot->confirmed_lsn)
+ ereport(LOG,
+ errmsg("synchronized confirmed_flush for slot \"%s\" differs from
remote slot",
+remote_slot->name),

Is there a reason to use elevel as LOG instead of ERROR? I think it
should be elog(ERROR, .. as this is an unexpected case.

2.
- if (remote_slot->restart_lsn < slot->data.restart_lsn)
+ if (remote_slot->confirmed_lsn < slot->data.confirmed_flush)
  elog(ERROR,
  "cannot synchronize local slot \"%s\" LSN(%X/%X)"

Can we be more specific in this message? How about splitting it into
error_message as "cannot synchronize local slot \"%s\"" and then
errdetail as "Local slot's start streaming location LSN(%X/%X) is
ahead of remote slot's LSN(%X/%X)"?

-- 
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

2024-04-10 Thread Zhijie Hou (Fujitsu)


On Thursday, April 4, 2024 5:37 PM Amit Kapila  wrote:
> 
> BTW, while thinking on this one, I
> noticed that in the function LogicalConfirmReceivedLocation(), we first update
> the disk copy, see comment [1] and then in-memory whereas the same is not
> true in
> update_local_synced_slot() for the case when snapshot exists. Now, do we have
> the same risk here in case of standby? Because I think we will use these xmins
> while sending the feedback message (in XLogWalRcvSendHSFeedback()).
>
> * We have to write the changed xmin to disk *before* we change
> * the in-memory value, otherwise after a crash we wouldn't know
> * that some catalog tuples might have been removed already.

Yes, I think we have the risk on the standby, I can reproduce the case that if
the server crashes after updating the in-memory value and before saving them to
disk, the synced slot could be invalidated after restarting from crash, because
the necessary rows have been removed on the primary. The steps can be found in
[1].

I think we'd better fix the order in update_local_synced_slot() as well. I
tried to make the fix in 0002, 0001 is Shveta's patch to fix another issue in 
this thread. Since
they are touching the same function, so attach them together for review.

[1]
-- Primary:
SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot', 
'test_decoding', false, false, true);

-- Standby:
SELECT 'init' FROM pg_create_logical_replication_slot('standbylogicalslot', 
'test_decoding', false, false, false);
SELECT pg_sync_replication_slots();

-- Primary:
CREATE TABLE test (a int);
INSERT INTO test VALUES(1);
DROP TABLE test;

SELECT txid_current();
SELECT txid_current();
SELECT txid_current();
SELECT pg_log_standby_snapshot();

SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn());

-- Standby:
- wait for standby to replay all the changes on the primary.

- this is to serialize snapshots.
SELECT pg_replication_slot_advance('standbylogicalslot', 
pg_last_wal_replay_lsn());

- Use gdb to stop at the place after calling ReplicationSlotsComputexx()
  functions and before calling ReplicationSlotSave().
SELECT pg_sync_replication_slots();

-- Primary:

- First, wait for the primary slot(the physical slot)'s catalog xmin to be
  updated to the same as the failover slot.

VACUUM FULL;

- Wait for VACUMM FULL to be replayed on standby.

-- Standby:

- For the process which is blocked by gdb, let the process crash (elog(PANIC,
  ...)).

After restarting the standby from crash, we can see the synced slot is 
invalidated.

LOG:  invalidating obsolete replication slot "logicalslot"
DETAIL:  The slot conflicted with xid horizon 741.
CONTEXT:  WAL redo at 0/3059B90 for Heap2/PRUNE_ON_ACCESS: 
snapshotConflictHorizon: 741, isCatalogRel: T, nplans: 0, nredirected: 0, 
ndead: 7, nunused: 0, dead: [22, 23, 24, 25, 26, 27, 28]; blkref #0: rel 
1663/5/1249, blk 16


Best Regards,
Hou zj


v4-0002-write-the-changed-xmin-to-disk-before-computing-g.patch
Description:  v4-0002-write-the-changed-xmin-to-disk-before-computing-g.patch


v4-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch
Description:  v4-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch

RE: Synchronizing slots from primary to standby

2024-04-09 Thread Zhijie Hou (Fujitsu)

On Thursday, April 4, 2024 4:25 PM Masahiko Sawada  
wrote:

Hi,

> On Wed, Apr 3, 2024 at 7:06 PM Amit Kapila 
> wrote:
> >
> > On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila 
> wrote:
> > >
> > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
> > >  wrote:
> > >
> > > > I quickly looked at v8, and have a nit, rest all looks good.
> > > >
> > > > +if (DecodingContextReady(ctx) &&
> found_consistent_snapshot)
> > > > +*found_consistent_snapshot = true;
> > > >
> > > > Can the found_consistent_snapshot be checked first to help avoid
> > > > the function call DecodingContextReady() for
> > > > pg_replication_slot_advance callers?
> > > >
> > >
> > > Okay, changed. Additionally, I have updated the comments and commit
> > > message. I'll push this patch after some more testing.
> > >
> >
> > Pushed!
> 
> While testing this change, I realized that it could happen that the server 
> logs
> are flooded with the following logical decoding logs that are written every 
> 200
> ms:

Thanks for reporting!

> 
> 2024-04-04 16:15:19.270 JST [3838739] LOG:  starting logical decoding for slot
> "test_sub"
> 2024-04-04 16:15:19.270 JST [3838739] DETAIL:  Streaming transactions
> committing after 0/50006F48, reading WAL from 0/50006F10.
> 2024-04-04 16:15:19.270 JST [3838739] LOG:  logical decoding found
> consistent point at 0/50006F10
> 2024-04-04 16:15:19.270 JST [3838739] DETAIL:  There are no running
> transactions.
> 2024-04-04 16:15:19.477 JST [3838739] LOG:  starting logical decoding for slot
> "test_sub"
> 2024-04-04 16:15:19.477 JST [3838739] DETAIL:  Streaming transactions
> committing after 0/50006F48, reading WAL from 0/50006F10.
> 2024-04-04 16:15:19.477 JST [3838739] LOG:  logical decoding found
> consistent point at 0/50006F10
> 2024-04-04 16:15:19.477 JST [3838739] DETAIL:  There are no running
> transactions.
> 
> For example, I could reproduce it with the following steps:
> 
> 1. create the primary and start.
> 2. run "pgbench -i -s 100" on the primary.
> 3. run pg_basebackup to create the standby.
> 4. configure slotsync setup on the standby and start.
> 5. create a publication for all tables on the primary.
> 6. create the subscriber and start.
> 7. run "pgbench -i -Idtpf" on the subscriber.
> 8. create a subscription on the subscriber (initial data copy will start).
> 
> The logical decoding logs were written every 200 ms during the initial data
> synchronization.
> 
> Looking at the new changes for update_local_synced_slot():
...
> We call LogicalSlotAdvanceAndCheckSnapState() if one of confirmed_lsn,
> restart_lsn, and catalog_xmin is different between the remote slot and the 
> local
> slot. In my test case, during the initial sync performing, only catalog_xmin 
> was
> different and there was no serialized snapshot at restart_lsn, and the 
> slotsync
> worker called LogicalSlotAdvanceAndCheckSnapState(). However no slot
> properties were changed even after the function and it set slot_updated = 
> true.
> So it starts the next slot synchronization after 200ms.

I was trying to reproduce this and check why the catalog_xmin is different
among synced slot and remote slot, but I was not able to reproduce the case
where there are lots of logical decoding logs. The script I used is attached.

Would it be possible for you to share the script you used to reproduce this
issue? Alternatively, could you please share the log files from both the
primary and standby servers after reproducing the problem (it would be greatly
helpful if you could set the log level to DEBUG2).

Best Regards,
Hou zj


test.sh
Description: test.sh

Re: Synchronizing slots from primary to standby

2024-04-08 Thread Amit Kapila

On Mon, Apr 8, 2024 at 7:01 PM Zhijie Hou (Fujitsu)
 wrote:
>
> Thanks for pushing.
>
> I checked the BF status, and noticed one BF failure, which I think is related 
> to
> a miss in the test code.
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-04-08%2012%3A04%3A27
>
> From the following log, I can see the sync failed because the standby is
> lagging behind of the failover slot.
>
> -
> # No postmaster PID for node "cascading_standby"
> error running SQL: 'psql::1: ERROR:  skipping slot synchronization as 
> the received slot sync LSN 0/4000148 for slot "snap_test_slot" is ahead of 
> the standby position 0/4000114'
> while running 'psql -XAtq -d port=50074 host=/tmp/t4HQFlrDmI 
> dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT 
> pg_sync_replication_slots();' at 
> /home/bf/bf-build/adder/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm 
> line 2042.
> # Postmaster PID for node "publisher" is 3715298
> -
>
> I think it's because we missed to call wait_for_replay_catchup before syncing
> slots.
>
> -
> $primary->safe_psql('postgres',
> "SELECT pg_create_logical_replication_slot('snap_test_slot', 
> 'test_decoding', false, false, true);"
> );
> # ? missed to wait here
> $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
> -
>
> While testing, I noticed another place where we were calling
> wait_for_replay_catchup before doing pg_replication_slot_advance, which also 
> has
> a small possibility to cause the failover slot to be ahead of the standby if
> some logs are written in between these two steps. So, I adjusted them 
> together.
>
> Here is a small patch to improve the test.
>

LGTM. I'll push this tomorrow morning unless there are any more
comments or suggestions.

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-08 Thread Amit Kapila

On Mon, Apr 8, 2024 at 9:49 PM Andres Freund  wrote:
>
> On 2024-04-08 16:01:41 +0530, Amit Kapila wrote:
> > Pushed.
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-04-08%2012%3A04%3A27
>
> This unfortunately is a commit after
>

Right, and thanks for the report. Hou-San has analyzed and shared the
patch [1] for this yesterday. I'll review it today.

[1] - 
https://www.postgresql.org/message-id/OS0PR01MB571665359F2F5DCD3ADABC9F94002%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-08 Thread Andres Freund

Hi,

On 2024-04-08 16:01:41 +0530, Amit Kapila wrote:
> Pushed.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-04-08%2012%3A04%3A27

This unfortunately is a commit after

commit 6f3d8d5e7cc
Author: Amit Kapila 
Date:   2024-04-08 13:21:55 +0530

Fix the intermittent buildfarm failures in 040_standby_failover_slots_sync.


Greetings,

Andres

RE: Synchronizing slots from primary to standby

2024-04-08 Thread Zhijie Hou (Fujitsu)

On Monday, April 8, 2024 6:32 PM Amit Kapila  wrote:
> 
> On Mon, Apr 8, 2024 at 12:19 PM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Saturday, April 6, 2024 12:43 PM Amit Kapila 
> wrote:
> > > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
> > >  wrote:
> > >
> > > Yeah, that could be the first step. We can probably add an injection
> > > point to control the bgwrite behavior and then add tests involving
> > > walsender performing the decoding. But I think it is important to
> > > have sufficient tests in this area as I see they are quite helpful in 
> > > uncovering
> the issues.
> >
> > Here is the patch to drop the subscription in the beginning so that
> > the restart_lsn of the lsub1_slot won't be advanced due to concurrent
> > xl_running_xacts from bgwriter. The subscription will be re-created
> > after all the slots are sync-ready. I think maybe we can use this to
> > stabilize the test as a first step and then think about how to make
> > use of injection point to add more tests if it's worth it.
> >
> 
> Pushed.

Thanks for pushing.

I checked the BF status, and noticed one BF failure, which I think is related to
a miss in the test code.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-04-08%2012%3A04%3A27

From the following log, I can see the sync failed because the standby is
lagging behind of the failover slot.

-
# No postmaster PID for node "cascading_standby"
error running SQL: 'psql::1: ERROR:  skipping slot synchronization as 
the received slot sync LSN 0/4000148 for slot "snap_test_slot" is ahead of the 
standby position 0/4000114'
while running 'psql -XAtq -d port=50074 host=/tmp/t4HQFlrDmI dbname='postgres' 
-f - -v ON_ERROR_STOP=1' with sql 'SELECT pg_sync_replication_slots();' at 
/home/bf/bf-build/adder/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm 
line 2042.
# Postmaster PID for node "publisher" is 3715298
-

I think it's because we missed to call wait_for_replay_catchup before syncing
slots.

-
$primary->safe_psql('postgres',
"SELECT pg_create_logical_replication_slot('snap_test_slot', 
'test_decoding', false, false, true);"
);
# ? missed to wait here
$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
-

While testing, I noticed another place where we were calling
wait_for_replay_catchup before doing pg_replication_slot_advance, which also has
a small possibility to cause the failover slot to be ahead of the standby if
some logs are written in between these two steps. So, I adjusted them together.

Here is a small patch to improve the test.

Best Regards,
Hou zj

0001-Ensure-the-standby-is-not-lagging-behind-the-failove.patch
Description:  0001-Ensure-the-standby-is-not-lagging-behind-the-failove.patch

Re: Synchronizing slots from primary to standby

2024-04-08 Thread Amit Kapila

On Mon, Apr 8, 2024 at 12:19 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Saturday, April 6, 2024 12:43 PM Amit Kapila  
> wrote:
> > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
> >  wrote:
> >
> > Yeah, that could be the first step. We can probably add an injection point 
> > to
> > control the bgwrite behavior and then add tests involving walsender
> > performing the decoding. But I think it is important to have sufficient 
> > tests in
> > this area as I see they are quite helpful in uncovering the issues.
>
> Here is the patch to drop the subscription in the beginning so that the
> restart_lsn of the lsub1_slot won't be advanced due to concurrent
> xl_running_xacts from bgwriter. The subscription will be re-created after all
> the slots are sync-ready. I think maybe we can use this to stabilize the test
> as a first step and then think about how to make use of injection point to add
> more tests if it's worth it.
>

Pushed.

-- 
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

2024-04-08 Thread Zhijie Hou (Fujitsu)

On Saturday, April 6, 2024 12:43 PM Amit Kapila  wrote:
> On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
>  wrote:
> >
> > On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote:
> > > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila 
> wrote:
> > > Thinking more on this, it doesn't seem related to
> > > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't
> > > change any locking or something like that which impacts write positions.
> >
> > Agree.
> >
> > > I think what has happened here is that running_xact record written
> > > by the background writer [1] is not written to the kernel or disk
> > > (see LogStandbySnapshot()), before pg_current_wal_lsn() checks the
> > > current_lsn to be compared with replayed LSN.
> >
> > Agree, I think it's not visible through pg_current_wal_lsn() yet.
> >
> > Also I think that the DEBUG message in LogCurrentRunningXacts()
> >
> > "
> > elog(DEBUG2,
> >  "snapshot of %d+%d running transaction ids (lsn %X/%X oldest
> xid %u latest complete %u next xid %u)",
> >  CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt,
> >  LSN_FORMAT_ARGS(recptr),
> >  CurrRunningXacts->oldestRunningXid,
> >  CurrRunningXacts->latestCompletedXid,
> >  CurrRunningXacts->nextXid); "
> >
> > should be located after the XLogSetAsyncXactLSN() call. Indeed, the
> > new LSN is visible after the spinlock (XLogCtl->info_lck) in
> > XLogSetAsyncXactLSN() is released,
> >
> 
> I think the new LSN can be visible only when the corresponding WAL is written
> by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can make it
> visible. In your experiment below, isn't it possible that in the meantime WAL
> writer has written that WAL due to which you are seeing an updated location?
> 
> >see:
> >
> > \watch on Session 1 provides:
> >
> >  pg_current_wal_lsn
> > 
> >  0/87D110
> > (1 row)
> >
> > Until:
> >
> > Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at
> xlog.c:2579
> > 2579XLogRecPtr  WriteRqstPtr = asyncXactLSN;
> > (gdb) n
> > 2581boolwakeup = false;
> > (gdb)
> > 2584SpinLockAcquire(>info_lck);
> > (gdb)
> > 2585RefreshXLogWriteResult(LogwrtResult);
> > (gdb)
> > 2586sleeping = XLogCtl->WalWriterSleeping;
> > (gdb)
> > 2587prevAsyncXactLSN = XLogCtl->asyncXactLSN;
> > (gdb)
> > 2588if (XLogCtl->asyncXactLSN < asyncXactLSN)
> > (gdb)
> > 2589XLogCtl->asyncXactLSN = asyncXactLSN;
> > (gdb)
> > 2590SpinLockRelease(>info_lck);
> > (gdb) p p/x (uint32) XLogCtl->asyncXactLSN
> > $1 = 0x87d148
> >
> > Then session 1 provides:
> >
> >  pg_current_wal_lsn
> > 
> >  0/87D148
> > (1 row)
> >
> > So, when we see in the log:
> >
> > 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:
> > snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740
> > latest complete 739 next xid 740)
> > 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG:
> statement: SELECT '0/360' <= replay_lsn AND state = 'streaming'
> >
> > It's indeed possible that the new LSN was not visible yet (spinlock
> > not released?) before the query began (because we can not rely on the
> > time the DEBUG message has been emitted).
> >
> > > Note that the reason why
> > > walsender has picked the running_xact written by background writer
> > > is because it has checked after pg_current_wal_lsn() query, see LOGs [2].
> > > I think we can probably try to reproduce manually via debugger.
> > >
> > > If this theory is correct
> >
> > It think it is.
> >
> > > then I think we will need to use injection points to control the
> > > behavior of bgwriter or use the slots created via SQL API for
> > > syncing in tests.
> > >
> > > Thoughts?
> >
> > I think that maybe as a first step we should move the "elog(DEBUG2,"
> > message as proposed above to help debugging (that could help to confirm
> the above theory).
> >
> 
> I think I am missing how exactly moving DEBUG2 can confirm the above theory.
> 
> > If the theory is proven then I'm not sure we need the extra complexity
> > of injection point here, maybe just relying on the slots created via
> > SQL API could be enough.
> >
> 
> Yeah, that could be the first step. We can probably add an injection point to
> control the bgwrite behavior and then add tests involving walsender
> performing the decoding. But I think it is important to have sufficient tests 
> in
> this area as I see they are quite helpful in uncovering the issues.

Here is the patch to drop the subscription in the beginning so that the
restart_lsn of the lsub1_slot won't be advanced due to concurrent
xl_running_xacts from bgwriter. The subscription will be re-created after all
the slots are sync-ready. I think maybe we can use this to stabilize the test
as a first step and then think about how to make use of injection point

Re: Synchronizing slots from primary to standby

2024-04-07 Thread Amit Kapila

On Sun, Apr 7, 2024 at 3:06 AM Andres Freund  wrote:
>
> On 2024-04-06 10:58:32 +0530, Amit Kapila wrote:
> > On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila  wrote:
> > >
> >
> > There are still a few pending issues to be fixed in this feature but
> > otherwise, we have committed all the main patches, so I marked the CF
> > entry corresponding to this work as committed.
>
> There are a a fair number of failures of 040_standby_failover_slots_sync in
> the buildfarm.  It'd be nice to get those fixed soon-ish.
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris=2024-04-06%2020%3A58%3A50
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2024-04-06%2015%3A18%3A08
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo=2024-04-06%2010%3A13%3A58
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit=2024-04-05%2016%3A04%3A10
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo=2024-04-05%2014%3A59%3A40
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-04-05%2014%3A59%3A07
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit=2024-04-05%2014%3A18%3A07
>
> The symptoms are similar, but not entirely identical across all of them, I 
> think.
>

I have analyzed these failures and there are two different tests that
are failing but the underlying reason is the same as being discussed
with Bertrand. We are working on the fix.

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-06 Thread Andres Freund

Hi,

On 2024-04-06 10:58:32 +0530, Amit Kapila wrote:
> On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila  wrote:
> >
> 
> There are still a few pending issues to be fixed in this feature but
> otherwise, we have committed all the main patches, so I marked the CF
> entry corresponding to this work as committed.

There are a a fair number of failures of 040_standby_failover_slots_sync in
the buildfarm.  It'd be nice to get those fixed soon-ish.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris=2024-04-06%2020%3A58%3A50
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2024-04-06%2015%3A18%3A08
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo=2024-04-06%2010%3A13%3A58
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit=2024-04-05%2016%3A04%3A10
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo=2024-04-05%2014%3A59%3A40
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-04-05%2014%3A59%3A07
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit=2024-04-05%2014%3A18%3A07

The symptoms are similar, but not entirely identical across all of them, I 
think.

I've also seen a bunch of failures of this test locally.

Greetings,

Andres Freund

Re: Synchronizing slots from primary to standby

2024-04-06 Thread Bertrand Drouvot

Hi,

On Sat, Apr 06, 2024 at 10:13:00AM +0530, Amit Kapila wrote:
> On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
>  wrote:
> 
> I think the new LSN can be visible only when the corresponding WAL is
> written by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can
> make it visible. In your experiment below, isn't it possible that in
> the meantime WAL writer has written that WAL due to which you are
> seeing an updated location?

What I did is:

session 1:  select pg_current_wal_lsn();\watch 1
session 2:  select pg_backend_pid();

terminal 1: tail -f logfile | grep -i snap 
terminal 2 : gdb -p ) at standby.c:1346
1346{
(gdb) n
1350

Then next, next until the DEBUG message is emitted (confirmed in terminal 1).

At this stage the DEBUG message shows the new LSN while session 1 still displays
the previous LSN.

Then once XLogSetAsyncXactLSN() is done in the gdb session (terminal 2) then
session 1 displays the new LSN.

This is reproducible as desired.

With more debugging I can see that when the spinlock is released in 
XLogSetAsyncXactLSN()
then XLogWrite() is doing its job and then session 1 does see the new value 
(that
happens in this order, and as you said that's expected).

My point is that while the DEBUG message is emitted session 1 still 
see the old LSN (until the new LSN is vsible). I think that we should emit the
DEBUG message once session 1 can see the new value (If not, I think the 
timestamp
of the DEBUG message can be missleading during debugging purpose).

> I think I am missing how exactly moving DEBUG2 can confirm the above theory.

I meant to say that instead of seeing:

2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:  snapshot 
of 0+0 running transaction ids (lsn 0/398 oldest xid 740 latest complete 
739 next xid 740)
2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG:  statement: 
SELECT '0/360' <= replay_lsn AND state = 'streaming'

We would probably see something like:

2024-04-05 04:37:05. UTC [3866475][client backend][2/4:0] LOG:  
statement: SELECT '0/360' <= replay_lsn AND state = 'streaming'
2024-04-05 04:37:05.+xx UTC [3854278][background writer][:0] DEBUG:  
snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740 latest 
complete 739 next xid 740)

And then it would be clear that the query has ran before the new LSN is visible.

> > If the theory is proven then I'm not sure we need the extra complexity of
> > injection point here, maybe just relying on the slots created via SQL API 
> > could
> > be enough.
> >
> 
> Yeah, that could be the first step. We can probably add an injection
> point to control the bgwrite behavior and then add tests involving
> walsender performing the decoding. But I think it is important to have
> sufficient tests in this area as I see they are quite helpful in
> uncovering the issues.
>

Yeah agree. 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Amit Kapila

On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila  wrote:
>

There are still a few pending issues to be fixed in this feature but
otherwise, we have committed all the main patches, so I marked the CF
entry corresponding to this work as committed.

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Amit Kapila

On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
 wrote:
>
> On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote:
> > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila  wrote:
> > Thinking more on this, it doesn't seem related to
> > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't change
> > any locking or something like that which impacts write positions.
>
> Agree.
>
> > I think what has happened here is that running_xact record written by
> > the background writer [1] is not written to the kernel or disk (see
> > LogStandbySnapshot()), before pg_current_wal_lsn() checks the
> > current_lsn to be compared with replayed LSN.
>
> Agree, I think it's not visible through pg_current_wal_lsn() yet.
>
> Also I think that the DEBUG message in LogCurrentRunningXacts()
>
> "
> elog(DEBUG2,
>  "snapshot of %d+%d running transaction ids (lsn %X/%X oldest xid 
> %u latest complete %u next xid %u)",
>  CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt,
>  LSN_FORMAT_ARGS(recptr),
>  CurrRunningXacts->oldestRunningXid,
>  CurrRunningXacts->latestCompletedXid,
>  CurrRunningXacts->nextXid);
> "
>
> should be located after the XLogSetAsyncXactLSN() call. Indeed, the new LSN is
> visible after the spinlock (XLogCtl->info_lck) in XLogSetAsyncXactLSN() is
> released,
>

I think the new LSN can be visible only when the corresponding WAL is
written by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can
make it visible. In your experiment below, isn't it possible that in
the meantime WAL writer has written that WAL due to which you are
seeing an updated location?

>see:
>
> \watch on Session 1 provides:
>
>  pg_current_wal_lsn
> 
>  0/87D110
> (1 row)
>
> Until:
>
> Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at xlog.c:2579
> 2579XLogRecPtr  WriteRqstPtr = asyncXactLSN;
> (gdb) n
> 2581boolwakeup = false;
> (gdb)
> 2584SpinLockAcquire(>info_lck);
> (gdb)
> 2585RefreshXLogWriteResult(LogwrtResult);
> (gdb)
> 2586sleeping = XLogCtl->WalWriterSleeping;
> (gdb)
> 2587prevAsyncXactLSN = XLogCtl->asyncXactLSN;
> (gdb)
> 2588if (XLogCtl->asyncXactLSN < asyncXactLSN)
> (gdb)
> 2589XLogCtl->asyncXactLSN = asyncXactLSN;
> (gdb)
> 2590SpinLockRelease(>info_lck);
> (gdb) p p/x (uint32) XLogCtl->asyncXactLSN
> $1 = 0x87d148
>
> Then session 1 provides:
>
>  pg_current_wal_lsn
> 
>  0/87D148
> (1 row)
>
> So, when we see in the log:
>
> 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:  snapshot 
> of 0+0 running transaction ids (lsn 0/398 oldest xid 740 latest complete 
> 739 next xid 740)
> 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG:  statement: 
> SELECT '0/360' <= replay_lsn AND state = 'streaming'
>
> It's indeed possible that the new LSN was not visible yet (spinlock not 
> released?)
> before the query began (because we can not rely on the time the DEBUG message 
> has
> been emitted).
>
> > Note that the reason why
> > walsender has picked the running_xact written by background writer is
> > because it has checked after pg_current_wal_lsn() query, see LOGs [2].
> > I think we can probably try to reproduce manually via debugger.
> >
> > If this theory is correct
>
> It think it is.
>
> > then I think we will need to use injection
> > points to control the behavior of bgwriter or use the slots created
> > via SQL API for syncing in tests.
> >
> > Thoughts?
>
> I think that maybe as a first step we should move the "elog(DEBUG2," message 
> as
> proposed above to help debugging (that could help to confirm the above 
> theory).
>

I think I am missing how exactly moving DEBUG2 can confirm the above theory.

> If the theory is proven then I'm not sure we need the extra complexity of
> injection point here, maybe just relying on the slots created via SQL API 
> could
> be enough.
>

Yeah, that could be the first step. We can probably add an injection
point to control the bgwrite behavior and then add tests involving
walsender performing the decoding. But I think it is important to have
sufficient tests in this area as I see they are quite helpful in
uncovering the issues.

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Bertrand Drouvot

Hi,

On Fri, Apr 05, 2024 at 02:35:42PM +, Bertrand Drouvot wrote:
> I think that maybe as a first step we should move the "elog(DEBUG2," message 
> as
> proposed above to help debugging (that could help to confirm the above 
> theory).

If you agree and think that makes sense, pleae find attached a tiny patch doing
so.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From a414e81a4c5c88963b2412d1641f3de1262c15c0 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Fri, 5 Apr 2024 14:49:51 +
Subject: [PATCH v1] Move DEBUG message in LogCurrentRunningXacts()

Indeed the new LSN is visible to others session (say through pg_current_wal_lsn())
only after the spinlock (XLogCtl->info_lck) in XLogSetAsyncXactLSN() is released.

So moving the DEBUG message after the XLogSetAsyncXactLSN() call seems more
appropriate for debugging purpose.
---
 src/backend/storage/ipc/standby.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)
 100.0% src/backend/storage/ipc/

diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 87b04e51b3..ba549acf50 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -1366,6 +1366,17 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
 
 	recptr = XLogInsert(RM_STANDBY_ID, XLOG_RUNNING_XACTS);
 
+	/*
+	 * Ensure running_xacts information is synced to disk not too far in the
+	 * future. We don't want to stall anything though (i.e. use XLogFlush()),
+	 * so we let the wal writer do it during normal operation.
+	 * XLogSetAsyncXactLSN() conveniently will mark the LSN as to-be-synced
+	 * and nudge the WALWriter into action if sleeping. Check
+	 * XLogBackgroundFlush() for details why a record might not be flushed
+	 * without it.
+	 */
+	XLogSetAsyncXactLSN(recptr);
+
 	if (CurrRunningXacts->subxid_overflow)
 		elog(DEBUG2,
 			 "snapshot of %d running transactions overflowed (lsn %X/%X oldest xid %u latest complete %u next xid %u)",
@@ -1383,17 +1394,6 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
 			 CurrRunningXacts->latestCompletedXid,
 			 CurrRunningXacts->nextXid);
 
-	/*
-	 * Ensure running_xacts information is synced to disk not too far in the
-	 * future. We don't want to stall anything though (i.e. use XLogFlush()),
-	 * so we let the wal writer do it during normal operation.
-	 * XLogSetAsyncXactLSN() conveniently will mark the LSN as to-be-synced
-	 * and nudge the WALWriter into action if sleeping. Check
-	 * XLogBackgroundFlush() for details why a record might not be flushed
-	 * without it.
-	 */
-	XLogSetAsyncXactLSN(recptr);
-
 	return recptr;
 }
 
-- 
2.34.1

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Bertrand Drouvot

Hi,

On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote:
> On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila  wrote:
> Thinking more on this, it doesn't seem related to
> c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't change
> any locking or something like that which impacts write positions.

Agree.

> I think what has happened here is that running_xact record written by
> the background writer [1] is not written to the kernel or disk (see
> LogStandbySnapshot()), before pg_current_wal_lsn() checks the
> current_lsn to be compared with replayed LSN.

Agree, I think it's not visible through pg_current_wal_lsn() yet.

Also I think that the DEBUG message in LogCurrentRunningXacts() 

"
elog(DEBUG2,
 "snapshot of %d+%d running transaction ids (lsn %X/%X oldest xid 
%u latest complete %u next xid %u)",
 CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt,
 LSN_FORMAT_ARGS(recptr),
 CurrRunningXacts->oldestRunningXid,
 CurrRunningXacts->latestCompletedXid,
 CurrRunningXacts->nextXid);
"

should be located after the XLogSetAsyncXactLSN() call. Indeed, the new LSN is
visible after the spinlock (XLogCtl->info_lck) in XLogSetAsyncXactLSN() is
released, see:

\watch on Session 1 provides: 

 pg_current_wal_lsn

 0/87D110
(1 row)

Until:

Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at xlog.c:2579 
2579XLogRecPtr  WriteRqstPtr = asyncXactLSN;
(gdb) n
2581boolwakeup = false;
(gdb)
2584SpinLockAcquire(>info_lck);
(gdb)
2585RefreshXLogWriteResult(LogwrtResult);
(gdb)
2586sleeping = XLogCtl->WalWriterSleeping;
(gdb)
2587prevAsyncXactLSN = XLogCtl->asyncXactLSN;
(gdb)
2588if (XLogCtl->asyncXactLSN < asyncXactLSN)
(gdb)
2589XLogCtl->asyncXactLSN = asyncXactLSN;
(gdb)
2590SpinLockRelease(>info_lck);
(gdb) p p/x (uint32) XLogCtl->asyncXactLSN
$1 = 0x87d148

Then session 1 provides:

 pg_current_wal_lsn

 0/87D148
(1 row)

So, when we see in the log:

2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:  snapshot 
of 0+0 running transaction ids (lsn 0/398 oldest xid 740 latest complete 
739 next xid 740)
2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG:  statement: 
SELECT '0/360' <= replay_lsn AND state = 'streaming'

It's indeed possible that the new LSN was not visible yet (spinlock not 
released?)
before the query began (because we can not rely on the time the DEBUG message 
has
been emitted).

> Note that the reason why
> walsender has picked the running_xact written by background writer is
> because it has checked after pg_current_wal_lsn() query, see LOGs [2].
> I think we can probably try to reproduce manually via debugger.
> 
> If this theory is correct

It think it is.

> then I think we will need to use injection
> points to control the behavior of bgwriter or use the slots created
> via SQL API for syncing in tests.
> 
> Thoughts?

I think that maybe as a first step we should move the "elog(DEBUG2," message as
proposed above to help debugging (that could help to confirm the above theory).

If the theory is proven then I'm not sure we need the extra complexity of
injection point here, maybe just relying on the slots created via SQL API could
be enough.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Amit Kapila

On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila  wrote:
>
> On Thu, Apr 4, 2024 at 2:59 PM shveta malik  wrote:
> >
> > There is an intermittent BF failure observed at [1] after this commit 
> > (2ec005b).
> >
>
> Thanks for analyzing and providing the patch. I'll look into it. There
> is another BF failure [1] which I have analyzed. The main reason for
> failure is the following test:
>
> #   Failed test 'logical slots have synced as true on standby'
> #   at 
> /home/bf/bf-build/serinus/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl
> line 198.
> #  got: 'f'
> # expected: 't'
>
> Here, we are expecting that the two logical slots (lsub1_slot, and
> lsub2_slot), one created via subscription and another one via API
> pg_create_logical_replication_slot() are synced. The standby LOGs
> which are as follows show that the one created by API 'lsub2_slot' is
> synced but the other one 'lsub1_slot':
>
> LOG for lsub1_slot:
> 
> 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] DETAIL:
> Streaming transactions committing after 0/0, reading WAL from
> 0/360.
> 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0]
> STATEMENT:  SELECT pg_sync_replication_slots();
> 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] DEBUG:
> xmin required by slots: data 0, catalog 740
> 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] LOG:
> could not sync slot "lsub1_slot"
>
> LOG for lsub2_slot:
> 
> 2024-04-05 04:37:08.518 UTC [3867682][client backend][0/2:0] DEBUG:
> xmin required by slots: data 0, catalog 740
> 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] LOG:
> newly created slot "lsub2_slot" is sync-ready now
> 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0]
> STATEMENT:  SELECT pg_sync_replication_slots();
>
> We can see from the log of lsub1_slot that its restart_lsn is
> 0/360 which means it will start reading from the WAL from that
> location. Now, if we check the publisher log, we have running_xacts
> record at that location. See following LOGs:
>
> 2024-04-05 04:36:57.830 UTC [3860839][client backend][8/2:0] LOG:
> statement: SELECT pg_create_logical_replication_slot('lsub2_slot',
> 'test_decoding', false, false, true);
> 2024-04-05 04:36:58.718 UTC [3860839][client backend][8/2:0] DEBUG:
> snapshot of 0+0 running transaction ids (lsn 0/360 oldest xid 740
> latest complete 739 next xid 740)
> 
> 
> 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:
> snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740
> latest complete 739 next xid 740)
>
> The first running_xact record ends at 360 and the second one at
> 398. So, the start location of the second running_xact is 360,
> the same can be confirmed by the following LOG line of walsender:
>
> 2024-04-05 04:37:05.144 UTC [3857385][walsender][25/0:0] DEBUG:
> serializing snapshot to pg_logical/snapshots/0-360.snap
>
> This shows that while processing running_xact at location 360, we
> have serialized the snapshot. As there is no running transaction in
> WAL at 360 so ideally we should have reached a consistent state
> after processing that record on standby. But the reason standby didn't
> process that LOG is that the confirmed_flush LSN is also at the same
> location so the function LogicalSlotAdvanceAndCheckSnapState() exits
> without reading the WAL at that location. Now, this can be confirmed
> by the below walsender-specific LOG in publisher:
>
> 2024-04-05 04:36:59.155 UTC [3857385][walsender][25/0:0] DEBUG: write
> 0/360 flush 0/360 apply 0/360 reply_time 2024-04-05
> 04:36:59.155181+00
>
> We update the confirmed_flush location with the flush location after
> receiving the above feedback. You can notice that we didn't receive
> the feedback for the 398 location and hence both the
> confirmed_flush and restart_lsn are at the same location 0/360.
> Now, the test is waiting for the subscriber to send feedback of the
> last WAL write location by
> $primary->wait_for_catchup('regress_mysub1'); As noticed from the
> publisher LOGs, the query we used for wait is:
>
> SELECT '0/360' <= replay_lsn AND state = 'streaming'
> FROM pg_catalog.pg_stat_replication
> WHERE application_name IN ('regress_mysub1', 'walreceiver')
>
> Here, instead of '0/360' it should have used ''0/398' which is
> the last write location. This position we get via function
> pg_current_wal_lsn()->GetXLogWriteRecPtr()->LogwrtResult.Write. And
> this variable seems to be touched by commit
> c9920a9068eac2e6c8fb34988d18c0b42b9bf811. Though unlikely could
> c9920a9068eac2e6c8fb34988d18c0b42b9bf811 be a reason for failure? At
> this stage, I am not sure so just sharing with others to see if what I
> am saying sounds logical. I'll think more about this.
>

Thinking more on this, it doesn't seem related to
c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Amit Kapila

On Thu, Apr 4, 2024 at 2:59 PM shveta malik  wrote:
>
> There is an intermittent BF failure observed at [1] after this commit 
> (2ec005b).
>

Thanks for analyzing and providing the patch. I'll look into it. There
is another BF failure [1] which I have analyzed. The main reason for
failure is the following test:

#   Failed test 'logical slots have synced as true on standby'
#   at 
/home/bf/bf-build/serinus/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl
line 198.
#  got: 'f'
# expected: 't'

Here, we are expecting that the two logical slots (lsub1_slot, and
lsub2_slot), one created via subscription and another one via API
pg_create_logical_replication_slot() are synced. The standby LOGs
which are as follows show that the one created by API 'lsub2_slot' is
synced but the other one 'lsub1_slot':

LOG for lsub1_slot:

2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] DETAIL:
Streaming transactions committing after 0/0, reading WAL from
0/360.
2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0]
STATEMENT:  SELECT pg_sync_replication_slots();
2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] DEBUG:
xmin required by slots: data 0, catalog 740
2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] LOG:
could not sync slot "lsub1_slot"

LOG for lsub2_slot:

2024-04-05 04:37:08.518 UTC [3867682][client backend][0/2:0] DEBUG:
xmin required by slots: data 0, catalog 740
2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] LOG:
newly created slot "lsub2_slot" is sync-ready now
2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0]
STATEMENT:  SELECT pg_sync_replication_slots();

We can see from the log of lsub1_slot that its restart_lsn is
0/360 which means it will start reading from the WAL from that
location. Now, if we check the publisher log, we have running_xacts
record at that location. See following LOGs:

2024-04-05 04:36:57.830 UTC [3860839][client backend][8/2:0] LOG:
statement: SELECT pg_create_logical_replication_slot('lsub2_slot',
'test_decoding', false, false, true);
2024-04-05 04:36:58.718 UTC [3860839][client backend][8/2:0] DEBUG:
snapshot of 0+0 running transaction ids (lsn 0/360 oldest xid 740
latest complete 739 next xid 740)

2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:
snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740
latest complete 739 next xid 740)

The first running_xact record ends at 360 and the second one at
398. So, the start location of the second running_xact is 360,
the same can be confirmed by the following LOG line of walsender:

2024-04-05 04:37:05.144 UTC [3857385][walsender][25/0:0] DEBUG:
serializing snapshot to pg_logical/snapshots/0-360.snap

This shows that while processing running_xact at location 360, we
have serialized the snapshot. As there is no running transaction in
WAL at 360 so ideally we should have reached a consistent state
after processing that record on standby. But the reason standby didn't
process that LOG is that the confirmed_flush LSN is also at the same
location so the function LogicalSlotAdvanceAndCheckSnapState() exits
without reading the WAL at that location. Now, this can be confirmed
by the below walsender-specific LOG in publisher:

2024-04-05 04:36:59.155 UTC [3857385][walsender][25/0:0] DEBUG: write
0/360 flush 0/360 apply 0/360 reply_time 2024-04-05
04:36:59.155181+00

We update the confirmed_flush location with the flush location after
receiving the above feedback. You can notice that we didn't receive
the feedback for the 398 location and hence both the
confirmed_flush and restart_lsn are at the same location 0/360.
Now, the test is waiting for the subscriber to send feedback of the
last WAL write location by
$primary->wait_for_catchup('regress_mysub1'); As noticed from the
publisher LOGs, the query we used for wait is:

SELECT '0/360' <= replay_lsn AND state = 'streaming'
FROM pg_catalog.pg_stat_replication
WHERE application_name IN ('regress_mysub1', 'walreceiver')

Here, instead of '0/360' it should have used ''0/398' which is
the last write location. This position we get via function
pg_current_wal_lsn()->GetXLogWriteRecPtr()->LogwrtResult.Write. And
this variable seems to be touched by commit
c9920a9068eac2e6c8fb34988d18c0b42b9bf811. Though unlikely could
c9920a9068eac2e6c8fb34988d18c0b42b9bf811 be a reason for failure? At
this stage, I am not sure so just sharing with others to see if what I
am saying sounds logical. I'll think more about this.

[1] - 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus=2024-04-05%2004%3A34%3A27

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-05 Thread shveta malik

On Fri, Apr 5, 2024 at 4:31 PM Bertrand Drouvot
 wrote:
>
> BTW, I just realized that the LSN I used in my example in the 
> LSN_FORMAT_ARGS()
> are not the right ones.

Noted. Thanks.

Please find v3 with the comments addressed.

thanks
Shveta


v3-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch
Description: Binary data

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Bertrand Drouvot

Hi,

On Fri, Apr 05, 2024 at 04:09:01PM +0530, shveta malik wrote:
> On Fri, Apr 5, 2024 at 10:09 AM Bertrand Drouvot
>  wrote:
> >
> > What about something like?
> >
> > ereport(LOG,
> > errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs 
> > from remote slot",
> > remote_slot->name),
> > errdetail("Remote slot has LSN %X/%X but local slot has LSN %X/%X.",
> > LSN_FORMAT_ARGS(remote_slot->restart_lsn),
> > LSN_FORMAT_ARGS(slot->data.restart_lsn));
> >
> > Regards,
> 
> +1. Better than earlier. I will update and post the patch.
> 

Thanks!

BTW, I just realized that the LSN I used in my example in the LSN_FORMAT_ARGS()
are not the right ones.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-04-05 Thread shveta malik

On Fri, Apr 5, 2024 at 10:09 AM Bertrand Drouvot
 wrote:
>
> What about something like?
>
> ereport(LOG,
> errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs from 
> remote slot",
> remote_slot->name),
> errdetail("Remote slot has LSN %X/%X but local slot has LSN %X/%X.",
> LSN_FORMAT_ARGS(remote_slot->restart_lsn),
> LSN_FORMAT_ARGS(slot->data.restart_lsn));
>
> Regards,

+1. Better than earlier. I will update and post the patch.

thanks
Shveta

Re: Synchronizing slots from primary to standby

2024-04-04 Thread Bertrand Drouvot

Hi,

On Fri, Apr 05, 2024 at 09:43:35AM +0530, shveta malik wrote:
> On Fri, Apr 5, 2024 at 9:22 AM Bertrand Drouvot
>  wrote:
> >
> > Hi,
> >
> > On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote:
> > > On Thu, Apr 4, 2024 at 2:59 PM shveta malik  
> > > wrote:
> > 2 ===
> >
> > +   if (slot->data.confirmed_flush != 
> > remote_slot->confirmed_lsn)
> > +   elog(LOG,
> > +"could not synchronize local slot 
> > \"%s\" LSN(%X/%X)"
> > +" to remote slot's LSN(%X/%X) ",
> > +remote_slot->name,
> > +
> > LSN_FORMAT_ARGS(slot->data.confirmed_flush),
> > +
> > LSN_FORMAT_ARGS(remote_slot->confirmed_lsn));
> >
> > I don't think that the message is correct here. Unless I am missing 
> > something
> > there is nothing in the following code path that would prevent the slot to 
> > be
> > sync during this cycle.
> 
> This is a sanity check,  I will put a comment to indicate the same.

Thanks!

> We
> want to ensure if anything changes in future, we get correct logs to
> indicate that.

Right, understood that way.

> If required, the LOG msg can be changed. Kindly suggest if you have
> anything better in mind.
> 

What about something like?

ereport(LOG,
errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs from 
remote slot",
remote_slot->name),
errdetail("Remote slot has LSN %X/%X but local slot has LSN %X/%X.",
LSN_FORMAT_ARGS(remote_slot->restart_lsn),
LSN_FORMAT_ARGS(slot->data.restart_lsn));

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-04-04 Thread shveta malik

On Fri, Apr 5, 2024 at 9:22 AM Bertrand Drouvot
 wrote:
>
> Hi,
>
> On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote:
> > On Thu, Apr 4, 2024 at 2:59 PM shveta malik  wrote:
> > >
> > >
> > > Prior to commit 2ec005b, this check was okay, as we did not expect
> > > restart_lsn of the synced slot to be ahead of remote since we were
> > > directly copying the lsns. But now when we use 'advance' to do logical
> > > decoding on standby, there is a possibility that restart lsn of the
> > > synced slot is ahead of remote slot, if there are running txns records
> > > found after reaching consistent-point while consuming WALs from
> > > restart_lsn till confirmed_lsn. In such a case, slot-sync's advance
> > > may end up serializing snapshots and setting restart_lsn to the
> > > serialized snapshot point, ahead of remote one.
> > >
> > > Fix:
> > > The sanity check needs to be corrected. Attached a patch to address the 
> > > issue.
> >
>
> Thanks for reporting, explaining the issue and providing a patch.
>
> Regarding the patch:
>
> 1 ===
>
> +* Attempt to sync lsns and xmins only if remote slot is ahead of 
> local
>
> s/lsns/LSNs/?
>
> 2 ===
>
> +   if (slot->data.confirmed_flush != 
> remote_slot->confirmed_lsn)
> +   elog(LOG,
> +"could not synchronize local slot 
> \"%s\" LSN(%X/%X)"
> +" to remote slot's LSN(%X/%X) ",
> +remote_slot->name,
> +
> LSN_FORMAT_ARGS(slot->data.confirmed_flush),
> +
> LSN_FORMAT_ARGS(remote_slot->confirmed_lsn));
>
> I don't think that the message is correct here. Unless I am missing something
> there is nothing in the following code path that would prevent the slot to be
> sync during this cycle.

This is a sanity check,  I will put a comment to indicate the same. We
want to ensure if anything changes in future, we get correct logs to
indicate that.
If required, the LOG msg can be changed. Kindly suggest if you have
anything better in mind.

thanks
Shveta

Re: Synchronizing slots from primary to standby

2024-04-04 Thread Bertrand Drouvot

Hi,

On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote:
> On Thu, Apr 4, 2024 at 2:59 PM shveta malik  wrote:
> >
> >
> > Prior to commit 2ec005b, this check was okay, as we did not expect
> > restart_lsn of the synced slot to be ahead of remote since we were
> > directly copying the lsns. But now when we use 'advance' to do logical
> > decoding on standby, there is a possibility that restart lsn of the
> > synced slot is ahead of remote slot, if there are running txns records
> > found after reaching consistent-point while consuming WALs from
> > restart_lsn till confirmed_lsn. In such a case, slot-sync's advance
> > may end up serializing snapshots and setting restart_lsn to the
> > serialized snapshot point, ahead of remote one.
> >
> > Fix:
> > The sanity check needs to be corrected. Attached a patch to address the 
> > issue.
> 

Thanks for reporting, explaining the issue and providing a patch.

Regarding the patch:

1 ===

+* Attempt to sync lsns and xmins only if remote slot is ahead of local

s/lsns/LSNs/?

2 ===

+   if (slot->data.confirmed_flush != 
remote_slot->confirmed_lsn)
+   elog(LOG,
+"could not synchronize local slot 
\"%s\" LSN(%X/%X)"
+" to remote slot's LSN(%X/%X) ",
+remote_slot->name,
+
LSN_FORMAT_ARGS(slot->data.confirmed_flush),
+
LSN_FORMAT_ARGS(remote_slot->confirmed_lsn));

I don't think that the message is correct here. Unless I am missing something
there is nothing in the following code path that would prevent the slot to be
sync during this cycle. 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-04-04 Thread shveta malik

On Thu, Apr 4, 2024 at 2:59 PM shveta malik  wrote:
>
>
> Prior to commit 2ec005b, this check was okay, as we did not expect
> restart_lsn of the synced slot to be ahead of remote since we were
> directly copying the lsns. But now when we use 'advance' to do logical
> decoding on standby, there is a possibility that restart lsn of the
> synced slot is ahead of remote slot, if there are running txns records
> found after reaching consistent-point while consuming WALs from
> restart_lsn till confirmed_lsn. In such a case, slot-sync's advance
> may end up serializing snapshots and setting restart_lsn to the
> serialized snapshot point, ahead of remote one.
>
> Fix:
> The sanity check needs to be corrected. Attached a patch to address the issue.

Please find v2 which has detailed commit-msg and some more comments in code.

thanks
Shveta


v2-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch
Description: Binary data

Re: Synchronizing slots from primary to standby

2024-04-04 Thread Amit Kapila

On Thu, Apr 4, 2024 at 1:55 PM Masahiko Sawada  wrote:
>
> While testing this change, I realized that it could happen that the
> server logs are flooded with the following logical decoding logs that
> are written every 200 ms:
>
> 2024-04-04 16:15:19.270 JST [3838739] LOG:  starting logical decoding
> for slot "test_sub"
...
...
>
> For example, I could reproduce it with the following steps:
>
> 1. create the primary and start.
> 2. run "pgbench -i -s 100" on the primary.
> 3. run pg_basebackup to create the standby.
> 4. configure slotsync setup on the standby and start.
> 5. create a publication for all tables on the primary.
> 6. create the subscriber and start.
> 7. run "pgbench -i -Idtpf" on the subscriber.
> 8. create a subscription on the subscriber (initial data copy will start).
>
> The logical decoding logs were written every 200 ms during the initial
> data synchronization.
>
> Looking at the new changes for update_local_synced_slot():
>
> if (remote_slot->confirmed_lsn != slot->data.confirmed_flush ||
> remote_slot->restart_lsn != slot->data.restart_lsn ||
> remote_slot->catalog_xmin != slot->data.catalog_xmin)
> {
> /*
>  * We can't directly copy the remote slot's LSN or xmin unless there
>  * exists a consistent snapshot at that point. Otherwise, after
>  * promotion, the slots may not reach a consistent point before the
>  * confirmed_flush_lsn which can lead to a data loss. To avoid data
>  * loss, we let slot machinery advance the slot which ensures that
>  * snapbuilder/slot statuses are updated properly.
>  */
> if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
> {
> /*
>  * Update the slot info directly if there is a serialized snapshot
>  * at the restart_lsn, as the slot can quickly reach consistency
>  * at restart_lsn by restoring the snapshot.
>  */
> SpinLockAcquire(>mutex);
> slot->data.restart_lsn = remote_slot->restart_lsn;
> slot->data.confirmed_flush = remote_slot->confirmed_lsn;
> slot->data.catalog_xmin = remote_slot->catalog_xmin;
> slot->effective_catalog_xmin = remote_slot->catalog_xmin;
> SpinLockRelease(>mutex);
>
> if (found_consistent_snapshot)
> *found_consistent_snapshot = true;
> }
> else
> {
> LogicalSlotAdvanceAndCheckSnapState(remote_slot->confirmed_lsn,
> found_consistent_snapshot);
> }
>
> ReplicationSlotsComputeRequiredXmin(false);
> ReplicationSlotsComputeRequiredLSN();
>
> slot_updated = true;
>
> We call LogicalSlotAdvanceAndCheckSnapState() if one of confirmed_lsn,
> restart_lsn, and catalog_xmin is different between the remote slot and
> the local slot. In my test case, during the initial sync performing,
> only catalog_xmin was different and there was no serialized snapshot
> at restart_lsn, and the slotsync worker called
> LogicalSlotAdvanceAndCheckSnapState(). However no slot properties were
> changed even after the function and it set slot_updated = true. So it
> starts the next slot synchronization after 200ms.
>
> It seems to me that we can skip calling
> LogicalSlotAdvanceAndCheckSnapState() at least when the remote and
> local have the same restart_lsn and confirmed_lsn.
>

I think we can do that but do we know what caused catalog_xmin to be
updated regularly without any change in restart/confirmed_flush LSN? I
think the LSNs are not updated during the initial sync (copy) time but
how catalog_xmin is getting updated for the same slot?

BTW, if we see, we will probably anyway except this xmin as it is due
to the following code in LogicalIncreaseXminForSlot()

LogicalIncreaseXminForSlot()
{
/*
* If the client has already confirmed up to this lsn, we directly can
* mark this as accepted. This can happen if we restart decoding in a
* slot.
*/
else if (current_lsn <= slot->data.confirmed_flush)
{
slot->candidate_catalog_xmin = xmin;
slot->candidate_xmin_lsn = current_lsn;

/* our candidate can directly be used */
updated_xmin = true;
}

> I'm not sure there are other scenarios but is it worth fixing this symptom?
>

I think so but let's investigate this a bit more. BTW, while thinking
on this one, I noticed that in the function
LogicalConfirmReceivedLocation(), we first update the disk copy, see
comment [1] and then in-memory whereas the same is not true in
update_local_synced_slot() for the case when snapshot exists. Now, do
we have the same risk here in case of standby? Because I think we will
use these xmins while sending the feedback message (in
XLogWalRcvSendHSFeedback()).

[1]
/*
* We have to write the changed xmin to disk *before* we change
* the in-memory value, otherwise after a crash we wouldn't know
* that some catalog tuples might have been removed already.

Re: Synchronizing slots from primary to standby

2024-04-04 Thread shveta malik

On Wed, Apr 3, 2024 at 3:36 PM Amit Kapila  wrote:
>
> On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila  wrote:
> >
> > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
> >  wrote:
> >
> > > I quickly looked at v8, and have a nit, rest all looks good.
> > >
> > > +if (DecodingContextReady(ctx) && found_consistent_snapshot)
> > > +*found_consistent_snapshot = true;
> > >
> > > Can the found_consistent_snapshot be checked first to help avoid the
> > > function call DecodingContextReady() for pg_replication_slot_advance
> > > callers?
> > >
> >
> > Okay, changed. Additionally, I have updated the comments and commit
> > message. I'll push this patch after some more testing.
> >
>
> Pushed!

There is an intermittent BF failure observed at [1] after this commit (2ec005b).

Analysis:
We see in BF logs, that during the first call of the sync function,
restart_lsn of the synced slot is advanced to a lsn which is > than
remote slot's restart_lsn. And when sync call is done second time
without any further change on primary, it hits the error:
  ERROR:  cannot synchronize local slot "lsub1_slot" LSN(0/360) to
remote slot's LSN(0/328) as synchronization would move it
backwards

Relevant BF logs are given at [2]. This reproduces intermittently
depending upon if bgwriter logs running txn record when the test is
running. We were able to mimic the test case to reproduce the failure.
Please see attached bf-test.txt for steps.

Issue:
Issue is that we are having a wrong sanity check based on
'restart_lsn' in synchronize_one_slot():

if (remote_slot->restart_lsn < slot->data.restart_lsn)
elog(ERROR, ...);

Prior to commit 2ec005b, this check was okay, as we did not expect
restart_lsn of the synced slot to be ahead of remote since we were
directly copying the lsns. But now when we use 'advance' to do logical
decoding on standby, there is a possibility that restart lsn of the
synced slot is ahead of remote slot, if there are running txns records
found after reaching consistent-point while consuming WALs from
restart_lsn till confirmed_lsn. In such a case, slot-sync's advance
may end up serializing snapshots and setting restart_lsn to the
serialized snapshot point, ahead of remote one.

Fix:
The sanity check needs to be corrected. Attached a patch to address the issue.
a) The sanity check is corrected to compare confirmed_lsn rather than
restart_lsn.
Additional changes:
b) A log has been added after LogicalSlotAdvanceAndCheckSnapState() to
log the case when the local and remote slots' confirmed-lsn were not
found to be the same after sync (if at all).
c) Now we attempt to sync in update_local_synced_slot() if one of
confirmed_lsn, restart_lsn, and catalog_xmin for remote slot is ahead
of local slot instead of them just being unequal.

[1]:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae=2024-04-03%2017%3A57%3A28

[2]:
2024-04-03 18:00:41.896 UTC [3239617][client backend][0/2:0] LOG:
statement: SELECT pg_sync_replication_slots();
LOG:  starting logical decoding for slot "lsub1_slot"
DETAIL:  Streaming transactions committing after 0/0, reading WAL from
0/328.
LOG:  logical decoding found consistent point at 0/328
DEBUG:  serializing snapshot to pg_logical/snapshots/0-360.snap
DEBUG:  got new restart lsn 0/360 at 0/360
LOG:  newly created slot "lsub1_slot" is sync-ready now

2024-04-03 18:00:45.218 UTC [3243487][client backend][2/2:0] LOG:
statement: SELECT pg_sync_replication_slots();
  ERROR:  cannot synchronize local slot "lsub1_slot" LSN(0/360) to
remote slot's LSN(0/328) as synchronization would move it
backwards

thanks
Shveta
Keep sync_replication_slots as disabled on standby.

--primary:
SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot', 
'test_decoding', false, false, true);

SELECT pg_log_standby_snapshot();
SELECT pg_log_standby_snapshot();
SELECT pg_log_standby_snapshot();
SELECT pg_log_standby_snapshot();
select pg_sleep(2);

SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn());
select slot_name,database,xmin,catalog_xmin 
cxmin,restart_lsn,confirmed_flush_lsn flushLsn,wal_status,conflicting conf, 
failover,synced,temporary temp, active from pg_replication_slots order by 
slot_name;

--standby:
select pg_sync_replication_slots();
select pg_sync_replication_slots(); -- ERROR here

v1-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch
Description: Binary data

Re: Synchronizing slots from primary to standby

2024-04-04 Thread Masahiko Sawada

On Wed, Apr 3, 2024 at 7:06 PM Amit Kapila  wrote:
>
> On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila  wrote:
> >
> > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
> >  wrote:
> >
> > > I quickly looked at v8, and have a nit, rest all looks good.
> > >
> > > +if (DecodingContextReady(ctx) && found_consistent_snapshot)
> > > +*found_consistent_snapshot = true;
> > >
> > > Can the found_consistent_snapshot be checked first to help avoid the
> > > function call DecodingContextReady() for pg_replication_slot_advance
> > > callers?
> > >
> >
> > Okay, changed. Additionally, I have updated the comments and commit
> > message. I'll push this patch after some more testing.
> >
>
> Pushed!

While testing this change, I realized that it could happen that the
server logs are flooded with the following logical decoding logs that
are written every 200 ms:

2024-04-04 16:15:19.270 JST [3838739] LOG:  starting logical decoding
for slot "test_sub"
2024-04-04 16:15:19.270 JST [3838739] DETAIL:  Streaming transactions
committing after 0/50006F48, reading WAL from 0/50006F10.
2024-04-04 16:15:19.270 JST [3838739] LOG:  logical decoding found
consistent point at 0/50006F10
2024-04-04 16:15:19.270 JST [3838739] DETAIL:  There are no running
transactions.
2024-04-04 16:15:19.477 JST [3838739] LOG:  starting logical decoding
for slot "test_sub"
2024-04-04 16:15:19.477 JST [3838739] DETAIL:  Streaming transactions
committing after 0/50006F48, reading WAL from 0/50006F10.
2024-04-04 16:15:19.477 JST [3838739] LOG:  logical decoding found
consistent point at 0/50006F10
2024-04-04 16:15:19.477 JST [3838739] DETAIL:  There are no running
transactions.

For example, I could reproduce it with the following steps:

1. create the primary and start.
2. run "pgbench -i -s 100" on the primary.
3. run pg_basebackup to create the standby.
4. configure slotsync setup on the standby and start.
5. create a publication for all tables on the primary.
6. create the subscriber and start.
7. run "pgbench -i -Idtpf" on the subscriber.
8. create a subscription on the subscriber (initial data copy will start).

The logical decoding logs were written every 200 ms during the initial
data synchronization.

Looking at the new changes for update_local_synced_slot():

if (remote_slot->confirmed_lsn != slot->data.confirmed_flush ||
remote_slot->restart_lsn != slot->data.restart_lsn ||
remote_slot->catalog_xmin != slot->data.catalog_xmin)
{
/*
 * We can't directly copy the remote slot's LSN or xmin unless there
 * exists a consistent snapshot at that point. Otherwise, after
 * promotion, the slots may not reach a consistent point before the
 * confirmed_flush_lsn which can lead to a data loss. To avoid data
 * loss, we let slot machinery advance the slot which ensures that
 * snapbuilder/slot statuses are updated properly.
 */
if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
{
/*
 * Update the slot info directly if there is a serialized snapshot
 * at the restart_lsn, as the slot can quickly reach consistency
 * at restart_lsn by restoring the snapshot.
 */
SpinLockAcquire(>mutex);
slot->data.restart_lsn = remote_slot->restart_lsn;
slot->data.confirmed_flush = remote_slot->confirmed_lsn;
slot->data.catalog_xmin = remote_slot->catalog_xmin;
slot->effective_catalog_xmin = remote_slot->catalog_xmin;
SpinLockRelease(>mutex);

if (found_consistent_snapshot)
*found_consistent_snapshot = true;
}
else
{
LogicalSlotAdvanceAndCheckSnapState(remote_slot->confirmed_lsn,
found_consistent_snapshot);
}

ReplicationSlotsComputeRequiredXmin(false);
ReplicationSlotsComputeRequiredLSN();

slot_updated = true;

We call LogicalSlotAdvanceAndCheckSnapState() if one of confirmed_lsn,
restart_lsn, and catalog_xmin is different between the remote slot and
the local slot. In my test case, during the initial sync performing,
only catalog_xmin was different and there was no serialized snapshot
at restart_lsn, and the slotsync worker called
LogicalSlotAdvanceAndCheckSnapState(). However no slot properties were
changed even after the function and it set slot_updated = true. So it
starts the next slot synchronization after 200ms.

It seems to me that we can skip calling
LogicalSlotAdvanceAndCheckSnapState() at least when the remote and
local have the same restart_lsn and confirmed_lsn.

I'm not sure there are other scenarios but is it worth fixing this symptom?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-04-03 Thread Amit Kapila

On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila  wrote:
>
> On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
>  wrote:
>
> > I quickly looked at v8, and have a nit, rest all looks good.
> >
> > +if (DecodingContextReady(ctx) && found_consistent_snapshot)
> > +*found_consistent_snapshot = true;
> >
> > Can the found_consistent_snapshot be checked first to help avoid the
> > function call DecodingContextReady() for pg_replication_slot_advance
> > callers?
> >
>
> Okay, changed. Additionally, I have updated the comments and commit
> message. I'll push this patch after some more testing.
>

Pushed!

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Amit Kapila

On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
 wrote:
>
> On Wed, Apr 3, 2024 at 9:04 AM Amit Kapila  wrote:
> >
> > > I'd just rename LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr
> > > moveto, bool *found_consistent_snapshot) to
> > > pg_logical_replication_slot_advance(XLogRecPtr moveto, bool
> > > *found_consistent_snapshot) and use it. If others don't like this, I'd
> > > at least turn pg_logical_replication_slot_advance(XLogRecPtr moveto) a
> > > static inline function.
> > >
> > Yeah, we can do that but it is not a performance-sensitive routine so
> > don't know if it is worth it.
>
> Okay for what the patch has right now. No more bikeshedding from me on this.
>
> > > > The slotsync worker needs to advance the slots from different databases 
> > > > in
> > > > fast_forward. So, we need to skip this check in fast_forward mode. The 
> > > > analysis can
> > > > be found in [3].
> > > -if (slot->data.database != MyDatabaseId)
> > > +/*
> > > + * We need to access the system tables during decoding to build the
> > > + * logical changes unless we are in fast_forward mode where no 
> > > changes are
> > > + * generated.
> > > + */
> > > +if (slot->data.database != MyDatabaseId && !fast_forward)
> > >  ereport(ERROR,
> > >
> > > It's not clear from the comment that we need it for a slotsync worker
> > > to advance the slots from different databases. Can this be put into
> > > the comment? Also, specify in the comment, why this is safe?
> > >
> > It is not specific to slot sync worker but specific to fast_forward
> > mode. There is already a comment "We need to access the system tables
> > during decoding to build the logical changes unless we are in
> > fast_forward mode where no changes are generated." telling why it is
> > safe. The point is we need database access to access system tables
> > while generating the logical changes and in fast-forward mode, we
> > don't generate logical changes so this check is not required. Do let
> > me if you have a different understanding or if my understanding is
> > incorrect.
>
> Understood. Thanks. Just curious, why isn't a problem for the existing
> fast_forward mode callers pg_replication_slot_advance and
> LogicalReplicationSlotHasPendingWal?
>

We call those after connecting to the database and the slot also
belongs to that database whereas during synchronization of slots
standby. the slots could be from different databases.

> I quickly looked at v8, and have a nit, rest all looks good.
>
> +if (DecodingContextReady(ctx) && found_consistent_snapshot)
> +*found_consistent_snapshot = true;
>
> Can the found_consistent_snapshot be checked first to help avoid the
> function call DecodingContextReady() for pg_replication_slot_advance
> callers?
>

Okay, changed. Additionally, I have updated the comments and commit
message. I'll push this patch after some more testing.

-- 
With Regards,
Amit Kapila.


v9-0001-Ensure-that-the-sync-slots-reach-a-consistent-sta.patch
Description: Binary data

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Bharath Rupireddy

On Wed, Apr 3, 2024 at 9:04 AM Amit Kapila  wrote:
>
> > I'd just rename LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr
> > moveto, bool *found_consistent_snapshot) to
> > pg_logical_replication_slot_advance(XLogRecPtr moveto, bool
> > *found_consistent_snapshot) and use it. If others don't like this, I'd
> > at least turn pg_logical_replication_slot_advance(XLogRecPtr moveto) a
> > static inline function.
> >
> Yeah, we can do that but it is not a performance-sensitive routine so
> don't know if it is worth it.

Okay for what the patch has right now. No more bikeshedding from me on this.

> > > The slotsync worker needs to advance the slots from different databases in
> > > fast_forward. So, we need to skip this check in fast_forward mode. The 
> > > analysis can
> > > be found in [3].
> > -if (slot->data.database != MyDatabaseId)
> > +/*
> > + * We need to access the system tables during decoding to build the
> > + * logical changes unless we are in fast_forward mode where no changes 
> > are
> > + * generated.
> > + */
> > +if (slot->data.database != MyDatabaseId && !fast_forward)
> >  ereport(ERROR,
> >
> > It's not clear from the comment that we need it for a slotsync worker
> > to advance the slots from different databases. Can this be put into
> > the comment? Also, specify in the comment, why this is safe?
> >
> It is not specific to slot sync worker but specific to fast_forward
> mode. There is already a comment "We need to access the system tables
> during decoding to build the logical changes unless we are in
> fast_forward mode where no changes are generated." telling why it is
> safe. The point is we need database access to access system tables
> while generating the logical changes and in fast-forward mode, we
> don't generate logical changes so this check is not required. Do let
> me if you have a different understanding or if my understanding is
> incorrect.

Understood. Thanks. Just curious, why isn't a problem for the existing
fast_forward mode callers pg_replication_slot_advance and
LogicalReplicationSlotHasPendingWal?

I quickly looked at v8, and have a nit, rest all looks good.

+if (DecodingContextReady(ctx) && found_consistent_snapshot)
+*found_consistent_snapshot = true;

Can the found_consistent_snapshot be checked first to help avoid the
function call DecodingContextReady() for pg_replication_slot_advance
callers?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Amit Kapila

On Tue, Apr 2, 2024 at 7:42 PM Bharath Rupireddy
 wrote:
>
> On Tue, Apr 2, 2024 at 7:25 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > > 1. Can we just remove pg_logical_replication_slot_advance and use
> > > LogicalSlotAdvanceAndCheckSnapState instead? If worried about the
> > > function naming, LogicalSlotAdvanceAndCheckSnapState can be renamed to
> > > pg_logical_replication_slot_advance?
> > >
> > > + * Advance our logical replication slot forward. See
> > > + * LogicalSlotAdvanceAndCheckSnapState for details.
> > >   */
> > >  static XLogRecPtr
> > >  pg_logical_replication_slot_advance(XLogRecPtr moveto)
> > >  {
> >
> > It was commented[1] that it's not appropriate for the
> > pg_logical_replication_slot_advance to have an out parameter
> > 'ready_for_decoding' which looks bit awkward as the functionality doesn't 
> > match
> > the name, and is also not consistent with the style of
> > pg_physical_replication_slot_advance(). So, we added a new function.
>
> I disagree here. A new function just for a parameter is not that great
> IMHO.
>

It is not for the parameter but primarily for the functionality it
provides. The additional functionality of whether we reached a
consistent point while advancing the slot doesn't sound to suit the
current function. Also, we want to keep the signature similar to the
existing function pg_physical_replication_slot_advance().

>
> I'd just rename LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr
> moveto, bool *found_consistent_snapshot) to
> pg_logical_replication_slot_advance(XLogRecPtr moveto, bool
> *found_consistent_snapshot) and use it. If others don't like this, I'd
> at least turn pg_logical_replication_slot_advance(XLogRecPtr moveto) a
> static inline function.
>

Yeah, we can do that but it is not a performance-sensitive routine so
don't know if it is worth it.

> > > 5. As far as the test case for this issue is concerned, I'm fine with
> > > adding one using an INJECTION point because we seem to be having no
> > > consistent way to control postgres writing current snapshot to WAL.
> >
> > Since me and my colleagues can reproduce the issue consistently after 
> > applying
> > 0002 and it could be rare for concurrent xl_running_xacts to happen, we 
> > discussed[2] to
> > consider adding the INJECTION point after pushing the main fix.
>
> Right.
>
> > > 7.
> > > +/*
> > > + * We need to access the system tables during decoding to build the
> > > + * logical changes unless we are in fast-forward mode where no 
> > > changes
> > > are
> > > + * generated.
> > > + */
> > > +if (slot->data.database != MyDatabaseId && !fast_forward)
> > >
> > > May I know if we need this change for this fix?
> >
> > The slotsync worker needs to advance the slots from different databases in
> > fast_forward. So, we need to skip this check in fast_forward mode. The 
> > analysis can
> > be found in [3].
> -if (slot->data.database != MyDatabaseId)
> +/*
> + * We need to access the system tables during decoding to build the
> + * logical changes unless we are in fast_forward mode where no changes 
> are
> + * generated.
> + */
> +if (slot->data.database != MyDatabaseId && !fast_forward)
>  ereport(ERROR,
>
> It's not clear from the comment that we need it for a slotsync worker
> to advance the slots from different databases. Can this be put into
> the comment? Also, specify in the comment, why this is safe?
>

It is not specific to slot sync worker but specific to fast_forward
mode. There is already a comment "We need to access the system tables
during decoding to build the logical changes unless we are in
fast_forward mode where no changes are generated." telling why it is
safe. The point is we need database access to access system tables
while generating the logical changes and in fast-forward mode, we
don't generate logical changes so this check is not required. Do let
me if you have a different understanding or if my understanding is
incorrect.

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Bharath Rupireddy

On Tue, Apr 2, 2024 at 7:25 PM Zhijie Hou (Fujitsu)
 wrote:
>
> > 1. Can we just remove pg_logical_replication_slot_advance and use
> > LogicalSlotAdvanceAndCheckSnapState instead? If worried about the
> > function naming, LogicalSlotAdvanceAndCheckSnapState can be renamed to
> > pg_logical_replication_slot_advance?
> >
> > + * Advance our logical replication slot forward. See
> > + * LogicalSlotAdvanceAndCheckSnapState for details.
> >   */
> >  static XLogRecPtr
> >  pg_logical_replication_slot_advance(XLogRecPtr moveto)
> >  {
>
> It was commented[1] that it's not appropriate for the
> pg_logical_replication_slot_advance to have an out parameter
> 'ready_for_decoding' which looks bit awkward as the functionality doesn't 
> match
> the name, and is also not consistent with the style of
> pg_physical_replication_slot_advance(). So, we added a new function.

I disagree here. A new function just for a parameter is not that great
IMHO. I'd just rename LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr
moveto, bool *found_consistent_snapshot) to
pg_logical_replication_slot_advance(XLogRecPtr moveto, bool
*found_consistent_snapshot) and use it. If others don't like this, I'd
at least turn pg_logical_replication_slot_advance(XLogRecPtr moveto) a
static inline function.

> > 5. As far as the test case for this issue is concerned, I'm fine with
> > adding one using an INJECTION point because we seem to be having no
> > consistent way to control postgres writing current snapshot to WAL.
>
> Since me and my colleagues can reproduce the issue consistently after applying
> 0002 and it could be rare for concurrent xl_running_xacts to happen, we 
> discussed[2] to
> consider adding the INJECTION point after pushing the main fix.

Right.

> > 7.
> > +/*
> > + * We need to access the system tables during decoding to build the
> > + * logical changes unless we are in fast-forward mode where no changes
> > are
> > + * generated.
> > + */
> > +if (slot->data.database != MyDatabaseId && !fast_forward)
> >
> > May I know if we need this change for this fix?
>
> The slotsync worker needs to advance the slots from different databases in
> fast_forward. So, we need to skip this check in fast_forward mode. The 
> analysis can
> be found in [3].
-if (slot->data.database != MyDatabaseId)
+/*
+ * We need to access the system tables during decoding to build the
+ * logical changes unless we are in fast_forward mode where no changes are
+ * generated.
+ */
+if (slot->data.database != MyDatabaseId && !fast_forward)
 ereport(ERROR,

It's not clear from the comment that we need it for a slotsync worker
to advance the slots from different databases. Can this be put into
the comment? Also, specify in the comment, why this is safe?

Also, if this change is needed for only slotsync workers, why not
protect it with IsSyncingReplicationSlots()? Otherwise, it might
impact non-slotsync callers, no?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

2024-04-02 Thread Zhijie Hou (Fujitsu)

On Tuesday, April 2, 2024 8:49 PM Bharath Rupireddy 
 wrote:
> 
> On Tue, Apr 2, 2024 at 2:11 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > CFbot[1] complained about one query result's order in the tap-test, so I am
> > attaching a V7 patch set which fixed this. There are no changes in 0001.
> >
> > [1] https://cirrus-ci.com/task/6375962162495488
> 
> Thanks. Here are some comments:

Thanks for the comments.

> 
> 1. Can we just remove pg_logical_replication_slot_advance and use
> LogicalSlotAdvanceAndCheckSnapState instead? If worried about the
> function naming, LogicalSlotAdvanceAndCheckSnapState can be renamed to
> pg_logical_replication_slot_advance?
> 
> + * Advance our logical replication slot forward. See
> + * LogicalSlotAdvanceAndCheckSnapState for details.
>   */
>  static XLogRecPtr
>  pg_logical_replication_slot_advance(XLogRecPtr moveto)
>  {

It was commented[1] that it's not appropriate for the
pg_logical_replication_slot_advance to have an out parameter
'ready_for_decoding' which looks bit awkward as the functionality doesn't match
the name, and is also not consistent with the style of
pg_physical_replication_slot_advance(). So, we added a new function.

> 
> 2.
> +if (!ready_for_decoding)
> +{
> +elog(DEBUG1, "could not find consistent point for synced
> slot; restart_lsn = %X/%X",
> + LSN_FORMAT_ARGS(slot->data.restart_lsn));
> 
> Can we specify the slot name in the message?

Added.

> 
> 3. Also, can the "could not find consistent point for synced slot;
> restart_lsn = %X/%X" be emitted at LOG level just like other messages
> in update_and_persist_local_synced_slot. Although, I see "XXX should
> this be changed to elog(DEBUG1) perhaps?", these messages need to be
> at LOG level as they help debug issues if at all they are hit.

Changed to LOG and reworded the message.

> 
> 4. How about using found_consistent_snapshot instead of
> ready_for_decoding? A general understanding is that the synced slots
> are not allowed for decoding (although with this fix, we do that for
> internal purposes), ready_for_decoding looks a bit misleading.

Agreed and renamed.

> 
> 5. As far as the test case for this issue is concerned, I'm fine with
> adding one using an INJECTION point because we seem to be having no
> consistent way to control postgres writing current snapshot to WAL.

Since me and my colleagues can reproduce the issue consistently after applying
0002 and it could be rare for concurrent xl_running_xacts to happen, we 
discussed[2] to
consider adding the INJECTION point after pushing the main fix.

> 
> 6. A nit: can we use "fast_forward mode" instead of "fast-forward
> mode" just to be consistent?
> + * logical changes unless we are in fast-forward mode where no changes
> are
> 
> 7.
> +/*
> + * We need to access the system tables during decoding to build the
> + * logical changes unless we are in fast-forward mode where no changes
> are
> + * generated.
> + */
> +if (slot->data.database != MyDatabaseId && !fast_forward)
> 
> May I know if we need this change for this fix?

The slotsync worker needs to advance the slots from different databases in
fast_forward. So, we need to skip this check in fast_forward mode. The analysis 
can
be found in [3].

Attach the V8 patch which addressed above comments.


[1] 
https://www.postgresql.org/message-id/CAA4eK1%2BwkaRi2BrLLC_0gKbHN68Awc9dRp811G3An6A6fuqdOg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/ZgvI9iAUWCZ17z5V%40ip-10-97-1-34.eu-west-3.compute.internal
[3] 
https://www.postgresql.org/message-id/CAJpy0uCQ2PDCAqcnbdOz6q_ZqmBfMyBpVqKDqL_XZBP%3DeK-1yw%40mail.gmail.com

Best Regards,
Hou zj


v8-0002-test-the-data-loss-case.patch
Description: v8-0002-test-the-data-loss-case.patch


v8-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description:  v8-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Bharath Rupireddy

On Tue, Apr 2, 2024 at 2:11 PM Zhijie Hou (Fujitsu)
 wrote:
>
> CFbot[1] complained about one query result's order in the tap-test, so I am
> attaching a V7 patch set which fixed this. There are no changes in 0001.
>
> [1] https://cirrus-ci.com/task/6375962162495488

Thanks. Here are some comments:

1. Can we just remove pg_logical_replication_slot_advance and use
LogicalSlotAdvanceAndCheckSnapState instead? If worried about the
function naming, LogicalSlotAdvanceAndCheckSnapState can be renamed to
pg_logical_replication_slot_advance?

+ * Advance our logical replication slot forward. See
+ * LogicalSlotAdvanceAndCheckSnapState for details.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
 {

2.
+if (!ready_for_decoding)
+{
+elog(DEBUG1, "could not find consistent point for synced
slot; restart_lsn = %X/%X",
+ LSN_FORMAT_ARGS(slot->data.restart_lsn));

Can we specify the slot name in the message?

3. Also, can the "could not find consistent point for synced slot;
restart_lsn = %X/%X" be emitted at LOG level just like other messages
in update_and_persist_local_synced_slot. Although, I see "XXX should
this be changed to elog(DEBUG1) perhaps?", these messages need to be
at LOG level as they help debug issues if at all they are hit.

4. How about using found_consistent_snapshot instead of
ready_for_decoding? A general understanding is that the synced slots
are not allowed for decoding (although with this fix, we do that for
internal purposes), ready_for_decoding looks a bit misleading.

5. As far as the test case for this issue is concerned, I'm fine with
adding one using an INJECTION point because we seem to be having no
consistent way to control postgres writing current snapshot to WAL.

6. A nit: can we use "fast_forward mode" instead of "fast-forward
mode" just to be consistent?
+ * logical changes unless we are in fast-forward mode where no changes are

7.
+/*
+ * We need to access the system tables during decoding to build the
+ * logical changes unless we are in fast-forward mode where no changes are
+ * generated.
+ */
+if (slot->data.database != MyDatabaseId && !fast_forward)

May I know if we need this change for this fix?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Bertrand Drouvot

Hi,

On Tue, Apr 02, 2024 at 02:19:30PM +0530, Amit Kapila wrote:
> On Tue, Apr 2, 2024 at 1:54 PM Bertrand Drouvot
>  wrote:
> > What about adding a "wait" injection point in LogStandbySnapshot() to 
> > prevent
> > checkpointer/bgwriter to log a standby snapshot? Something among those 
> > lines:
> >
> >if (AmCheckpointerProcess() || AmBackgroundWriterProcess())
> >INJECTION_POINT("bgw-log-standby-snapshot");
> >
> > And make use of it in the test, something like:
> >
> >$node_primary->safe_psql('postgres',
> >"SELECT injection_points_attach('bgw-log-standby-snapshot', 
> > 'wait');");
> >
> 
> Sometimes we want the checkpoint to log the standby snapshot as we
> need it at a predictable time, maybe one can use
> pg_log_standby_snapshot() instead of that. Can we add an injection
> point as a separate patch/commit after a bit more discussion?

Sure, let's come back to this injection point discussion after the feature
freeze. BTW, I think it could also be useful to make use of injection point for
the test that has been added in 7f13ac8123.

I'll open a new thread for this at that time.

>. One other
> idea to make such tests predictable is to add a developer-specific GUC
> say debug_bg_log_standby_snapshot or something like that but injection
> point sounds like a better idea.

Agree.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Amit Kapila

On Tue, Apr 2, 2024 at 1:54 PM Bertrand Drouvot
 wrote:
>
> On Tue, Apr 02, 2024 at 07:20:46AM +, Zhijie Hou (Fujitsu) wrote:
> > I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which 
> > can
> > reproduce the data loss issue consistently on my machine.
>
> Thanks!
>
> > It may not reproduce
> > in some rare cases if concurrent xl_running_xacts are written by bgwriter, 
> > but
> > I think it's still valuable if it can verify the fix in most cases.
>
> What about adding a "wait" injection point in LogStandbySnapshot() to prevent
> checkpointer/bgwriter to log a standby snapshot? Something among those lines:
>
>if (AmCheckpointerProcess() || AmBackgroundWriterProcess())
>INJECTION_POINT("bgw-log-standby-snapshot");
>
> And make use of it in the test, something like:
>
>$node_primary->safe_psql('postgres',
>"SELECT injection_points_attach('bgw-log-standby-snapshot', 
> 'wait');");
>

Sometimes we want the checkpoint to log the standby snapshot as we
need it at a predictable time, maybe one can use
pg_log_standby_snapshot() instead of that. Can we add an injection
point as a separate patch/commit after a bit more discussion? I want
to discuss this in a separate thread so that later we should not get
an objection to adding an injection_point at this location. One other
idea to make such tests predictable is to add a developer-specific GUC
say debug_bg_log_standby_snapshot or something like that but injection
point sounds like a better idea.

-- 
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

2024-04-02 Thread Zhijie Hou (Fujitsu)

On Tuesday, April 2, 2024 3:21 PM Zhijie Hou (Fujitsu)  
wrote:
> On Tuesday, April 2, 2024 8:35 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Monday, April 1, 2024 7:30 PM Amit Kapila 
> > wrote:
> > >
> > > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu)
> > > 
> > > wrote:
> > > >
> > > > On Friday, March 29, 2024 2:50 PM Amit Kapila
> > > > 
> > > wrote:
> > > > >
> > > >
> > > > >
> > > > >
> > > > > 2.
> > > > > +extern XLogRecPtr
> > > > > +pg_logical_replication_slot_advance(XLogRecPtr
> > > moveto,
> > > > > +   bool *found_consistent_point);
> > > > > +
> > > > >
> > > > > This API looks a bit awkward as the functionality doesn't match
> > > > > the name. How about having a function with name
> > > > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > > > > ready_for_decoding) with the same functionality as your patch
> > > > > has for
> > > > > pg_logical_replication_slot_advance() and then invoke it both
> > > > > from pg_logical_replication_slot_advance and slotsync.c. The
> > > > > function name is too big, we can think of a shorter name. Any ideas?
> > > >
> > > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> > > > LogicalSlotAdvanceAndCheckDecoding()?
> > > >
> > >
> > > It is about snapbuild state, so how about naming the function as
> > > LogicalSlotAdvanceAndCheckSnapState()?
> >
> > It looks better to me, so changed.
> >
> > >
> > > I have made quite a few cosmetic changes in comments and code. See
> > > attached. This is atop your latest patch. Can you please review and
> > > include these changes in the next version?
> >
> > Thanks, I have reviewed and merged them.
> > Attach the V5 patch set which addressed above comments and ran pgindent.
> 
> I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which 
> can
> reproduce the data loss issue consistently on my machine. It may not
> reproduce in some rare cases if concurrent xl_running_xacts are written by
> bgwriter, but I think it's still valuable if it can verify the fix in most 
> cases. The test
> will fail if directly applied on HEAD, and will pass after applying atop of 
> 0001.

CFbot[1] complained about one query result's order in the tap-test, so I am
attaching a V7 patch set which fixed this. There are no changes in 0001.

[1] https://cirrus-ci.com/task/6375962162495488

Best Regards,
Hou zj



v7-0002-test-the-data-loss-case.patch
Description: v7-0002-test-the-data-loss-case.patch


v7-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description:  v7-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Bertrand Drouvot

Hi,

On Tue, Apr 02, 2024 at 07:20:46AM +, Zhijie Hou (Fujitsu) wrote:
> I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which 
> can
> reproduce the data loss issue consistently on my machine.

Thanks!

> It may not reproduce
> in some rare cases if concurrent xl_running_xacts are written by bgwriter, but
> I think it's still valuable if it can verify the fix in most cases.

What about adding a "wait" injection point in LogStandbySnapshot() to prevent
checkpointer/bgwriter to log a standby snapshot? Something among those lines:

   if (AmCheckpointerProcess() || AmBackgroundWriterProcess())
   INJECTION_POINT("bgw-log-standby-snapshot");

And make use of it in the test, something like:

   $node_primary->safe_psql('postgres',
   "SELECT injection_points_attach('bgw-log-standby-snapshot', 
'wait');");

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

2024-04-02 Thread Zhijie Hou (Fujitsu)

On Tuesday, April 2, 2024 8:35 AM Zhijie Hou (Fujitsu)  
wrote:
> 
> On Monday, April 1, 2024 7:30 PM Amit Kapila 
> wrote:
> >
> > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu)
> > 
> > wrote:
> > >
> > > On Friday, March 29, 2024 2:50 PM Amit Kapila
> > > 
> > wrote:
> > > >
> > >
> > > >
> > > >
> > > > 2.
> > > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr
> > moveto,
> > > > +   bool *found_consistent_point);
> > > > +
> > > >
> > > > This API looks a bit awkward as the functionality doesn't match
> > > > the name. How about having a function with name
> > > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > > > ready_for_decoding) with the same functionality as your patch has
> > > > for
> > > > pg_logical_replication_slot_advance() and then invoke it both from
> > > > pg_logical_replication_slot_advance and slotsync.c. The function
> > > > name is too big, we can think of a shorter name. Any ideas?
> > >
> > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> > > LogicalSlotAdvanceAndCheckDecoding()?
> > >
> >
> > It is about snapbuild state, so how about naming the function as
> > LogicalSlotAdvanceAndCheckSnapState()?
> 
> It looks better to me, so changed.
> 
> >
> > I have made quite a few cosmetic changes in comments and code. See
> > attached. This is atop your latest patch. Can you please review and
> > include these changes in the next version?
> 
> Thanks, I have reviewed and merged them.
> Attach the V5 patch set which addressed above comments and ran pgindent.

I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which can
reproduce the data loss issue consistently on my machine. It may not reproduce
in some rare cases if concurrent xl_running_xacts are written by bgwriter, but
I think it's still valuable if it can verify the fix in most cases. The test
will fail if directly applied on HEAD, and will pass after applying atop of
0001.

Best Regards,
Hou zj


v6-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description:  v6-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch


v6-0002-test-the-data-loss-case.patch
Description: v6-0002-test-the-data-loss-case.patch

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Bertrand Drouvot

Hi,

On Tue, Apr 02, 2024 at 04:24:49AM +, Zhijie Hou (Fujitsu) wrote:
> On Monday, April 1, 2024 9:28 PM Bertrand Drouvot 
>  wrote:
> > 
> > On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote:
> > > On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot
> > 
> > >
> > > > 2 ===
> > > >
> > > > +   {
> > > > +   if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
> > > > +   {
> > > >
> > > > That could call SnapBuildSnapshotExists() multiple times for the
> > > > same "restart_lsn" (for example in case of multiple remote slots to 
> > > > sync).
> > > >
> > > > What if the sync worker records the last lsn it asks for
> > > > serialization (and serialized ? Then we could check that value first
> > > > before deciding to call (or not)
> > > > SnapBuildSnapshotExists() on it?
> > > >
> > > > It's not ideal because it would record "only the last one" but that
> > > > would be simple enough for now (currently there is only one sync
> > > > worker so that scenario is likely to happen).
> > > >
> > >
> > > Yeah, we could do that but I am not sure how much it can help. I guess
> > > we could do some tests to see if it helps.
> > 
> > Yeah not sure either. I just think it can only help and shouldn't make 
> > things
> > worst (but could avoid extra SnapBuildSnapshotExists() calls).
> 
> Thanks for the idea. I tried some tests based on Nisha's setup[1].

Thank you and Nisha and Shveta for the testing!

> I tried to
> advance the slots on the primary to the same restart_lsn before calling
> sync_replication_slots(), and reduced the data generated by pgbench.

Agree that this scenario makes sense to try to see the impact of
SnapBuildSnapshotExists().

> The SnapBuildSnapshotExists is still not noticeable in the profile.

SnapBuildSnapshotExists() number of calls are probably negligeable when 
compared to the IO calls generated by the fast forward logical decoding in this
scenario.

> So, I feel we
> could leave this as a further improvement once we encounter scenarios where
> the duplicate SnapBuildSnapshotExists call becomes noticeable.

Sounds reasonable to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-04-01 Thread shveta malik

On Mon, Apr 1, 2024 at 5:05 PM Amit Kapila  wrote:
>
> > 2 ===
> >
> > +   {
> > +   if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
> > +   {
> >
> > That could call SnapBuildSnapshotExists() multiple times for the same
> > "restart_lsn" (for example in case of multiple remote slots to sync).
> >
> > What if the sync worker records the last lsn it asks for serialization (and
> > serialized ? Then we could check that value first before deciding to call 
> > (or not)
> > SnapBuildSnapshotExists() on it?
> >
> > It's not ideal because it would record "only the last one" but that would be
> > simple enough for now (currently there is only one sync worker so that 
> > scenario
> > is likely to happen).
> >
>
> Yeah, we could do that but I am not sure how much it can help. I guess
> we could do some tests to see if it helps.

I had a look at test-results conducted by Nisha, I did not find any
repetitive restart_lsn from primary being synced to standby for that
particular test of 100 slots. Unless we have some concrete test in
mind (having repetitive restart_lsn), I do not think that by using the
given tests, we can establish the benefit of suggested optimization.
Attached the log files of all slots test for reference,

thanks
Shveta
 slot_name | database |  xmin   | cxmin | restart_lsn |  flushlsn  | wal_status 
| conf | failover | synced | temp 
---+--+-+---+-+++--+--++--
 slot_1| newdb1   | |   772 | 0/AD0   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_10   | newdb2   | |   772 | 0/A0002C8   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_100  | newdb20  | |   772 | 0/A005F90   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_11   | newdb3   | |   772 | 0/A000300   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_12   | newdb3   | |   772 | 0/A000338   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_13   | newdb3   | |   772 | 0/A000370   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_14   | newdb3   | |   772 | 0/A0003A8   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_15   | newdb3   | |   772 | 0/A0003E0   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_16   | newdb4   | |   772 | 0/A000418   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_17   | newdb4   | |   772 | 0/A000450   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_18   | newdb4   | |   772 | 0/A000488   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_19   | newdb4   | |   772 | 0/A0004C0   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_2| newdb1   | |   772 | 0/A000108   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_20   | newdb4   | |   772 | 0/A0004F8   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_21   | newdb5   | |   772 | 0/A000530   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_22   | newdb5   | |   772 | 0/A000568   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_23   | newdb5   | |   772 | 0/A0005A0   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_24   | newdb5   | |   772 | 0/A0005D8   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_25   | newdb5   | |   772 | 0/A000610   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_26   | newdb6   | |   772 | 0/A000648   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_27   | newdb6   | |   772 | 0/A000680   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_28   | newdb6   | |   772 | 0/A0006B8   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_29   | newdb6   | |   772 | 0/A0006F0   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_3| newdb1   | |   772 | 0/A000140   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_30   | newdb6   | |   772 | 0/A000728   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_31   | newdb7   | |   772 | 0/A000760   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_32   | newdb7   | |   772 | 0/A000798   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_33   | newdb7   | |   772 | 0/A0007D0   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_34   | newdb7   | |   772 | 0/A000808   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_35   | newdb7   | |   772 | 0/A000840   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_36   | newdb8   | |   772 | 0/A000878   | 0/E43E9240 | reserved   
| f| t| f  | f
 slot_37   |

RE: Synchronizing slots from primary to standby

2024-04-01 Thread Zhijie Hou (Fujitsu)

On Tuesday, April 2, 2024 8:43 AM Bharath Rupireddy 
 wrote:
> 
> On Mon, Apr 1, 2024 at 11:36 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > Attach the V4 patch which includes the optimization to skip the
> > decoding if the snapshot at the syncing restart_lsn is already
> > serialized. It can avoid most of the duplicate decoding in my test, and I am
> doing some more tests locally.
> 
> Thanks for the patch. I'm thinking if we can reduce the amount of work that we
> do for synced slots in each sync worker cycle. With that context in mind, why 
> do
> we need to create decoding context every time?
> Can't we create it once, store it in an in-memory structure and use it for 
> each
> sync worker cycle? Is there any problem with it? What do you think?

Thanks for the idea. I think the cost of decoding context seems to be
relatively minor when compared to the IO cost. After generating the profiles
for the tests shared by Nisha[1], it appears that the StartupDecodingContext is
not a issue. While the suggested refactoring is an option, I think
we can consider this as a future improvement and addressing it only if we
encounter scenarios where the creation of decoding context becomes a
bottleneck.

[1] 
https://www.postgresql.org/message-id/CALj2ACUeij5tFzJ1-cuoUh%2Bmhj33v%2BYgqD_gHYUpRdXSCSBbhw%40mail.gmail.com

Best Regards,
Hou zj
<>

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Amit Kapila

On Mon, Apr 1, 2024 at 6:58 PM Bertrand Drouvot
 wrote:
>
> On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote:
> > On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot
> >  wrote:
> > > Then there is no need to call WaitForStandbyConfirmation() as it could go 
> > > until
> > > the RecoveryInProgress() in StandbySlotsHaveCaughtup() for nothing (as we 
> > > already
> > > know it).
> > >
> >
> > Won't it will normally return from the first check in
> > WaitForStandbyConfirmation() because standby_slot_names_config is not
> > set on standby?
>
> I think standby_slot_names can be set on a standby. One could want to set it 
> in
> a cascading standby env (though it won't have any real effects until the 
> standby
> is promoted).
>

Yeah, it is possible but doesn't seem worth additional checks for this
micro-optimization.

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Bharath Rupireddy

On Mon, Apr 1, 2024 at 11:36 AM Zhijie Hou (Fujitsu)
 wrote:
>
> Attach the V4 patch which includes the optimization to skip the decoding if
> the snapshot at the syncing restart_lsn is already serialized. It can avoid 
> most
> of the duplicate decoding in my test, and I am doing some more tests locally.

Thanks for the patch. I'm thinking if we can reduce the amount of work
that we do for synced slots in each sync worker cycle. With that
context in mind, why do we need to create decoding context every time?
Can't we create it once, store it in an in-memory structure and use it
for each sync worker cycle? Is there any problem with it? What do you
think?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

2024-04-01 Thread Zhijie Hou (Fujitsu)

On Monday, April 1, 2024 7:30 PM Amit Kapila  wrote:
> 
> On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Friday, March 29, 2024 2:50 PM Amit Kapila 
> wrote:
> > >
> >
> > >
> > >
> > > 2.
> > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr
> moveto,
> > > +   bool *found_consistent_point);
> > > +
> > >
> > > This API looks a bit awkward as the functionality doesn't match the
> > > name. How about having a function with name
> > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > > ready_for_decoding) with the same functionality as your patch has
> > > for
> > > pg_logical_replication_slot_advance() and then invoke it both from
> > > pg_logical_replication_slot_advance and slotsync.c. The function
> > > name is too big, we can think of a shorter name. Any ideas?
> >
> > How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> > LogicalSlotAdvanceAndCheckDecoding()?
> >
> 
> It is about snapbuild state, so how about naming the function as
> LogicalSlotAdvanceAndCheckSnapState()?

It looks better to me, so changed.

> 
> I have made quite a few cosmetic changes in comments and code. See
> attached. This is atop your latest patch. Can you please review and include
> these changes in the next version?

Thanks, I have reviewed and merged them.
Attach the V5 patch set which addressed above comments and ran pgindent.

I will think and test the improvement suggested by Bertrand[1] and reply after 
that.

[1] 
https://www.postgresql.org/message-id/Zgp8n9QD5nYSESnM%40ip-10-97-1-34.eu-west-3.compute.internal

Best Regards,
Hou zj


v5-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description:  v5-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Bertrand Drouvot

Hi,

On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote:
> On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot
>  wrote:
> > Then there is no need to call WaitForStandbyConfirmation() as it could go 
> > until
> > the RecoveryInProgress() in StandbySlotsHaveCaughtup() for nothing (as we 
> > already
> > know it).
> >
> 
> Won't it will normally return from the first check in
> WaitForStandbyConfirmation() because standby_slot_names_config is not
> set on standby?

I think standby_slot_names can be set on a standby. One could want to set it in
a cascading standby env (though it won't have any real effects until the standby
is promoted). 

> 
> > 2 ===
> >
> > +   {
> > +   if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
> > +   {
> >
> > That could call SnapBuildSnapshotExists() multiple times for the same
> > "restart_lsn" (for example in case of multiple remote slots to sync).
> >
> > What if the sync worker records the last lsn it asks for serialization (and
> > serialized ? Then we could check that value first before deciding to call 
> > (or not)
> > SnapBuildSnapshotExists() on it?
> >
> > It's not ideal because it would record "only the last one" but that would be
> > simple enough for now (currently there is only one sync worker so that 
> > scenario
> > is likely to happen).
> >
> 
> Yeah, we could do that but I am not sure how much it can help. I guess
> we could do some tests to see if it helps.

Yeah not sure either. I just think it can only help and shouldn't make things
worst (but could avoid extra SnapBuildSnapshotExists() calls).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Bharath Rupireddy

On Mon, Apr 1, 2024 at 10:40 AM Amit Kapila  wrote:
>
> After this step and before the next, did you ensure that the slot sync
> has synced the latest confirmed_flush/restart LSNs? You can query:
> "select slot_name,restart_lsn, confirmed_flush_lsn from
> pg_replication_slots;" to ensure the same on both the primary and
> standby.

Yes, after ensuring the slot is synced on the standby, the problem is
reproduced for me and the proposed patch fixes it (i.e. able to see
the changes even after the promotion). I'm just thinking if we can add
a TAP test for this issue, but one key aspect of this reproducer is to
not let someone write a RUNNING_XACTS WAL record on the primary in
between before the standby promotion. Setting bgwriter_delay to max
isn't helping me. I think we can think of using an injection point to
add delay in LogStandbySnapshot() for having this problem reproduced
consistently in a TAP test. Perhaps, we can think of adding this later
after the fix is shipped.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Amit Kapila

On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot
 wrote:
>
> Hi,
>
> On Mon, Apr 01, 2024 at 06:05:34AM +, Zhijie Hou (Fujitsu) wrote:
> > On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu) 
> >  wrote:
> > Attach the V4 patch which includes the optimization to skip the decoding if
> > the snapshot at the syncing restart_lsn is already serialized. It can avoid 
> > most
> > of the duplicate decoding in my test, and I am doing some more tests 
> > locally.
> >
>
> Thanks!
>
> 1 ===
>
> Same comment as in [1].
>
> In LogicalSlotAdvanceAndCheckReadynessForDecoding(), if we are synchronizing 
> slots
> then I think that we can skip:
>
> +   /*
> +* Wait for specified streaming replication standby servers 
> (if any)
> +* to confirm receipt of WAL up to moveto lsn.
> +*/
> +   WaitForStandbyConfirmation(moveto);
>
> Indeed if we are dealing with synced slot then we know we're in 
> RecoveryInProgress().
>
> Then there is no need to call WaitForStandbyConfirmation() as it could go 
> until
> the RecoveryInProgress() in StandbySlotsHaveCaughtup() for nothing (as we 
> already
> know it).
>

Won't it will normally return from the first check in
WaitForStandbyConfirmation() because standby_slot_names_config is not
set on standby?

> 2 ===
>
> +   {
> +   if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
> +   {
>
> That could call SnapBuildSnapshotExists() multiple times for the same
> "restart_lsn" (for example in case of multiple remote slots to sync).
>
> What if the sync worker records the last lsn it asks for serialization (and
> serialized ? Then we could check that value first before deciding to call (or 
> not)
> SnapBuildSnapshotExists() on it?
>
> It's not ideal because it would record "only the last one" but that would be
> simple enough for now (currently there is only one sync worker so that 
> scenario
> is likely to happen).
>

Yeah, we could do that but I am not sure how much it can help. I guess
we could do some tests to see if it helps.

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Amit Kapila

On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Friday, March 29, 2024 2:50 PM Amit Kapila  wrote:
> >
>
> >
> >
> > 2.
> > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto,
> > +   bool *found_consistent_point);
> > +
> >
> > This API looks a bit awkward as the functionality doesn't match the name. 
> > How
> > about having a function with name
> > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > ready_for_decoding) with the same functionality as your patch has for
> > pg_logical_replication_slot_advance() and then invoke it both from
> > pg_logical_replication_slot_advance and slotsync.c. The function name is too
> > big, we can think of a shorter name. Any ideas?
>
> How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> LogicalSlotAdvanceAndCheckDecoding()?
>

It is about snapbuild state, so how about naming the function as
LogicalSlotAdvanceAndCheckSnapState()?

I have made quite a few cosmetic changes in comments and code. See
attached. This is atop your latest patch. Can you please review and
include these changes in the next version?

-- 
With Regards,
Amit Kapila.
diff --git a/src/backend/replication/logical/logical.c 
b/src/backend/replication/logical/logical.c
index bbc7cdaf50..7d5884789c 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -518,10 +518,9 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 errmsg("cannot use physical replication slot 
for logical decoding")));
 
/*
-* Do not allow decoding if the replication slot belongs to a different
-* database unless we are in fast-forward mode. In fast-forward mode, we
-* ignore storage-level changes and do not need to access the database
-* object.
+* We need to access the system tables during decoding to build the 
logical
+* changes unless we are in fast-forward mode where no changes are
+* generated.
 */
if (slot->data.database != MyDatabaseId && !fast_forward)
ereport(ERROR,
@@ -530,9 +529,9 @@ CreateDecodingContext(XLogRecPtr start_lsn,
NameStr(slot->data.name;
 
/*
-* Do not allow consumption of a "synchronized" slot until the standby 
gets
-* promoted unless we are syncing replication slots, in which case we 
need
-* to advance the LSN and xmin of the slot during decoding.
+* The slots being synced from the primary can't be used for decoding as
+* they are used after failover. However, we do allow advancing the LSNs
+* during the synchronization of slots. See update_local_synced_slot.
 */
if (RecoveryInProgress() && slot->data.synced && 
!IsSyncingReplicationSlots())
ereport(ERROR,
@@ -2054,8 +2053,8 @@ LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal)
  * WAL and removal of old catalog tuples.  As decoding is done in fast_forward
  * mode, no changes are generated anyway.
  *
- * *ready_for_decoding will be set to true if the logical decoding reaches
- * the consistent point; Otherwise, it will be set to false.
+ * *ready_for_decoding will be true if the initial decoding snapshot has
+ * been built; Otherwise, it will be false.
  */
 XLogRecPtr
 LogicalSlotAdvanceAndCheckReadynessForDecoding(XLogRecPtr moveto,
diff --git a/src/backend/replication/logical/slotsync.c 
b/src/backend/replication/logical/slotsync.c
index 886a9fcc7e..9f6b83d486 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -26,10 +26,13 @@
  * pg_sync_replication_slots() periodically to perform the syncs.
  *
  * If synchronized slots fail to build a consistent snapshot from the
- * restart_lsn, they would become unreliable after promotion due to potential
- * data loss from changes before reaching a consistent point. So, we mark such
- * slots as RS_TEMPORARY. Once they successfully reach the consistent point,
- * they will be marked to RS_PERSISTENT.
+ * restart_lsn before reaching confirmed_flush_lsn, they would become
+ * unreliable after promotion due to potential data loss from changes
+ * before reaching a consistent point. This can happen because the slots can
+ * be synced at some random time and we may not reach the consistent point
+ * at the same WAL location as the primary. So, we mark such slots as
+ * RS_TEMPORARY. Once the decoding from corresponding LSNs can reach a
+ * consistent point, they will be marked as RS_PERSISTENT.
  *
  * The slot sync worker waits for some time before the next synchronization,
  * with the duration varying based on whether any slots were updated during
@@ -155,9 +158,9 @@ static void slotsync_failure_callback(int code, Datum arg);
  * If no update was needed (the data of the remote slot is the same as the
  * local slot) return false, otherwise true.
  *
- * If the LSN of the slot is

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Nisha Moond

Did performance test on optimization patch
(v2-0001-optimize-the-slot-advancement.patch). Please find the
results:

Setup:
- One primary node with 100 failover-enabled logical slots
- 20 DBs, each having 5 failover-enabled logical replication slots
- One physical standby node with 'sync_replication_slots' as off but
other parameters required by slot-sync as enabled.

Node Configurations: please see config.txt

Test Plan:
1) Create 20 Databases on Primary node, each with 5 failover slots
using "pg_create_logical_replication_slot()". Overall 100 failover
slots.
2) Use pg_sync_replication_slot() to sync them to the standby. Note
the execution time of sync and lsns values.
3) On Primary node, run pgbench for 15 mins on postgres db
4) Advance lsns of all the 100 slots on primary using
pg_replication_slot_advance().
5) Use pg_sync_replication_slot() to sync slots to the standby. Note
the execution time of sync and lsns values.

Executed the above test plan for three cases and did time elapsed
comparison for the pg_replication_slot_advance()-

(1) HEAD
Time taken by pg_sync_replication_slot() on Standby node -
  a) The initial sync (step 2) = 140.208 ms
  b) Sync after pgbench run on primary (step 5) = 66.994 ms

(2) HEAD + v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
  a) The initial sync (step 2) = 163.885 ms
  b) Sync after pgbench run on primary (step 5) = 837901.290 ms (13:57.901)

  >> With v3 patch, the pg_sync_replication_slot() takes a significant
amount of time to sync the slots.

(3) HEAD + v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
+ v2-0001-optimize-the-slot-advancement.patch
  a) The initial sync (step 2) = 165.554 ms
  b) Sync after pgbench run on primary (step 5) = 7991.718 ms (00:07.992)

  >> With the optimization patch, the time taken by
pg_sync_replication_slot() is reduced significantly to ~7 seconds.

We did the same test with a single DB too by creating all 100 failover
slots in postgres DB and the results were almost similar.

Attached the scripts used for the test  -
"v3_perf_test_scripts.tar.gz" include files -
setup_multidb.sh : setup primary and standby nodes
createdb20.sql : create 20 DBs
createslot20.sql : create total 100 logical slots, 5 on each DB
run_sync.sql : call pg_replication_slot_advance() with timing
advance20.sql : advance lsn of all slots on Primary node to current lsn
advance20_perdb.sql : use on HEAD to advance lsn on Primary node
get_synced_data.sql : get details of the
config.txt : configuration used for nodes


v3_perf_test_scripts.tar.gz
Description: GNU Zip compressed data

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Bertrand Drouvot

Hi,

On Mon, Apr 01, 2024 at 06:05:34AM +, Zhijie Hou (Fujitsu) wrote:
> On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu) 
>  wrote:
> Attach the V4 patch which includes the optimization to skip the decoding if
> the snapshot at the syncing restart_lsn is already serialized. It can avoid 
> most
> of the duplicate decoding in my test, and I am doing some more tests locally.
> 

Thanks!

1 ===

Same comment as in [1].

In LogicalSlotAdvanceAndCheckReadynessForDecoding(), if we are synchronizing 
slots
then I think that we can skip:

+   /*
+* Wait for specified streaming replication standby servers (if 
any)
+* to confirm receipt of WAL up to moveto lsn.
+*/
+   WaitForStandbyConfirmation(moveto);

Indeed if we are dealing with synced slot then we know we're in 
RecoveryInProgress().

Then there is no need to call WaitForStandbyConfirmation() as it could go until
the RecoveryInProgress() in StandbySlotsHaveCaughtup() for nothing (as we 
already
know it).

2 ===

+   {
+   if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
+   {

That could call SnapBuildSnapshotExists() multiple times for the same
"restart_lsn" (for example in case of multiple remote slots to sync).

What if the sync worker records the last lsn it asks for serialization (and
serialized ? Then we could check that value first before deciding to call (or 
not)
SnapBuildSnapshotExists() on it?

It's not ideal because it would record "only the last one" but that would be
simple enough for now (currently there is only one sync worker so that scenario
is likely to happen).

Maybe an idea for future improvement (not for now) could be that
SnapBuildSerialize() maintains a "small list" of "already serialized" snapshots.

[1]: 
https://www.postgresql.org/message-id/ZgayTFIhLfzhpHci%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-04-01 Thread shveta malik

On Mon, Apr 1, 2024 at 10:40 AM Amit Kapila  wrote:
>
> On Mon, Apr 1, 2024 at 10:01 AM Bharath Rupireddy
>  wrote:
> >
> > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > [2] The steps to reproduce the data miss issue on a primary->standby 
> > > setup:
> >
> > I'm trying to reproduce the problem with [1], but I can see the
> > changes after the standby is promoted. Am I missing anything here?
> >
> > ubuntu:~/postgres/pg17/bin$ ./psql -d postgres -p 5433 -c "select *
> > from pg_logical_slot_get_changes('lrep_sync_slot', NULL, NULL);"
> > lsn| xid |data
> > ---+-+-
> >  0/3B0 | 738 | BEGIN 738
> >  0/3017FC8 | 738 | COMMIT 738
> >  0/3017FF8 | 739 | BEGIN 739
> >  0/3019A38 | 739 | COMMIT 739
> >  0/3019A38 | 740 | BEGIN 740
> >  0/3019A38 | 740 | table public.dummy1: INSERT: a[integer]:999
> >  0/3019AA8 | 740 | COMMIT 740
> > (7 rows)
> >
> > [1]
> > -#define LOG_SNAPSHOT_INTERVAL_MS 15000
> > +#define LOG_SNAPSHOT_INTERVAL_MS 150
> >
> > ./initdb -D db17
> > echo "archive_mode = on
> > archive_command='cp %p /home/ubuntu/postgres/pg17/bin/archived_wal/%f'
> > wal_level='logical'
> > autovacuum = off
> > checkpoint_timeout='1h'" | tee -a db17/postgresql.conf
> >
> > ./pg_ctl -D db17 -l logfile17 start
> >
> > rm -rf sbdata logfilesbdata
> > ./pg_basebackup -D sbdata
> >
> > ./psql -d postgres -p 5432 -c "SELECT
> > pg_create_logical_replication_slot('lrep_sync_slot', 'test_decoding',
> > false, false, true);"
> > ./psql -d postgres -p 5432 -c "SELECT
> > pg_create_physical_replication_slot('phy_repl_slot', true, false);"
> >
> > echo "port=5433
> > primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu'
> > primary_slot_name='phy_repl_slot'
> > restore_command='cp /home/ubuntu/postgres/pg17/bin/archived_wal/%f %p'
> > hot_standby_feedback=on
> > sync_replication_slots=on" | tee -a sbdata/postgresql.conf
> >
> > touch sbdata/standby.signal
> >
> > ./pg_ctl -D sbdata -l logfilesbdata start
> > ./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();"
> >
> > ./psql -d postgres
> >
> > SESSION1, TXN1
> > BEGIN;
> > create table dummy1(a int);
> >
> > SESSION2, TXN2
> > SELECT pg_log_standby_snapshot();
> >
> > SESSION1, TXN1
> > COMMIT;
> >
> > SESSION1, TXN1
> > BEGIN;
> > create table dummy2(a int);
> >
> > SESSION2, TXN2
> > SELECT pg_log_standby_snapshot();
> >
> > SESSION1, TXN1
> > COMMIT;
> >
> > ./psql -d postgres -p 5432 -c "SELECT
> > pg_replication_slot_advance('lrep_sync_slot', pg_current_wal_lsn());"
> >
>
> After this step and before the next, did you ensure that the slot sync
> has synced the latest confirmed_flush/restart LSNs? You can query:
> "select slot_name,restart_lsn, confirmed_flush_lsn from
> pg_replication_slots;" to ensure the same on both the primary and
> standby.

+1. To ensure last sync, one can run this manually on standby just
before promotion :
SELECT pg_sync_replication_slots();

thanks

Shveta

RE: Synchronizing slots from primary to standby

2024-04-01 Thread Zhijie Hou (Fujitsu)

On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu)  
wrote:
> 
> On Friday, March 29, 2024 2:50 PM Amit Kapila 
> wrote:
> >
> > On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu)
> > 
> > wrote:
> > >
> > >
> > > Attach a new version patch which fixed an un-initialized variable
> > > issue and added some comments.
> > >
> >
> > The other approach to fix this issue could be that the slotsync worker
> > get the serialized snapshot using pg_read_binary_file() corresponding
> > to restart_lsn and writes those at standby. But there are cases when
> > we won't have such a file like (a) when we initially create the slot
> > and reach the consistent_point, or (b) also by the time the slotsync
> > worker starts to read the remote snapshot file, the snapshot file
> > could have been removed by the checkpointer on the primary (if the
> > restart_lsn of the remote has been advanced in this window). So, in
> > such cases, we anyway need to advance the slot. I think these could be
> optimizations that we could do in the future.
> >
> > Few comments:
> 
> Thanks for the comments.
> 
> > =
> > 1.
> > - if (slot->data.database != MyDatabaseId)
> > + if (slot->data.database != MyDatabaseId && !fast_forward)
> >   ereport(ERROR,
> >   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >   errmsg("replication slot \"%s\" was not created in this database",
> > @@ -526,7
> > +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn,
> >   * Do not allow consumption of a "synchronized" slot until the standby
> >   * gets promoted.
> >   */
> > - if (RecoveryInProgress() && slot->data.synced)
> > + if (RecoveryInProgress() && slot->data.synced &&
> > + !IsSyncingReplicationSlots())
> >
> >
> > Add comments at both of the above places.
> 
> Added.
> 
> >
> >
> > 2.
> > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr
> moveto,
> > +   bool *found_consistent_point);
> > +
> >
> > This API looks a bit awkward as the functionality doesn't match the
> > name. How about having a function with name
> > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > ready_for_decoding) with the same functionality as your patch has for
> > pg_logical_replication_slot_advance() and then invoke it both from
> > pg_logical_replication_slot_advance and slotsync.c. The function name
> > is too big, we can think of a shorter name. Any ideas?
> 
> How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> LogicalSlotAdvanceAndCheckDecoding()? (I used the suggested
> LogicalSlotAdvanceAndCheckReadynessForDecoding in this version, It can be
> renamed in next version if we agree).
> 
> Attach the V3 patch which addressed above comments and Kuroda-san's
> comments[1]. I also adjusted the tap-test to only check the 
> confirmed_flush_lsn
> after syncing, as the restart_lsn could be different from the remote one due 
> to
> the new slot_advance() call. I am also testing some optimization idea locally 
> and
> will share if ready.

Attach the V4 patch which includes the optimization to skip the decoding if
the snapshot at the syncing restart_lsn is already serialized. It can avoid most
of the duplicate decoding in my test, and I am doing some more tests locally.

Best Regards,
Hou zj


v4-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description:  v4-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch

Re: Synchronizing slots from primary to standby

2024-03-31 Thread Amit Kapila

On Mon, Apr 1, 2024 at 10:01 AM Bharath Rupireddy
 wrote:
>
> On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > [2] The steps to reproduce the data miss issue on a primary->standby setup:
>
> I'm trying to reproduce the problem with [1], but I can see the
> changes after the standby is promoted. Am I missing anything here?
>
> ubuntu:~/postgres/pg17/bin$ ./psql -d postgres -p 5433 -c "select *
> from pg_logical_slot_get_changes('lrep_sync_slot', NULL, NULL);"
> lsn| xid |data
> ---+-+-
>  0/3B0 | 738 | BEGIN 738
>  0/3017FC8 | 738 | COMMIT 738
>  0/3017FF8 | 739 | BEGIN 739
>  0/3019A38 | 739 | COMMIT 739
>  0/3019A38 | 740 | BEGIN 740
>  0/3019A38 | 740 | table public.dummy1: INSERT: a[integer]:999
>  0/3019AA8 | 740 | COMMIT 740
> (7 rows)
>
> [1]
> -#define LOG_SNAPSHOT_INTERVAL_MS 15000
> +#define LOG_SNAPSHOT_INTERVAL_MS 150
>
> ./initdb -D db17
> echo "archive_mode = on
> archive_command='cp %p /home/ubuntu/postgres/pg17/bin/archived_wal/%f'
> wal_level='logical'
> autovacuum = off
> checkpoint_timeout='1h'" | tee -a db17/postgresql.conf
>
> ./pg_ctl -D db17 -l logfile17 start
>
> rm -rf sbdata logfilesbdata
> ./pg_basebackup -D sbdata
>
> ./psql -d postgres -p 5432 -c "SELECT
> pg_create_logical_replication_slot('lrep_sync_slot', 'test_decoding',
> false, false, true);"
> ./psql -d postgres -p 5432 -c "SELECT
> pg_create_physical_replication_slot('phy_repl_slot', true, false);"
>
> echo "port=5433
> primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu'
> primary_slot_name='phy_repl_slot'
> restore_command='cp /home/ubuntu/postgres/pg17/bin/archived_wal/%f %p'
> hot_standby_feedback=on
> sync_replication_slots=on" | tee -a sbdata/postgresql.conf
>
> touch sbdata/standby.signal
>
> ./pg_ctl -D sbdata -l logfilesbdata start
> ./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();"
>
> ./psql -d postgres
>
> SESSION1, TXN1
> BEGIN;
> create table dummy1(a int);
>
> SESSION2, TXN2
> SELECT pg_log_standby_snapshot();
>
> SESSION1, TXN1
> COMMIT;
>
> SESSION1, TXN1
> BEGIN;
> create table dummy2(a int);
>
> SESSION2, TXN2
> SELECT pg_log_standby_snapshot();
>
> SESSION1, TXN1
> COMMIT;
>
> ./psql -d postgres -p 5432 -c "SELECT
> pg_replication_slot_advance('lrep_sync_slot', pg_current_wal_lsn());"
>

After this step and before the next, did you ensure that the slot sync
has synced the latest confirmed_flush/restart LSNs? You can query:
"select slot_name,restart_lsn, confirmed_flush_lsn from
pg_replication_slots;" to ensure the same on both the primary and
standby.

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-03-31 Thread Bharath Rupireddy

On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
 wrote:
>
> [2] The steps to reproduce the data miss issue on a primary->standby setup:

I'm trying to reproduce the problem with [1], but I can see the
changes after the standby is promoted. Am I missing anything here?

ubuntu:~/postgres/pg17/bin$ ./psql -d postgres -p 5433 -c "select *
from pg_logical_slot_get_changes('lrep_sync_slot', NULL, NULL);"
lsn| xid |data
---+-+-
 0/3B0 | 738 | BEGIN 738
 0/3017FC8 | 738 | COMMIT 738
 0/3017FF8 | 739 | BEGIN 739
 0/3019A38 | 739 | COMMIT 739
 0/3019A38 | 740 | BEGIN 740
 0/3019A38 | 740 | table public.dummy1: INSERT: a[integer]:999
 0/3019AA8 | 740 | COMMIT 740
(7 rows)

[1]
-#define LOG_SNAPSHOT_INTERVAL_MS 15000
+#define LOG_SNAPSHOT_INTERVAL_MS 150

./initdb -D db17
echo "archive_mode = on
archive_command='cp %p /home/ubuntu/postgres/pg17/bin/archived_wal/%f'
wal_level='logical'
autovacuum = off
checkpoint_timeout='1h'" | tee -a db17/postgresql.conf

./pg_ctl -D db17 -l logfile17 start

rm -rf sbdata logfilesbdata
./pg_basebackup -D sbdata

./psql -d postgres -p 5432 -c "SELECT
pg_create_logical_replication_slot('lrep_sync_slot', 'test_decoding',
false, false, true);"
./psql -d postgres -p 5432 -c "SELECT
pg_create_physical_replication_slot('phy_repl_slot', true, false);"

echo "port=5433
primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu'
primary_slot_name='phy_repl_slot'
restore_command='cp /home/ubuntu/postgres/pg17/bin/archived_wal/%f %p'
hot_standby_feedback=on
sync_replication_slots=on" | tee -a sbdata/postgresql.conf

touch sbdata/standby.signal

./pg_ctl -D sbdata -l logfilesbdata start
./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();"

./psql -d postgres

SESSION1, TXN1
BEGIN;
create table dummy1(a int);

SESSION2, TXN2
SELECT pg_log_standby_snapshot();

SESSION1, TXN1
COMMIT;

SESSION1, TXN1
BEGIN;
create table dummy2(a int);

SESSION2, TXN2
SELECT pg_log_standby_snapshot();

SESSION1, TXN1
COMMIT;

./psql -d postgres -p 5432 -c "SELECT
pg_replication_slot_advance('lrep_sync_slot', pg_current_wal_lsn());"
./psql -d postgres -p 5432 -c "INSERT INTO dummy1 VALUES(999);"

./psql -d postgres -p 5433 -c "SELECT pg_promote();"
./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();"

./psql -d postgres -p 5433 -c "select * from
pg_logical_slot_get_changes('lrep_sync_slot', NULL, NULL);"

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

2024-03-31 Thread Zhijie Hou (Fujitsu)

On Friday, March 29, 2024 2:50 PM Amit Kapila  wrote:
> 
> On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> >
> > Attach a new version patch which fixed an un-initialized variable
> > issue and added some comments.
> >
> 
> The other approach to fix this issue could be that the slotsync worker get the
> serialized snapshot using pg_read_binary_file() corresponding to restart_lsn
> and writes those at standby. But there are cases when we won't have such a 
> file
> like (a) when we initially create the slot and reach the consistent_point, or 
> (b)
> also by the time the slotsync worker starts to read the remote snapshot file, 
> the
> snapshot file could have been removed by the checkpointer on the primary (if
> the restart_lsn of the remote has been advanced in this window). So, in such
> cases, we anyway need to advance the slot. I think these could be 
> optimizations
> that we could do in the future.
> 
> Few comments:

Thanks for the comments.

> =
> 1.
> - if (slot->data.database != MyDatabaseId)
> + if (slot->data.database != MyDatabaseId && !fast_forward)
>   ereport(ERROR,
>   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>   errmsg("replication slot \"%s\" was not created in this database", @@ -526,7
> +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn,
>   * Do not allow consumption of a "synchronized" slot until the standby
>   * gets promoted.
>   */
> - if (RecoveryInProgress() && slot->data.synced)
> + if (RecoveryInProgress() && slot->data.synced &&
> + !IsSyncingReplicationSlots())
> 
> 
> Add comments at both of the above places.

Added.

> 
> 
> 2.
> +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto,
> +   bool *found_consistent_point);
> +
> 
> This API looks a bit awkward as the functionality doesn't match the name. How
> about having a function with name
> LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> ready_for_decoding) with the same functionality as your patch has for
> pg_logical_replication_slot_advance() and then invoke it both from
> pg_logical_replication_slot_advance and slotsync.c. The function name is too
> big, we can think of a shorter name. Any ideas?

How about LogicalSlotAdvanceAndCheckDecodingState() Or just
LogicalSlotAdvanceAndCheckDecoding()? (I used the suggested
LogicalSlotAdvanceAndCheckReadynessForDecoding in this version, It can be 
renamed in
next version if we agree).

Attach the V3 patch which addressed above comments and Kuroda-san's
comments[1]. I also adjusted the tap-test to only check the confirmed_flush_lsn
after syncing, as the restart_lsn could be different from the remote one due to
the new slot_advance() call. I am also testing some optimization idea locally
and will share if ready.

[1] 
https://www.postgresql.org/message-id/TYCPR01MB1207757BB2A32B6815CE1CCE7F53A2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hou zj


v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description:  v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch

Re: Synchronizing slots from primary to standby

2024-03-29 Thread Bertrand Drouvot

Hi,

On Fri, Mar 29, 2024 at 02:35:22PM +0530, Amit Kapila wrote:
> On Fri, Mar 29, 2024 at 1:08 PM Bertrand Drouvot
>  wrote:
> >
> > On Fri, Mar 29, 2024 at 07:23:11AM +, Zhijie Hou (Fujitsu) wrote:
> > > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot 
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote:
> > > > > Attach a new version patch which fixed an un-initialized variable
> > > > > issue and added some comments. Also, temporarily enable DEBUG2 for the
> > > > > 040 tap-test so that we can analyze the possible CFbot failures 
> > > > > easily.
> > > > >
> > > >
> > > > Thanks!
> > > >
> > > > +   if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
> > > > +   {
> > > > +   /*
> > > > +* By advancing the restart_lsn, confirmed_lsn, and 
> > > > xmin using
> > > > +* fast-forward logical decoding, we ensure that the 
> > > > required
> > > > snapshots
> > > > +* are saved to disk. This enables logical decoding to 
> > > > quickly
> > > > reach a
> > > > +* consistent point at the restart_lsn, eliminating the 
> > > > risk of
> > > > missing
> > > > +* data during snapshot creation.
> > > > +*/
> > > > +
> > > > pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
> > > > +
> > > > found_consistent_point);
> > > > +   ReplicationSlotsComputeRequiredLSN();
> > > > +   updated_lsn = true;
> > > > +   }
> > > >
> > > > Instead of using pg_logical_replication_slot_advance() for each synced 
> > > > slot and
> > > > during sync cycles what about?:
> > > >
> > > > - keep sync slot synchronization as it is currently (not using
> > > > pg_logical_replication_slot_advance())
> > > > - create "an hidden" logical slot if sync slot feature is on
> > > > - at the time of promotion use pg_logical_replication_slot_advance() on 
> > > > this
> > > > hidden slot only to advance to the max lsn of the synced slots
> > > >
> > > > I'm not sure that would be enough, just asking your thoughts on this 
> > > > (benefits
> > > > would be to avoid calling pg_logical_replication_slot_advance() on each 
> > > > sync
> > > > slots and during the sync cycles).
> > >
> > > Thanks for the idea !
> > >
> > > I considered about this. I think advancing the "hidden" slot on promotion 
> > > may be a
> > > bit late, because if we cannot reach the consistent point after advancing 
> > > the
> > > "hidden" slot, then it means we may need to remove all the synced slots 
> > > as we
> > > are not sure if they are usable(will not loss data) after promotion.
> >
> > What about advancing the hidden slot during the sync cycles then?
> >
> > > The current approach is to mark such un-consistent slot as temp and 
> > > persist
> > > them once it reaches consistent point, so that user can ensure the slot 
> > > can be
> > > used after promotion once persisted.
> >
> > Right, but do we need to do so for all the sync slots? Would a single hidden
> > slot be enough?
> >
> 
> Even if we mark one of the synced slots as persistent without reaching
> a consistent state, it could create a problem after promotion. And,
> how a single hidden slot would serve the purpose, different synced
> slots will have different restart/confirmed_flush LSN and we won't be
> able to perform advancing for those using a single slot. For example,
> say for first synced slot, it has not reached a consistent state and
> then how can it try for the second slot? This sounds quite tricky to
> make work. We should go with something simple where the chances of
> introducing bugs are lesser.

Yeah, better to go with something simple.

+   if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
+   {
+   /*
+* By advancing the restart_lsn, confirmed_lsn, and xmin using
+* fast-forward logical decoding, we ensure that the required 
snapshots
+* are saved to disk. This enables logical decoding to quickly 
reach a
+* consistent point at the restart_lsn, eliminating the risk of 
missing
+* data during snapshot creation.
+*/
+   pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
+   
found_consistent_point);

In our case, what about skipping WaitForStandbyConfirmation() in
pg_logical_replication_slot_advance()? (It could go until the
RecoveryInProgress() check in StandbySlotsHaveCaughtup() if we don't skip it).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-03-29 Thread Amit Kapila

On Fri, Mar 29, 2024 at 1:08 PM Bertrand Drouvot
 wrote:
>
> On Fri, Mar 29, 2024 at 07:23:11AM +, Zhijie Hou (Fujitsu) wrote:
> > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot 
> >  wrote:
> > >
> > > Hi,
> > >
> > > On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote:
> > > > Attach a new version patch which fixed an un-initialized variable
> > > > issue and added some comments. Also, temporarily enable DEBUG2 for the
> > > > 040 tap-test so that we can analyze the possible CFbot failures easily.
> > > >
> > >
> > > Thanks!
> > >
> > > +   if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
> > > +   {
> > > +   /*
> > > +* By advancing the restart_lsn, confirmed_lsn, and xmin 
> > > using
> > > +* fast-forward logical decoding, we ensure that the 
> > > required
> > > snapshots
> > > +* are saved to disk. This enables logical decoding to 
> > > quickly
> > > reach a
> > > +* consistent point at the restart_lsn, eliminating the 
> > > risk of
> > > missing
> > > +* data during snapshot creation.
> > > +*/
> > > +
> > > pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
> > > +
> > > found_consistent_point);
> > > +   ReplicationSlotsComputeRequiredLSN();
> > > +   updated_lsn = true;
> > > +   }
> > >
> > > Instead of using pg_logical_replication_slot_advance() for each synced 
> > > slot and
> > > during sync cycles what about?:
> > >
> > > - keep sync slot synchronization as it is currently (not using
> > > pg_logical_replication_slot_advance())
> > > - create "an hidden" logical slot if sync slot feature is on
> > > - at the time of promotion use pg_logical_replication_slot_advance() on 
> > > this
> > > hidden slot only to advance to the max lsn of the synced slots
> > >
> > > I'm not sure that would be enough, just asking your thoughts on this 
> > > (benefits
> > > would be to avoid calling pg_logical_replication_slot_advance() on each 
> > > sync
> > > slots and during the sync cycles).
> >
> > Thanks for the idea !
> >
> > I considered about this. I think advancing the "hidden" slot on promotion 
> > may be a
> > bit late, because if we cannot reach the consistent point after advancing 
> > the
> > "hidden" slot, then it means we may need to remove all the synced slots as 
> > we
> > are not sure if they are usable(will not loss data) after promotion.
>
> What about advancing the hidden slot during the sync cycles then?
>
> > The current approach is to mark such un-consistent slot as temp and persist
> > them once it reaches consistent point, so that user can ensure the slot can 
> > be
> > used after promotion once persisted.
>
> Right, but do we need to do so for all the sync slots? Would a single hidden
> slot be enough?
>

Even if we mark one of the synced slots as persistent without reaching
a consistent state, it could create a problem after promotion. And,
how a single hidden slot would serve the purpose, different synced
slots will have different restart/confirmed_flush LSN and we won't be
able to perform advancing for those using a single slot. For example,
say for first synced slot, it has not reached a consistent state and
then how can it try for the second slot? This sounds quite tricky to
make work. We should go with something simple where the chances of
introducing bugs are lesser.

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-03-29 Thread Amit Kapila

On Fri, Mar 29, 2024 at 9:34 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Thanks for updating the patch! Here is a comment for it.
>
> ```
> +/*
> + * By advancing the restart_lsn, confirmed_lsn, and xmin using
> + * fast-forward logical decoding, we can verify whether a consistent
> + * snapshot can be built. This process also involves saving necessary
> + * snapshots to disk during decoding, ensuring that logical decoding
> + * efficiently reaches a consistent point at the restart_lsn without
> + * the potential loss of data during snapshot creation.
> + */
> +pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
> +found_consistent_point);
> +ReplicationSlotsComputeRequiredLSN();
> +updated_lsn = true;
> ```
>
> You added them like pg_replication_slot_advance(), but the function also calls
> ReplicationSlotsComputeRequiredXmin(false) at that time. According to the 
> related
> commit b48df81 and discussions [1], I know it is needed only for physical 
> slots,
> but it makes more consistent to call requiredXmin() as well, per [2]:
>

Yeah, I also think it is okay to call for the sake of consistency with
pg_replication_slot_advance().

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-03-29 Thread Bertrand Drouvot

Hi,

On Fri, Mar 29, 2024 at 07:23:11AM +, Zhijie Hou (Fujitsu) wrote:
> On Friday, March 29, 2024 2:48 PM Bertrand Drouvot 
>  wrote:
> > 
> > Hi,
> > 
> > On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote:
> > > Attach a new version patch which fixed an un-initialized variable
> > > issue and added some comments. Also, temporarily enable DEBUG2 for the
> > > 040 tap-test so that we can analyze the possible CFbot failures easily.
> > >
> > 
> > Thanks!
> > 
> > +   if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
> > +   {
> > +   /*
> > +* By advancing the restart_lsn, confirmed_lsn, and xmin 
> > using
> > +* fast-forward logical decoding, we ensure that the 
> > required
> > snapshots
> > +* are saved to disk. This enables logical decoding to 
> > quickly
> > reach a
> > +* consistent point at the restart_lsn, eliminating the 
> > risk of
> > missing
> > +* data during snapshot creation.
> > +*/
> > +
> > pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
> > +
> > found_consistent_point);
> > +   ReplicationSlotsComputeRequiredLSN();
> > +   updated_lsn = true;
> > +   }
> > 
> > Instead of using pg_logical_replication_slot_advance() for each synced slot 
> > and
> > during sync cycles what about?:
> > 
> > - keep sync slot synchronization as it is currently (not using
> > pg_logical_replication_slot_advance())
> > - create "an hidden" logical slot if sync slot feature is on
> > - at the time of promotion use pg_logical_replication_slot_advance() on this
> > hidden slot only to advance to the max lsn of the synced slots
> > 
> > I'm not sure that would be enough, just asking your thoughts on this 
> > (benefits
> > would be to avoid calling pg_logical_replication_slot_advance() on each sync
> > slots and during the sync cycles).
> 
> Thanks for the idea !
> 
> I considered about this. I think advancing the "hidden" slot on promotion may 
> be a
> bit late, because if we cannot reach the consistent point after advancing the
> "hidden" slot, then it means we may need to remove all the synced slots as we
> are not sure if they are usable(will not loss data) after promotion.

What about advancing the hidden slot during the sync cycles then?

> The current approach is to mark such un-consistent slot as temp and persist
> them once it reaches consistent point, so that user can ensure the slot can be
> used after promotion once persisted.

Right, but do we need to do so for all the sync slots? Would a single hidden
slot be enough?

> Another optimization idea is to check the snapshot file existence before 
> calling the
> slot_advance(). If the file already exists, we skip the decoding and directly
> update the restart_lsn. This way, we could also avoid some duplicate decoding
> work.

Yeah, I think it's a good idea (even better if we can do this check without
performing any I/O).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

2024-03-29 Thread Zhijie Hou (Fujitsu)

On Friday, March 29, 2024 2:48 PM Bertrand Drouvot 
 wrote:
> 
> Hi,
> 
> On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote:
> > Attach a new version patch which fixed an un-initialized variable
> > issue and added some comments. Also, temporarily enable DEBUG2 for the
> > 040 tap-test so that we can analyze the possible CFbot failures easily.
> >
> 
> Thanks!
> 
> +   if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
> +   {
> +   /*
> +* By advancing the restart_lsn, confirmed_lsn, and xmin using
> +* fast-forward logical decoding, we ensure that the required
> snapshots
> +* are saved to disk. This enables logical decoding to quickly
> reach a
> +* consistent point at the restart_lsn, eliminating the risk 
> of
> missing
> +* data during snapshot creation.
> +*/
> +
> pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
> +
> found_consistent_point);
> +   ReplicationSlotsComputeRequiredLSN();
> +   updated_lsn = true;
> +   }
> 
> Instead of using pg_logical_replication_slot_advance() for each synced slot 
> and
> during sync cycles what about?:
> 
> - keep sync slot synchronization as it is currently (not using
> pg_logical_replication_slot_advance())
> - create "an hidden" logical slot if sync slot feature is on
> - at the time of promotion use pg_logical_replication_slot_advance() on this
> hidden slot only to advance to the max lsn of the synced slots
> 
> I'm not sure that would be enough, just asking your thoughts on this (benefits
> would be to avoid calling pg_logical_replication_slot_advance() on each sync
> slots and during the sync cycles).

Thanks for the idea !

I considered about this. I think advancing the "hidden" slot on promotion may 
be a
bit late, because if we cannot reach the consistent point after advancing the
"hidden" slot, then it means we may need to remove all the synced slots as we
are not sure if they are usable(will not loss data) after promotion. And it may
confuse user a bit as they have seen these slots to be sync-ready.

The current approach is to mark such un-consistent slot as temp and persist
them once it reaches consistent point, so that user can ensure the slot can be
used after promotion once persisted.

Another optimization idea is to check the snapshot file existence before 
calling the
slot_advance(). If the file already exists, we skip the decoding and directly
update the restart_lsn. This way, we could also avoid some duplicate decoding
work.

Best Regards,
Hou zj

Re: Synchronizing slots from primary to standby

2024-03-29 Thread Amit Kapila

On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu)
 wrote:
>
>
> Attach a new version patch which fixed an un-initialized variable issue and
> added some comments.
>

The other approach to fix this issue could be that the slotsync worker
get the serialized snapshot using pg_read_binary_file() corresponding
to restart_lsn and writes those at standby. But there are cases when
we won't have such a file like (a) when we initially create the slot
and reach the consistent_point, or (b) also by the time the slotsync
worker starts to read the remote snapshot file, the snapshot file
could have been removed by the checkpointer on the primary (if the
restart_lsn of the remote has been advanced in this window). So, in
such cases, we anyway need to advance the slot. I think these could be
optimizations that we could do in the future.

Few comments:
=
1.
- if (slot->data.database != MyDatabaseId)
+ if (slot->data.database != MyDatabaseId && !fast_forward)
  ereport(ERROR,
  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  errmsg("replication slot \"%s\" was not created in this database",
@@ -526,7 +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn,
  * Do not allow consumption of a "synchronized" slot until the standby
  * gets promoted.
  */
- if (RecoveryInProgress() && slot->data.synced)
+ if (RecoveryInProgress() && slot->data.synced && !IsSyncingReplicationSlots())

Add comments at both of the above places.

2.
+extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto,
+   bool *found_consistent_point);
+

This API looks a bit awkward as the functionality doesn't match the
name. How about having a function with name
LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
ready_for_decoding) with the same functionality as your patch has for
pg_logical_replication_slot_advance() and then invoke it both from
pg_logical_replication_slot_advance and slotsync.c. The function name
is too big, we can think of a shorter name. Any ideas?

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-03-29 Thread Bertrand Drouvot

Hi,

On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote:
> Attach a new version patch which fixed an un-initialized variable issue and
> added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so 
> that
> we can analyze the possible CFbot failures easily.
> 

Thanks!

+   if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
+   {
+   /*
+* By advancing the restart_lsn, confirmed_lsn, and xmin using
+* fast-forward logical decoding, we ensure that the required 
snapshots
+* are saved to disk. This enables logical decoding to quickly 
reach a
+* consistent point at the restart_lsn, eliminating the risk of 
missing
+* data during snapshot creation.
+*/
+   pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
+   
found_consistent_point);
+   ReplicationSlotsComputeRequiredLSN();
+   updated_lsn = true;
+   }

Instead of using pg_logical_replication_slot_advance() for each synced slot
and during sync cycles what about?:

- keep sync slot synchronization as it is currently (not using 
pg_logical_replication_slot_advance())
- create "an hidden" logical slot if sync slot feature is on
- at the time of promotion use pg_logical_replication_slot_advance() on this
hidden slot only to advance to the max lsn of the synced slots

I'm not sure that would be enough, just asking your thoughts on this (benefits
would be to avoid calling pg_logical_replication_slot_advance() on each sync 
slots
and during the sync cycles).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

2024-03-28 Thread Hayato Kuroda (Fujitsu)

Dear Hou,

Thanks for updating the patch! Here is a comment for it.

```
+/*
+ * By advancing the restart_lsn, confirmed_lsn, and xmin using
+ * fast-forward logical decoding, we can verify whether a consistent
+ * snapshot can be built. This process also involves saving necessary
+ * snapshots to disk during decoding, ensuring that logical decoding
+ * efficiently reaches a consistent point at the restart_lsn without
+ * the potential loss of data during snapshot creation.
+ */
+pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
+found_consistent_point);
+ReplicationSlotsComputeRequiredLSN();
+updated_lsn = true;
```

You added them like pg_replication_slot_advance(), but the function also calls
ReplicationSlotsComputeRequiredXmin(false) at that time. According to the 
related
commit b48df81 and discussions [1], I know it is needed only for physical slots,
but it makes more consistent to call requiredXmin() as well, per [2]:

```
This may be a waste if no advancing is done, but it could also be an
advantage to enforce a recalculation of the thresholds for each
function call.  And that's more consistent with the slot copy, drop
and creation.
```

How do you think?

[1]: 
https://www.postgresql.org/message-id/20200609171904.kpltxxvjzislidks%40alap3.anarazel.de
[2]: https://www.postgresql.org/message-id/20200616072727.GA2361%40paquier.xyz

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Re: Synchronizing slots from primary to standby

2024-03-28 Thread shveta malik

On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu)
 wrote:
>
> Attach a new version patch which fixed an un-initialized variable issue and
> added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so 
> that
> we can analyze the possible CFbot failures easily.

As suggested by Amit in [1], for the fix being discussed where we need
to advance the synced slot on standby, we need to skip the dbid check
in fast_forward mode in CreateDecodingContext(). We tried few tests to
make sure that there was no table-access done during fast-forward mode

1) Initially we tried avoiding database-id check in
CreateDecodingContext() only when called by
pg_logical_replication_slot_advance(). 'make check-world' passed on
HEAD for the same.

2) But the more generic solution was to skip the database check if
"fast_forward" is true. It was tried and 'make check-world' passed on
HEAD for that as well.

3) Another thing tried by Hou-San was to run pgbench after skipping db
check in the fast_forward logical decoding case.
pgbench was run to generate some changes and then the logical slot was
advanced to the latest position in another database. A LOG was added
in relation_open to catch table access. It was found that there was no
table-access in fast forward logical decoding i.e. no LOGS for
table-open were generated during the test. Steps given at [2]

[1]: 
https://www.postgresql.org/message-id/CAA4eK1KMiKangJa4NH_K1oFc87Y01n3rnpuwYagT59Y%3DADW8Dw%40mail.gmail.com

[2]:
--
1. apply the DEBUG patch (attached as .txt) which will log the
relation open and table cache access.

2. create a slot:
SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot',
'test_decoding', false, false, true);

3. run pgbench to generate some data.
pgbench -i postgres
pgbench --aggregate-interval=5 --time=5 --client=10 --log --rate=1000
--latency-limit=10 --failures-detailed --max-tries=10 postgres

4. start a fresh session in a different db and advance the slot to the
latest position. There should be no relation open or CatCache log
between the LOG "starting logical decoding for slot .." and LOG
"decoding over".
SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn());
--

thanks
Shveta
From 5386894faa14c0de9854e0eee9679f8eea775f65 Mon Sep 17 00:00:00 2001
From: Hou Zhijie 
Date: Fri, 29 Mar 2024 11:46:36 +0800
Subject: [PATCH] debug log

---
 src/backend/access/common/relation.c | 2 ++
 src/backend/replication/slotfuncs.c  | 1 +
 src/backend/utils/cache/catcache.c   | 1 +
 3 files changed, 4 insertions(+)

diff --git a/src/backend/access/common/relation.c 
b/src/backend/access/common/relation.c
index d8a313a2c9..40718fc47e 100644
--- a/src/backend/access/common/relation.c
+++ b/src/backend/access/common/relation.c
@@ -50,6 +50,7 @@ relation_open(Oid relationId, LOCKMODE lockmode)
 
Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES);
 
+   elog(LOG, "relation_open");
/* Get the lock before trying to open the relcache entry */
if (lockmode != NoLock)
LockRelationOid(relationId, lockmode);
@@ -91,6 +92,7 @@ try_relation_open(Oid relationId, LOCKMODE lockmode)
 
Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES);
 
+   elog(LOG, "try_relation_open");
/* Get the lock first */
if (lockmode != NoLock)
LockRelationOid(relationId, lockmode);
diff --git a/src/backend/replication/slotfuncs.c 
b/src/backend/replication/slotfuncs.c
index ef5081784c..564b36fc45 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -608,6 +608,7 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto,
 
/* free context, call shutdown callback */
FreeDecodingContext(ctx);
+   elog(LOG, "decoding over");
 
InvalidateSystemCaches();
}
diff --git a/src/backend/utils/cache/catcache.c 
b/src/backend/utils/cache/catcache.c
index 569f51cb33..e19c586697 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1328,6 +1328,7 @@ SearchCatCacheInternal(CatCache *cache,
 
Assert(cache->cc_nkeys == nkeys);
 
+   elog(LOG, "SearchCatCacheInternal");
/*
 * one-time startup overhead for each cache
 */
-- 
2.31.1

RE: Synchronizing slots from primary to standby

2024-03-28 Thread Zhijie Hou (Fujitsu)

On Thursday, March 28, 2024 10:02 PM Zhijie Hou (Fujitsu) 
 wrote:
> 
> On Thursday, March 28, 2024 7:32 PM Amit Kapila 
> wrote:
> >
> > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
> > 
> > wrote:
> > >
> > > When analyzing one BF error[1], we find an issue of slotsync: Since
> > > we don't perform logical decoding for the synced slots when syncing
> > > the lsn/xmin of slot, no logical snapshots will be serialized to
> > > disk. So, when user starts to use these synced slots after
> > > promotion, it needs to re-build the consistent snapshot from the
> > > restart_lsn if the
> > > WAL(xl_running_xacts) at restart_lsn position indicates that there
> > > are running transactions. This however could cause the data that
> > > before the
> > consistent point to be missed[2].
> > >
> > > This issue doesn't exist on the primary because the snapshot at
> > > restart_lsn should have been serialized to disk
> > > (SnapBuildProcessRunningXacts -> SnapBuildSerialize), so even if the
> > > logical decoding restarts, it can find consistent snapshot
> > > immediately at
> > restart_lsn.
> > >
> > > To fix this, we could use the fast forward logical decoding to
> > > advance the synced slot's lsn/xmin when syncing these values instead
> > > of directly updating the slot's info. This way, the snapshot will be
> > > serialized to
> > disk when decoding.
> > > If we could not reach to the consistent point at the remote
> > > restart_lsn, the slot is marked as temp and will be persisted once
> > > it reaches the consistent point. I am still analyzing the fix and
> > > will share once
> > ready.
> > >
> >
> > Yes, we can use this but one thing to note is that
> > CreateDecodingContext() will expect that the slot's and current
> > database are the same. I think the reason for that is we need to check
> > system tables of the current database while decoding and sending data
> > to the output_plugin which won't be a requirement for the fast_forward
> > case. So, we need to skip that check in fast_forward mode.
> 
> Agreed.
> 
> >
> > Next, I was thinking about the case of the first time updating the
> > restart and confirmed_flush LSN while syncing the slots. I think we
> > can keep the current logic as it is based on the following analysis.
> >
> > For each logical slot, cases possible on the primary:
> > 1. The restart_lsn doesn't have a serialized snapshot and hasn't yet
> > reached the consistent point.
> > 2. The restart_lsn doesn't have a serialized snapshot but has reached
> > a consistent point.
> > 3. The restart_lsn has a serialized snapshot which means it has
> > reached a consistent point as well.
> >
> > Considering we keep the logic to reserve initial WAL positions the
> > same as the current (Reserve WAL for the currently active local slot
> > using the specified WAL location (restart_lsn). If the given WAL
> > location has been removed, reserve WAL using the oldest existing WAL
> > segment.), I could think of the below
> > scenarios:
> > A. For 1, we shouldn't sync the slot as it still wouldn't have been
> > marked persistent on the primary.
> > B. For 2, we would sync the slot
> >B1. If remote_restart_lsn >= local_resart_lsn, then advance the
> > slot by calling pg_logical_replication_slot_advance().
> >B11. If we reach consistent point, then it should be okay
> > because after promotion as well we should reach consistent point.
> > B111. But again is it possible that there is some xact
> > that comes before consistent_point on primary and the same is after
> > consistent_point on standby? This shouldn't matter as we will start
> > decoding transactions after confirmed_flush_lsn which would be the same on
> primary and standby.
> >B22. If we haven't reached consistent_point, then we won't mark
> > the slot as persistent, and at the next sync we will do the same till
> > it reaches consistent_point. At that time, the situation will be similar to 
> > B11.
> >B2. If remote_restart_lsn < local_restart_lsn, then we will wait
> > for the next sync cycle and keep the slot as temporary. Once in the
> > next or some consecutive sync cycle, we reach the condition
> > remote_restart_lsn >= local_restart_lsn, we will proceed to advance
> > the slot and we should have the same behavior as B1.
> > C. For 3, we would sync the slot, but the behavior should be the same as B.
> >
> > Thoughts?
> 
> Looks reasonable to me.
> 
> Here is the patch based on above lines.
> I am also testing and verifying the patch locally.

Attach a new version patch which fixed an un-initialized variable issue and
added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so 
that
we can analyze the possible CFbot failures easily.

Best Regards,
Hou zj


v2-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description:  v2-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch

RE: Synchronizing slots from primary to standby

2024-03-28 Thread Zhijie Hou (Fujitsu)

On Thursday, March 28, 2024 7:32 PM Amit Kapila  wrote:
> 
> On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > When analyzing one BF error[1], we find an issue of slotsync: Since we
> > don't perform logical decoding for the synced slots when syncing the
> > lsn/xmin of slot, no logical snapshots will be serialized to disk. So,
> > when user starts to use these synced slots after promotion, it needs
> > to re-build the consistent snapshot from the restart_lsn if the
> > WAL(xl_running_xacts) at restart_lsn position indicates that there are
> > running transactions. This however could cause the data that before the
> consistent point to be missed[2].
> >
> > This issue doesn't exist on the primary because the snapshot at
> > restart_lsn should have been serialized to disk
> > (SnapBuildProcessRunningXacts -> SnapBuildSerialize), so even if the
> > logical decoding restarts, it can find consistent snapshot immediately at
> restart_lsn.
> >
> > To fix this, we could use the fast forward logical decoding to advance
> > the synced slot's lsn/xmin when syncing these values instead of
> > directly updating the slot's info. This way, the snapshot will be 
> > serialized to
> disk when decoding.
> > If we could not reach to the consistent point at the remote
> > restart_lsn, the slot is marked as temp and will be persisted once it
> > reaches the consistent point. I am still analyzing the fix and will share 
> > once
> ready.
> >
> 
> Yes, we can use this but one thing to note is that
> CreateDecodingContext() will expect that the slot's and current database are
> the same. I think the reason for that is we need to check system tables of the
> current database while decoding and sending data to the output_plugin which
> won't be a requirement for the fast_forward case. So, we need to skip that
> check in fast_forward mode.

Agreed.

> 
> Next, I was thinking about the case of the first time updating the restart and
> confirmed_flush LSN while syncing the slots. I think we can keep the current
> logic as it is based on the following analysis.
> 
> For each logical slot, cases possible on the primary:
> 1. The restart_lsn doesn't have a serialized snapshot and hasn't yet reached 
> the
> consistent point.
> 2. The restart_lsn doesn't have a serialized snapshot but has reached a
> consistent point.
> 3. The restart_lsn has a serialized snapshot which means it has reached a
> consistent point as well.
> 
> Considering we keep the logic to reserve initial WAL positions the same as the
> current (Reserve WAL for the currently active local slot using the specified 
> WAL
> location (restart_lsn). If the given WAL location has been removed, reserve
> WAL using the oldest existing WAL segment.), I could think of the below
> scenarios:
> A. For 1, we shouldn't sync the slot as it still wouldn't have been marked
> persistent on the primary.
> B. For 2, we would sync the slot
>B1. If remote_restart_lsn >= local_resart_lsn, then advance the slot by 
> calling
> pg_logical_replication_slot_advance().
>B11. If we reach consistent point, then it should be okay because after
> promotion as well we should reach consistent point.
> B111. But again is it possible that there is some xact that comes
> before consistent_point on primary and the same is after consistent_point on
> standby? This shouldn't matter as we will start decoding transactions after
> confirmed_flush_lsn which would be the same on primary and standby.
>B22. If we haven't reached consistent_point, then we won't mark the 
> slot
> as persistent, and at the next sync we will do the same till it reaches
> consistent_point. At that time, the situation will be similar to B11.
>B2. If remote_restart_lsn < local_restart_lsn, then we will wait for the 
> next
> sync cycle and keep the slot as temporary. Once in the next or some
> consecutive sync cycle, we reach the condition remote_restart_lsn >=
> local_restart_lsn, we will proceed to advance the slot and we should have the
> same behavior as B1.
> C. For 3, we would sync the slot, but the behavior should be the same as B.
> 
> Thoughts?

Looks reasonable to me.

Here is the patch based on above lines.
I am also testing and verifying the patch locally.

Best Regards,
Hou zj


0001-advance-the-restart_lsn-of-synced-slots-using-logica.patch
Description:  0001-advance-the-restart_lsn-of-synced-slots-using-logica.patch

Re: Synchronizing slots from primary to standby

2024-03-28 Thread Bertrand Drouvot

Hi,

On Thu, Mar 28, 2024 at 05:05:35PM +0530, Amit Kapila wrote:
> On Thu, Mar 28, 2024 at 3:34 PM Bertrand Drouvot
>  wrote:
> >
> > On Thu, Mar 28, 2024 at 04:38:19AM +, Zhijie Hou (Fujitsu) wrote:
> >
> > > To fix this, we could use the fast forward logical decoding to advance 
> > > the synced
> > > slot's lsn/xmin when syncing these values instead of directly updating the
> > > slot's info. This way, the snapshot will be serialized to disk when 
> > > decoding.
> > > If we could not reach to the consistent point at the remote restart_lsn, 
> > > the
> > > slot is marked as temp and will be persisted once it reaches the 
> > > consistent
> > > point. I am still analyzing the fix and will share once ready.
> >
> > Thanks! I'm wondering about the performance impact (even in fast_forward 
> > mode),
> > might be worth to keep an eye on it.
> >
> 
> True, we can consider performance but correctness should be a
> priority,

Yeah of course.

> and can we think of a better way to fix this issue?

I'll keep you posted if there is one that I can think of.

> > Should we create a 17 open item [1]?
> >
> > [1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items
> >
> 
> Yes, we can do that.

done.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-03-28 Thread Amit Kapila

On Thu, Mar 28, 2024 at 3:34 PM Bertrand Drouvot
 wrote:
>
> On Thu, Mar 28, 2024 at 04:38:19AM +, Zhijie Hou (Fujitsu) wrote:
>
> > To fix this, we could use the fast forward logical decoding to advance the 
> > synced
> > slot's lsn/xmin when syncing these values instead of directly updating the
> > slot's info. This way, the snapshot will be serialized to disk when 
> > decoding.
> > If we could not reach to the consistent point at the remote restart_lsn, the
> > slot is marked as temp and will be persisted once it reaches the consistent
> > point. I am still analyzing the fix and will share once ready.
>
> Thanks! I'm wondering about the performance impact (even in fast_forward 
> mode),
> might be worth to keep an eye on it.
>

True, we can consider performance but correctness should be a
priority, and can we think of a better way to fix this issue?

> Should we create a 17 open item [1]?
>
> [1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items
>

Yes, we can do that.

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-03-28 Thread Amit Kapila

On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
 wrote:
>
> When analyzing one BF error[1], we find an issue of slotsync: Since we don't
> perform logical decoding for the synced slots when syncing the lsn/xmin of
> slot, no logical snapshots will be serialized to disk. So, when user starts to
> use these synced slots after promotion, it needs to re-build the consistent
> snapshot from the restart_lsn if the WAL(xl_running_xacts) at restart_lsn
> position indicates that there are running transactions. This however could
> cause the data that before the consistent point to be missed[2].
>
> This issue doesn't exist on the primary because the snapshot at restart_lsn
> should have been serialized to disk (SnapBuildProcessRunningXacts ->
> SnapBuildSerialize), so even if the logical decoding restarts, it can find
> consistent snapshot immediately at restart_lsn.
>
> To fix this, we could use the fast forward logical decoding to advance the 
> synced
> slot's lsn/xmin when syncing these values instead of directly updating the
> slot's info. This way, the snapshot will be serialized to disk when decoding.
> If we could not reach to the consistent point at the remote restart_lsn, the
> slot is marked as temp and will be persisted once it reaches the consistent
> point. I am still analyzing the fix and will share once ready.
>

Yes, we can use this but one thing to note is that
CreateDecodingContext() will expect that the slot's and current
database are the same. I think the reason for that is we need to check
system tables of the current database while decoding and sending data
to the output_plugin which won't be a requirement for the fast_forward
case. So, we need to skip that check in fast_forward mode.

Next, I was thinking about the case of the first time updating the
restart and confirmed_flush LSN while syncing the slots. I think we
can keep the current logic as it is based on the following analysis.

For each logical slot, cases possible on the primary:
1. The restart_lsn doesn't have a serialized snapshot and hasn't yet
reached the consistent point.
2. The restart_lsn doesn't have a serialized snapshot but has reached
a consistent point.
3. The restart_lsn has a serialized snapshot which means it has
reached a consistent point as well.

Considering we keep the logic to reserve initial WAL positions the
same as the current (Reserve WAL for the currently active local slot
using the specified WAL location (restart_lsn). If the given WAL
location has been removed, reserve WAL using the oldest existing WAL
segment.), I could think of the below scenarios:
A. For 1, we shouldn't sync the slot as it still wouldn't have been
marked persistent on the primary.
B. For 2, we would sync the slot
   B1. If remote_restart_lsn >= local_resart_lsn, then advance the
slot by calling pg_logical_replication_slot_advance().
   B11. If we reach consistent point, then it should be okay
because after promotion as well we should reach consistent point.
B111. But again is it possible that there is some xact
that comes before consistent_point on primary and the same is after
consistent_point on standby? This shouldn't matter as we will start
decoding transactions after confirmed_flush_lsn which would be the
same on primary and standby.
   B22. If we haven't reached consistent_point, then we won't mark
the slot as persistent, and at the next sync we will do the same till
it reaches consistent_point. At that time, the situation will be
similar to B11.
   B2. If remote_restart_lsn < local_restart_lsn, then we will wait
for the next sync cycle and keep the slot as temporary. Once in the
next or some consecutive sync cycle, we reach the condition
remote_restart_lsn >= local_restart_lsn, we will proceed to advance
the slot and we should have the same behavior as B1.
C. For 3, we would sync the slot, but the behavior should be the same as B.

Thoughts?

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-03-28 Thread Bertrand Drouvot

Hi,

On Thu, Mar 28, 2024 at 04:38:19AM +, Zhijie Hou (Fujitsu) wrote:
> Hi,
> 
> When analyzing one BF error[1], we find an issue of slotsync: Since we don't
> perform logical decoding for the synced slots when syncing the lsn/xmin of
> slot, no logical snapshots will be serialized to disk. So, when user starts to
> use these synced slots after promotion, it needs to re-build the consistent
> snapshot from the restart_lsn if the WAL(xl_running_xacts) at restart_lsn
> position indicates that there are running transactions. This however could
> cause the data that before the consistent point to be missed[2].

I see, nice catch and explanation, thanks!

> This issue doesn't exist on the primary because the snapshot at restart_lsn
> should have been serialized to disk (SnapBuildProcessRunningXacts ->
> SnapBuildSerialize), so even if the logical decoding restarts, it can find
> consistent snapshot immediately at restart_lsn.

Right.

> To fix this, we could use the fast forward logical decoding to advance the 
> synced
> slot's lsn/xmin when syncing these values instead of directly updating the
> slot's info. This way, the snapshot will be serialized to disk when decoding.
> If we could not reach to the consistent point at the remote restart_lsn, the
> slot is marked as temp and will be persisted once it reaches the consistent
> point. I am still analyzing the fix and will share once ready.

Thanks! I'm wondering about the performance impact (even in fast_forward mode),
might be worth to keep an eye on it.

Should we create a 17 open item [1]?

[1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

2024-03-27 Thread Zhijie Hou (Fujitsu)

Hi,

When analyzing one BF error[1], we find an issue of slotsync: Since we don't
perform logical decoding for the synced slots when syncing the lsn/xmin of
slot, no logical snapshots will be serialized to disk. So, when user starts to
use these synced slots after promotion, it needs to re-build the consistent
snapshot from the restart_lsn if the WAL(xl_running_xacts) at restart_lsn
position indicates that there are running transactions. This however could
cause the data that before the consistent point to be missed[2].

This issue doesn't exist on the primary because the snapshot at restart_lsn
should have been serialized to disk (SnapBuildProcessRunningXacts ->
SnapBuildSerialize), so even if the logical decoding restarts, it can find
consistent snapshot immediately at restart_lsn.

To fix this, we could use the fast forward logical decoding to advance the 
synced
slot's lsn/xmin when syncing these values instead of directly updating the
slot's info. This way, the snapshot will be serialized to disk when decoding.
If we could not reach to the consistent point at the remote restart_lsn, the
slot is marked as temp and will be persisted once it reaches the consistent
point. I am still analyzing the fix and will share once ready.


[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2024-03-19%2010%3A03%3A06
[2] The steps to reproduce the data miss issue on a primary->standby setup:
 
Note, we need to set LOG_SNAPSHOT_INTERVAL_MS to a bigger number(150) to
prevent cocurrent LogStandbySnapshot() call and enable sync_replication_slots 
on the standby.
 
1. Create a failover logical slot on the primary.
SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot', 
'test_decoding', false, false, true);
 
2. Use the following steps to advance the restart_lsn of the failover slot to a
position where the xl_running_xacts at that position indicates that there is
running transaction.
 
TXN1
BEGIN;
create table dummy1(a int);
 
TXN2
SELECT pg_log_standby_snapshot();
 
TXN1
COMMIT;

TXN1
BEGIN;
create table dummy2(a int);
 
TXN2
SELECT pg_log_standby_snapshot();
 
TXN1
COMMIT;
 
-- the restart_lsn will be advanced to a position where there was 1 running
transaction. And we need to wait for the restart_lsn to be synced to the
standby.
SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn());
 
-- insert some data here before calling next pg_log_standby_snapshot().
INSERT INTO reptable VALUES(999);
 
3. Promote the standby and try to consume the change(999) from the synced slot
on the standby. We will find that no change is returned.
 
select * from pg_logical_slot_get_changes('logicalslot', NULL, NULL);

Best Regards,
Hou zj

Re: Synchronizing slots from primary to standby

2024-03-15 Thread Bertrand Drouvot

Hi,

On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote:
> Hi,
> 
> Since the standby_slot_names patch has been committed, I am attaching the last
> doc patch for review.
> 

Thanks!

1 ===

+   continue subscribing to publications now on the new primary server without
+   any data loss.

I think "without any data loss" should be re-worded in this context. Data loss
in the sense "data committed on the primary and not visible on the subscriber in
case of failover" can still occurs (in case synchronous replication is not 
used).

2 ===

+   If the result (failover_ready) of both above steps is
+   true, existing subscriptions will be able to continue without data loss.
+  

I don't think that's true if synchronous replication is not used. Say,

- synchronous replication is not used
- primary is not able to reach the standby anymore and standby_slot_names is set
- new data is inserted into the primary
- then not replicated to subscriber (due to standby_slot_names)

Then I think the both above steps will return true but data would be lost in
case of failover.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

2024-03-13 Thread Zhijie Hou (Fujitsu)

Hi,

Since the standby_slot_names patch has been committed, I am attaching the last
doc patch for review.

Best Regards,
Hou zj


v109-0001-Document-the-steps-to-check-if-the-standby-is-r.patch
Description:  v109-0001-Document-the-steps-to-check-if-the-standby-is-r.patch

Re: Synchronizing slots from primary to standby

2024-03-07 Thread shveta malik

On Fri, Mar 8, 2024 at 9:56 AM Ajin Cherian  wrote:
>
>> Pushed with minor modifications. I'll keep an eye on BF.
>>
>> BTW, one thing that we should try to evaluate a bit more is the
>> traversal of slots in StandbySlotsHaveCaughtup() where we verify if
>> all the slots mentioned in standby_slot_names have received the
>> required WAL. Even if the standby_slot_names list is short the total
>> number of slots can be much larger which can lead to an increase in
>> CPU usage during traversal. There is an optimization that allows to
>> cache ss_oldest_flush_lsn and ensures that we don't need to traverse
>> the slots each time so it may not hit frequently but still there is a
>> chance. I see it is possible to further optimize this area by caching
>> the position of each slot mentioned in standby_slot_names in
>> replication_slots array but not sure whether it is worth.
>>
>>
>
> I tried to test this by configuring a large number of logical slots while 
> making sure the standby slots are at the end of the array and checking if 
> there was any performance hit in logical replication from these searches.
>

Thanks  Ajin and Nisha.

We also plan:
1) Redoing XLogSendLogical time-log related test with
'sync_replication_slots' enabled.
2) pg_recvlogical test to monitor lag in StandbySlotsHaveCaughtup()
for a large number of slots.
3) Profiling to see if StandbySlotsHaveCaughtup() is noticeable in the
report when there are a large number of slots to traverse.

thanks
Shveta

Re: Synchronizing slots from primary to standby

2024-03-07 Thread Ajin Cherian

On Fri, Mar 8, 2024 at 2:33 PM Amit Kapila  wrote:

> On Thu, Mar 7, 2024 at 12:00 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> >
> > Attach the V108 patch set which addressed above and Peter's comments.
> > I also removed the check for "*" in guc check hook.
> >
>
>
> Pushed with minor modifications. I'll keep an eye on BF.
>
> BTW, one thing that we should try to evaluate a bit more is the
> traversal of slots in StandbySlotsHaveCaughtup() where we verify if
> all the slots mentioned in standby_slot_names have received the
> required WAL. Even if the standby_slot_names list is short the total
> number of slots can be much larger which can lead to an increase in
> CPU usage during traversal. There is an optimization that allows to
> cache ss_oldest_flush_lsn and ensures that we don't need to traverse
> the slots each time so it may not hit frequently but still there is a
> chance. I see it is possible to further optimize this area by caching
> the position of each slot mentioned in standby_slot_names in
> replication_slots array but not sure whether it is worth.
>
>
>
I tried to test this by configuring a large number of logical slots while
making sure the standby slots are at the end of the array and checking if
there was any performance hit in logical replication from these searches.

Setup:
1. 1 primary server configured with 3 servers in the standby_slot_names, 1
extra logical slot (not configured for failover) + 1 logical subscriber
configures as failover + 3 physical standbys(all configured to sync logical
slots)

2. 1 primary server configured with 3 servers in the standby_slot_names,
100 extra logical slot (not configured for failover) + 1 logical subscriber
configures as failover + 3 physical standbys(all configured to sync logical
slots)

3. 1 primary server configured with 3 servers in the standby_slot_names,
500 extra logical slot (not configured for failover) + 1 logical subscriber
configures as failover + 3 physical standbys(all configured to sync logical
slots)

In the three setups, 3 standby_slot_names are compared with a list of 2,101
and 501 slots respectively.

I ran a pgbench for 15 minutes for all 3 setups:

Case 1: Average TPS - 8.143399 TPS
Case 2: Average TPS - 8.187462 TPS
Case 3: Average TPS - 8.190611 TPS

I see no degradation in the performance, the differences in performance are
well within the run to run variations seen.

Nisha also did some performance tests to record the lag introduced by the
large number of slots traversal in StandbySlotsHaveCaughtup(). The tests
logged time at the start and end of the XLogSendLogical() call (which
eventually calls WalSndWaitForWal() --> StandbySlotsHaveCaughtup())  and
calculated total time taken by this function during the load run for
different total slots count.

Setup:
--one primary with 3 standbys and one subscriber with one active
subscription
--hot_standby_feedback=off and sync_replication_slots=false
--made sure the standby slots remain at the end
ReplicationSlotCtl->replication_slots array to measure performance of worst
case scenario for standby slot search in StandbySlotsHaveCaughtup()

pgbench for 15 min was run. Here is the data:

Case1 : with 1 logical slot, standby_slot_names having 3 slots
Run1: 626.141642 secs
Run2: 631.930254 secs

Case2 : with 100 logical slots,  standby_slot_names having 3 slots
Run1: 629.38332 secs
Run2: 630.548432 secs

Case3 : with 500 logical slots,  standby_slot_names having 3 slots
Run1: 629.910829 secs
Run2: 627.924183 secs

There was no degradation in performance seen.

Thanks Nisha for helping with the testing.

regards,
Ajin Cherian
Fujitsu Australia

Re: Synchronizing slots from primary to standby

2024-03-07 Thread Amit Kapila

On Thu, Mar 7, 2024 at 12:00 PM Zhijie Hou (Fujitsu)
 wrote:
>
>
> Attach the V108 patch set which addressed above and Peter's comments.
> I also removed the check for "*" in guc check hook.
>

Pushed with minor modifications. I'll keep an eye on BF.

BTW, one thing that we should try to evaluate a bit more is the
traversal of slots in StandbySlotsHaveCaughtup() where we verify if
all the slots mentioned in standby_slot_names have received the
required WAL. Even if the standby_slot_names list is short the total
number of slots can be much larger which can lead to an increase in
CPU usage during traversal. There is an optimization that allows to
cache ss_oldest_flush_lsn and ensures that we don't need to traverse
the slots each time so it may not hit frequently but still there is a
chance. I see it is possible to further optimize this area by caching
the position of each slot mentioned in standby_slot_names in
replication_slots array but not sure whether it is worth.

-- 
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

2024-03-06 Thread Zhijie Hou (Fujitsu)

On Thursday, March 7, 2024 12:46 PM Amit Kapila  wrote:
> 
> On Thu, Mar 7, 2024 at 7:35 AM Peter Smith 
> wrote:
> >
> > Here are some review comments for v107-0001
> >
> > ==
> > src/backend/replication/slot.c
> >
> > 1.
> > +/*
> > + * Struct for the configuration of standby_slot_names.
> > + *
> > + * Note: this must be a flat representation that can be held in a single 
> > chunk
> > + * of guc_malloc'd memory, so that it can be stored as the "extra" data for
> the
> > + * standby_slot_names GUC.
> > + */
> > +typedef struct
> > +{
> > + int slot_num;
> > +
> > + /* slot_names contains nmembers consecutive nul-terminated C strings */
> > + char slot_names[FLEXIBLE_ARRAY_MEMBER];
> > +} StandbySlotConfigData;
> > +
> >
> > 1a.
> > To avoid any ambiguity this 1st field is somehow a slot ID number, I
> > felt a better name would be 'nslotnames' or even just 'n' or 'count',
> >
> 
> We can probably just add a comment above slot_num and that should be
> sufficient but I am fine with 'nslotnames' as well, in anycase let's
> add a comment for the same.

Added.

> 
> >
> > 6b.
> > IMO this function would be tidier written such that the
> > MyReplicationSlot->data.name is passed as a parameter. Then you can
> > name the function more naturally like:
> >
> > IsSlotInStandbySlotNames(const char *slot_name)
> >
> 
> +1. How about naming it as SlotExistsinStandbySlotNames(char
> *slot_name) and pass the slot_name from MyReplicationSlot? Otherwise,
> we need an Assert for MyReplicationSlot in this function.

Changed as suggested.

> 
> Also, can we add a comment like below before the loop:
> + /*
> + * XXX: We are not expecting this list to be long so a linear search
> + * shouldn't hurt but if that turns out not to be true then we can cache
> + * this information for each WalSender as well.
> + */

Added.

Attach the V108 patch set which addressed above and Peter's comments.
I also removed the check for "*" in guc check hook.

Best Regards,
Hou zj



v108-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
Description:  v108-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch

RE: Synchronizing slots from primary to standby

2024-03-06 Thread Zhijie Hou (Fujitsu)

On Thursday, March 7, 2024 10:05 AM Peter Smith  wrote:
> 
> Here are some review comments for v107-0001

Thanks for the comments.

> 
> ==
> src/backend/replication/slot.c
> 
> 1.
> +/*
> + * Struct for the configuration of standby_slot_names.
> + *
> + * Note: this must be a flat representation that can be held in a
> +single chunk
> + * of guc_malloc'd memory, so that it can be stored as the "extra" data
> +for the
> + * standby_slot_names GUC.
> + */
> +typedef struct
> +{
> + int slot_num;
> +
> + /* slot_names contains nmembers consecutive nul-terminated C strings
> +*/  char slot_names[FLEXIBLE_ARRAY_MEMBER];
> +} StandbySlotConfigData;
> +
> 
> 1a.
> To avoid any ambiguity this 1st field is somehow a slot ID number, I felt a 
> better
> name would be 'nslotnames' or even just 'n' or 'count',

Changed to 'nslotnames'.

> 
> ~
> 
> 1b.
> (fix typo)
> 
> SUGGESTION for the 2nd field comment
> slot_names is a chunk of 'n' X consecutive null-terminated C strings

Changed.

> 
> ~
> 
> 1c.
> A more explanatory name for this typedef maybe is
> 'StandbySlotNamesConfigData' ?

Changed.

> 
> ~~~
> 
> 
> 2.
> +/* This is parsed and cached configuration for standby_slot_names */
> +static StandbySlotConfigData *standby_slot_config;
> 
> 
> 2a.
> /This is parsed and cached configuration for .../This is the parsed and cached
> configuration for .../

Changed.

> 
> ~
> 
> 2b.
> Similar to above -- since this only has name information maybe it is more
> correct to call it 'standby_slot_names_config'?
> 


Changed.

> ~~~
> 
> 3.
> +/*
> + * A helper function to validate slots specified in GUC standby_slot_names.
> + *
> + * The rawname will be parsed, and the parsed result will be saved into
> + * *elemlist.
> + */
> +static bool
> +validate_standby_slots(char *rawname, List **elemlist)
> 
> /and the parsed result/and the result/
> 


Changed.

> ~~~
> 
> 4. check_standby_slot_names
> 
> + /* Need a modifiable copy of string */ rawname = pstrdup(*newval);
> 
> /copy of string/copy of the GUC string/
> 


Changed.

> ~~~
> 
> 5.
> +assign_standby_slot_names(const char *newval, void *extra) {
> + /*
> + * The standby slots may have changed, so we must recompute the oldest
> + * LSN.
> + */
> + ss_oldest_flush_lsn = InvalidXLogRecPtr;
> +
> + standby_slot_config = (StandbySlotConfigData *) extra; }
> 
> To avoid leaking don't we need to somewhere take care to free any memory
> used by a previous value (if any) of this 'standby_slot_config'?
> 

The memory of extra is maintained by the GUC mechanism. It will be
automatically freed when the associated GUC setting is no longer of interest.
See src/backend/utils/misc/README for details.

 
> ~~~
> 
> 6. AcquiredStandbySlot
> 
> +/*
> + * Return true if the currently acquired slot is specified in
> + * standby_slot_names GUC; otherwise, return false.
> + */
> +bool
> +AcquiredStandbySlot(void)
> +{
> + const char *name;
> +
> + /* Return false if there is no value in standby_slot_names */ if
> + (standby_slot_config == NULL) return false;
> +
> + name = standby_slot_config->slot_names; for (int i = 0; i <
> + standby_slot_config->slot_num; i++) { if (strcmp(name,
> + NameStr(MyReplicationSlot->data.name)) == 0) return true;
> +
> + name += strlen(name) + 1;
> + }
> +
> + return false;
> +}
> 
> 6a.
> Just checking "(standby_slot_config == NULL)" doesn't seem enough to me,
> because IIUIC it is possible when 'standby_slot_names' has no value then
> maybe standby_slot_config is not NULL but standby_slot_config->slot_num is
> 0.

The standby_slot_config will always be NULL if there is no value in it.

While checking, I did find a rare case that if there are only some white space
in the standby_slot_names, then slot_num will be 0, and have fixed it so that
standby_slot_config will always be NULL if there is no meaning value in guc.

> 
> ~
> 
> 6b.
> IMO this function would be tidier written such that the
> MyReplicationSlot->data.name is passed as a parameter. Then you can
> name the function more naturally like:
> 
> IsSlotInStandbySlotNames(const char *slot_name)

Changed it to SlotExistsInStandbySlotNames.

> 
> ~
> 
> 6c.
> IMO the body of the function will be tidier if written so there are only 2 
> returns
> instead of 3 like
> 
> SUGGESTION:
> if (...)
> {
>   for (...)
>   {
>  ...
> return true;
>   }
> }
> return false;

I personally prefer the current style.

> 
> ~~~
> 
> 7.
> + /*
> + * Don't need to wait for the standbys to catch up if there is no value
> + in
> + * standby_slot_names.
> + */
> + if (standby_slot_config == NULL)
> + return true;
> 
> (similar to a previous review comment)
> 
> This check doesn't seem enough because IIUIC it is possible when
> 'standby_slot_names' has no value then maybe standby_slot_config is not NULL
> but standby_slot_config->slot_num is 0.

Same as above.

> 
> ~~~
> 
> 8. WaitForStandbyConfirmation
> 
> + /*
> + * Don't need to wait for the standby to catch up if the current
> + acquired
> + * slot is not a

Re: Synchronizing slots from primary to standby

2024-03-06 Thread Amit Kapila

On Thu, Mar 7, 2024 at 8:37 AM shveta malik  wrote:
>

I thought about whether we can make standby_slot_names as USERSET
instead of SIGHUP and it doesn't sound like a good idea as that can
lead to inconsistent standby replicas even after configuring the
correct value of standby_slot_names. One can set a different or ''
(empty) value for a particular session and consume all changes from
the slot without waiting for the standby to acknowledge the change.
Also, it would be difficult for users to ensure whether the standby is
always ahead of subscribers. Does anyone think differently?

-- 
With Regards,
Amit Kapila.

1 2 3 4 5 6 7 8 9 >

1 - 100 of 847 matches

Mail list logo