Re: Synchronizing slots from primary to standby

Amit Kapila Thu, 15 Feb 2024 23:42:56 -0800

On Fri, Feb 16, 2024 at 11:43 AM Amit Kapila <[email protected]> wrote:
>
> Thanks for noticing this. I have pushed all your debug patches. Let's
> hope if there is a BF failure next time, we can gather enough
> information to know the reason of the same.
>


There is a new BF failure [1] after adding these LOGs and I think I
know what is going wrong. First, let's look at standby LOGs:

2024-02-16 06:18:18.442 UTC [241414][client backend][2/14:0] DEBUG:
segno: 4 of purposed restart_lsn for the synced slot, oldest_segno: 4
available
2024-02-16 06:18:18.443 UTC [241414][client backend][2/14:0] DEBUG:
xmin required by slots: data 0, catalog 741
2024-02-16 06:18:18.443 UTC [241414][client backend][2/14:0] LOG:mote
 could not sync slot information as reslot precedes local slot: remote
slot "lsub1_slot": LSN (0/4000168), catalog xmin (739) local slot: LSN
(0/4000168), catalog xmin (741)

So, from the above LOG, it is clear that the remote slot's catalog
xmin (739) precedes the local catalog xmin (741) which makes the sync
on standby to not complete.

Next, let's look at the LOG from the primary during the nearby time:
2024-02-16 06:18:11.354 UTC [238037][autovacuum worker][5/17:0] DEBUG:
 analyzing "pg_catalog.pg_depend"
2024-02-16 06:18:11.360 UTC [238037][autovacuum worker][5/17:0] DEBUG:
 "pg_depend": scanned 13 of 13 pages, containing 1709 live rows and 0
dead rows; 1709 rows in sample, 1709 estimated total rows
...
2024-02-16 06:18:11.372 UTC [238037][autovacuum worker][5/0:0] DEBUG:
Autovacuum VacuumUpdateCosts(db=1, rel=14050, dobalance=yes,
cost_limit=200, cost_delay=2 active=yes failsafe=no)
2024-02-16 06:18:11.372 UTC [238037][autovacuum worker][5/19:0] DEBUG:
 analyzing "information_schema.sql_features"
2024-02-16 06:18:11.377 UTC [238037][autovacuum worker][5/19:0] DEBUG:
 "sql_features": scanned 8 of 8 pages, containing 756 live rows and 0
dead rows; 756 rows in sample, 756 estimated total rows

It shows us that autovacuum worker has analyzed catalog table and for
updating its statistics in pg_statistic table, it would have acquired
a new transaction id. Now, after the slot creation, a new transaction
id that has updated the catalog is generated on primary and would have
been replication to standby. Due to this catalog_xmin of primary's
slot would precede standby's catalog_xmin and we see this failure.

As per this theory, we should disable autovacuum on primary to avoid
updates to catalog_xmin values.




[1] - 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2024-02-16%2006%3A12%3A59

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

Reply via email to