On Mon, Apr 8, 2024 at 7:01 PM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > Thanks for pushing. > > I checked the BF status, and noticed one BF failure, which I think is related > to > a miss in the test code. > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-04-08%2012%3A04%3A27 > > From the following log, I can see the sync failed because the standby is > lagging behind of the failover slot. > > ----- > # No postmaster PID for node "cascading_standby" > error running SQL: 'psql:<stdin>:1: ERROR: skipping slot synchronization as > the received slot sync LSN 0/4000148 for slot "snap_test_slot" is ahead of > the standby position 0/4000114' > while running 'psql -XAtq -d port=50074 host=/tmp/t4HQFlrDmI > dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT > pg_sync_replication_slots();' at > /home/bf/bf-build/adder/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm > line 2042. > # Postmaster PID for node "publisher" is 3715298 > ----- > > I think it's because we missed to call wait_for_replay_catchup before syncing > slots. > > ----- > $primary->safe_psql('postgres', > "SELECT pg_create_logical_replication_slot('snap_test_slot', > 'test_decoding', false, false, true);" > ); > # ? missed to wait here > $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();"); > ----- > > While testing, I noticed another place where we were calling > wait_for_replay_catchup before doing pg_replication_slot_advance, which also > has > a small possibility to cause the failover slot to be ahead of the standby if > some logs are written in between these two steps. So, I adjusted them > together. > > Here is a small patch to improve the test. >
LGTM. I'll push this tomorrow morning unless there are any more comments or suggestions. -- With Regards, Amit Kapila.