Re: Race conditions in 019_replslot_limit.pl

Andres Freund Wed, 16 Feb 2022 18:15:04 -0800

Hi,

On 2022-02-16 20:22:23 -0500, Tom Lane wrote:
> There's no disconnection log entry for either, which I suppose means
> that somebody didn't bother logging disconnection for walsenders ...


The thing is, we actually *do* log disconnection for walsenders:

2022-02-16 17:53:55.780 PST [2459806][walsender][:0] LOG:  disconnection: 
session time: 0:00:01.321 user=andres database= host=[local]

The only way I see for the disconnection not to be logged is an immediate
shutdown.

It'd be one thing if the missing walsender disconnect wasn't logged because we
shut down just after. But unless I misunderstand something, between the
basebackup and the failing test, we actually start standby_3:

# Running: pg_basebackup -D 
/home/bf/build/buildfarm-serinus/HEAD/pgsql.build/src/test/recovery/tmp_check/t_019_replslot_limit_primary3_data/backup/my_backup
 -h /tmp/hyrtnyd2RU -p 53774 --checkpoint fast --no-sync
# Backup finished

...

# Running: pg_ctl -w -D 
/home/bf/build/buildfarm-serinus/HEAD/pgsql.build/src/test/recovery/tmp_check/t_019_replslot_limit_standby_3_data/pgdata
 -l 
/home/bf/build/buildfarm-serinus/HEAD/pgsql.build/src/test/recovery/tmp_check/log/019_replslot_limit_standby_3.log
 -o --cluster-name=standby_3 start

and also wait till replication has caught up:

Waiting for replication conn standby_3's replay_lsn to pass 0/700000 on primary3
done

and only then we have

not ok 17 - have walsender pid 3682154
# 3682136

#   Failed test 'have walsender pid 3682154
# 3682136'
#   at t/019_replslot_limit.pl line 335.
#                   '3682154
# 3682136'
#     doesn't match '(?^:^[0-9]+$)'


Starting a node in recovery and having it connect to the primary seems like a
mighty long time for a process to exit, unless it's stuck behind something.

Greetings,

Andres Freund

Re: Race conditions in 019_replslot_limit.pl

Reply via email to