Andres Freund <[email protected]> writes:
> I think the test is telling us that something may be broken. We shouldn't
> silence that without at least some understanding what it is.
I looked at the recent failure on komodoensis [1], and I think what is
happening is just that the walsender for the basebackup run (launched
at 019_replslot_limit.pl line 325) hasn't exited yet at the point where
we do a blind
"SELECT pid FROM pg_stat_activity WHERE backend_type = 'walsender'"
and expect that we're only going to see the walsender launched for
the standby at line 331. The two PIDs reported in the failure
correspond to this postmaster log trace:
2022-02-16 23:06:29.596 CET [620d7565.38dd62:1] LOG: connection received:
host=[local]
2022-02-16 23:06:29.596 CET [620d7565.38dd62:2] LOG: replication connection
authorized: user=bf application_name=019_replslot_limit.pl
2022-02-16 23:06:29.596 CET [620d7565.38dd62:3] LOG: received replication
command: SHOW data_directory_mode
2022-02-16 23:06:29.596 CET [620d7565.38dd62:4] STATEMENT: SHOW
data_directory_mode
2022-02-16 23:06:29.596 CET [620d7565.38dd62:5] LOG: received replication
command: CREATE_REPLICATION_SLOT "pg_basebackup_3726690" TEMPORARY PHYSICAL (
RESERVE_WAL)
2022-02-16 23:06:29.596 CET [620d7565.38dd62:6] STATEMENT:
CREATE_REPLICATION_SLOT "pg_basebackup_3726690" TEMPORARY PHYSICAL (
RESERVE_WAL)
2022-02-16 23:06:29.597 CET [620d7565.38dd62:7] LOG: received replication
command: IDENTIFY_SYSTEM
2022-02-16 23:06:29.597 CET [620d7565.38dd62:8] STATEMENT: IDENTIFY_SYSTEM
2022-02-16 23:06:29.597 CET [620d7565.38dd62:9] LOG: received replication
command: START_REPLICATION SLOT "pg_basebackup_3726690" 0/600000 TIMELINE 1
2022-02-16 23:06:29.597 CET [620d7565.38dd62:10] STATEMENT: START_REPLICATION
SLOT "pg_basebackup_3726690" 0/600000 TIMELINE 1
and this one:
2022-02-16 23:06:29.687 CET [620d7565.38dd6f:1] LOG: connection received:
host=[local]
2022-02-16 23:06:29.687 CET [620d7565.38dd6f:2] LOG: replication connection
authorized: user=bf application_name=standby_3
2022-02-16 23:06:29.687 CET [620d7565.38dd6f:3] LOG: received replication
command: IDENTIFY_SYSTEM
2022-02-16 23:06:29.687 CET [620d7565.38dd6f:4] STATEMENT: IDENTIFY_SYSTEM
2022-02-16 23:06:29.687 CET [620d7565.38dd6f:5] LOG: received replication
command: START_REPLICATION SLOT "rep3" 0/700000 TIMELINE 1
2022-02-16 23:06:29.687 CET [620d7565.38dd6f:6] STATEMENT: START_REPLICATION
SLOT "rep3" 0/700000 TIMELINE 1
There's no disconnection log entry for either, which I suppose means
that somebody didn't bother logging disconnection for walsenders ...
shouldn't we fix that? But in any case, I don't see anything
interesting here, just a query that needs to be more selective.
Perhaps we can look for application_name=standby_3?
regards, tom lane
[1]
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=komodoensis&dt=2022-02-16%2021%3A16%3A04