Starting 2019-11-17, jacana and bowerbird (different compiler, same machine?)
have failed four times like this:

# poll_query_until timed out executing this query:
# SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' FROM 
pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub';
# expecting this output:
# t
# last actual query output:
# 
# with stderr:
# Looks like your test exited with 25 just after 8.
[11:06:11] t/001_rep_changes.pl .. 
Dubious, test returned 25 (wstat 6400, 0x1900)
Failed 9/17 subtests 

Every such run:
  sysname  │      snapshot       │ branch │                                     
        bfurl
───────────┼─────────────────────┼────────┼───────────────────────────────────────────────────────────────────────────────────────────────
 bowerbird │ 2019-11-17 15:22:42 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2019-11-17%2015%3A22%3A42
 bowerbird │ 2020-01-10 17:30:49 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2020-01-10%2017%3A30%3A49
 jacana    │ 2020-04-05 00:00:27 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2020-04-05%2000%3A00%3A27
 jacana    │ 2020-04-16 00:00:27 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2020-04-16%2000%3A00%3A27

The dates and branches suggest either a v13 regression (hence my concern) or a
regression in the underlying machine.  In each affected run, other tests
completed at their normal speed, but 001_rep_changes.pl failed after 397s (a
normal run takes <30s).  In the publisher log, the failed run[1] is missing
"replication connection authorized", "IDENTIFY_SYSTEM", etc.  Subscriber logs
do not differ; failed and good runs both have 'logical replication apply
worker for subscription "tap_sub" has started'.  That suggests a subscriber
stuck in ApplyWorkerMain(), somewhere between that log message and the end of
walrcv_connect.  I failed to reproduce this.  Andrew, are you interested in
attempting to reproduce it and/or identify the blockage?

Thanks,
nm


==== [1] master log with delay (failed run)
...
2020-04-15 20:56:40.334 EDT [5e97ad48.247c:3] 001_rep_changes.pl LOG:  
statement: DELETE FROM tab_rep
2020-04-15 20:56:40.334 EDT [5e97ad48.247c:4] 001_rep_changes.pl LOG:  
disconnection: session time: 0:00:00.015 user=pgrunner database=postgres 
host=127.0.0.1 port=65317
2020-04-15 20:56:40.381 EDT [5e97ad39.1964:4] LOG:  received fast shutdown 
request
2020-04-15 20:56:40.381 EDT [5e97ad39.1964:5] LOG:  aborting any active 
transactions
2020-04-15 20:56:40.396 EDT [5e97ad39.1964:6] LOG:  background worker "logical 
replication launcher" (PID 9920) exited with exit code 1
2020-04-15 20:56:40.396 EDT [5e97ad39.1a7c:1] LOG:  shutting down
2020-04-15 20:56:40.538 EDT [5e97ad47.1ce8:9] tap_sub LOG:  disconnection: 
session time: 0:00:00.541 user=pgrunner database=postgres host=127.0.0.1 
port=65313
2020-04-15 20:56:40.569 EDT [5e97ad39.1964:7] LOG:  database system is shut down
2020-04-15 20:56:40.780 EDT [5e97ad48.c0:1] LOG:  starting PostgreSQL 13devel 
on x86_64-w64-mingw32, compiled by x86_64-w64-mingw32-gcc.exe 
(x86_64-win32-sjlj-rev0, Built by MinGW-W64 project) 7.3.0, 64-bit
2020-04-15 20:56:40.780 EDT [5e97ad48.c0:2] LOG:  listening on IPv4 address 
"127.0.0.1", port 58418
2020-04-15 20:56:40.811 EDT [5e97ad48.36c:1] LOG:  database system was shut 
down at 2020-04-15 20:56:40 EDT
2020-04-15 20:56:40.873 EDT [5e97ad48.c0:3] LOG:  database system is ready to 
accept connections
2020-04-15 20:56:41.015 EDT [5e97ad49.1888:1] [unknown] LOG:  connection 
received: host=127.0.0.1 port=65319
2020-04-15 20:56:41.029 EDT [5e97ad49.1888:2] [unknown] LOG:  connection 
authorized: user=pgrunner database=postgres application_name=001_rep_changes.pl
2020-04-15 20:56:41.044 EDT [5e97ad49.1888:3] 001_rep_changes.pl LOG:  
statement: SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' 
FROM pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub';
... repeated a total of 1800 times ...
2020-04-15 21:02:58.960 EDT [5e97aec2.21f8:1] [unknown] LOG:  connection 
received: host=127.0.0.1 port=50920
2020-04-15 21:02:58.960 EDT [5e97aec2.21f8:2] [unknown] LOG:  connection 
authorized: user=pgrunner database=postgres application_name=001_rep_changes.pl
2020-04-15 21:02:58.975 EDT [5e97aec2.21f8:3] 001_rep_changes.pl LOG:  
statement: SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' 
FROM pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub';
2020-04-15 21:02:58.975 EDT [5e97aec2.21f8:4] 001_rep_changes.pl LOG:  
disconnection: session time: 0:00:00.015 user=pgrunner database=postgres 
host=127.0.0.1 port=50920
<test script gives up hope>
2020-04-15 21:02:59.148 EDT [5e97ad48.c0:4] LOG:  received immediate shutdown 
request
2020-04-15 21:02:59.148 EDT [5e97ad48.2148:1] WARNING:  terminating connection 
because of crash of another server process
2020-04-15 21:02:59.148 EDT [5e97ad48.2148:2] DETAIL:  The postmaster has 
commanded this server process to roll back the current transaction and exit, 
because another server process exited abnormally and possibly corrupted shared 
memory.
2020-04-15 21:02:59.148 EDT [5e97ad48.2148:3] HINT:  In a moment you should be 
able to reconnect to the database and repeat your command.
2020-04-15 21:02:59.164 EDT [5e97ad48.c0:5] LOG:  database system is shut down

==== master log without delay (good run)
...
2020-04-15 11:17:24.868 EDT [5e972584.19a0:3] 001_rep_changes.pl LOG:  
statement: DELETE FROM tab_rep
2020-04-15 11:17:24.868 EDT [5e972584.19a0:4] 001_rep_changes.pl LOG:  
disconnection: session time: 0:00:00.015 user=pgrunner database=postgres 
host=127.0.0.1 port=60986
2020-04-15 11:17:24.931 EDT [5e972575.1594:4] LOG:  received fast shutdown 
request
2020-04-15 11:17:24.931 EDT [5e972575.1594:5] LOG:  aborting any active 
transactions
2020-04-15 11:17:24.947 EDT [5e972575.1594:6] LOG:  background worker "logical 
replication launcher" (PID 9988) exited with exit code 1
2020-04-15 11:17:24.947 EDT [5e972575.185c:1] LOG:  shutting down
2020-04-15 11:17:25.072 EDT [5e972584.560:9] tap_sub LOG:  disconnection: 
session time: 0:00:00.636 user=pgrunner database=postgres host=127.0.0.1 
port=60983
2020-04-15 11:17:25.104 EDT [5e972575.1594:7] LOG:  database system is shut down
2020-04-15 11:17:25.312 EDT [5e972585.164:1] LOG:  starting PostgreSQL 13devel 
on x86_64-w64-mingw32, compiled by x86_64-w64-mingw32-gcc.exe 
(x86_64-win32-sjlj-rev0, Built by MinGW-W64 project) 7.3.0, 64-bit
2020-04-15 11:17:25.312 EDT [5e972585.164:2] LOG:  listening on IPv4 address 
"127.0.0.1", port 60213
2020-04-15 11:17:25.343 EDT [5e972585.c7c:1] LOG:  database system was shut 
down at 2020-04-15 11:17:25 EDT
2020-04-15 11:17:25.390 EDT [5e972585.164:3] LOG:  database system is ready to 
accept connections
2020-04-15 11:17:25.531 EDT [5e972585.d40:1] [unknown] LOG:  connection 
received: host=127.0.0.1 port=60988
2020-04-15 11:17:25.552 EDT [5e972585.d40:2] [unknown] LOG:  connection 
authorized: user=pgrunner database=postgres application_name=001_rep_changes.pl
2020-04-15 11:17:25.567 EDT [5e972585.d40:3] 001_rep_changes.pl LOG:  
statement: SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' 
FROM pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub';
2020-04-15 11:17:25.583 EDT [5e972585.d40:4] 001_rep_changes.pl LOG:  
disconnection: session time: 0:00:00.051 user=pgrunner database=postgres 
host=127.0.0.1 port=60988
2020-04-15 11:17:25.646 EDT [5e972585.1044:1] [unknown] LOG:  connection 
received: host=127.0.0.1 port=60987
2020-04-15 11:17:25.646 EDT [5e972585.1044:2] [unknown] LOG:  replication 
connection authorized: user=pgrunner application_name=tap_sub
2020-04-15 11:17:25.661 EDT [5e972585.1044:3] tap_sub LOG:  received 
replication command: IDENTIFY_SYSTEM
2020-04-15 11:17:25.661 EDT [5e972585.1044:4] tap_sub LOG:  received 
replication command: START_REPLICATION SLOT "tap_sub" LOGICAL 0/15C1DF8 
(proto_version '1', publication_names '"tap_pub_ins_only"')
2020-04-15 11:17:25.661 EDT [5e972585.1044:5] tap_sub LOG:  starting logical 
decoding for slot "tap_sub"
2020-04-15 11:17:25.661 EDT [5e972585.1044:6] tap_sub DETAIL:  Streaming 
transactions committing after 0/15A7678, reading WAL from 0/15A7678.
2020-04-15 11:17:25.661 EDT [5e972585.1044:7] tap_sub LOG:  logical decoding 
found consistent point at 0/15A7678
2020-04-15 11:17:25.661 EDT [5e972585.1044:8] tap_sub DETAIL:  There are no 
running transactions.
2020-04-15 11:17:25.787 EDT [5e972585.1878:1] [unknown] LOG:  connection 
received: host=127.0.0.1 port=60993
2020-04-15 11:17:25.803 EDT [5e972585.1878:2] [unknown] LOG:  connection 
authorized: user=pgrunner database=postgres application_name=001_rep_changes.pl
2020-04-15 11:17:25.803 EDT [5e972585.1878:3] 001_rep_changes.pl LOG:  
statement: SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' 
FROM pg_catalog.pg_stat_replication WHERE application_name = 'tap_sub';
2020-04-15 11:17:25.819 EDT [5e972585.1878:4] 001_rep_changes.pl LOG:  
disconnection: session time: 0:00:00.031 user=pgrunner database=postgres 
host=127.0.0.1 port=60993
2020-04-15 11:17:26.102 EDT [5e972586.fcc:1] [unknown] LOG:  connection 
received: host=127.0.0.1 port=60996
2020-04-15 11:17:26.102 EDT [5e972586.fcc:2] [unknown] LOG:  connection 
authorized: user=pgrunner database=postgres application_name=001_rep_changes.pl
2020-04-15 11:17:26.117 EDT [5e972586.fcc:3] 001_rep_changes.pl LOG:  
statement: ALTER PUBLICATION tap_pub_ins_only SET (publish = 'insert, delete')
...


Reply via email to