Hi,
13.07.2023 23:52, Andres Freund wrote:
Backpatching indeed was no fun. Not having BackgroundPsql.pm was the worst
part. But also a lot of other conflicts in tests... Took me 5-6 hours or
so.
But I now finally pushed the fixes. Hope the buildfarm agrees with it...
Thanks for the review!
I've discovered that the test 037_invalid_database, introduced with
c66a7d75e, hangs when a server built with -DCLOBBER_CACHE_ALWAYS or with
debug_discard_caches = 1 set via TEMP_CONFIG:
echo "debug_discard_caches = 1" >/tmp/extra.config
TEMP_CONFIG=/tmp/extra.config make -s check -C src/test/recovery/
PROVE_TESTS="t/037*"
# +++ tap check in src/test/recovery +++
[09:05:48] t/037_invalid_database.pl .. 6/?
regress_log_037_invalid_database ends with:
[09:05:51.622](0.021s) # issuing query via background psql:
# CREATE DATABASE regression_invalid_interrupt;
# BEGIN;
# LOCK pg_tablespace;
# PREPARE TRANSACTION 'lock_tblspc';
[09:05:51.684](0.062s) ok 8 - blocked DROP DATABASE completion
I see two backends waiting:
law 2420132 2420108 0 09:05 ? 00:00:00 postgres: node: law
postgres [local] DROP DATABASE waiting
law 2420135 2420108 0 09:05 ? 00:00:00 postgres: node: law
postgres [local] startup waiting
and the latter's stack trace:
#0 0x00007f65c8fd3f9a in epoll_wait (epfd=9, events=0x563c40e15478, maxevents=1, timeout=-1) at
../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 0x0000563c3fa9a9fa in WaitEventSetWaitBlock (set=0x563c40e15410, cur_timeout=-1, occurred_events=0x7fff579dda80,
nevents=1) at latch.c:1570
#2 0x0000563c3fa9a8e4 in WaitEventSetWait (set=0x563c40e15410, timeout=-1, occurred_events=0x7fff579dda80, nevents=1,
wait_event_info=50331648) at latch.c:1516
#3 0x0000563c3fa99b14 in WaitLatch (latch=0x7f65c5e112e4, wakeEvents=33, timeout=0, wait_event_info=50331648) at
latch.c:538
#4 0x0000563c3fac7dee in ProcSleep (locallock=0x563c40e41e80, lockMethodTable=0x563c4007cba0 <default_lockmethod>) at
proc.c:1339
#5 0x0000563c3fab4160 in WaitOnLock (locallock=0x563c40e41e80,
owner=0x563c40ea5af8) at lock.c:1816
#6 0x0000563c3fab2c80 in LockAcquireExtended (locktag=0x7fff579dde30, lockmode=1, sessionLock=false, dontWait=false,
reportMemoryError=true, locallockp=0x7fff579dde28) at lock.c:1080
#7 0x0000563c3faaf86d in LockRelationOid (relid=1213, lockmode=1) at lmgr.c:116
#8 0x0000563c3f537aff in relation_open (relationId=1213, lockmode=1) at
relation.c:55
#9 0x0000563c3f5efde9 in table_open (relationId=1213, lockmode=1) at table.c:44
#10 0x0000563c3fca2227 in CatalogCacheInitializeCache (cache=0x563c40e8fe80) at
catcache.c:980
#11 0x0000563c3fca255e in InitCatCachePhase2 (cache=0x563c40e8fe80,
touch_index=true) at catcache.c:1083
#12 0x0000563c3fcc0556 in InitCatalogCachePhase2 () at syscache.c:184
#13 0x0000563c3fcb7db3 in RelationCacheInitializePhase3 () at relcache.c:4317
#14 0x0000563c3fce2748 in InitPostgres (in_dbname=0x563c40e54000 "postgres", dboid=5, username=0x563c40e53fe8 "law",
useroid=0, flags=1, out_dbname=0x0) at postinit.c:1177
#15 0x0000563c3fad90a7 in PostgresMain (dbname=0x563c40e54000 "postgres",
username=0x563c40e53fe8 "law") at postgres.c:4229
#16 0x0000563c3f9f01e4 in BackendRun (port=0x563c40e45360) at postmaster.c:4475
It looks like no new backend can be started due to the pg_tablespace lock,
when a new relcache file is needed during the backend initialization.
Best regards,
Alexander