Hi,

13.07.2023 23:52, Andres Freund wrote:

Backpatching indeed was no fun. Not having BackgroundPsql.pm was the worst
part. But also a lot of other conflicts in tests... Took me 5-6 hours or
so.
But I now finally pushed the fixes. Hope the buildfarm agrees with it...

Thanks for the review!

I've discovered that the test 037_invalid_database, introduced with
c66a7d75e, hangs when a server built with -DCLOBBER_CACHE_ALWAYS or with
debug_discard_caches = 1 set via TEMP_CONFIG:
echo "debug_discard_caches = 1" >/tmp/extra.config
TEMP_CONFIG=/tmp/extra.config make -s check -C src/test/recovery/ 
PROVE_TESTS="t/037*"
# +++ tap check in src/test/recovery +++
[09:05:48] t/037_invalid_database.pl .. 6/?

regress_log_037_invalid_database ends with:
[09:05:51.622](0.021s) # issuing query via background psql:
#   CREATE DATABASE regression_invalid_interrupt;
#   BEGIN;
#   LOCK pg_tablespace;
#   PREPARE TRANSACTION 'lock_tblspc';
[09:05:51.684](0.062s) ok 8 - blocked DROP DATABASE completion

I see two backends waiting:
law      2420132 2420108  0 09:05 ?        00:00:00 postgres: node: law 
postgres [local] DROP DATABASE waiting
law      2420135 2420108  0 09:05 ?        00:00:00 postgres: node: law 
postgres [local] startup waiting

and the latter's stack trace:
#0  0x00007f65c8fd3f9a in epoll_wait (epfd=9, events=0x563c40e15478, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 #1  0x0000563c3fa9a9fa in WaitEventSetWaitBlock (set=0x563c40e15410, cur_timeout=-1, occurred_events=0x7fff579dda80, nevents=1) at latch.c:1570 #2  0x0000563c3fa9a8e4 in WaitEventSetWait (set=0x563c40e15410, timeout=-1, occurred_events=0x7fff579dda80, nevents=1, wait_event_info=50331648) at latch.c:1516 #3  0x0000563c3fa99b14 in WaitLatch (latch=0x7f65c5e112e4, wakeEvents=33, timeout=0, wait_event_info=50331648) at latch.c:538 #4  0x0000563c3fac7dee in ProcSleep (locallock=0x563c40e41e80, lockMethodTable=0x563c4007cba0 <default_lockmethod>) at proc.c:1339
#5  0x0000563c3fab4160 in WaitOnLock (locallock=0x563c40e41e80, 
owner=0x563c40ea5af8) at lock.c:1816
#6  0x0000563c3fab2c80 in LockAcquireExtended (locktag=0x7fff579dde30, lockmode=1, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x7fff579dde28) at lock.c:1080
#7  0x0000563c3faaf86d in LockRelationOid (relid=1213, lockmode=1) at lmgr.c:116
#8  0x0000563c3f537aff in relation_open (relationId=1213, lockmode=1) at 
relation.c:55
#9  0x0000563c3f5efde9 in table_open (relationId=1213, lockmode=1) at table.c:44
#10 0x0000563c3fca2227 in CatalogCacheInitializeCache (cache=0x563c40e8fe80) at 
catcache.c:980
#11 0x0000563c3fca255e in InitCatCachePhase2 (cache=0x563c40e8fe80, 
touch_index=true) at catcache.c:1083
#12 0x0000563c3fcc0556 in InitCatalogCachePhase2 () at syscache.c:184
#13 0x0000563c3fcb7db3 in RelationCacheInitializePhase3 () at relcache.c:4317
#14 0x0000563c3fce2748 in InitPostgres (in_dbname=0x563c40e54000 "postgres", dboid=5, username=0x563c40e53fe8 "law", useroid=0, flags=1, out_dbname=0x0) at postinit.c:1177
#15 0x0000563c3fad90a7 in PostgresMain (dbname=0x563c40e54000 "postgres", 
username=0x563c40e53fe8 "law") at postgres.c:4229
#16 0x0000563c3f9f01e4 in BackendRun (port=0x563c40e45360) at postmaster.c:4475

It looks like no new backend can be started due to the pg_tablespace lock,
when a new relcache file is needed during the backend initialization.

Best regards,
Alexander


Reply via email to