On Tue, Mar 29, 2022 at 11:55:05AM -0400, Robert Haas wrote:
> On Mon, Mar 28, 2022 at 3:08 PM Robert Haas <[email protected]> wrote:
> > smgrcreate() as we would for most WAL records or whether it should be
> > adopting the new system introduced by
> > 49d9cfc68bf4e0d32a948fe72d5a0ef7f464944e. I wrote about this concern
> > over here:
> >
> > http://postgr.es/m/CA+TgmoYcUPL+WOJL2ZzhH=zmrhj0iOQ=icfm0suyqbbqzea...@mail.gmail.com
> >
> > But apart from that question your adaptations here look reasonable to me.
>
> That commit having been reverted, I committed v6 instead. Let's see
> what breaks...
There's a crash
2022-07-31 01:22:51.437 CDT client backend[13362] [unknown] PANIC: could not
open critical system index 2662
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007efe27999801 in __GI_abort () at abort.c:79
#2 0x00005583891941dc in errfinish (filename=<optimized out>,
filename@entry=0x558389420437 "relcache.c", lineno=lineno@entry=4328,
funcname=funcname@entry=0x558389421680 <__func__.33178>
"load_critical_index") at elog.c:675
#3 0x00005583891713ef in load_critical_index (indexoid=indexoid@entry=2662,
heapoid=heapoid@entry=1259) at relcache.c:4328
#4 0x0000558389172667 in RelationCacheInitializePhase3 () at relcache.c:4103
#5 0x00005583891b93a4 in InitPostgres
(in_dbname=in_dbname@entry=0x55838a50d468 "a", dboid=dboid@entry=0,
username=username@entry=0x55838a50d448 "pryzbyj", useroid=useroid@entry=0,
load_session_libraries=<optimized out>,
override_allow_connections=override_allow_connections@entry=false,
out_dbname=0x0) at postinit.c:1087
#6 0x0000558388daa7bb in PostgresMain (dbname=0x55838a50d468 "a",
username=username@entry=0x55838a50d448 "pryzbyj") at postgres.c:4081
#7 0x0000558388b9f423 in BackendRun (port=port@entry=0x55838a505dd0) at
postmaster.c:4490
#8 0x0000558388ba6e07 in BackendStartup (port=port@entry=0x55838a505dd0) at
postmaster.c:4218
#9 0x0000558388ba747f in ServerLoop () at postmaster.c:1808
#10 0x0000558388ba8f93 in PostmasterMain (argc=7, argv=<optimized out>) at
postmaster.c:1480
#11 0x0000558388840e1f in main (argc=7, argv=0x55838a4dc000) at main.c:197
while :; do psql -qh /tmp postgres -c "DROP DATABASE a" -c "CREATE DATABASE a
TEMPLATE postgres STRATEGY wal_log"; done
# Run this for a few loops and then ^C or hold down ^C until it stops,
# and then connect to postgres and try to connect to 'a':
postgres=# \c a
2022-07-31 01:22:51.437 CDT client backend[13362] [unknown] PANIC: could not
open critical system index 2662
Unfortunately, that isn't very consistent, and you have have to run it a bunch
of times...
I don't know if it's an issue of any significance that CREATE DATABASE / ^C
leaves behind a broken database, but it is an issue that the cluster crashes.
While struggling to reproduce that problem, I also hit this warning, which may
or may not be the same. I added an abort() after WARNING in aset.c to get a
backtrace.
WARNING: problem in alloc set PortalContext: bogus aset link in block
0x55a63f2f9d60, chunk 0x55a63f2fb138
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No existe el archivo o el
directorio.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007f81144f1801 in __GI_abort () at abort.c:79
#2 0x000055a63c834c5d in AllocSetCheck (context=context@entry=0x55a63f26fea0)
at aset.c:1491
#3 0x000055a63c835b09 in AllocSetDelete (context=0x55a63f26fea0) at aset.c:638
#4 0x000055a63c854322 in MemoryContextDelete (context=0x55a63f26fea0) at
mcxt.c:252
#5 0x000055a63c8591d5 in PortalDrop (portal=portal@entry=0x55a63f2bb7a0,
isTopCommit=isTopCommit@entry=false) at portalmem.c:596
#6 0x000055a63c3e4a7b in exec_simple_query
(query_string=query_string@entry=0x55a63f24db90 "CREATE DATABASE a TEMPLATE
postgres STRATEGY wal_log ;") at postgres.c:1253
#7 0x000055a63c3e7fc1 in PostgresMain (dbname=<optimized out>,
username=username@entry=0x55a63f279448 "pryzbyj") at postgres.c:4505
#8 0x000055a63c1dc423 in BackendRun (port=port@entry=0x55a63f271dd0) at
postmaster.c:4490
#9 0x000055a63c1e3e07 in BackendStartup (port=port@entry=0x55a63f271dd0) at
postmaster.c:4218
#10 0x000055a63c1e447f in ServerLoop () at postmaster.c:1808
#11 0x000055a63c1e5f93 in PostmasterMain (argc=7, argv=<optimized out>) at
postmaster.c:1480
#12 0x000055a63be7de1f in main (argc=7, argv=0x55a63f248000) at main.c:197
I reproduced that by running this a couple dozen times in an interactive psql.
It doesn't seem to affect STRATEGY=file_copy.
SET statement_timeout=0; DROP DATABASE a; SET statement_timeout='60ms'; CREATE
DATABASE a TEMPLATE postgres STRATEGY wal_log ; \c a \c postgres
Also, if I understand correctly, this patch seems to assume that nobody is
connected to the source database. But what's actually enforced is just that
nobody *else* is connected. Is it any issue that the current DB can be used as
a source? Anyway, both of the above problems are reproducible using a
different database.
|postgres=# CREATE DATABASE new TEMPLATE postgres STRATEGY wal_log;
|CREATE DATABASE
--
Justin