Hi,

If a process encounters a FATAL error after acquiring a dshash lock but
before releasing it,
and it is not within a transaction, it can lead to a segmentation fault.

The FATAL error causes the backend to exit, triggering proc_exit() and
similar functions.
In the absence of a transaction, LWLockReleaseAll() is delayed until
ProcKill. ProcKill is
an on_shmem_exit callback, and dsm_backend_shutdown() is called before any
on_shmem_exit callbacks are invoked.
Consequently, if a dshash lock was acquired before the FATAL error
occurred, the lock
will only be released after dsm_backend_shutdown() detaches the DSM segment
containing
the lock, resulting in a segmentation fault.

Please find a reproducer attached. I have modified the test_dsm_registry
module to create
a background worker that does nothing but throws a FATAL error after
acquiring the dshash lock.
The reason this must be executed in the background worker is to ensure it
runs without a transaction.

To trigger the segmentation fault, apply the 0001-Reproducer* patch, run
make install in the
test_dsm_registry module, specify test_dsm_registry as
shared_preload_libraries in postgresql.conf,
and start the server.

Please find attached a fix to call LWLockReleaseAll() early in the
shmem_exit() routine. This ensures
that the dshash lock is released before dsm_backend_shutdown() is called.
This will  also ensure that
any subsequent callbacks invoked in shmem_exit() will not fail to acquire
any lock.

Please see the backtrace below.

```
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055a7515af56c in pg_atomic_fetch_sub_u32_impl (ptr=0x7f92c4b334f4,
sub_=262144)
    at ../../../../src/include/port/atomics/generic-gcc.h:218
218             return __sync_fetch_and_sub(&ptr->value, sub_);
(gdb) bt
#0  0x000055a7515af56c in pg_atomic_fetch_sub_u32_impl (ptr=0x7f92c4b334f4,
sub_=262144)
    at ../../../../src/include/port/atomics/generic-gcc.h:218
#1  0x000055a7515af625 in pg_atomic_sub_fetch_u32_impl (ptr=0x7f92c4b334f4,
sub_=262144)
    at ../../../../src/include/port/atomics/generic.h:232
#2  0x000055a7515af709 in pg_atomic_sub_fetch_u32 (ptr=0x7f92c4b334f4,
sub_=262144)
    at ../../../../src/include/port/atomics.h:441
#3  0x000055a7515b1583 in LWLockReleaseInternal (lock=0x7f92c4b334f0,
mode=LW_EXCLUSIVE) at lwlock.c:1840
#4  0x000055a7515b1638 in LWLockRelease (lock=0x7f92c4b334f0) at
lwlock.c:1902
#5  0x000055a7515b16e9 in LWLockReleaseAll () at lwlock.c:1951
#6  0x000055a7515ba63d in ProcKill (code=1, arg=0) at proc.c:953
#7  0x000055a7515913af in shmem_exit (code=1) at ipc.c:276
#8  0x000055a75159119b in proc_exit_prepare (code=1) at ipc.c:198
#9  0x000055a7515910df in proc_exit (code=1) at ipc.c:111
#10 0x000055a7517be71d in errfinish (filename=0x7f92ce41d062
"test_dsm_registry.c", lineno=187,
    funcname=0x7f92ce41d160 <__func__.0> "TestDSMRegistryMain") at
elog.c:596
#11 0x00007f92ce41ca62 in TestDSMRegistryMain (main_arg=0) at
test_dsm_registry.c:187
#12 0x000055a7514db00c in BackgroundWorkerMain
(startup_data=0x55a752dd8028, startup_data_len=1472)
    at bgworker.c:846
#13 0x000055a7514de1e8 in postmaster_child_launch (child_type=B_BG_WORKER,
child_slot=239,
    startup_data=0x55a752dd8028, startup_data_len=1472, client_sock=0x0) at
launch_backend.c:268
#14 0x000055a7514e530d in StartBackgroundWorker (rw=0x55a752dd8028) at
postmaster.c:4168
#15 0x000055a7514e55a4 in maybe_start_bgworkers () at postmaster.c:4334
#16 0x000055a7514e4200 in LaunchMissingBackgroundProcesses () at
postmaster.c:3408
#17 0x000055a7514e205b in ServerLoop () at postmaster.c:1728
#18 0x000055a7514e18b0 in PostmasterMain (argc=3, argv=0x55a752dd0e70) at
postmaster.c:1403
#19 0x000055a75138eead in main (argc=3, argv=0x55a752dd0e70) at main.c:231
```

Thank you,
Rahila Syed

Attachment: 0001-Reproducer-segmentation-fault-dshash.patch
Description: Binary data

Attachment: 0001-Fix-the-seg-fault-during-proc-exit.patch
Description: Binary data

Reply via email to