Hello,

Gabriele just reported a crash when changing synchronized_standby_slots
under SIGHUP and logging collector working.  The problem apparently is
that validate_sync_standby_slots is run for the GUC check routine, and
it requires acquiring an LWLock; but syslogger doesn't have a PGPROC so
that doesn't work.

Apparently we already have a hack in place against the same problem in
postmaster (and elsewhere?), by testing that ReplicationSlotCtl is not
null.  But that hack seems to be incomplete, as the crash here attests.

To reproduce, simply start with no synchronized_standby_slots setting
and logging_collector=on, then change the value in postgresql.conf and
reload.

One option to fix this might be as attached.  The point here is that
processes without a PGPROC don't need or care to have a correct setting,
so we just skip verifying it on those.  AFAICS this is more general than
the test for ReplicationSlotCtl, so I just removed that one.

Here's the backtrace Gabriele showed me.

[New LWP 29]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: freddie: logger                  '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  pg_atomic_read_u32_impl (ptr=0x7f0cf21a1f84) at 
./build/../src/include/port/atomics/generic.h:48
Download failed: Invalid argument.  Continuing without source file 
./build/src/backend/storage/lmgr/./build/../src/include/port/atomics/generic.h.
48      ./build/../src/include/port/atomics/generic.h: No such file or 
directory.
#0  pg_atomic_read_u32_impl (ptr=0x7f0cf21a1f84) at 
./build/../src/include/port/atomics/generic.h:48
No locals.
#1  pg_atomic_read_u32 (ptr=0x7f0cf21a1f84) at 
./build/../src/include/port/atomics.h:237
No locals.
#2  LWLockAttemptLock (lock=lock@entry=0x7f0cf21a1f80, 
mode=mode@entry=LW_SHARED) at ./build/../src/backend/storage/lmgr/lwlock.c:796
        old_state = <optimized out>
#3  0x0000557cbe97ba60 in LWLockAcquire (lock=0x7f0cf21a1f80, 
mode=mode@entry=LW_SHARED) at ./build/../src/backend/storage/lmgr/lwlock.c:1235
        mustwait = <optimized out>
        proc = 0x0
        result = true
        extraWaits = 0
        __func__ = "LWLockAcquire"
#4  0x0000557cbe935919 in validate_sync_standby_slots (elemlist=0x7ffda5aa60d0, 
rawname=0x557cfd704c98 "_cnpg_freddie_2") at 
./build/../src/backend/replication/slot.c:2443
        ok = true
        ok = <optimized out>
        name = <optimized out>
        name__outerloop = <optimized out>
        name__state = {l = <optimized out>, i = <optimized out>}
        slot = <optimized out>
#5  check_synchronized_standby_slots (newval=<optimized out>, 
extra=0x7ffda5aa6300, source=<optimized out>) at 
./build/../src/backend/replication/slot.c:2494
        rawname = 0x557cfd704c98 "_cnpg_freddie_2"
        ptr = <optimized out>
        elemlist = 0x557cfd7041c8
        size = <optimized out>
        ok = <optimized out>
        config = <optimized out>
#6  0x0000557cbeae95de in call_string_check_hook 
(conf=conf@entry=0x557cbee2f8b0 <ConfigureNamesString+14000>, 
newval=newval@entry=0x7ffda5aa62f8, extra=extra@entry=0x7ffda5aa6300, 
source=source@entry=PGC_S_FILE, elevel=elevel@entry=12) at 
./build/../src/backend/utils/misc/guc.c:6925
        _save_exception_stack = 0x0
        _save_context_stack = 0x0
        _local_sigjmp_buf = {{__jmpbuf = {93994266851504, -6990900416883569069, 
140727382860536, 12, 93994266851504, 93995316414744, -6990900416982135213, 
-3748856269321122221}, __mask_was_saved = 0, __saved_mask = {__val = 
{140727382860536, 140727382860272, 93995316305072, 139693771467072, 
93994263427300, 4662694561863172096, 93994263387537, 140727382860536, 
93994263505961, 93994266851504, 140727382860536, 140727382860336, 
93994263421816, 140727382860536, 12, 93994266851504}}}}
        _do_rethrow = false
        result = true
        __func__ = "call_string_check_hook"
#7  0x0000557cbeaec1ba in parse_and_validate_value (record=0x557cbee2f8b0 
<ConfigureNamesString+14000>, name=0x557cfd765840 "synchronized_standby_slots", 
value=<optimized out>, source=PGC_S_FILE, elevel=12, newval=0x7ffda5aa62f8, 
newextra=0x7ffda5aa6300) at ./build/../src/backend/utils/misc/guc.c:3260
        conf = 0x557cbee2f8b0 <ConfigureNamesString+14000>
        __func__ = "parse_and_validate_value"
#8  0x0000557cbeaecd6f in set_config_with_handle (name=0x557cfd765840 
"synchronized_standby_slots", handle=<optimized out>, value=0x557cfd765868 
"_cnpg_freddie_2", context=PGC_SIGHUP, source=PGC_S_FILE, srole=10, 
action=GUC_ACTION_SET, changeVal=<optimized out>, elevel=12, is_reload=false) 
at ./build/../src/backend/utils/misc/guc.c:4013
        conf = 0x557cbee2f8b0 <ConfigureNamesString+14000>
        orig_context = PGC_SIGHUP
        orig_source = PGC_S_FILE
        orig_srole = 10
        record = 0x557cbee2f8b0 <ConfigureNamesString+14000>
        newval_union = {boolval = 24, intval = -42858216, realval = 
4.6439856710502817e-310, stringval = 0x557cfd720918 "_cnpg_freddie_2", enumval 
= -42858216}
        newextra = 0x0
        prohibitValueChange = false
        makeDefault = <optimized out>
        __func__ = "set_config_with_handle"
#9  0x0000557cbeaef400 in set_config_option (is_reload=false, elevel=0, 
changeVal=true, action=GUC_ACTION_SET, source=PGC_S_FILE, context=PGC_SIGHUP, 
value=<optimized out>, name=<optimized out>) at 
./build/../src/backend/utils/misc/guc.c:3363
        srole = 10
        srole = <optimized out>
#10 ProcessConfigFileInternal (context=context@entry=PGC_SIGHUP, 
applySettings=applySettings@entry=true, elevel=elevel@entry=13) at 
./build/../src/backend/utils/misc/guc.c:557
        pre_value = 0x0
        scres = <optimized out>
        error = false
        applying = true
        ConfFileWithError = 0x557cfd720100 
"/var/lib/postgresql/data/pgdata/postgresql.conf"
        item = 0x557cfd765880
        head = 0x557cfd704248
        tail = 0x557cfd7664c0
        status = {hashp = 0x557cfd7219c8, curBucket = 512, curEntry = 0x0}
        hentry = <optimized out>
        bail_out = <optimized out>
        __func__ = "ProcessConfigFileInternal"
#11 0x0000557cbeaf2e81 in ProcessConfigFile (context=context@entry=PGC_SIGHUP) 
at ./build/../src/backend/utils/misc/guc-file.l:152
        elevel = 13
        config_cxt = 0x557cfd7040d0
        caller_cxt = 0x557cfd6fd800
#12 0x0000557cbe9001cc in SysLoggerMain (startup_data=<optimized out>, 
startup_data_len=<optimized out>) at 
./build/../src/backend/postmaster/syslogger.c:369
        time_based_rotation = false
        size_rotation_for = 0
        cur_timeout = <optimized out>
        event = {pos = 0, events = 1, fd = -1, user_data = 0x0}
        rc = <optimized out>
        logbuffer = "\000\000\300\000\034\000\000\000!2024-11-28 10:19:42.415 
UTC,,,28,,6748439f.1c,8,,2024-11-28 10:19:11 UTC,,0,LOG,00000,\"parameter 
\"\"synchronous_standby_names\"\" changed to \"\"ANY 1 
(\"\"freddie-2\"\")\"\"\",,,,,,,,,\"\",\"postmaster\",,0"...
        bytes_in_logbuffer = 0
        currentLogDir = 0x557cfd6fe6d8 "/controller/log"
        currentLogFilename = 0x557cfd6fe6f0 "postgres"
        currentLogRotationAge = 0
        now = 1732789151
        wes = 0x557cfd6fe708
        __func__ = "SysLoggerMain"
#13 0x0000557cbe8f908a in postmaster_child_launch 
(child_type=child_type@entry=B_LOGGER, startup_data=startup_data@entry=0x0, 
startup_data_len=startup_data_len@entry=0, client_sock=client_sock@entry=0x0) 
at ./build/../src/backend/postmaster/launch_backend.c:277
        pid = <optimized out>
#14 0x0000557cbe8ff9c2 in SysLogger_Start () at 
./build/../src/backend/postmaster/syslogger.c:706
        sysloggerPid = <optimized out>
        filename = 0x557cfd75a160 "og output will go to log destination 
\"csvlog\".\n"
        __func__ = "SysLogger_Start"
#15 0x0000557cbe8fe42e in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x557cfd6fd7a0) at 
./build/../src/backend/postmaster/postmaster.c:1054
        opt = <optimized out>
        status = <optimized out>
        userDoption = <optimized out>
        listen_addr_saved = false
        output_config_variable = 0x0
        __func__ = "PostmasterMain"
#16 0x0000557cbe6258ec in main (argc=3, argv=0x557cfd6fd7a0) at 
./build/../src/backend/main/main.c:197
        do_check_root = <optimized out>

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Once again, thank you and all of the developers for your hard work on
PostgreSQL.  This is by far the most pleasant management experience of
any database I've worked on."                             (Dan Harris)
http://archives.postgresql.org/pgsql-performance/2006-04/msg00247.php
>From 2d38870be86200deefb9979b9dd0a164d5986499 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvhe...@alvh.no-ip.org>
Date: Thu, 28 Nov 2024 13:11:20 +0100
Subject: [PATCH] Fix synchronized_standby_slots GUC check hook

It requires an LWLock, so it cannot run in processes without PGPROC.
Skip it there; it makes no difference.

Reported-by: Gabriele Bartolini <gabriele.bartol...@enterprisedb.com>
---
 src/backend/replication/slot.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 77bae536915..b6e43c1a89e 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -2428,18 +2428,16 @@ validate_sync_standby_slots(char *rawname, List **elemlist)
 	{
 		GUC_check_errdetail("List syntax is invalid.");
 	}
-	else if (!ReplicationSlotCtl)
+	else if (MyProc)
 	{
 		/*
-		 * We cannot validate the replication slot if the replication slots'
-		 * data has not been initialized. This is ok as we will anyway
-		 * validate the specified slot when waiting for them to catch up. See
-		 * StandbySlotsHaveCaughtup() for details.
+		 * Check that the specified slots exist and are logical slots.  Note
+		 * that because we need an LWLock, we cannot do this on processes
+		 * without a PGPROC (hello syslogger!) so we skip it there, but it
+		 * doesn't have an effect for those anyway.
+		 *
+		 * See StandbySlotsHaveCaughtup() for details.
 		 */
-	}
-	else
-	{
-		/* Check that the specified slots exist and are logical slots */
 		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
 		foreach_ptr(char, name, *elemlist)
-- 
2.39.5

Reply via email to