Hello, Gabriele just reported a crash when changing synchronized_standby_slots under SIGHUP and logging collector working. The problem apparently is that validate_sync_standby_slots is run for the GUC check routine, and it requires acquiring an LWLock; but syslogger doesn't have a PGPROC so that doesn't work.
Apparently we already have a hack in place against the same problem in postmaster (and elsewhere?), by testing that ReplicationSlotCtl is not null. But that hack seems to be incomplete, as the crash here attests. To reproduce, simply start with no synchronized_standby_slots setting and logging_collector=on, then change the value in postgresql.conf and reload. One option to fix this might be as attached. The point here is that processes without a PGPROC don't need or care to have a correct setting, so we just skip verifying it on those. AFAICS this is more general than the test for ReplicationSlotCtl, so I just removed that one. Here's the backtrace Gabriele showed me. [New LWP 29] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `postgres: freddie: logger '. Program terminated with signal SIGSEGV, Segmentation fault. #0 pg_atomic_read_u32_impl (ptr=0x7f0cf21a1f84) at ./build/../src/include/port/atomics/generic.h:48 Download failed: Invalid argument. Continuing without source file ./build/src/backend/storage/lmgr/./build/../src/include/port/atomics/generic.h. 48 ./build/../src/include/port/atomics/generic.h: No such file or directory. #0 pg_atomic_read_u32_impl (ptr=0x7f0cf21a1f84) at ./build/../src/include/port/atomics/generic.h:48 No locals. #1 pg_atomic_read_u32 (ptr=0x7f0cf21a1f84) at ./build/../src/include/port/atomics.h:237 No locals. #2 LWLockAttemptLock (lock=lock@entry=0x7f0cf21a1f80, mode=mode@entry=LW_SHARED) at ./build/../src/backend/storage/lmgr/lwlock.c:796 old_state = <optimized out> #3 0x0000557cbe97ba60 in LWLockAcquire (lock=0x7f0cf21a1f80, mode=mode@entry=LW_SHARED) at ./build/../src/backend/storage/lmgr/lwlock.c:1235 mustwait = <optimized out> proc = 0x0 result = true extraWaits = 0 __func__ = "LWLockAcquire" #4 0x0000557cbe935919 in validate_sync_standby_slots (elemlist=0x7ffda5aa60d0, rawname=0x557cfd704c98 "_cnpg_freddie_2") at ./build/../src/backend/replication/slot.c:2443 ok = true ok = <optimized out> name = <optimized out> name__outerloop = <optimized out> name__state = {l = <optimized out>, i = <optimized out>} slot = <optimized out> #5 check_synchronized_standby_slots (newval=<optimized out>, extra=0x7ffda5aa6300, source=<optimized out>) at ./build/../src/backend/replication/slot.c:2494 rawname = 0x557cfd704c98 "_cnpg_freddie_2" ptr = <optimized out> elemlist = 0x557cfd7041c8 size = <optimized out> ok = <optimized out> config = <optimized out> #6 0x0000557cbeae95de in call_string_check_hook (conf=conf@entry=0x557cbee2f8b0 <ConfigureNamesString+14000>, newval=newval@entry=0x7ffda5aa62f8, extra=extra@entry=0x7ffda5aa6300, source=source@entry=PGC_S_FILE, elevel=elevel@entry=12) at ./build/../src/backend/utils/misc/guc.c:6925 _save_exception_stack = 0x0 _save_context_stack = 0x0 _local_sigjmp_buf = {{__jmpbuf = {93994266851504, -6990900416883569069, 140727382860536, 12, 93994266851504, 93995316414744, -6990900416982135213, -3748856269321122221}, __mask_was_saved = 0, __saved_mask = {__val = {140727382860536, 140727382860272, 93995316305072, 139693771467072, 93994263427300, 4662694561863172096, 93994263387537, 140727382860536, 93994263505961, 93994266851504, 140727382860536, 140727382860336, 93994263421816, 140727382860536, 12, 93994266851504}}}} _do_rethrow = false result = true __func__ = "call_string_check_hook" #7 0x0000557cbeaec1ba in parse_and_validate_value (record=0x557cbee2f8b0 <ConfigureNamesString+14000>, name=0x557cfd765840 "synchronized_standby_slots", value=<optimized out>, source=PGC_S_FILE, elevel=12, newval=0x7ffda5aa62f8, newextra=0x7ffda5aa6300) at ./build/../src/backend/utils/misc/guc.c:3260 conf = 0x557cbee2f8b0 <ConfigureNamesString+14000> __func__ = "parse_and_validate_value" #8 0x0000557cbeaecd6f in set_config_with_handle (name=0x557cfd765840 "synchronized_standby_slots", handle=<optimized out>, value=0x557cfd765868 "_cnpg_freddie_2", context=PGC_SIGHUP, source=PGC_S_FILE, srole=10, action=GUC_ACTION_SET, changeVal=<optimized out>, elevel=12, is_reload=false) at ./build/../src/backend/utils/misc/guc.c:4013 conf = 0x557cbee2f8b0 <ConfigureNamesString+14000> orig_context = PGC_SIGHUP orig_source = PGC_S_FILE orig_srole = 10 record = 0x557cbee2f8b0 <ConfigureNamesString+14000> newval_union = {boolval = 24, intval = -42858216, realval = 4.6439856710502817e-310, stringval = 0x557cfd720918 "_cnpg_freddie_2", enumval = -42858216} newextra = 0x0 prohibitValueChange = false makeDefault = <optimized out> __func__ = "set_config_with_handle" #9 0x0000557cbeaef400 in set_config_option (is_reload=false, elevel=0, changeVal=true, action=GUC_ACTION_SET, source=PGC_S_FILE, context=PGC_SIGHUP, value=<optimized out>, name=<optimized out>) at ./build/../src/backend/utils/misc/guc.c:3363 srole = 10 srole = <optimized out> #10 ProcessConfigFileInternal (context=context@entry=PGC_SIGHUP, applySettings=applySettings@entry=true, elevel=elevel@entry=13) at ./build/../src/backend/utils/misc/guc.c:557 pre_value = 0x0 scres = <optimized out> error = false applying = true ConfFileWithError = 0x557cfd720100 "/var/lib/postgresql/data/pgdata/postgresql.conf" item = 0x557cfd765880 head = 0x557cfd704248 tail = 0x557cfd7664c0 status = {hashp = 0x557cfd7219c8, curBucket = 512, curEntry = 0x0} hentry = <optimized out> bail_out = <optimized out> __func__ = "ProcessConfigFileInternal" #11 0x0000557cbeaf2e81 in ProcessConfigFile (context=context@entry=PGC_SIGHUP) at ./build/../src/backend/utils/misc/guc-file.l:152 elevel = 13 config_cxt = 0x557cfd7040d0 caller_cxt = 0x557cfd6fd800 #12 0x0000557cbe9001cc in SysLoggerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ./build/../src/backend/postmaster/syslogger.c:369 time_based_rotation = false size_rotation_for = 0 cur_timeout = <optimized out> event = {pos = 0, events = 1, fd = -1, user_data = 0x0} rc = <optimized out> logbuffer = "\000\000\300\000\034\000\000\000!2024-11-28 10:19:42.415 UTC,,,28,,6748439f.1c,8,,2024-11-28 10:19:11 UTC,,0,LOG,00000,\"parameter \"\"synchronous_standby_names\"\" changed to \"\"ANY 1 (\"\"freddie-2\"\")\"\"\",,,,,,,,,\"\",\"postmaster\",,0"... bytes_in_logbuffer = 0 currentLogDir = 0x557cfd6fe6d8 "/controller/log" currentLogFilename = 0x557cfd6fe6f0 "postgres" currentLogRotationAge = 0 now = 1732789151 wes = 0x557cfd6fe708 __func__ = "SysLoggerMain" #13 0x0000557cbe8f908a in postmaster_child_launch (child_type=child_type@entry=B_LOGGER, startup_data=startup_data@entry=0x0, startup_data_len=startup_data_len@entry=0, client_sock=client_sock@entry=0x0) at ./build/../src/backend/postmaster/launch_backend.c:277 pid = <optimized out> #14 0x0000557cbe8ff9c2 in SysLogger_Start () at ./build/../src/backend/postmaster/syslogger.c:706 sysloggerPid = <optimized out> filename = 0x557cfd75a160 "og output will go to log destination \"csvlog\".\n" __func__ = "SysLogger_Start" #15 0x0000557cbe8fe42e in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x557cfd6fd7a0) at ./build/../src/backend/postmaster/postmaster.c:1054 opt = <optimized out> status = <optimized out> userDoption = <optimized out> listen_addr_saved = false output_config_variable = 0x0 __func__ = "PostmasterMain" #16 0x0000557cbe6258ec in main (argc=3, argv=0x557cfd6fd7a0) at ./build/../src/backend/main/main.c:197 do_check_root = <optimized out> -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Once again, thank you and all of the developers for your hard work on PostgreSQL. This is by far the most pleasant management experience of any database I've worked on." (Dan Harris) http://archives.postgresql.org/pgsql-performance/2006-04/msg00247.php
>From 2d38870be86200deefb9979b9dd0a164d5986499 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvhe...@alvh.no-ip.org> Date: Thu, 28 Nov 2024 13:11:20 +0100 Subject: [PATCH] Fix synchronized_standby_slots GUC check hook It requires an LWLock, so it cannot run in processes without PGPROC. Skip it there; it makes no difference. Reported-by: Gabriele Bartolini <gabriele.bartol...@enterprisedb.com> --- src/backend/replication/slot.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 77bae536915..b6e43c1a89e 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -2428,18 +2428,16 @@ validate_sync_standby_slots(char *rawname, List **elemlist) { GUC_check_errdetail("List syntax is invalid."); } - else if (!ReplicationSlotCtl) + else if (MyProc) { /* - * We cannot validate the replication slot if the replication slots' - * data has not been initialized. This is ok as we will anyway - * validate the specified slot when waiting for them to catch up. See - * StandbySlotsHaveCaughtup() for details. + * Check that the specified slots exist and are logical slots. Note + * that because we need an LWLock, we cannot do this on processes + * without a PGPROC (hello syslogger!) so we skip it there, but it + * doesn't have an effect for those anyway. + * + * See StandbySlotsHaveCaughtup() for details. */ - } - else - { - /* Check that the specified slots exist and are logical slots */ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); foreach_ptr(char, name, *elemlist) -- 2.39.5