Hi, (adding Samuel to CC)
On Sun, Oct 12, 2025 at 01:42:30PM +1300, Thomas Munro wrote: > On Sun, Oct 12, 2025 at 1:00 AM Alexander Lakhin <[email protected]> wrote: > > !!!wrapper_handler[1988]| postgres_signal_arg: 30, PG_NSIG: 33 > > !!!wrapper_handler[1989]| postgres_signal_arg: 30, PG_NSIG: 33 > > !!!wrapper_handler[3284]| postgres_signal_arg: 14, PG_NSIG: 33 > > !!!wrapper_handler[3284]| postgres_signal_arg: 28476608, PG_NSIG: 33 > > TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", > > Line: 94, PID: 3284 Those issues are really quite rare at least on fruitcrow (but Alexander has better introspection into this I think), but the test_shm_mq hangs are frequent enough (most/all of the timeouts on https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=fruitcrow&br=master and other branches) that I was trying to get a handle on them. What was really irritating was that gdb didn't show anything much for backends (contrary to the postmaster process), like: |#0 0x00000001023cdaec in ?? () from /lib/ld-x86-64.so.1 |#1 0x00000001023ce1ae in ?? () from /lib/ld-x86-64.so.1 |#2 0x00000001023f6db8 in ?? () |#3 0x0000000000000000 in ?? () so debugging this was difficult. I managed to get Samuel Thibault to look at it at FOSDEM last weekend and he immediately figured out that not enough information was copied on fork() and committed a fix to glibc[1]. With this, the backtrace for the hanging test_shm_mq background worker looks like this: (from the Mach debugger:) |switch_context(...)+0xbb |thread_invoke(...)+0xe3 |thread_block(...)+0x4b |gsync_wait(...)+0x1d8 |_Xgsync_wait(...)+0x73 |ipc_kobject_server(...)+0xa5 |mach_msg_trap(...)+0x8b0 |syscall64(...)+0xdd |>>>>> user space <<<<< (from gdb:) |(gdb) bt |#0 0x00000001023cdc6c in __GI___mach_msg_trap () | at ./build-tree/hurd-amd64-libc/mach/mach_msg_trap.S:2 |#1 0x00000001023ce32e in __GI___mach_msg (msg=msg@entry=0x101cce1c0, option=option@entry=3, | send_size=send_size@entry=112, rcv_size=rcv_size@entry=48, rcv_name=18, timeout=timeout@entry=0, | notify=0) at ./mach/msg.c:111 |#2 0x0000000102643579 in __gsync_wait (task=<optimized out>, addr=addr@entry=35184372089520, | val1=val1@entry=2, val2=val2@entry=0, msec=msec@entry=0, flags=flags@entry=0) | at ./build-tree/hurd-amd64-libc/mach/RPC_gsync_wait.c:187 |#3 0x0000000101d04bee in __GI___pthread_mutex_lock (mtxp=mtxp@entry=0x2000000002b0) | at ../sysdeps/mach/hurd/htl/pt-mutex-lock.c:37 |#4 0x0000000101d08944 in __pthread_enable_asynccancel () at ./htl/cancellation.c:28 |#5 0x000000010250b065 in __GI___libc_write (fd=4, buf=0x101cce2fb, nbytes=1) | at ../sysdeps/mach/hurd/write.c:25 |#6 0x000000010063b6c4 in sendSelfPipeByte () at waiteventset.c:1925 |#7 0x000000010063b69c in latch_sigurg_handler (postgres_signal_arg=16) at waiteventset.c:1914 |#8 0x00000001008b9c6c in wrapper_handler (postgres_signal_arg=16) at pqsignal.c:110 |#9 <signal handler called> |#10 0x0000000101d04795 in __pthread_testcancel () at ./htl/pt-testcancel.c:31 |#11 0x0000000101d0895c in __pthread_enable_asynccancel () at ./htl/cancellation.c:33 |#12 0x0000000102505a65 in __GI___libc_read (fd=3, buf=0x101ccec00, nbytes=1) |#13 0x000000010063b729 in drain () at waiteventset.c:1974 |#14 0x000000010063b4b4 in WaitEventSetWaitBlock (set=0x200000015418, cur_timeout=-1, | occurred_events=0x101ccf0f0, nevents=1) at waiteventset.c:1534 |#15 0x000000010063b268 in WaitEventSetWait (set=0x200000015418, timeout=-1, | occurred_events=0x101ccf0f0, nevents=1, wait_event_info=134217764) at waiteventset.c:1140 |#16 0x000000010062c9e6 in WaitLatch (latch=0x10df221a4, wakeEvents=33, timeout=0, |#17 0x0000000100635b32 in shm_mq_send_bytes (mqh=0x200000051ab8, nbytes=130, data=0x200000049fd8, |#18 0x0000000100635021 in shm_mq_sendv (mqh=0x200000051ab8, iov=0x101ccf250, iovcnt=1, nowait=false, |#19 0x0000000100634c95 in shm_mq_send (mqh=0x200000051ab8, nbytes=130, data=0x200000049fd8, |#20 0x000000010e8aa72d in copy_messages (inqh=0x200000051b40, outqh=0x200000051ab8) at worker.c:200 |#21 test_shm_mq_main (main_arg=<optimized out>) at worker.c:141 [...] So there's a signal handler and pthread cancellation involved. Having a backtrace enabled Samuel to push a fix to glibc[2] that adds a critical section to the cancel lock. Before, Postgres would hang in a few ten thousand (or at most a few hundred thousand) iterations when running something like this (50000 iterations overall in this case), adopted from src/test/modules/test_shm_mq/sql/test_shm_mq.sql: |postgres=# CREATE EXTENSION test_shm_mq; -- once |postgres=# SELECT test_shm_mq(100, (select string_agg(chr(32+(random()*95)::int), '') from generate_series(1,(100+200*random())::int)), 50000, 1); With the glibc patch from [2] applied, I have been running the above SQL for 5 million iterations without a hang just now. So those semi-frequent hangs on fruitcrow should be fixed now, yay! > Hmm. We only install the handler for real signal numbers, and it > clearly managed to find the handler, so then how did it corrupt signo > before calling the function? I wonder if there could concurrency bugs > reached by our perhaps unusually large amount of signaling (we have > found bugs in the signal implementations of several other OSes...). > This might be the code: > > https://github.com/bminor/glibc/blob/master/hurd/hurdsig.c#L639 > > It appears to suspend the thread selected to handle the signal, mess > with its stack/context and then resume it, just like traditional > monokernels, it's just done in user space by code running in a helper > thread that communicates over Mach ports. So it looks like I > misunderstood that comment in the docs, it's not the handler itself > that runs in a different thread, unless I'm looking at the wrong code > (?). > > Some random thoughts after skim-reading that and > glibc/sysdeps/mach/hurd/x86/trampoline.c: > * I wonder if setting up sigaltstack() and then using SA_ONSTACK in > pqsignal() would behave differently, though SysV AMD64 calling > conventions (used by Hurd IIGC) have the first argument in %rdi, not > the stack, so I don't really expect that to be relevant... Haven't tried this so far. > * I wonder about the special code paths for handlers that were already > running and happened to be in sigreturn(), or something like that, > which I didn't study at all, but it occurred to me that our pqsignal > will only block the signal itself while running a handler (since it > doesn't specify SA_NODEFER)... so what happens if you block all > signals while running each handler by changing > sigemptyset(&act.sa_mask) to sigfillset(&act.sa_mask)? I tried that before and it seemed to help a bit, but the hangs are pretty random and certainly still happen from time to time with just this change. > * I see special code paths for threads that were in (its notion of) > critical sections, which must be rare, but it looks like that just > leave it pending which seems reasonable > * I see special code paths for SIGIO and SIGURG that I didn't try to > understand, but I wonder what would happen if we s/SIGURG/SIGXCPU/ I did that as well, does not seem to change much, either. [...] > Thinking of other maybe-slightly-unusual things in the signal > processing area that have been problematic in a couple of other OSes > (ie systems that added emulations of Linux system calls), I wondered > about epoll and signalfd, but it doesn't have those either, so it must > be using plain old poll() with the widely used self-pipe trick for > latches, and that doesn't seem likely to be new or buggy code. Yeah, it uses that code path. Michael [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=9513e9e5a45fd1c6165c115f43f103f93e7a7faa [2] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=c4a81e882e607a34d0c26caf279c98398e9c1e4d
