Peter Eisentraut <[email protected]> writes:
> I took this patch for a quick spin on macOS. The result was that the
> test suite hangs in the test src/test/recovery/t/017_shm.pl. I didn't
> see any mentions of this anywhere in the thread, but that test is newer
> than the beginning of this thread. Can anyone confirm or deny this
> issue? Is it specific to macOS perhaps?
Yeah, I duplicated the problem in macOS Catalina (10.15.2), using today's
HEAD. The core regression tests pass, as do the earlier recovery tests
(I didn't try a full check-world though). Somewhere early in 017_shm.pl,
things freeze up with four postmaster-child processes stuck in 100%-
CPU-consuming loops. I captured stack traces:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007fff6554dbb6 libsystem_kernel.dylib`kqueue + 10
frame #1: 0x0000000105511533
postgres`CreateWaitEventSet(context=<unavailable>, nevents=<unavailable>) at
latch.c:622:19 [opt]
frame #2: 0x0000000105511305
postgres`WaitLatchOrSocket(latch=0x0000000112e02da4, wakeEvents=41, sock=-1,
timeout=237000, wait_event_info=83886084) at latch.c:389:22 [opt]
frame #3: 0x00000001054a7073 postgres`CheckpointerMain at
checkpointer.c:514:10 [opt]
frame #4: 0x00000001052da390 postgres`AuxiliaryProcessMain(argc=2,
argv=0x00007ffeea9dded0) at bootstrap.c:461:4 [opt]
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007fff6554dbce libsystem_kernel.dylib`kevent + 10
frame #1: 0x0000000105511ddc
postgres`WaitEventAdjustKqueue(set=0x00007fc8e8805920,
event=0x00007fc8e8805958, old_events=<unavailable>) at latch.c:1034:7 [opt]
frame #2: 0x0000000105511638 postgres`AddWaitEventToSet(set=<unavailable>,
events=<unavailable>, fd=<unavailable>, latch=<unavailable>,
user_data=<unavailable>) at latch.c:778:2 [opt]
frame #3: 0x0000000105511342
postgres`WaitLatchOrSocket(latch=0x0000000112e030f4, wakeEvents=41, sock=-1,
timeout=200, wait_event_info=83886083) at latch.c:397:3 [opt]
frame #4: 0x00000001054a6d69 postgres`BackgroundWriterMain at
bgwriter.c:304:8 [opt]
frame #5: 0x00000001052da38b postgres`AuxiliaryProcessMain(argc=2,
argv=0x00007ffeea9dded0) at bootstrap.c:456:4 [opt]
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007fff65549c66 libsystem_kernel.dylib`close + 10
frame #1: 0x0000000105511466 postgres`WaitLatchOrSocket [inlined]
FreeWaitEventSet(set=<unavailable>) at latch.c:660:2 [opt]
frame #2: 0x000000010551145d
postgres`WaitLatchOrSocket(latch=0x0000000112e03444, wakeEvents=<unavailable>,
sock=-1, timeout=5000, wait_event_info=83886093) at latch.c:432 [opt]
frame #3: 0x00000001054b8685 postgres`WalWriterMain at walwriter.c:256:10
[opt]
frame #4: 0x00000001052da39a postgres`AuxiliaryProcessMain(argc=2,
argv=0x00007ffeea9dded0) at bootstrap.c:467:4 [opt]
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007fff655515be libsystem_kernel.dylib`__select + 10
frame #1: 0x00000001056a6191 postgres`pg_usleep(microsec=<unavailable>) at
pgsleep.c:56:10 [opt]
frame #2: 0x00000001054abe12 postgres`backend_read_statsfile at
pgstat.c:5720:3 [opt]
frame #3: 0x00000001054adcc0
postgres`pgstat_fetch_stat_dbentry(dbid=<unavailable>) at pgstat.c:2431:2 [opt]
frame #4: 0x00000001054a320c postgres`do_start_worker at
autovacuum.c:1248:20 [opt]
frame #5: 0x00000001054a2639 postgres`AutoVacLauncherMain [inlined]
launch_worker(now=632853327674576) at autovacuum.c:1357:9 [opt]
frame #6: 0x00000001054a2634
postgres`AutoVacLauncherMain(argc=<unavailable>, argv=<unavailable>) at
autovacuum.c:769 [opt]
frame #7: 0x00000001054a1ea7 postgres`StartAutoVacLauncher at
autovacuum.c:415:4 [opt]
I'm not sure how much faith to put in the last couple of those, as
stopping the earlier processes could perhaps have had side-effects.
But evidently 017_shm.pl is doing something that interferes with
our ability to create kqueue-based WaitEventSets.
regards, tom lane