Hi, In two recent investigations in occasional test failures (019_replslot_limit.pl failures, AIO rebase) the problems are somehow tied to checkpointer.
I don't yet know if actually causally related to precisely those failures, but when running e.g. 027_stream_regress.pl, I see phases in which many backends are looping in RegisterSyncRequest() repeatedly, each time sleeping with pg_usleep(10000L). Without adding instrumentation this is completely invisible at any log level. There's no log messages, there's no wait events, nothing. ISTM, we should not have any loops around pg_usleep(). And shorter term, we shouldn't have any loops around pg_usleep() that don't emit log messages / set wait events. Therefore I propose that we "prohibit" such loops without at least a DEBUG2 elog() or so. It's just too hard to debug. The reason for the sync queue filling up in 027_stream_regress.pl is actually fairly simple: 1) The test runs with shared_buffers = 1MB, leading to a small sync queue of 128 entries. 2) CheckpointWriteDelay() does pg_usleep(100000L) ForwardSyncRequest() wakes up the checkpointer using SetLatch() if the sync queue is more than half full. But at least on linux and freebsd that doesn't actually interrupt pg_usleep() anymore (due to using signalfd / kqueue rather than a signal handler). And on all platforms the signal might arrive just before the pg_usleep() rather than during, also not causing usleep to be interrupted. If I shorten the sleep in CheckpointWriteDelay() the problem goes away. This actually reduces the time for a single run of 027_stream_regress.pl on my workstation noticably. With default sleep time it's ~32s, with shortened time it's ~27s. I suspect we need to do something about this concrete problem for 14 and master, because it's certainly worse than before on linux / freebsd. I suspect the easiest is to just convert that usleep to a WaitLatch(). That'd require adding a new enum value to WaitEventTimeout in 14. Which probably is fine? Greetings, Andres Freund