Hi, On 2023-08-11 15:31:43 +0200, Tomas Vondra wrote: > That's an awful lot of CPU for a cluster doing essentially "nothing" > (there's no WAL to decode/replicate, etc.). It usually disappears after > a couple seconds, but sometimes it's a rather persistent state.
Ugh, that's not great. > The profile from the walsender processes looks like this: > > --99.94%--XLogSendLogical > | > |--99.23%--XLogReadRecord > | XLogReadAhead > | XLogDecodeNextRecord > | ReadPageInternal > | logical_read_xlog_page > | | > | |--97.80%--WalSndWaitForWal > | | | > | | |--68.48%--WalSndWait > > It seems to me the issue is in WalSndWait, which was reworked to use > ConditionVariableCancelSleep() in bc971f4025c. The walsenders end up > waking each other in a busy loop, until the timing changes just enough > to break the cycle. IMO ConditionVariableCancelSleep()'s behaviour of waking up additional processes can nearly be considered a bug, at least when combined with ConditionVariableBroadcast(). In that case the wakeup is never needed, and it can cause situations like this, where condition variables basically deteriorate to a busy loop. I hit this with AIO as well. I've been "solving" it by adding a ConditionVariableCancelSleepEx(), which has a only_broadcasts argument. I'm inclined to think that any code that needs that needs the forwarding behaviour is pretty much buggy. Greetings, Andres Freund