On Tue, Dec 13, 2022 at 04:41:05PM -0800, Nathan Bossart wrote: > On Tue, Dec 13, 2022 at 07:20:14PM -0500, Tom Lane wrote: >> I certainly don't think that "wake the apply launcher every 1ms" >> is a sane configuration. Unless I'm missing something basic about >> its responsibilities, it should seldom need to wake at all in >> normal operation. > > This parameter appears to control how often the apply launcher starts new > workers. If it starts new workers in a loop iteration, it updates its > last_start_time variable, and it won't start any more workers until another > wal_retrieve_retry_interval has elapsed. If no new workers need to be > started, it only wakes up every 3 minutes.
Looking closer, I see that wal_retrieve_retry_interval is used for three purposes. It's main purpose seems to be preventing busy-waiting in WaitForWALToBecomeAvailable(), as that's what's documented. But it's also used for logical replication. The apply launcher uses it as I've describe above, and the apply workers use it when launching sync workers. Unlike the apply launcher, the apply workers store the last start time for each table's sync worker and use that to determine whether to start a new one. My first thought is that the latter two uses should be moved to a new parameter, and the apply launcher should store the last start time for each apply worker like the apply workers do for the table-sync workers. In any case, it probably makes sense to lower this parameter's value for testing so that tests that restart these workers frequently aren't waiting for so long. I can put a patch together if this seems like a reasonable direction to go. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com