On Thu, Dec 15, 2022 at 02:47:21PM -0800, Nathan Bossart wrote: > I tried setting wal_retrieve_retry_interval to 1ms for all TAP tests > (similar to what was done in 2710ccd), and I noticed that the recovery > tests consistently took much longer. Upon further inspection, it looks > like the same (or a very similar) race condition described in e5d494d's > commit message [0]. With some added debug logs, I see that all of the > callers of MaybeStartWalReceiver() complete before SIGCHLD is processed, so > ServerLoop() waits for a minute before starting the WAL receiver. > > A simple fix is to have DetermineSleepTime() take the WalReceiverRequested > flag into consideration. The attached 0002 patch shortens the sleep time > to 100ms if it looks like we are waiting on a SIGCHLD. I'm not certain > this is the best approach, but it seems to fix the tests.
This seems to have somehow broken the archiving tests on Windows, so obviously I owe some better analysis here. I didn't see anything obvious in the logs, but I will continue to dig. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com