On 17/09/2020 12:48, Thomas Munro wrote:
Hello,

In commits 9f095299 and f98b8476 we improved recovery performance on
Linux and FreeBSD but we didn't help other operating systems.  David
just confirmed for me that commenting out the PostmasterIsAlive() call
in the main recovery loop speeds up crash recovery considerably on his
Windows system: 93s -> 70s or 1.32x faster.

Nice speedup!

So I think we should do
something like what Heikki originally proposed to lower the frequency
of checks, on systems where we don't have the ability to skip the
check completely.  Please see attached.

If you put the counter in HandleStartupProcInterrupts(), it could be a long wait if the startup process is e.g. waiting for WAL to arrive in the loop in WaitForWALToBecomeAvailable(), or in recoveryPausesHere(). My original patch only reduced the frequency in the WAL redo loop, when you're actively replaying records.

We could probably do better on Windows. Maybe the signal handler thread could wait on the PostmasterHandle at the same time that it waits on the signal pipe, and set postmaster_possibly_dead. But I'm not going to work on that, and it would only help on Windows, so I'm OK with just adding the counter.

- Heikki


Reply via email to