* Andres Freund (and...@anarazel.de) wrote: > On 2017-01-19 10:06:09 -0500, Stephen Frost wrote: > > WAL replay does do more work, generally speaking (the WAL has to be > > read, the checksum validated on it, and then the write has to go out, > > while the checkpointer just writes the page out from memory), but it's > > also dealing with less contention on the system (there aren't a bunch of > > backends hammering the disks to pull data in with reads when you're > > doing crash recovery...). > > There's a huge difference though: WAL replay is single threaded, whereas > generating WAL is not.
I'm aware- but *checkpointing* is still single-threaded, unless, as I mentioned, you end up with backends pushing out their own changes to the heap to make room for new pages to come in. Or is there some other way the checkpoint ends up being performed with multiple processes? > Especially if there's synchronous IO required > (most commonly reading in data, because more data was modified in the > current checkpointthan fit in shared buffers, so FPIs don't pre-fill > buffers), you can be significantly slower than generating the WAL. That is an interesting point, if I'm following what you're saying correctly- during the replay we can end up having more pages modified than fit in shared buffers, which means that we have to read back in pages that we pushed out to implement the non-FPI WAL changes to that page. I wonder if we should have a way to configure the amount of memory allowed to be used for WAL replay, independent of shared_buffers? I mean, really, during crash recovery on a dedicated database box, you'd probably want to say "ALL the memory can be used if it makes crash recovery faster!". That said, I wonder if our eviction algorithm could be improved/changed when performing WAL replay too to reduce the chances that we'll have to read a page back in. Very interesting. Thanks! Stephen
Description: Digital signature