Re: [PERFORM] Replaying 48 WAL files takes 80 minutes

Heikki Linnakangas Tue, 30 Oct 2012 03:08:25 -0700

On 30.10.2012 10:50, Albe Laurenz wrote:

Why does WAL replay read much more than it writes?
I thought that pretty much every block read during WAL
replay would also get dirtied and hence written out.

Not necessarily. If a block is modified and written out of the buffercache before next checkpoint, the latest version of the block is alreadyon disk. On replay, the redo routine reads the block, sees that thechange was applied, and does nothing.

I wonder why the performance is good in the first few seconds.
Why should exactly the pages that I need in the beginning
happen to be in cache?

This is probably because of full_page_writes=on. When replay has a fullpage image of a block, it doesn't need to read the old contents fromdisk. It can just blindly write the image to disk. Writing a block todisk also puts that block in the OS cache, so this also efficientlywarms the cache from the WAL. Hence in the beginning of replay, you justwrite a lot of full page images to the OS cache, which is fast, and youonly start reading from disk after you've filled up the OS cache. Ifthis theory is true, you should see a pattern in the I/O stats, where inthe first seconds there is no I/O, but the CPU is 100% busy while itreads from WAL and writes out the pages to the OS cache. After the OScache fills up with the dirty pages (up to dirty_ratio, on Linux), youwill start to see a lot of writes. As the replay progresses, you willsee more and more reads, as you start to get cache misses.


- Heikki


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Replaying 48 WAL files takes 80 minutes

Reply via email to