On 10/27/2012 2:41 PM, Heikki Linnakangas wrote:
And it's not at all
clear to me that it would perform better than full_page_writes. You're
writing and flushing out roughly the same amount of data AFAICS.

I think this assumption is wrong. full_page_writes=on means we write the full page content to WAL on first change after a checkpoint. A change after a checkpoint logically means that the same page is dirty now and must also be written latest during the next checkpoint, which means 16K written minimum for every page changed after a checkpoint.

What exactly is the problem with full_page_writes that we're trying to
solve?

Full page writes are meant to guard against torn pages. That is the case when an 8K page is written by the underlying OS/filesystem/HW in smaller chunks (for example 512 byte sectors), and in the case of a crash some of these chunks make it, others don't. Without full_page_writes, crash recovery can work if all 8K made it, or nothing made it (aka atomic writes). But it will fail otherwise.

The amount of WAL generated with full_page_writes=on is quite substantial. For pgbench for example the ratio 20:1. Meaning with full_page_writes you write 20x the amount you do without.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to