On 10/27/2012 2:41 PM, Heikki Linnakangas wrote:
And it's not at all clear to me that it would perform better than full_page_writes. You're writing and flushing out roughly the same amount of data AFAICS.
I think this assumption is wrong. full_page_writes=on means we write the full page content to WAL on first change after a checkpoint. A change after a checkpoint logically means that the same page is dirty now and must also be written latest during the next checkpoint, which means 16K written minimum for every page changed after a checkpoint.
What exactly is the problem with full_page_writes that we're trying to solve?
Full page writes are meant to guard against torn pages. That is the case when an 8K page is written by the underlying OS/filesystem/HW in smaller chunks (for example 512 byte sectors), and in the case of a crash some of these chunks make it, others don't. Without full_page_writes, crash recovery can work if all 8K made it, or nothing made it (aka atomic writes). But it will fail otherwise.
The amount of WAL generated with full_page_writes=on is quite substantial. For pgbench for example the ratio 20:1. Meaning with full_page_writes you write 20x the amount you do without.
Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers