On 7/26/13 9:14 AM, didier wrote:
During recovery you have to load the log in cache first before applying WAL.
Checkpoints exist to bound recovery time after a crash. That is their only purpose. What you're suggesting moves a lot of work into the recovery path, which will slow down how long it takes to process.
More work at recovery time means someone who uses the default of checkpoint_timeout='5 minutes', expecting that crash recovery won't take very long, will discover it does take a longer time now. They'll be forced to shrink the value to get the same recovery time as they do currently. You might need to make checkpoint_timeout 3 minutes instead, if crash recovery now has all this extra work to deal with. And when the time between checkpoints drops, it will slow the fundamental efficiency of checkpoint processing down. You will end up writing out more data in the end.
The interval between checkpoints and recovery time are all related. If you let any one side of the current requirements slip, it makes the rest easier to deal with. Those are all trade-offs though, not improvements. And this particular one is already an option.
If you want less checkpoint I/O per capita and don't care about recovery time, you don't need a code change to get it. Just make checkpoint_timeout huge. A lot of checkpoint I/O issues go away if you only do a checkpoint per hour, because instead of random writes you're getting sequential ones to the WAL. But when you crash, expect to be down for a significant chunk of an hour, as you go back to sort out all of the work postponed before.
-- Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers