On Wed, Oct 5, 2011 at 6:19 AM, Fujii Masao <masao.fu...@gmail.com> wrote:
> While the system is idle, we skip duplicate checkpoints for some > reasons. But when wal_level is set to hot_standby, I found that > checkpoints are wrongly duplicated even while the system is idle. > The cause is that XLOG_RUNNING_XACTS WAL record always > follows CHECKPOINT one when wal_level is set to hot_standby. > So the subsequent checkpoint wrongly thinks that there is inserted > record (i.e., XLOG_RUNNING_XACTS record) since the start of the > last checkpoint, the system is not idle, and this checkpoint cannot > be skipped. Is this intentional behavior? Or a bug? I think it is avoidable behaviour, but not a bug. Thinking some more about this, IMHO it is possible to improve the situation greatly by returning to look at the true purpose of checkpoints. Checkpoints exist to minimise the time taken during crash recovery, and as starting points for backups/archive recoveries. The current idea is that if there has been no activity then we skip checkpoint. But all it takes is a single WAL record and off we go with another checkpoint. If there hasn't been much WAL activity, there is not much point in having another checkpoint record since there is little if any time to be saved in recovery. So why not avoid checkpoints until we have written at least 1 WAL file worth of data? That way checkpoint records are always in different files, so we are safer with regard to primary and secondary checkpoint records. That would mean in some cases that dirty data would stay in shared buffers for days or weeks? No, because the bgwriter would clean it - but even if it did, so what? Recovery will still be incredibly quick, which is the whole point. Testing whether we're in a different segment is easy and much simpler than trying to wriggle around trying to directly fix the problem you mention. Patch attached. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
spaced_checkpoints.v1.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers