Hello, this is bug report and a patch for it. The first patch in the attachments is for 9.2dev and next one is for 9.1.3.
On the current 9.2dev, IsCheckpointOnSchedule@checkpointer.c does not check against WAL segments written. This makes checkpointer always run at the speed according to checkpoint_timeout regardless of WAL advancing rate. This leads to unexpected imbalance in the numbers of WAL segment files between the master and the standby(s) for high advance rate of WALs. And what is worse, the master would have much higher chance to remove some WAL segments before the standby receives them. XLogPageRead()@xlog.c triggers checkpoint referring to WAL segment advance. So I think this is a bug of bgwriter in 9.1. The attached patches fix that on 9.2dev and 9.1.3 respctively. In the backported version to 9.1.3, bgwriter.c is modified instead of checkpointer.c in 9.2. And GetWalRcvWriteRecPtr() is used as the equivalent of GetStandbyFlushRecPtr() in 9.2. By the way, GetStandbyFlushRecPtr() acquires spin lock within. It might be enough to read XLogCtl->recoveryLastRecPtr without lock to make rough estimation, but I can't tell it is safe or not. Same discussion could be for GetWalRcvWriteRecPtr() on 9.1.3. However, it seems to work fine on a simple test. regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012.
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index c9473f7..f86e9b9 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -491,8 +491,8 @@ CheckpointerMain(void) * Initialize checkpointer-private variables used during checkpoint. */ ckpt_active = true; - if (!do_restartpoint) - ckpt_start_recptr = GetInsertRecPtr(); + ckpt_start_recptr = + do_restartpoint ? GetStandbyFlushRecPtr() : GetInsertRecPtr(); ckpt_start_time = now; ckpt_cached_elapsed = 0; @@ -731,28 +731,29 @@ IsCheckpointOnSchedule(double progress) return false; /* - * Check progress against WAL segments written and checkpoint_segments. + * Check progress against WAL segments written, or replayed for + * hot standby, and checkpoint_segments. * * We compare the current WAL insert location against the location - * computed before calling CreateCheckPoint. The code in XLogInsert that - * actually triggers a checkpoint when checkpoint_segments is exceeded - * compares against RedoRecptr, so this is not completely accurate. - * However, it's good enough for our purposes, we're only calculating an - * estimate anyway. + * computed before calling CreateCheckPoint. The code in + * XLogInsert that actually triggers a checkpoint when + * checkpoint_segments is exceeded compares against RedoRecptr. + * Similarly, we consult WAL flush location instead on hot + * standbys and XLogPageRead compares it aganst RedoRecptr, too. + * Altough these are not completely accurate, it's good enough for + * our purposes, we're only calculating an estimate anyway. */ - if (!RecoveryInProgress()) + + recptr = RecoveryInProgress() ? GetStandbyFlushRecPtr() : GetInsertRecPtr(); + elapsed_xlogs = + (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile + + ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) / + CheckPointSegments; + + if (progress < elapsed_xlogs) { - recptr = GetInsertRecPtr(); - elapsed_xlogs = - (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile + - ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) / - CheckPointSegments; - - if (progress < elapsed_xlogs) - { - ckpt_cached_elapsed = elapsed_xlogs; - return false; - } + ckpt_cached_elapsed = elapsed_xlogs; + return false; } /*
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c index 5643ec8..0ce9945 100644 --- a/src/backend/postmaster/bgwriter.c +++ b/src/backend/postmaster/bgwriter.c @@ -56,6 +56,7 @@ #include "pgstat.h" #include "postmaster/bgwriter.h" #include "replication/syncrep.h" +#include "replication/walreceiver.h" #include "storage/bufmgr.h" #include "storage/fd.h" #include "storage/ipc.h" @@ -489,8 +490,8 @@ BackgroundWriterMain(void) * Initialize bgwriter-private variables used during checkpoint. */ ckpt_active = true; - if (!do_restartpoint) - ckpt_start_recptr = GetInsertRecPtr(); + ckpt_start_recptr = do_restartpoint ? + GetWalRcvWriteRecPtr(NULL) : GetInsertRecPtr(); ckpt_start_time = now; ckpt_cached_elapsed = 0; @@ -764,30 +765,32 @@ IsCheckpointOnSchedule(double progress) return false; /* - * Check progress against WAL segments written and checkpoint_segments. + * Check progress against WAL segments written, or replayed for + * hot standby, and checkpoint_segments. * * We compare the current WAL insert location against the location - * computed before calling CreateCheckPoint. The code in XLogInsert that - * actually triggers a checkpoint when checkpoint_segments is exceeded - * compares against RedoRecptr, so this is not completely accurate. - * However, it's good enough for our purposes, we're only calculating an - * estimate anyway. + * computed before calling CreateCheckPoint. The code in + * XLogInsert that actually triggers a checkpoint when + * checkpoint_segments is exceeded compares against RedoRecptr. + * Similarly, we consult WAL write location instead on hot + * standbys and XLogPageRead compares it aganst RedoRecptr, too. + * Altough these are not completely accurate, it's good enough for + * our purposes, we're only calculating an estimate anyway. */ - if (!RecoveryInProgress()) + + recptr = + RecoveryInProgress() ? GetWalRcvWriteRecPtr(NULL) : GetInsertRecPtr(); + elapsed_xlogs = + (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile + + ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) / + CheckPointSegments; + + if (progress < elapsed_xlogs) { - recptr = GetInsertRecPtr(); - elapsed_xlogs = - (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile + - ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) / - CheckPointSegments; - - if (progress < elapsed_xlogs) - { - ckpt_cached_elapsed = elapsed_xlogs; - return false; - } + ckpt_cached_elapsed = elapsed_xlogs; + return false; } - + /* * Check progress against time elapsed and checkpoint_timeout. */
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers