Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Hello, On Sun, May 13, 2012 at 10:38 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: I've overlooked that startup process of the standby reads archives first, and then WAL. But the current patch enables progress governing based on checkpoint_segments during archive recovery on the standby. I forcused on WalRcvInProgress again to achieve this. The problem on using it was its intermittent nature. But using IsStandbyMode to avoid that turned out to lead another problem. Now, the introduced function WalRcvStarted() uses WalRcvInProgress to inform the caller if wal receiver has been started regardless of current status. We can avoid accelarated checkpoints before WAL receiver starts, but checkpoints on WAL streaming will be governed with checkpoint_segments. I have not certain answer for the criticism that checkpoint_semgents should be ignored even when WAL streaming.. Allowing checkpoint_segments be null to signal no check aganst it? Introduce another guc variable in bool to instruct to ignore checkpoint_semgnts on WAL streaming? Or something other? regards, -- Kyotaro Horiguchi NTT Open Source Software Center standby_checkpoint_segments_9.2dev_fix_20120516.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Hello, I've returned from my overseas trip for just over one week. # and I'll leave Japan again after this... restorePtr = replayPtr = receivePtr But XLByteLT(recievePtr, replayPtr) this should not return true under the condition above.. Something wrong in my assumption? When walreceiver is not running, i.e., the startup process reads the WAL files from the archival area, the replay location would get bigger than the receive one. I've overlooked that startup process of the standby reads archives first, and then WAL. But the current patch enables progress governing based on checkpoint_segments during archive recovery on the standby. At Wed, 25 Apr 2012 02:20:37 +0900, Fujii Masao masao.fu...@gmail.com wrote in cahgqgwfso5wfptnalxme-ozrqq6kk24xgtybvjjhdytjf9f...@mail.gmail.com To alleviate the above problem, at least we might have to change the recovery logic so that the archived WAL files are restored with a temporary name, if cascading replication is not enabled (i.e., !standby_mode || !hot_standby || max_wal_senders = 0). Or we might have to remove the restored WAL file after replaying it and before opening the next one, without waiting for a restartpoint to remove the restored WAL files. Thought? I think it is beyond a bug fix. Furthermore, this is not a problem of speed of restartpoint progression, I suppose. If so, this should be cared as another problem. Putting aside the problem of vast amount of copied WALs, I suppose the remaining problem is needless restartpoint acceleration caused during archive restoration on the standby. I will try to resolve this problem. Is that OK? Thinking about the so-many-copied-WAL problem, IMHO, using temporary name only for non-cascading is not a good solution because it leads complication and retrogression to the code and behavior, nevertheless it won't solve the half of the problem. I don't yet understand about cascading replication enough, but I suppose erasing WALs as becoming out of use (by some logic I don't find yet) is hopeful. regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Hello, At Wed, 25 Apr 2012 02:31:24 +0900, Fujii Masao masao.fu...@gmail.com wrote in CAHGQGwE1OvB=HLcahLeL5oP66sxsfsGMgwU3MqAAwZ_Vr=x...@mail.gmail.com If we are allowed to be tolerant of the temporary lack of coherence in shared memory there, the spinlock could be removed. But the possibility to read garbage by using XLogCtl itself to access standbyMode does not seem to be tolerable. What do you think about that? I'm not sure if we really need to worry about that for such shared variable that doesn't change since it's been initialized at the start of recovery. Anyway, if we really need to worry about that, we need to protect the shared variable RecoveryTargetTLI and archiveCleanupCommand with the spinlock because they are in the same situation as standbyMode. From I said that the former (spinlock) could be dropped, but the latter (read as volatile) should be needed. From the view of maintenancibility (suspicious-proof expression?), it may be preferable that the manner to read shared memory be uniform whole source code if no particular reasons. Concerning this point, I think I will do 'volatization' and do not spinlock and put comment instead. regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Thank you for sugestion. This still makes catching up in standby mode slower, as you get many more restartpoints. The reason for ignoring checkpoint_segments during recovery was to avoid that. I may have a misunderstanding around there, or your intention. I understand that standby creates no WAL archive, and can not recover from WAL archive, and both master and standby keeps WAL segment no longer than about them for about 2 * 1h, spans two maximum checkpoint_timeout intervals and some more. Could you please tell me whether the above is correct? If you meant crash recovery with the word 'recovery', there's WALs no more than for 2+ hours, far less than days, or weeks long. Otherwise, if you meant archive recovery, this patch does not change the behavior of archive recovery as far as I intended. This patch intended to change the behavior of standby under WAL shipping. If it is correct and the patch works correctly, your anxiety below should disappear, I hope. And if not correct, I *MUST* avoid such negative impacts on the functions out of the target - governing checkpoint progress on standby server shipping WALs from its master. Maybe it's still better than what we have currently, I'm not sure, but at least it needs to be discussed. Would be good to do some performance testing of recovery with various checkpoint_segments and _timeout settings, with and without this patch. regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Sorry for broken message. From I said that the former (spinlock) could be dropped, but the latter (read as volatile) should be needed. I said that the former (spinlock) could be dropped from the view of functionarity, but the latter (read as volatile) should be needed. From the view of maintenancibility (suspicious-proof expression?), it may be preferable that the manner to read shared memory be uniform whole source code if no particular reasons. regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Hello, - xlog.c: Make StandbyMode shared. - checkpointer.c: Use IsStandbyMode() to check if postmaster is under standby mode. IsStandbyMode() looks overkill to me. The standby mode flag is forcibly turned off at the end of recovery, but its change doesn't need to be shared to the checkpointer process, IOW, the shared flag doesn't need to change since startup like XLogCtl-archiveCleanupCommand, I think. So we can simplify the code to share the flag to the checkpointer. See the attached patch (though not tested yet). Hmm. I understood that the aim of the spinlock and volatil'ize of the pointer in reading shared memory is to secure the memory consistency on SMPs with weak memory consistency and to make compiler help from over-optimization for non-volatile pointer respectively. You removed both of them in the patch. If we are allowed to be tolerant of the temporary lack of coherence in shared memory there, the spinlock could be removed. But the possibility to read garbage by using XLogCtl itself to access standbyMode does not seem to be tolerable. What do you think about that? The comments in checkpointer.c seems to need to be revised more. For example, + * XLogInsert that actually triggers a checkpoint when Currently a checkpoint is triggered by XLogWrite (not XLogInsert), the above needs to be corrected. I will be carefull for such outdated description. regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
On 23.04.2012 02:59, Fujii Masao wrote: On Thu, Apr 19, 2012 at 2:20 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, this is new version of standby checkpoint_segments patch. Thanks for the patch! This still makes catching up in standby mode slower, as you get many more restartpoints. The reason for ignoring checkpoint_segments during recovery was to avoid that. Maybe it's still better than what we have currently, I'm not sure, but at least it needs to be discussed. Would be good to do some performance testing of recovery with various checkpoint_segments and _timeout settings, with and without this patch. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
On Tue, Apr 24, 2012 at 3:53 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 23.04.2012 02:59, Fujii Masao wrote: On Thu, Apr 19, 2012 at 2:20 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, this is new version of standby checkpoint_segments patch. Thanks for the patch! This still makes catching up in standby mode slower, as you get many more restartpoints. The reason for ignoring checkpoint_segments during recovery was to avoid that. Maybe it's still better than what we have currently, I'm not sure, but at least it needs to be discussed. I see your point. Agreed. Another aspect of this problem is that if we ignore checkpoint_segments during recovery, a restartpoint would take long time, and which prevents WAL files from being removed from pg_xlog for a while. Which might cause the disk to fill up with WAL files. This trouble is unlikely to happen in 9.1 or before because the archived WAL files are always restored with a temporary name. OTOH, in 9.2, cascading replication patch changed the recovery logic so that the archived WAL files are restored with the correct name, so the number of WAL files in pg_xlog keeps increasing until a restartpoint removes them. The disk is more likely to fill up, in 9.2. To alleviate the above problem, at least we might have to change the recovery logic so that the archived WAL files are restored with a temporary name, if cascading replication is not enabled (i.e., !standby_mode || !hot_standby || max_wal_senders = 0). Or we might have to remove the restored WAL file after replaying it and before opening the next one, without waiting for a restartpoint to remove the restored WAL files. Thought? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
On Tue, Apr 24, 2012 at 3:47 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, - xlog.c: Make StandbyMode shared. - checkpointer.c: Use IsStandbyMode() to check if postmaster is under standby mode. IsStandbyMode() looks overkill to me. The standby mode flag is forcibly turned off at the end of recovery, but its change doesn't need to be shared to the checkpointer process, IOW, the shared flag doesn't need to change since startup like XLogCtl-archiveCleanupCommand, I think. So we can simplify the code to share the flag to the checkpointer. See the attached patch (though not tested yet). Hmm. I understood that the aim of the spinlock and volatil'ize of the pointer in reading shared memory is to secure the memory consistency on SMPs with weak memory consistency and to make compiler help from over-optimization for non-volatile pointer respectively. You removed both of them in the patch. If we are allowed to be tolerant of the temporary lack of coherence in shared memory there, the spinlock could be removed. But the possibility to read garbage by using XLogCtl itself to access standbyMode does not seem to be tolerable. What do you think about that? I'm not sure if we really need to worry about that for such shared variable that doesn't change since it's been initialized at the start of recovery. Anyway, if we really need to worry about that, we need to protect the shared variable RecoveryTargetTLI and archiveCleanupCommand with the spinlock because they are in the same situation as standbyMode. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
On Thu, Apr 19, 2012 at 2:20 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, this is new version of standby checkpoint_segments patch. Thanks for the patch! - xlog.c: Make StandbyMode shared. - checkpointer.c: Use IsStandbyMode() to check if postmaster is under standby mode. IsStandbyMode() looks overkill to me. The standby mode flag is forcibly turned off at the end of recovery, but its change doesn't need to be shared to the checkpointer process, IOW, the shared flag doesn't need to change since startup like XLogCtl-archiveCleanupCommand, I think. So we can simplify the code to share the flag to the checkpointer. See the attached patch (though not tested yet). The comments in checkpointer.c seems to need to be revised more. For example, +* XLogInsert that actually triggers a checkpoint when Currently a checkpoint is triggered by XLogWrite (not XLogInsert), the above needs to be corrected. Regards, -- Fujii Masao standby_checkpoint_segments_9.2dev_fix_20120423.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
On Wed, Apr 18, 2012 at 10:22 AM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: I tried that at first. But I suppose the requirement here is 'if reading segments comes via replication stream, enable throttling by checkpoint_segments.' and WalRcvInProgress() seems fit to check that. If so, what if replication is terminated and restarted repeatedly while a restartpoint is running? It sometimes obeys and sometimes ignores checkpoint_segments. Which seems strange behavior. Plus, adding SharedStartupStandbyMode into XLogCtlData seems accompanied with some annoyances which would not pay. Hmm... what are you worried about? I don't think that sharing the variable via XLogCtl is so difficult. Please see the code to share archiveCleanupCommand from the startup process to the checkpointer. It looks very simple. By the way, do you have some advise about GetStandbyFlushRecPtr() and the order of the locations? I'm embarrassed with that... Sorry. I could not parse this Could you explain this again? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
I convinced that current patch has a problem, and will come up with the new patch later. I tried that at first. But I suppose the requirement here is 'if reading segments comes via replication stream, enable throttling by checkpoint_segments.' and WalRcvInProgress() seems fit to check that. If so, what if replication is terminated and restarted repeatedly while a restartpoint is running? It sometimes obeys and sometimes ignores checkpoint_segments. Which seems strange behavior. I see your point. And agree on that is something not should be. Plus, adding SharedStartupStandbyMode into XLogCtlData seems accompanied with some annoyances which would not pay. Hmm... what are you worried about? I don't think that sharing the variable via XLogCtl is so difficult. Please see the code to share archiveCleanupCommand from the startup process to the checkpointer. It looks very simple. The mechanism has nothing complex. I've been afraid of making so many variables with similar meaning sitting side by side on shared memory. But I convinced that additional shared variable is preferable because it makes the logic and behavior clear and sane. I will come up with updated patch soon. By the way, do you have some advise about GetStandbyFlushRecPtr() and the order of the locations? I'm embarrassed with that... Sorry. I could not parse this Could you explain this again? My point is, - Is the assumption correct that restorePtr = replayPtr = receivePtr? - If correct, what the code in GetStandbyFlushRecPtr() showing below means? if (XLByteLT(receivePtr, replayPtr)) - Or if wrong, what situation would take place to break the expression restorePtr = replayPtr = receivePtr? regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Hello, this is new version of standby checkpoint_segments patch. - xlog.c: Make StandbyMode shared. - checkpointer.c: Use IsStandbyMode() to check if postmaster is under standby mode. regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 8d0aabf..2457840 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -177,6 +177,12 @@ static bool LocalRecoveryInProgress = true; static bool LocalHotStandbyActive = false; /* + * Local copy of SharedIsStandbyMode variable. True actually means not known, + * need to check the shared state. + */ +static bool LocalIsStandbyMode = true; + +/* * Local state for XLogInsertAllowed(): * 1: unconditionally allowed to insert XLOG * 0: unconditionally not allowed to insert XLOG @@ -206,7 +212,6 @@ static TimestampTz recoveryTargetTime; static char *recoveryTargetName; /* options taken from recovery.conf for XLOG streaming */ -static bool StandbyMode = false; static char *PrimaryConnInfo = NULL; static char *TriggerFile = NULL; @@ -427,6 +432,11 @@ typedef struct XLogCtlData bool SharedHotStandbyActive; /* + * SharedInStandbyMode indicates if we are running in standby mode. + */ + bool SharedIsStandbyMode; + + /* * recoveryWakeupLatch is used to wake up the startup process to continue * WAL replay, if it is waiting for WAL to arrive or failover trigger file * to appear. @@ -619,6 +629,7 @@ static void SetLatestXTime(TimestampTz xtime); static void SetCurrentChunkStartTime(TimestampTz xtime); static void CheckRequiredParameterValues(void); static void XLogReportParameters(void); +static void ExitStandbyMode(void); static void LocalSetXLogInsertAllowed(void); static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags); static void KeepLogSeg(XLogRecPtr recptr, uint32 *logId, uint32 *logSeg); @@ -3115,7 +3126,7 @@ RestoreArchivedFile(char *path, const char *xlogfname, * incorrectly conclude we've reached the end of WAL and we're * done recovering ... */ -if (StandbyMode stat_buf.st_size expectedSize) +if (IsStandbyMode() stat_buf.st_size expectedSize) elevel = DEBUG1; else elevel = FATAL; @@ -4072,7 +4083,7 @@ next_record_is_invalid: } /* In standby-mode, keep trying */ - if (StandbyMode) + if (IsStandbyMode()) goto retry; else return NULL; @@ -5098,6 +5109,7 @@ XLOGShmemInit(void) XLogCtl-XLogCacheBlck = XLOGbuffers - 1; XLogCtl-SharedRecoveryInProgress = true; XLogCtl-SharedHotStandbyActive = false; + XLogCtl-SharedIsStandbyMode = true; XLogCtl-Insert.currpage = (XLogPageHeader) (XLogCtl-pages); SpinLockInit(XLogCtl-info_lck); InitSharedLatch(XLogCtl-recoveryWakeupLatch); @@ -5289,6 +5301,7 @@ readRecoveryCommandFile(void) FILE *fd; TimeLineID rtli = 0; bool rtliGiven = false; + boolstandby_mode = false; ConfigVariable *item, *head = NULL, *tail = NULL; @@ -5439,13 +5452,14 @@ readRecoveryCommandFile(void) } else if (strcmp(item-name, standby_mode) == 0) { - if (!parse_bool(item-value, StandbyMode)) + if (!parse_bool(item-value, standby_mode)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg(parameter \%s\ requires a Boolean value, standby_mode))); ereport(DEBUG2, (errmsg_internal(standby_mode = '%s', item-value))); + } else if (strcmp(item-name, primary_conninfo) == 0) { @@ -5470,7 +5484,7 @@ readRecoveryCommandFile(void) /* * Check for compulsory parameters */ - if (StandbyMode) + if (standby_mode) { if (PrimaryConnInfo == NULL recoveryRestoreCommand == NULL) ereport(WARNING, @@ -5480,6 +5494,7 @@ readRecoveryCommandFile(void) } else { + ExitStandbyMode(); if (recoveryRestoreCommand == NULL) ereport(FATAL, (errmsg(recovery command file \%s\ must specify restore_command when standby mode is not enabled, @@ -6086,7 +6101,7 @@ StartupXLOG(void) if (InArchiveRecovery) { - if (StandbyMode) + if (IsStandbyMode()) ereport(LOG, (errmsg(entering standby mode))); else if (recoveryTarget == RECOVERY_TARGET_XID) @@ -6110,7 +6125,7 @@ StartupXLOG(void) * Take ownership of the wakeup latch if we're going to sleep during * recovery. */ - if (StandbyMode) + if (IsStandbyMode()) OwnLatch(XLogCtl-recoveryWakeupLatch); if (read_backup_label(checkPointLoc, backupEndRequired, @@ -6169,7 +6184,7 @@ StartupXLOG(void) (errmsg(checkpoint record is at %X/%X, checkPointLoc.xlogid, checkPointLoc.xrecoff))); } - else if (StandbyMode) + else if (IsStandbyMode()) { /* * The last valid checkpoint record required for a streaming @@ -6683,7 +6698,7 @@ StartupXLOG(void) * We don't need the latch anymore. It's not strictly necessary to disown * it, but let's do it for the
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
This is new version of the patch. I replaced GetStandbyFlushRecPtr with GetXLogReplayRecPtr to check progress of checkpoint following Fujii's sugestion. The first one is for 9.2dev, and the second is 9.1.3 backported version. === By the way, I took a close look around there, I agree with it basically. But I've get confused to look into GetStandbyFlushRecPtr(). | if (XLByteLT(receivePtr, replayPtr)) | return XLByteLT(replayPtr, restorePtr) ? restorePtr : replayPtr; | else | return XLByteLT(receivePtr, restorePtr) ? restorePtr : receivePtr; - receivePtr seems always updated just after syncing received xlog. - replayPtr is updated just BEFORE xlog_redo operation, and - restorePtr is updated AFTER xlog_redo(). - And, replayPtr seems not exceeds receivePtr. These seems quite reasonable. These conditions make following conditional expression. restorePtr = replayPtr = receivePtr But XLByteLT(recievePtr, replayPtr) this should not return true under the condition above.. Something wrong in my assumption? Anyway, I understand here that you say the location returned by GetXLogReplayRecPtr() is always flushed. -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index c9473f7..c2fafbf 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -491,8 +491,8 @@ CheckpointerMain(void) * Initialize checkpointer-private variables used during checkpoint. */ ckpt_active = true; - if (!do_restartpoint) -ckpt_start_recptr = GetInsertRecPtr(); + ckpt_start_recptr = +do_restartpoint ? GetXLogReplayRecPtr(NULL) : GetInsertRecPtr(); ckpt_start_time = now; ckpt_cached_elapsed = 0; @@ -731,28 +731,30 @@ IsCheckpointOnSchedule(double progress) return false; /* - * Check progress against WAL segments written and checkpoint_segments. + * Check progress against WAL segments written, or replayed for + * hot standby, and checkpoint_segments. * * We compare the current WAL insert location against the location - * computed before calling CreateCheckPoint. The code in XLogInsert that - * actually triggers a checkpoint when checkpoint_segments is exceeded - * compares against RedoRecptr, so this is not completely accurate. - * However, it's good enough for our purposes, we're only calculating an - * estimate anyway. + * computed before calling CreateCheckPoint. The code in + * XLogInsert that actually triggers a checkpoint when + * checkpoint_segments is exceeded compares against RedoRecPtr. + * Similarly, we consult WAL replay location instead on hot + * standbys and XLogPageRead compares it aganst RedoRecPtr, too. + * Altough these are not completely accurate, it's good enough for + * our purposes, we're only calculating an estimate anyway. */ - if (!RecoveryInProgress()) + + recptr = + RecoveryInProgress() ? GetXLogReplayRecPtr(NULL) : GetInsertRecPtr(); + elapsed_xlogs = + (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile + + ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) / + CheckPointSegments; + + if (progress elapsed_xlogs) { - recptr = GetInsertRecPtr(); - elapsed_xlogs = - (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile + - ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) / - CheckPointSegments; - - if (progress elapsed_xlogs) - { - ckpt_cached_elapsed = elapsed_xlogs; - return false; - } + ckpt_cached_elapsed = elapsed_xlogs; + return false; } /* diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c index 5643ec8..a272866 100644 --- a/src/backend/postmaster/bgwriter.c +++ b/src/backend/postmaster/bgwriter.c @@ -56,6 +56,7 @@ #include pgstat.h #include postmaster/bgwriter.h #include replication/syncrep.h +#include replication/walreceiver.h #include storage/bufmgr.h #include storage/fd.h #include storage/ipc.h @@ -489,8 +490,8 @@ BackgroundWriterMain(void) * Initialize bgwriter-private variables used during checkpoint. */ ckpt_active = true; - if (!do_restartpoint) -ckpt_start_recptr = GetInsertRecPtr(); + ckpt_start_recptr = do_restartpoint ? +GetXLogReplayRecPtr() : GetInsertRecPtr(); ckpt_start_time = now; ckpt_cached_elapsed = 0; @@ -764,30 +765,32 @@ IsCheckpointOnSchedule(double progress) return false; /* - * Check progress against WAL segments written and checkpoint_segments. + * Check progress against WAL segments written, or replayed for + * hot standby, and checkpoint_segments. * * We compare the current WAL insert location against the location - * computed before calling CreateCheckPoint. The code in XLogInsert that - * actually triggers a checkpoint when checkpoint_segments is exceeded -
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
On 17.04.2012 09:50, Kyotaro HORIGUCHI wrote: This is new version of the patch. I replaced GetStandbyFlushRecPtr with GetXLogReplayRecPtr to check progress of checkpoint following Fujii's sugestion. The reason we haven't historically obeyed checkpoint_segments during recovery is that it slows down the recovery unnecessarily if you're restoring from a backup and you replay, say, one week's worth of WAL files. If for example you have checkpoint_segments=10 and checkpoint_timeout='15 mins' in the server that generated the WAL, you would be constantly performing a restartpoint if you trigger one every 10 segments. You could argue that you should obey checkpoint_segments in a standby server that's caught up with the master, but AFAICS the patch doesn't try to detect that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Hello, The reason we haven't historically obeyed checkpoint_segments during recovery is that it slows down the recovery unnecessarily if you're restoring from a backup and you replay, The variable StandbyMode is false on archive recovery, so no checkpoint triggerred during then. xlog.c:10026 (in some version around 9.2) | /* | * Request a restartpoint if we've replayed too much | * xlog since the last one. | */ | if (StandbyMode bgwriterLaunched) | { | if (XLogCheckpointNeeded(readId, readSeg)) You could argue that you should obey checkpoint_segments in a standby server that's caught up with the master, but AFAICS the patch doesn't try to detect that. Concerning checkpoint, there seems no need for the standby to know whether it has been caught up with its master or not. Checkpoint has been already kicked by checkpoint_timeout regardless of the sync_state. regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Sorry, I've wrote something wrong. The reason we haven't historically obeyed checkpoint_segments during recovery is that it slows down the recovery unnecessarily if you're restoring from a backup and you replay, The variable StandbyMode is false on archive recovery, so no checkpoint triggerred during then. Nevertheless, checkpoints will be triggered by checkpoint_timeout and run at the maybe higher speed governed by checkpoint_segments. This is undesirable behavior from such a point of view. But I think referring checkpoint_segment on such case should be inhibited, and I suppose it is possible using StandbyMode in IsCheckpointOnSchedule(), I suppose. I will correct the patch later. regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Hello, this message is attached with the patch which did not tested. That is for show the way. On Tue, Apr 17, 2012 at 9:38 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: But I think referring checkpoint_segment on such case should be inhibited, and I suppose it is possible using StandbyMode in IsCheckpointOnSchedule(), I suppose. I will correct the patch later. Hmm. StandbyMode is a local variable which cannot be accessed in checkpointer. But WalRcvInProgress() which shows if wal receiver is running seems to be usable to ENABLE governing progress by checkpoint_segments. | IsCheckpointOnSchedule(double progress) | { |/* | * Inhibit governing progress by segments in archive recovery. | */ |recovery_in_progress = RecoveryInProgress(); |if (!recovery_in_progress || WalRcvInProgress()) |{ |recptr = recovery_in_progress ? GetXLogReplayRecPtr(NULL) : |GetInsertRecPtr(); How about this? regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. standby_checkpoint_segments_9.2dev_fix_20120417v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
On Tue, Apr 17, 2012 at 3:50 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: These seems quite reasonable. These conditions make following conditional expression. restorePtr = replayPtr = receivePtr But XLByteLT(recievePtr, replayPtr) this should not return true under the condition above.. Something wrong in my assumption? When walreceiver is not running, i.e., the startup process reads the WAL files from the archival area, the replay location would get bigger than the receive one. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
On Tue, Apr 17, 2012 at 11:50 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hmm. StandbyMode is a local variable which cannot be accessed in checkpointer. But WalRcvInProgress() which shows if wal receiver is running seems to be usable to ENABLE governing progress by checkpoint_segments. Even when walreceiver is not running and WAL files are read from the archive, checkpoint_segments can trigger a restartpoint. In this case, ISTM a restartpoint should be scheduled according to checkpoint_segments, so I don't think that checking WalRcvInProgress() for that purpose is right thing. Instead, what about sharing StandbyMode flag among processes via shared memory like XLogCtl? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Hmm. StandbyMode is a local variable which cannot be accessed in checkpointer. But WalRcvInProgress() which shows if wal receiver is running seems to be usable to ENABLE governing progress by checkpoint_segments. Even when walreceiver is not running and WAL files are read from the archive, checkpoint_segments can trigger a restartpoint. In this case, ISTM a restartpoint should be scheduled according to checkpoint_segments, so I don't think that checking WalRcvInProgress() for that purpose is right thing. Instead, what about sharing StandbyMode flag among processes via shared memory like XLogCtl? I tried that at first. But I suppose the requirement here is 'if reading segments comes via replication stream, enable throttling by checkpoint_segments.' and WalRcvInProgress() seems fit to check that. Plus, adding SharedStartupStandbyMode into XLogCtlData seems accompanied with some annoyances which would not pay. By the way, do you have some advise about GetStandbyFlushRecPtr() and the order of the locations? I'm embarrassed with that... I agree with it basically. But I've get confused to look into GetStandbyFlushRecPtr(). | if (XLByteLT(receivePtr, replayPtr)) | return XLByteLT(replayPtr, restorePtr) ? restorePtr : replayPtr; | else | return XLByteLT(receivePtr, restorePtr) ? restorePtr : receivePtr; - receivePtr seems always updated just after syncing received xlog. - replayPtr is updated just BEFORE xlog_redo operation, and - restorePtr is updated AFTER xlog_redo(). - And, replayPtr seems not exceeds receivePtr. These seems quite reasonable. These conditions make following conditional expression. restorePtr = replayPtr = receivePtr But XLByteLT(recievePtr, replayPtr) this should not return true under the condition above.. Something wrong in my assumption? regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
On Mon, Apr 16, 2012 at 1:05 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, this is bug report and a patch for it. The first patch in the attachments is for 9.2dev and next one is for 9.1.3. On the current 9.2dev, IsCheckpointOnSchedule@checkpointer.c does not check against WAL segments written. This makes checkpointer always run at the speed according to checkpoint_timeout regardless of WAL advancing rate. Thanks, I'll take a look. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
On Mon, Apr 16, 2012 at 9:05 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: In the backported version to 9.1.3, bgwriter.c is modified instead of checkpointer.c in 9.2. And GetWalRcvWriteRecPtr() is used as the equivalent of GetStandbyFlushRecPtr() in 9.2. In 9,2, GetXLogReplayRecPtr() should be used instead of GetStandbyFlushRecPtr(). A restartpoint is scheduled to finish before next restartpoint is triggered. A restartpoint is triggered if too much WAL files have been replayed since last restartpoint. So a restartpoint should be scheduled according to the replay location not the receive location. -* computed before calling CreateCheckPoint. The code in XLogInsert that -* actually triggers a checkpoint when checkpoint_segments is exceeded -* compares against RedoRecptr, so this is not completely accurate. -* However, it's good enough for our purposes, we're only calculating an -* estimate anyway. +* computed before calling CreateCheckPoint. The code in +* XLogInsert that actually triggers a checkpoint when +* checkpoint_segments is exceeded compares against RedoRecptr. +* Similarly, we consult WAL flush location instead on hot +* standbys and XLogPageRead compares it aganst RedoRecptr, too. +* Altough these are not completely accurate, it's good enough for +* our purposes, we're only calculating an estimate anyway. I think that basically it's better not to change the comments (i.e., not to add the line feed) if their contents are the same as previous ones, to highlight what you actually changed in the patch. Typo: RedoRecptr should be RedoRecPtr? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Hello, thank you for comment. In the backported version to 9.1.3, bgwriter.c is modified instead of checkpointer.c in 9.2. And GetWalRcvWriteRecPtr() is used as the equivalent of GetStandbyFlushRecPtr() in 9.2. In 9,2, GetXLogReplayRecPtr() should be used instead of GetStandbyFlushRecPtr(). A restartpoint is scheduled to finish before next restartpoint is triggered. A restartpoint is triggered if too much WAL files have been replayed since last restartpoint. So a restartpoint should be scheduled according to the replay location not the receive location. I agree with it basically. But I've get confused to look into GetStandbyFlushRecPtr(). | if (XLByteLT(receivePtr, replayPtr)) | return XLByteLT(replayPtr, restorePtr) ? restorePtr : replayPtr; | else | return XLByteLT(receivePtr, restorePtr) ? restorePtr : receivePtr; This seems imply receivePtr may be behind replayPtr. I don't understand what condition makes it but anyway the bottom line I think is that a restartpoint should be based on WALs surely synced. So I choosed GetStandbyFlushRecPtr() to read the location. If receivePtr/restorePtr always precede or are equal to replayPtr, I prefer GetXLogReplayRecPtr() as you suggest. (And some comment about the order among these pointers might should be supplied for the part) I think that basically it's better not to change the comments (i.e., not to add the line feed) if their contents are the same as previous ones, to highlight what you actually changed in the patch. Hmm. It is a priority matter between pointing up in or compactness of a patch and consistency in outcome of that. I think the latter takes precedence over the former. Altough, I could have found a description on better balance. But more than that, I've found fill-column for this comment be too short... Typo: RedoRecptr should be RedoRecPtr? I think that's right. I've unconsciously brought that spelling from the orignal comment. regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers