On Wed, Sep 15, 2021 at 9:38 PM Robert Haas <robertmh...@gmail.com> wrote: > > On Wed, Sep 15, 2021 at 10:32 AM Robert Haas <robertmh...@gmail.com> wrote: > > Putting these changes into 0001 seems to make no sense. It seems like > > they should be part of 0003, or maybe a new 0004 patch. > > After looking at this a little bit more, I think it's really necessary > to separate out all of your changes into separate patches at least for > initial review. It's particularly important to separate code movement > changes from other kinds of changes. 0001 was just moving code before, > and so was 0002, but now both are making other changes, which is not > easy to see from looking at the 'git diff' output. For that reason > it's not so easy to understand exactly what you've changed here and > analyze it. >
Ok, understood, I have separated my changes into 0001 and 0002 patch, and the refactoring patches start from 0003. In the 0001 patch, I have copied ArchiveRecoveryRequested to shared memory as said previously. Coping ArchiveRecoveryRequested value to shared memory is not really interesting, and I think somehow we should reuse existing variable, (perhaps, with some modification of the information it can store, e.g. adding one more enum value for SharedRecoveryState or something else), thinking on the same. In addition to that, I tried to turn down the scope of ArchiveRecoveryRequested global variable. Now, this is a static variable, and the scope is limited to xlog.c file like LocalXLogInsertAllowed and can be accessed through the newly added function ArchiveRecoveryIsRequested() (like PromoteIsTriggered()). Let me know what you think about the approach. In 0002 patch is a mixed one where I tried to remove the dependencies on global variables and local variables belonging to StartupXLOG(). I am still worried about the InRecovery value that needs to be deduced afterward inside XLogAcceptWrites(). Currently, relying on ControlFile->state != DB_SHUTDOWNED check but I think that will not be good for ASRO where we plan to skip XLogAcceptWrites() work only and let the StartupXLOG() do the rest of the work as it is where it will going to update ControlFile's DBState to DB_IN_PRODUCTION, then we might need some ugly kludge to call PerformRecoveryXLogAction() in checkpointer irrespective of DBState, which makes me a bit uncomfortable. > I poked around a little bit at these patches, looking for > perhaps-interesting global variables upon which the code called from > XLogAcceptWrites() would depend with your patches applied. The most > interesting ones seem to be (1) ThisTimeLineID, which you mentioned > and which may be fine but I'm not totally convinced yet, (2) > LocalXLogInsertAllowed, which is probably not broken but I'm thinking > we may want to redesign that mechanism somehow to make it cleaner, and Thanks for the off-list detailed explanation on this. For somebody else who might be reading this, the concern here is (not really a concern, it is a good thing to improve) the LocalSetXLogInsertAllowed() function call, is a kind of hack that enables wal writes irrespective of RecoveryInProgress() for a shorter period. E.g. see following code in StartupXLOG: " LocalSetXLogInsertAllowed(); UpdateFullPageWrites(); LocalXLogInsertAllowed = -1; .... .... /* * If any of the critical GUCs have changed, log them before we allow * backends to write WAL. */ LocalSetXLogInsertAllowed(); XLogReportParameters(); " Instead of explicitly enabling wal insert, somehow that implicitly allowed for the startup process and/or the checkpointer doing the first checkpoint and/or wal writes after the recovery. Well, the current LocalSetXLogInsertAllowed() mechanism is not really harming anything or bad and does not necessarily need to change but it would be nice if we were able to come up with something much cleaner, bug-free, and 100% perfect enough design. (Hope I am not missing anything from the discussion). > (3) CheckpointStats, which is called from RemoveXlogFile which is > called from RemoveNonParentXlogFiles which is called from > CleanupAfterArchiveRecovery which is called from XLogAcceptWrites. > This last one is actually pretty weird already in the existing code. > It sort of looks like RemoveXlogFile() only expects to be called from > the checkpointer (or a standalone backend) so that it can update > CheckpointStats and have that just work, but actually it's also called > from the startup process when a timeline switch happens. I don't know > whether the fact that the increments to ckpt_segs_recycled get lost in > that case should be considered an intentional behavior that should be > preserved or an inadvertent mistake. > Maybe I could be wrong, but I think that is intentional. It removes pre-allocated or bogus files of the old timeline which are not supposed to be considered in stats. The comments for CheckpointStatsData might not be clear but comment at the calling RemoveNonParentXlogFiles() place inside StartupXLOG hints the same: " /* * Before we continue on the new timeline, clean up any * (possibly bogus) future WAL segments on the old * timeline. */ RemoveNonParentXlogFiles(EndRecPtr, ThisTimeLineID); .... .... * We switched to a new timeline. Clean up segments on the old * timeline. * * If there are any higher-numbered segments on the old timeline, * remove them. They might contain valid WAL, but they might also be * pre-allocated files containing garbage. In any case, they are not * part of the new timeline's history so we don't need them. */ RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID); " > So I think you've covered most of the necessary things here, with > probably some more discussion needed on whether you've done the right > things... > Thanks, Robert, for your time. Regards, Amul Sul
From 9d876ab6ffe05228457c9c58442d05cefceb1584 Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Mon, 20 Sep 2021 07:40:44 -0400 Subject: [PATCH v35 5/5] Create XLogAcceptWrites() function with code from StartupXLOG(). This is just code movement. A future patch will want to defer the call to XLogAcceptWrites() until a later time, rather than doing it as soon as we finish applying WAL, but here we're just grouping related code together into a new function. Robert Haas, with modifications by Amul Sul. --- src/backend/access/transam/xlog.c | 77 +++++++++++++++++++------------ 1 file changed, 47 insertions(+), 30 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e8b1cf4ce20..642608013ae 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -938,6 +938,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force); static XLogRecord *ReadRecord(XLogReaderState *xlogreader, int emode, bool fetching_ckpt); static void CheckRecoveryConsistency(void); +static bool XLogAcceptWrites(void); static bool PerformRecoveryXLogAction(void); static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int whichChkpt, bool report); @@ -6628,7 +6629,7 @@ StartupXLOG(void) DBState dbstate_at_startup; XLogReaderState *xlogreader; XLogPageReadPrivate private; - bool promoted = false; + bool promoted; struct stat st; /* @@ -8041,38 +8042,14 @@ StartupXLOG(void) XLogReaderFree(xlogreader); /* - * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE - * record before resource manager writes cleanup WAL records or checkpoint - * record is written. + * Update full_page_writes in shared memory, and later whenever wal write + * permitted, write an XLOG_FPW_CHANGE record before resource manager + * writes cleanup WAL records or checkpoint record is written. */ Insert->fullPageWrites = lastFullPageWrites; - LocalSetXLogInsertAllowed(); - UpdateFullPageWrites(); - LocalXLogInsertAllowed = -1; - - /* - * Emit checkpoint or end-of-recovery record in XLOG, if the server wasn't - * shut down cleanly, which been through recovery. - */ - if (ControlFile->state != DB_SHUTDOWNED) - promoted = PerformRecoveryXLogAction(); - - /* If this is archive recovery, perform post-recovery cleanup actions. */ - if (ArchiveRecoveryIsRequested()) - CleanupAfterArchiveRecovery(); - - /* - * If any of the critical GUCs have changed, log them before we allow - * backends to write WAL. - */ - LocalSetXLogInsertAllowed(); - XLogReportParameters(); - /* - * Local WAL inserts enabled, so it's time to finish initialization of - * commit timestamp. - */ - CompleteCommitTsInitialization(); + /* Prepare to accept WAL writes. */ + promoted = XLogAcceptWrites(); /* * All done with end-of-recovery actions. @@ -8116,6 +8093,46 @@ StartupXLOG(void) RequestCheckpoint(CHECKPOINT_FORCE); } +/* + * Prepare to accept WAL writes. + */ +static bool +XLogAcceptWrites(void) +{ + bool promoted = false; + + /* Write an XLOG_FPW_CHANGE record */ + LocalSetXLogInsertAllowed(); + UpdateFullPageWrites(); + LocalXLogInsertAllowed = -1; + + /* + * Emit checkpoint or end-of-recovery record in XLOG, if the server wasn't + * shut down cleanly, which been through recovery. + */ + if (ControlFile->state != DB_SHUTDOWNED) + promoted = PerformRecoveryXLogAction(); + + /* If this is archive recovery, perform post-recovery cleanup actions. */ + if (ArchiveRecoveryIsRequested()) + CleanupAfterArchiveRecovery(); + + /* + * If any of the critical GUCs have changed, log them before we allow + * backends to write WAL. + */ + LocalSetXLogInsertAllowed(); + XLogReportParameters(); + + /* + * Local WAL inserts enabled, so it's time to finish initialization of + * commit timestamp. + */ + CompleteCommitTsInitialization(); + + return promoted; +} + /* * Checks if recovery has reached a consistent state. When consistency is * reached and we have a valid starting standby snapshot, tell postmaster -- 2.18.0
From 30392aca58c4f9c103047eed104917f63e5d9236 Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Mon, 20 Sep 2021 06:43:32 -0400 Subject: [PATCH v35 2/5] miscellaneous: remove dependency on global and local variable. Removes dependency on global variables and some local variable in StartupXLOG() function which may be available and/or deduced through the information available into shared memory. Changes enable us to move some of the code from StartupXLOG() into a separate function that can be executable by the other process which are connected to shared memory. --- src/backend/access/transam/xlog.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 4a6ddfb1872..c9d5bf9a72c 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7902,7 +7902,11 @@ StartupXLOG(void) UpdateFullPageWrites(); LocalXLogInsertAllowed = -1; - if (InRecovery) + /* + * Emit checkpoint or end-of-recovery record in XLOG, if the server wasn't + * shut down cleanly, which been through recovery. + */ + if (ControlFile->state != DB_SHUTDOWNED) { /* * Perform a checkpoint to update all our recovery activity to disk. @@ -7919,7 +7923,7 @@ StartupXLOG(void) * queries. */ if (ArchiveRecoveryIsRequested() && IsUnderPostmaster && - LocalPromoteIsTriggered) + PromoteIsTriggered()) { promoted = true; @@ -7945,6 +7949,8 @@ StartupXLOG(void) if (ArchiveRecoveryIsRequested()) { + XLogRecPtr EndOfLog; + /* * And finally, execute the recovery_end_command, if any. */ @@ -7962,6 +7968,7 @@ StartupXLOG(void) * pre-allocated files containing garbage. In any case, they are not * part of the new timeline's history so we don't need them. */ + (void) GetLastSegSwitchData(&EndOfLog); RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID); /* @@ -7998,6 +8005,7 @@ StartupXLOG(void) { char origfname[MAXFNAMELEN]; XLogSegNo endLogSegNo; + TimeLineID EndOfLogTLI = XLogCtl->replayEndTLI; XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size); XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size); -- 2.18.0
From 8062a67ac681940bb239b06e9277a765a546905d Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Mon, 20 Sep 2021 07:40:36 -0400 Subject: [PATCH v35 4/5] Postpone some end-of-recovery operations relating to allowing WAL. Previously, moved the code that performs whether to write a checkpoint or an end-of-recovery record into PerformRecoveryXlogAction(), and code performs post-archive-recovery into CleanupAfterArchiveRecovery(), but called both the functions from the same place. Now postpone that stuff until after we clear InRecovery and shut down the XLogReader. This is preparatory work for a future patch that wants to allow recovery to end at one time and only later start to allow WAL writes. The steps that themselves write WAL clearly shouldn't happen before we're ready to accept WAL writes, and it seems best for now to keep the steps performed by CleanupAfterArchiveRecovery() at the same point relative to the surrounding steps. We assume (hopefully correctly) that the user doesn't want recovery_end_command to run until we're committed to writing WAL on the new timeline. Until then, the machine is still usable as a standby on the old timeline. Aside from the value of this patch as preparatory work, this order of operations actually seems more logical, since it means we don't actually write any WAL until after exiting recovery. Robert Haas, with modifications by Amul Sul --- src/backend/access/transam/xlog.c | 42 +++++++++++++++---------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 12f7e080d3e..e8b1cf4ce20 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7989,27 +7989,6 @@ StartupXLOG(void) XLogCtl->LogwrtRqst.Write = EndOfLog; XLogCtl->LogwrtRqst.Flush = EndOfLog; - /* - * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE - * record before resource manager writes cleanup WAL records or checkpoint - * record is written. - */ - Insert->fullPageWrites = lastFullPageWrites; - LocalSetXLogInsertAllowed(); - UpdateFullPageWrites(); - LocalXLogInsertAllowed = -1; - - /* - * Emit checkpoint or end-of-recovery record in XLOG, if the server wasn't - * shut down cleanly, which been through recovery. - */ - if (ControlFile->state != DB_SHUTDOWNED) - promoted = PerformRecoveryXLogAction(); - - /* If this is archive recovery, perform post-recovery cleanup actions. */ - if (ArchiveRecoveryIsRequested()) - CleanupAfterArchiveRecovery(); - /* * Preallocate additional log files, if wanted. */ @@ -8061,6 +8040,27 @@ StartupXLOG(void) } XLogReaderFree(xlogreader); + /* + * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE + * record before resource manager writes cleanup WAL records or checkpoint + * record is written. + */ + Insert->fullPageWrites = lastFullPageWrites; + LocalSetXLogInsertAllowed(); + UpdateFullPageWrites(); + LocalXLogInsertAllowed = -1; + + /* + * Emit checkpoint or end-of-recovery record in XLOG, if the server wasn't + * shut down cleanly, which been through recovery. + */ + if (ControlFile->state != DB_SHUTDOWNED) + promoted = PerformRecoveryXLogAction(); + + /* If this is archive recovery, perform post-recovery cleanup actions. */ + if (ArchiveRecoveryIsRequested()) + CleanupAfterArchiveRecovery(); + /* * If any of the critical GUCs have changed, log them before we allow * backends to write WAL. -- 2.18.0
From 2f5dc2d9ca71d9bc4ba5dc86a69efca7aa4408f9 Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Mon, 20 Sep 2021 05:52:19 -0400 Subject: [PATCH v35 1/5] Store ArchiveRecoveryRequested in shared memory and change its scope. Storing ArchiveRecoveryRequested value in shared memory makes it accessible to other processes as well. As of now no other process does care about that but this will help to move code executed when ArchiveRecoveryRequested set to other processes. Also, the patch does change the scope of ArchiveRecoveryRequested global to local, and type to integer. Now it has three values as -1 for the unknown, 1 for the request is made & 0 for no request or request is completed. --- src/backend/access/transam/timeline.c | 6 +- src/backend/access/transam/xlog.c | 122 ++++++++++++++++------- src/backend/access/transam/xlogarchive.c | 2 +- src/include/access/xlog.h | 1 + src/include/access/xlog_internal.h | 1 - 5 files changed, 93 insertions(+), 39 deletions(-) diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c index 8d0903c1756..0d4951b8255 100644 --- a/src/backend/access/transam/timeline.c +++ b/src/backend/access/transam/timeline.c @@ -93,7 +93,7 @@ readTimeLineHistory(TimeLineID targetTLI) return list_make1(entry); } - if (ArchiveRecoveryRequested) + if (ArchiveRecoveryIsRequested()) { TLHistoryFileName(histfname, targetTLI); fromArchive = @@ -229,7 +229,7 @@ existsTimeLineHistory(TimeLineID probeTLI) if (probeTLI == 1) return false; - if (ArchiveRecoveryRequested) + if (ArchiveRecoveryIsRequested()) { TLHistoryFileName(histfname, probeTLI); RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false); @@ -331,7 +331,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI, /* * If a history file exists for the parent, copy it verbatim */ - if (ArchiveRecoveryRequested) + if (ArchiveRecoveryIsRequested()) { TLHistoryFileName(histfname, parentTLI); RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false); diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e51a7a749da..4a6ddfb1872 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -239,17 +239,23 @@ static bool LocalPromoteIsTriggered = false; static int LocalXLogInsertAllowed = -1; /* - * When ArchiveRecoveryRequested is set, archive recovery was requested, - * ie. signal files were present. When InArchiveRecovery is set, we are - * currently recovering using offline XLOG archives. These variables are only - * valid in the startup process. - * - * When ArchiveRecoveryRequested is true, but InArchiveRecovery is false, we're - * currently performing crash recovery using only XLOG files in pg_wal, but - * will switch to using offline XLOG archives as soon as we reach the end of - * WAL in pg_wal. -*/ -bool ArchiveRecoveryRequested = false; + * When ArchiveRecoveryRequested is ARCHIVE_RECOVERY_REQUEST_YES, archive + * recovery was requested, ie. signal files were present. When InArchiveRecovery + * is set, we are currently recovering using offline XLOG archives. + * + * When ArchiveRecoveryRequested is ARCHIVE_RECOVERY_REQUEST_YES, but + * InArchiveRecovery is false, we're currently performing crash recovery using + * only XLOG files in pg_wal, but will switch to using offline XLOG archives as + * soon as we reach the end of WAL in pg_wal. + * + * InArchiveRecovery only valid in the startup process. ArchiveRecoveryRequested + * can be acccsed through ArchiveRecoveryIsRequested(). + */ +#define ARCHIVE_RECOVERY_REQUEST_UNKOWN -1 +#define ARCHIVE_RECOVERY_REQUEST_NO 0 +#define ARCHIVE_RECOVERY_REQUEST_YES 1 + +static int ArchiveRecoveryRequested = ARCHIVE_RECOVERY_REQUEST_UNKOWN; bool InArchiveRecovery = false; static bool standby_signal_file_found = false; @@ -637,6 +643,12 @@ typedef struct XLogCtlData */ RecoveryState SharedRecoveryState; + /* + * SharedArchiveRecoveryRequested indicates whether an archive recovery is + * requested. Protected by info_lck. + */ + bool SharedArchiveRecoveryRequested; + /* * SharedHotStandbyActive indicates if we allow hot standby queries to be * run. Protected by info_lck. @@ -4455,7 +4467,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode, * we'd have no idea how far we'd have to replay to reach * consistency. So err on the safe side and give up. */ - if (!InArchiveRecovery && ArchiveRecoveryRequested && + if (!InArchiveRecovery && ArchiveRecoveryIsRequested() && !fetching_ckpt) { ereport(DEBUG1, @@ -5223,6 +5235,7 @@ XLOGShmemInit(void) */ XLogCtl->XLogCacheBlck = XLOGbuffers - 1; XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH; + XLogCtl->SharedArchiveRecoveryRequested = false; XLogCtl->SharedHotStandbyActive = false; XLogCtl->InstallXLogFileSegmentActive = false; XLogCtl->SharedPromoteIsTriggered = false; @@ -5485,16 +5498,16 @@ readRecoverySignalFile(void) } StandbyModeRequested = false; - ArchiveRecoveryRequested = false; + ArchiveRecoveryRequested = ARCHIVE_RECOVERY_REQUEST_NO; if (standby_signal_file_found) { StandbyModeRequested = true; - ArchiveRecoveryRequested = true; + ArchiveRecoveryRequested = ARCHIVE_RECOVERY_REQUEST_YES; } else if (recovery_signal_file_found) { StandbyModeRequested = false; - ArchiveRecoveryRequested = true; + ArchiveRecoveryRequested = ARCHIVE_RECOVERY_REQUEST_YES; } else return; @@ -5507,12 +5520,18 @@ readRecoverySignalFile(void) ereport(FATAL, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("standby mode is not supported by single-user servers"))); + + /* + * Remember archive recovery request in shared memory state. A lock is not + * needed since we are the only ones who updating this. + */ + XLogCtl->SharedArchiveRecoveryRequested = (bool) ArchiveRecoveryRequested; } static void validateRecoveryParameters(void) { - if (!ArchiveRecoveryRequested) + if (!ArchiveRecoveryIsRequested()) return; /* @@ -5750,7 +5769,7 @@ recoveryStopsBefore(XLogReaderState *record) * Ignore recovery target settings when not in archive recovery (meaning * we are in crash recovery). */ - if (!ArchiveRecoveryRequested) + if (!ArchiveRecoveryIsRequested()) return false; /* Check if we should stop as soon as reaching consistency */ @@ -5897,7 +5916,7 @@ recoveryStopsAfter(XLogReaderState *record) * Ignore recovery target settings when not in archive recovery (meaning * we are in crash recovery). */ - if (!ArchiveRecoveryRequested) + if (!ArchiveRecoveryIsRequested()) return false; info = XLogRecGetInfo(record) & ~XLR_INFO_MASK; @@ -6211,7 +6230,7 @@ recoveryApplyDelay(XLogReaderState *record) return false; /* nothing to do if crash recovery is requested */ - if (!ArchiveRecoveryRequested) + if (!ArchiveRecoveryIsRequested()) return false; /* @@ -6455,7 +6474,7 @@ CheckRequiredParameterValues(void) * For archive recovery, the WAL must be generated with at least 'replica' * wal_level. */ - if (ArchiveRecoveryRequested && ControlFile->wal_level == WAL_LEVEL_MINIMAL) + if (ArchiveRecoveryIsRequested() && ControlFile->wal_level == WAL_LEVEL_MINIMAL) { ereport(FATAL, (errmsg("WAL was generated with wal_level=minimal, cannot continue recovering"), @@ -6467,7 +6486,7 @@ CheckRequiredParameterValues(void) * For Hot Standby, the WAL must be generated with 'replica' mode, and we * must have at least as many backend slots as the primary. */ - if (ArchiveRecoveryRequested && EnableHotStandby) + if (ArchiveRecoveryIsRequested() && EnableHotStandby) { /* We ignore autovacuum_max_workers when we make this test. */ RecoveryRequiresIntParameter("max_connections", @@ -6633,7 +6652,7 @@ StartupXLOG(void) readRecoverySignalFile(); validateRecoveryParameters(); - if (ArchiveRecoveryRequested) + if (ArchiveRecoveryIsRequested()) { if (StandbyModeRequested) ereport(LOG, @@ -6666,7 +6685,7 @@ StartupXLOG(void) * Take ownership of the wakeup latch if we're going to sleep during * recovery. */ - if (ArchiveRecoveryRequested) + if (ArchiveRecoveryIsRequested()) OwnLatch(&XLogCtl->recoveryWakeupLatch); /* Set up XLOG reader facility */ @@ -6833,7 +6852,7 @@ StartupXLOG(void) * to minRecoveryPoint, up to backupEndPoint, or until we see an * end-of-backup record), and we can enter archive recovery directly. */ - if (ArchiveRecoveryRequested && + if (ArchiveRecoveryIsRequested() && (ControlFile->minRecoveryPoint != InvalidXLogRecPtr || ControlFile->backupEndRequired || ControlFile->backupEndPoint != InvalidXLogRecPtr || @@ -7063,7 +7082,7 @@ StartupXLOG(void) } else if (ControlFile->state != DB_SHUTDOWNED) InRecovery = true; - else if (ArchiveRecoveryRequested) + else if (ArchiveRecoveryIsRequested()) { /* force recovery due to presence of recovery signal file */ InRecovery = true; @@ -7229,7 +7248,7 @@ StartupXLOG(void) * control file and we've established a recovery snapshot from a * running-xacts WAL record. */ - if (ArchiveRecoveryRequested && EnableHotStandby) + if (ArchiveRecoveryIsRequested() && EnableHotStandby) { TransactionId *xids; int nxids; @@ -7646,7 +7665,7 @@ StartupXLOG(void) * This check is intentionally after the above log messages that * indicate how far recovery went. */ - if (ArchiveRecoveryRequested && + if (ArchiveRecoveryIsRequested() && recoveryTarget != RECOVERY_TARGET_UNSET && !reachedRecoveryTarget) ereport(FATAL, @@ -7674,7 +7693,7 @@ StartupXLOG(void) * We don't need the latch anymore. It's not strictly necessary to disown * it, but let's do it for the sake of tidiness. */ - if (ArchiveRecoveryRequested) + if (ArchiveRecoveryIsRequested()) DisownLatch(&XLogCtl->recoveryWakeupLatch); /* @@ -7725,7 +7744,7 @@ StartupXLOG(void) * crashes while an online backup is in progress. We must not treat * that as an error, or the database will refuse to start up. */ - if (ArchiveRecoveryRequested || ControlFile->backupEndRequired) + if (ArchiveRecoveryIsRequested() || ControlFile->backupEndRequired) { if (ControlFile->backupEndRequired) ereport(FATAL, @@ -7771,7 +7790,7 @@ StartupXLOG(void) * In a normal crash recovery, we can just extend the timeline we were in. */ PrevTimeLineID = ThisTimeLineID; - if (ArchiveRecoveryRequested) + if (ArchiveRecoveryIsRequested()) { char *reason; char recoveryPath[MAXPGPATH]; @@ -7899,7 +7918,7 @@ StartupXLOG(void) * after we're fully out of recovery mode and already accepting * queries. */ - if (ArchiveRecoveryRequested && IsUnderPostmaster && + if (ArchiveRecoveryIsRequested() && IsUnderPostmaster && LocalPromoteIsTriggered) { promoted = true; @@ -7924,7 +7943,7 @@ StartupXLOG(void) } } - if (ArchiveRecoveryRequested) + if (ArchiveRecoveryIsRequested()) { /* * And finally, execute the recovery_end_command, if any. @@ -8003,6 +8022,15 @@ StartupXLOG(void) XLogArchiveNotify(partialfname); } } + + /* + * Done with archive recovery request, clear the shared memory state + * which no longer needed. + */ + SpinLockAcquire(&XLogCtl->info_lck); + XLogCtl->SharedArchiveRecoveryRequested = false; + ArchiveRecoveryRequested = ARCHIVE_RECOVERY_REQUEST_UNKOWN; + SpinLockRelease(&XLogCtl->info_lck); } /* @@ -8263,6 +8291,32 @@ RecoveryInProgress(void) } } +/* + * Is the archive recovery is requested? + * + * If ArchiveRecoveryRequested is unknown, then it will be updated by checking + * shared memory. Like PromoteIsTriggered(), this works in any process that's + * connected to shared memory. + */ +bool +ArchiveRecoveryIsRequested(void) +{ + /* + * If not UNKNOWN, the ArchiveRecoveryRequested value either + * ARCHIVE_RECOVERY_REQUEST_YES => 1 or ARCHIVE_RECOVERY_REQUEST_NO => 0 + * which can be coerced to boolean true or false respectively. + */ + if (likely(ArchiveRecoveryRequested != ARCHIVE_RECOVERY_REQUEST_UNKOWN)) + return (bool) ArchiveRecoveryRequested; + + SpinLockAcquire(&XLogCtl->info_lck); + ArchiveRecoveryRequested = XLogCtl->SharedArchiveRecoveryRequested ? + ARCHIVE_RECOVERY_REQUEST_YES : ARCHIVE_RECOVERY_REQUEST_NO; + SpinLockRelease(&XLogCtl->info_lck); + + return (bool) ArchiveRecoveryRequested; +} + /* * Returns current recovery state from shared memory. * @@ -10174,7 +10228,7 @@ xlog_redo(XLogReaderState *record) * record, the backup was canceled and the end-of-backup record will * never arrive. */ - if (ArchiveRecoveryRequested && + if (ArchiveRecoveryIsRequested() && !XLogRecPtrIsInvalid(ControlFile->backupStartPoint) && XLogRecPtrIsInvalid(ControlFile->backupEndPoint)) ereport(PANIC, @@ -12176,7 +12230,7 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen, * Request a restartpoint if we've replayed too much xlog since the * last one. */ - if (ArchiveRecoveryRequested && IsUnderPostmaster) + if (ArchiveRecoveryIsRequested() && IsUnderPostmaster) { if (XLogCheckpointNeeded(readSegNo)) { diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c index 26b023e754b..756d03adb6f 100644 --- a/src/backend/access/transam/xlogarchive.c +++ b/src/backend/access/transam/xlogarchive.c @@ -67,7 +67,7 @@ RestoreArchivedFile(char *path, const char *xlogfname, * Ignore restore_command when not in archive recovery (meaning we are in * crash recovery). */ - if (!ArchiveRecoveryRequested) + if (!ArchiveRecoveryIsRequested()) goto not_available; /* In standby mode, restore_command might not be supplied */ diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 0a8ede700de..0a356c98d1f 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -289,6 +289,7 @@ extern RecoveryPauseState GetRecoveryPauseState(void); extern void SetRecoveryPause(bool recoveryPause); extern TimestampTz GetLatestXTime(void); extern TimestampTz GetCurrentChunkReplayStartTime(void); +extern bool ArchiveRecoveryIsRequested(void); extern void UpdateControlFile(void); extern uint64 GetSystemIdentifier(void); diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h index 3b5eceff658..2051953d404 100644 --- a/src/include/access/xlog_internal.h +++ b/src/include/access/xlog_internal.h @@ -319,7 +319,6 @@ extern void GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli); * Exported for the functions in timeline.c and xlogarchive.c. Only valid * in the startup process. */ -extern bool ArchiveRecoveryRequested; extern bool InArchiveRecovery; extern bool StandbyMode; extern char *recoveryRestoreCommand; -- 2.18.0
From b5e6bd6fd572b23a087adfcc448e180e91f6d4ce Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Mon, 20 Sep 2021 07:40:07 -0400 Subject: [PATCH v35 3/5] Refactor some end-of-recovery code out of StartupXLOG(). Moved the code that performs whether to write a checkpoint or an end-of-recovery record into PerformRecoveryXlogAction(). Also create a new function CleanupAfterArchiveRecovery() to perform a few tasks that we want to do after we've actually exited archive recovery but before we start accepting new WAL writes. This is straightforward code movement to make StartupXLOG() a little bit shorter and a little bit easier to understand. Robert Haas, with modifications by Amul Sul. --- src/backend/access/transam/xlog.c | 285 ++++++++++++++++-------------- 1 file changed, 154 insertions(+), 131 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index c9d5bf9a72c..12f7e080d3e 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -892,6 +892,7 @@ static MemoryContext walDebugCxt = NULL; static void readRecoverySignalFile(void); static void validateRecoveryParameters(void); static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog); +static void CleanupAfterArchiveRecovery(void); static bool recoveryStopsBefore(XLogReaderState *record); static bool recoveryStopsAfter(XLogReaderState *record); static char *getRecoveryStopReason(void); @@ -937,6 +938,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force); static XLogRecord *ReadRecord(XLogReaderState *xlogreader, int emode, bool fetching_ckpt); static void CheckRecoveryConsistency(void); +static bool PerformRecoveryXLogAction(void); static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int whichChkpt, bool report); static bool rescanLatestTimeLine(void); @@ -5713,6 +5715,101 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog) (errmsg("archive recovery complete"))); } +/* + * Perform cleanup actions at the conclusion of archive recovery. + */ +static void +CleanupAfterArchiveRecovery(void) +{ + XLogRecPtr EndOfLog; + + /* + * Execute the recovery_end_command, if any. + */ + if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0) + ExecuteRecoveryCommand(recoveryEndCommand, + "recovery_end_command", + true); + + /* + * We switched to a new timeline. Clean up segments on the old timeline. + * + * If there are any higher-numbered segments on the old timeline, remove + * them. They might contain valid WAL, but they might also be pre-allocated + * files containing garbage. In any case, they are not part of the new + * timeline's history so we don't need them. + */ + (void) GetLastSegSwitchData(&EndOfLog); + RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID); + + /* + * If the switch happened in the middle of a segment, what to do with the + * last, partial segment on the old timeline? If we don't archive it, and + * the server that created the WAL never archives it either (e.g. because it + * was hit by a meteor), it will never make it to the archive. That's OK + * from our point of view, because the new segment that we created with the + * new TLI contains all the WAL from the old timeline up to the switch + * point. But if you later try to do PITR to the "missing" WAL on the old + * timeline, recovery won't find it in the archive. It's physically present + * in the new file with new TLI, but recovery won't look there when it's + * recovering to the older timeline. On the other hand, if we archive the + * partial segment, and the original server on that timeline is still + * running and archives the completed version of the same segment later, it + * will fail. (We used to do that in 9.4 and below, and it caused such + * problems). + * + * As a compromise, we rename the last segment with the .partial suffix, and + * archive it. Archive recovery will never try to read .partial segments, so + * they will normally go unused. But in the odd PITR case, the administrator + * can copy them manually to the pg_wal directory (removing the suffix). + * They can be useful in debugging, too. + * + * If a .done or .ready file already exists for the old timeline, however, + * we had already determined that the segment is complete, so we can let it + * be archived normally. (In particular, if it was restored from the archive + * to begin with, it's expected to have a .done file). + */ + if (XLogSegmentOffset(EndOfLog, wal_segment_size) != 0 && + XLogArchivingActive()) + { + char origfname[MAXFNAMELEN]; + XLogSegNo endLogSegNo; + TimeLineID EndOfLogTLI = XLogCtl->replayEndTLI; + + XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size); + XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size); + + if (!XLogArchiveIsReadyOrDone(origfname)) + { + char origpath[MAXPGPATH]; + char partialfname[MAXFNAMELEN]; + char partialpath[MAXPGPATH]; + + XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size); + snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname); + snprintf(partialpath, MAXPGPATH, "%s.partial", origpath); + + /* + * Make sure there's no .done or .ready file for the .partial + * file. + */ + XLogArchiveCleanup(partialfname); + + durable_rename(origpath, partialpath, ERROR); + XLogArchiveNotify(partialfname); + } + } + + /* + * Done with archive recovery request, clear the shared memory state which + * no longer needed. + */ + SpinLockAcquire(&XLogCtl->info_lck); + XLogCtl->SharedArchiveRecoveryRequested = false; + ArchiveRecoveryRequested = ARCHIVE_RECOVERY_REQUEST_UNKOWN; + SpinLockRelease(&XLogCtl->info_lck); +} + /* * Extract timestamp from WAL record. * @@ -7907,139 +8004,11 @@ StartupXLOG(void) * shut down cleanly, which been through recovery. */ if (ControlFile->state != DB_SHUTDOWNED) - { - /* - * Perform a checkpoint to update all our recovery activity to disk. - * - * Note that we write a shutdown checkpoint rather than an on-line - * one. This is not particularly critical, but since we may be - * assigning a new TLI, using a shutdown checkpoint allows us to have - * the rule that TLI only changes in shutdown checkpoints, which - * allows some extra error checking in xlog_redo. - * - * In promotion, only create a lightweight end-of-recovery record - * instead of a full checkpoint. A checkpoint is requested later, - * after we're fully out of recovery mode and already accepting - * queries. - */ - if (ArchiveRecoveryIsRequested() && IsUnderPostmaster && - PromoteIsTriggered()) - { - promoted = true; - - /* - * Insert a special WAL record to mark the end of recovery, since - * we aren't doing a checkpoint. That means that the checkpointer - * process may likely be in the middle of a time-smoothed - * restartpoint and could continue to be for minutes after this. - * That sounds strange, but the effect is roughly the same and it - * would be stranger to try to come out of the restartpoint and - * then checkpoint. We request a checkpoint later anyway, just for - * safety. - */ - CreateEndOfRecoveryRecord(); - } - else - { - RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY | - CHECKPOINT_IMMEDIATE | - CHECKPOINT_WAIT); - } - } + promoted = PerformRecoveryXLogAction(); + /* If this is archive recovery, perform post-recovery cleanup actions. */ if (ArchiveRecoveryIsRequested()) - { - XLogRecPtr EndOfLog; - - /* - * And finally, execute the recovery_end_command, if any. - */ - if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0) - ExecuteRecoveryCommand(recoveryEndCommand, - "recovery_end_command", - true); - - /* - * We switched to a new timeline. Clean up segments on the old - * timeline. - * - * If there are any higher-numbered segments on the old timeline, - * remove them. They might contain valid WAL, but they might also be - * pre-allocated files containing garbage. In any case, they are not - * part of the new timeline's history so we don't need them. - */ - (void) GetLastSegSwitchData(&EndOfLog); - RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID); - - /* - * If the switch happened in the middle of a segment, what to do with - * the last, partial segment on the old timeline? If we don't archive - * it, and the server that created the WAL never archives it either - * (e.g. because it was hit by a meteor), it will never make it to the - * archive. That's OK from our point of view, because the new segment - * that we created with the new TLI contains all the WAL from the old - * timeline up to the switch point. But if you later try to do PITR to - * the "missing" WAL on the old timeline, recovery won't find it in - * the archive. It's physically present in the new file with new TLI, - * but recovery won't look there when it's recovering to the older - * timeline. On the other hand, if we archive the partial segment, and - * the original server on that timeline is still running and archives - * the completed version of the same segment later, it will fail. (We - * used to do that in 9.4 and below, and it caused such problems). - * - * As a compromise, we rename the last segment with the .partial - * suffix, and archive it. Archive recovery will never try to read - * .partial segments, so they will normally go unused. But in the odd - * PITR case, the administrator can copy them manually to the pg_wal - * directory (removing the suffix). They can be useful in debugging, - * too. - * - * If a .done or .ready file already exists for the old timeline, - * however, we had already determined that the segment is complete, so - * we can let it be archived normally. (In particular, if it was - * restored from the archive to begin with, it's expected to have a - * .done file). - */ - if (XLogSegmentOffset(EndOfLog, wal_segment_size) != 0 && - XLogArchivingActive()) - { - char origfname[MAXFNAMELEN]; - XLogSegNo endLogSegNo; - TimeLineID EndOfLogTLI = XLogCtl->replayEndTLI; - - XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size); - XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size); - - if (!XLogArchiveIsReadyOrDone(origfname)) - { - char origpath[MAXPGPATH]; - char partialfname[MAXFNAMELEN]; - char partialpath[MAXPGPATH]; - - XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size); - snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname); - snprintf(partialpath, MAXPGPATH, "%s.partial", origpath); - - /* - * Make sure there's no .done or .ready file for the .partial - * file. - */ - XLogArchiveCleanup(partialfname); - - durable_rename(origpath, partialpath, ERROR); - XLogArchiveNotify(partialfname); - } - } - - /* - * Done with archive recovery request, clear the shared memory state - * which no longer needed. - */ - SpinLockAcquire(&XLogCtl->info_lck); - XLogCtl->SharedArchiveRecoveryRequested = false; - ArchiveRecoveryRequested = ARCHIVE_RECOVERY_REQUEST_UNKOWN; - SpinLockRelease(&XLogCtl->info_lck); - } + CleanupAfterArchiveRecovery(); /* * Preallocate additional log files, if wanted. @@ -8243,6 +8212,60 @@ CheckRecoveryConsistency(void) } } +/* + * Perform whatever XLOG actions are necessary at end of REDO. + * + * The goal here is to make sure that we'll be able to recover properly if + * we crash again. If we choose to write a checkpoint, we'll write a shutdown + * checkpoint rather than an on-line one. This is not particularly critical, + * but since we may be assigning a new TLI, using a shutdown checkpoint allows + * us to have the rule that TLI only changes in shutdown checkpoints, which + * allows some extra error checking in xlog_redo. + */ +static bool +PerformRecoveryXLogAction(void) +{ + bool promoted = false; + + /* + * Perform a checkpoint to update all our recovery activity to disk. + * + * Note that we write a shutdown checkpoint rather than an on-line one. This + * is not particularly critical, but since we may be assigning a new TLI, + * using a shutdown checkpoint allows us to have the rule that TLI only + * changes in shutdown checkpoints, which allows some extra error checking + * in xlog_redo. + * + * In promotion, only create a lightweight end-of-recovery record instead of + * a full checkpoint. A checkpoint is requested later, after we're fully out + * of recovery mode and already accepting queries. + */ + if (ArchiveRecoveryIsRequested() && IsUnderPostmaster && + PromoteIsTriggered()) + { + promoted = true; + + /* + * Insert a special WAL record to mark the end of recovery, since we + * aren't doing a checkpoint. That means that the checkpointer process + * may likely be in the middle of a time-smoothed restartpoint and could + * continue to be for minutes after this. That sounds strange, but the + * effect is roughly the same and it would be stranger to try to come + * out of the restartpoint and then checkpoint. We request a checkpoint + * later anyway, just for safety. + */ + CreateEndOfRecoveryRecord(); + } + else + { + RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY | + CHECKPOINT_IMMEDIATE | + CHECKPOINT_WAIT); + } + + return promoted; +} + /* * Is the system still in recovery? * -- 2.18.0