On Mon, Oct 4, 2021 at 1:57 PM Rushabh Lathia <rushabh.lat...@gmail.com> wrote: > > > > On Fri, Oct 1, 2021 at 2:29 AM Robert Haas <robertmh...@gmail.com> wrote: >> >> On Thu, Sep 30, 2021 at 7:59 AM Amul Sul <sula...@gmail.com> wrote: >> > To find the value of InRecovery after we clear it, patch still uses >> > ControlFile's DBState, but now the check condition changed to a more >> > specific one which is less confusing. >> > >> > In casual off-list discussion, the point was made to check >> > SharedRecoveryState to find out the InRecovery value afterward, and >> > check that using RecoveryInProgress(). But we can't depend on >> > SharedRecoveryState because at the start it gets initialized to >> > RECOVERY_STATE_CRASH irrespective of InRecovery that happens later. >> > Therefore, we can't use RecoveryInProgress() which always returns >> > true if SharedRecoveryState != RECOVERY_STATE_DONE. >> >> Uh, this change has crept into 0002, but it should be in 0004 with the >> rest of the changes to remove dependencies on variables specific to >> the startup process. Like I said before, we should really be trying to >> separate code movement from functional changes.
Well, I have to replace the InRecovery flag in that patch since we are moving code past to the point where the InRecovery flag gets cleared. If I don't do, then the 0002 patch would be wrong since InRecovery is always false, and behaviour won't be the same as it was before that patch. >> Also, 0002 doesn't >> actually apply for me. Did you generate these patches with 'git >> format-patch'? >> >> [rhaas pgsql]$ patch -p1 < >> ~/Downloads/v36-0001-Refactor-some-end-of-recovery-code-out-of-Startu.patch >> patching file src/backend/access/transam/xlog.c >> Hunk #1 succeeded at 889 (offset 9 lines). >> Hunk #2 succeeded at 939 (offset 12 lines). >> Hunk #3 succeeded at 5734 (offset 37 lines). >> Hunk #4 succeeded at 8038 (offset 70 lines). >> Hunk #5 succeeded at 8248 (offset 70 lines). >> [rhaas pgsql]$ patch -p1 < >> ~/Downloads/v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch >> patching file src/backend/access/transam/xlog.c >> Reversed (or previously applied) patch detected! Assume -R? [n] >> Apply anyway? [n] y >> Hunk #1 FAILED at 7954. >> Hunk #2 succeeded at 8079 (offset 70 lines). >> 1 out of 2 hunks FAILED -- saving rejects to file >> src/backend/access/transam/xlog.c.rej >> [rhaas pgsql]$ git reset --hard >> HEAD is now at b484ddf4d2 Treat ETIMEDOUT as indicating a >> non-recoverable connection failure. >> [rhaas pgsql]$ patch -p1 < >> ~/Downloads/v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch >> patching file src/backend/access/transam/xlog.c >> Reversed (or previously applied) patch detected! Assume -R? [n] >> Apply anyway? [n] >> Skipping patch. >> 2 out of 2 hunks ignored -- saving rejects to file >> src/backend/access/transam/xlog.c.rej >> > > I tried to apply the patch on the master branch head and it's failing > with conflicts. > Thanks, Rushabh, for the quick check, I have attached a rebased version for the latest master head commit # f6b5d05ba9a. > Later applied patch on below commit and it got applied cleanly: > > commit 7d1aa6bf1c27bf7438179db446f7d1e72ae093d0 > Author: Tom Lane <t...@sss.pgh.pa.us> > Date: Mon Sep 27 18:48:01 2021 -0400 > > Re-enable contrib/bloom's TAP tests. > > rushabh@rushabh:postgresql$ git apply > v36-0001-Refactor-some-end-of-recovery-code-out-of-Startu.patch > rushabh@rushabh:postgresql$ git apply > v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch > rushabh@rushabh:postgresql$ git apply > v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch > v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:34: space > before tab in indent. > /* > v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:38: space > before tab in indent. > */ > v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:39: space > before tab in indent. > Insert->fullPageWrites = lastFullPageWrites; > warning: 3 lines add whitespace errors. > rushabh@rushabh:postgresql$ git apply > v36-0004-Remove-dependencies-on-startup-process-specifica.patch > > There are whitespace errors on patch 0003. > Fixed. >> >> It seems to me that the approach you're pursuing here can't work, >> because the long-term goal is to get to a place where, if the system >> starts up read-only, XLogAcceptWrites() might not be called until >> later, after StartupXLOG() has exited. But in that case the control >> file state would be DB_IN_PRODUCTION. But my idea of using >> RecoveryInProgress() won't work either, because we set >> RECOVERY_STATE_DONE just after we set DB_IN_PRODUCTION. Put >> differently, the question we want to answer is not "are we in recovery >> now?" but "did we perform recovery?". After studying the code a bit, I >> think a good test might be >> !XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr). If InRecovery >> gets set to true, then we're certain to enter the if (InRecovery) >> block that contains the main redo loop. And that block unconditionally >> does XLogCtl->lastReplayedEndRecPtr = XLogCtl->replayEndRecPtr. I >> think that replayEndRecPtr can't be 0 because it's supposed to >> represent the record we're pretending to have last replayed, as >> explained by the comments. And while lastReplayedEndRecPtr will get >> updated later as we replay more records, I think it will never be set >> back to 0. It's only going to increase, as we replay more records. On >> the other hand if InRecovery = false then we'll never change it, and >> it seems that it starts out as 0. >> Understood, used lastReplayedEndRecPtr but in 0002 patch for the aforesaid reason. >> I was hoping to have more time today to comment on 0004, but the day >> seems to have gotten away from me. One quick thought is that it looks >> a bit strange to be getting EndOfLog from GetLastSegSwitchData() which >> returns lastSegSwitchLSN while getting EndOfLogTLI from replayEndTLI >> ... because there's also replayEndRecPtr, which seems to go with >> replayEndTLI. It feels like we should use a source for the TLI that >> clearly matches the source for the corresponding LSN, unless there's >> some super-good reason to do otherwise. Agreed, that would be the right thing, but on the latest master head that might not be the right thing to use because of commit # ff9f111bce24 that has introduced the following code that changes the EndOfLog that could be different from replayEndRecPtr: /* * Actually, if WAL ended in an incomplete record, skip the parts that * made it through and start writing after the portion that persisted. * (It's critical to first write an OVERWRITE_CONTRECORD message, which * we'll do as soon as we're open for writing new WAL.) */ if (!XLogRecPtrIsInvalid(missingContrecPtr)) { Assert(!XLogRecPtrIsInvalid(abortedRecPtr)); EndOfLog = missingContrecPtr; } With this commit, we have got two new global variables. First, missingContrecPtr is an EndOfLog which gets stored in shared memory at few places, and the other one abortedRecPtr that is needed in XLogAcceptWrite(), which I have exported into shared memory. Regards, Amul
From de79f7f46d101768269afa360f7183302eee9551 Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Thu, 30 Sep 2021 06:29:06 -0400 Subject: [PATCH v37 4/4] Remove dependencies on startup-process specifical variables. To make XLogAcceptWrites(), need to dependency on few global and local variable spcific to startup process. Global variables are abortedRecPtr, ArchiveRecoveryRequested and LocalPromoteIsTriggered, whereas LocalPromoteIsTriggered can be accessed in any other process using existing PromoteIsTriggered(). ArchiveRecoveryRequested and abortedRecPtr are made accessible by copying into shared memory. XLogAcceptWrites() accepts two argument as EndOfLogTLI and EndOfLog which are local to StartupXLOG(). Instead of passing as an argument XLogCtl->replayEndTLI and XLogCtl->lastSegSwitchLSN from the shared memory can be used as an replacement to EndOfLogTLI and EndOfLog respectively. XLogCtl->lastSegSwitchLSN is not going to change until we use it. That changes only when the current WAL segment gets full which never going to happen because of two reasons, first WAL writes are disabled for other processes until XLogAcceptWrites() finishes and other reasons before use of lastSegSwitchLSN, XLogAcceptWrites() is writes fix size wal records as full-page write and record for either recovery end or checkpoint which not going to fill up the 16MB wal segment. EndOfLogTLI in the StartupXLOG() is the timeline ID of the last record that xlogreader reads, but this xlogreader was simply re-fetching the last record which we have replied in redo loop if it was in recovery, if not in recovery, we don't need to worry since this value is needed only in case of ArchiveRecoveryRequested = true, which implicitly forces redo and sets XLogCtl->replayEndTLI value. --- src/backend/access/transam/xlog.c | 63 +++++++++++++++++++++++-------- 1 file changed, 48 insertions(+), 15 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 5abb7c5e542..2dd81af8ca9 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -668,6 +668,13 @@ typedef struct XLogCtlData */ bool SharedPromoteIsTriggered; + /* + * SharedArchiveRecoveryRequested exports the value of the + * ArchiveRecoveryRequested flag to be share which is otherwise valid only + * in the startup process. + */ + bool SharedArchiveRecoveryRequested; + /* * WalWriterSleeping indicates whether the WAL writer is currently in * low-power mode (and hence should be nudged if an async commit occurs). @@ -717,6 +724,13 @@ typedef struct XLogCtlData /* timestamp of last COMMIT/ABORT record replayed (or being replayed) */ TimestampTz recoveryLastXTime; + /* + * SharedAbortedRecPtr exports abortedRecPtr to be shared with another + * process to write OVERWRITE_CONTRECORD message, if WAL writes are not + * permitted in the current process which reads that. + */ + XLogRecPtr SharedAbortedRecPtr; + /* * timestamp of when we started replaying the current chunk of WAL data, * only relevant for replication or archive recovery @@ -889,8 +903,7 @@ static MemoryContext walDebugCxt = NULL; static void readRecoverySignalFile(void); static void validateRecoveryParameters(void); static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog); -static void CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, - XLogRecPtr EndOfLog); +static void CleanupAfterArchiveRecovery(void); static bool recoveryStopsBefore(XLogReaderState *record); static bool recoveryStopsAfter(XLogReaderState *record); static char *getRecoveryStopReason(void); @@ -939,7 +952,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force); static XLogRecord *ReadRecord(XLogReaderState *xlogreader, int emode, bool fetching_ckpt); static void CheckRecoveryConsistency(void); -static bool XLogAcceptWrites(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog); +static bool XLogAcceptWrites(void); static bool PerformRecoveryXLogAction(void); static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int whichChkpt, bool report); @@ -5267,6 +5280,7 @@ XLOGShmemInit(void) XLogCtl->SharedHotStandbyActive = false; XLogCtl->InstallXLogFileSegmentActive = false; XLogCtl->SharedPromoteIsTriggered = false; + XLogCtl->SharedArchiveRecoveryRequested = false; XLogCtl->WalWriterSleeping = false; SpinLockInit(&XLogCtl->Insert.insertpos_lck); @@ -5548,6 +5562,11 @@ readRecoverySignalFile(void) ereport(FATAL, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("standby mode is not supported by single-user servers"))); + + /* + * Remember archive recovery request in shared memory state. + */ + XLogCtl->SharedArchiveRecoveryRequested = ArchiveRecoveryRequested; } static void @@ -5739,8 +5758,10 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog) * Perform cleanup actions at the conclusion of archive recovery. */ static void -CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog) +CleanupAfterArchiveRecovery(void) { + XLogRecPtr EndOfLog; + /* * Execute the recovery_end_command, if any. */ @@ -5757,6 +5778,7 @@ CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog) * files containing garbage. In any case, they are not part of the new * timeline's history so we don't need them. */ + (void) GetLastSegSwitchData(&EndOfLog); RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID); /* @@ -5791,6 +5813,7 @@ CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog) { char origfname[MAXFNAMELEN]; XLogSegNo endLogSegNo; + TimeLineID EndOfLogTLI = XLogCtl->replayEndTLI; XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size); XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size); @@ -7965,6 +7988,18 @@ StartupXLOG(void) { Assert(!XLogRecPtrIsInvalid(abortedRecPtr)); EndOfLog = missingContrecPtr; + + /* + * Remember broken record pointer in shared memory state. This process + * might unable to write an OVERWRITE_CONTRECORD message because of WAL + * write restriction. Storing in shared memory helps that get written + * later by another process when WAL writes enabled. + */ + XLogCtl->SharedAbortedRecPtr = abortedRecPtr; + + /* Shared memory value will be used further */ + abortedRecPtr = InvalidXLogRecPtr; + missingContrecPtr = InvalidXLogRecPtr; } /* @@ -8071,7 +8106,7 @@ StartupXLOG(void) Insert->fullPageWrites = lastFullPageWrites; /* Prepare to accept WAL writes. */ - promoted = XLogAcceptWrites(EndOfLogTLI, EndOfLog); + promoted = XLogAcceptWrites(); /* * All done with end-of-recovery actions. @@ -8131,19 +8166,17 @@ StartupXLOG(void) * Prepare to accept WAL writes. */ static bool -XLogAcceptWrites(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog) +XLogAcceptWrites(void) { bool promoted = false; LocalSetXLogInsertAllowed(); /* If necessary, write overwrite-contrecord before doing anything else */ - if (!XLogRecPtrIsInvalid(abortedRecPtr)) + if (!XLogRecPtrIsInvalid(XLogCtl->SharedAbortedRecPtr)) { - Assert(!XLogRecPtrIsInvalid(missingContrecPtr)); - CreateOverwriteContrecordRecord(abortedRecPtr); - abortedRecPtr = InvalidXLogRecPtr; - missingContrecPtr = InvalidXLogRecPtr; + CreateOverwriteContrecordRecord(XLogCtl->SharedAbortedRecPtr); + XLogCtl->SharedAbortedRecPtr = InvalidXLogRecPtr; } /* Write an XLOG_FPW_CHANGE record */ @@ -8161,8 +8194,8 @@ XLogAcceptWrites(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog) promoted = PerformRecoveryXLogAction(); /* If this is archive recovery, perform post-recovery cleanup actions. */ - if (ArchiveRecoveryRequested) - CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog); + if (XLogCtl->SharedArchiveRecoveryRequested) + CleanupAfterArchiveRecovery(); /* * If any of the critical GUCs have changed, log them before we allow @@ -8304,8 +8337,8 @@ PerformRecoveryXLogAction(void) * a full checkpoint. A checkpoint is requested later, after we're fully out * of recovery mode and already accepting queries. */ - if (ArchiveRecoveryRequested && IsUnderPostmaster && - LocalPromoteIsTriggered) + if (XLogCtl->SharedArchiveRecoveryRequested && IsUnderPostmaster && + PromoteIsTriggered()) { promoted = true; -- 2.18.0
From 3208b3379eb21f97157022d524c1df2d75ab5230 Mon Sep 17 00:00:00 2001 From: Robert Haas <rhaas@postgresql.org> Date: Fri, 23 Jul 2021 14:27:51 -0400 Subject: [PATCH v37 2/4] Postpone some end-of-recovery operations relating to allowing WAL. Previously, moved the code that performs whether to write a checkpoint or an end-of-recovery record into PerformRecoveryXlogAction(), and code performs post-archive-recovery into CleanupAfterArchiveRecovery(), but called both the functions from the same place. Now postpone that stuff until after we clear InRecovery and shut down the XLogReader. We do find out of InRecovery value afterward by looking XLogCtl->lastReplayedEndRecPtr, that will be only get set inside the REDO loop. This is preparatory work for a future patch that wants to allow recovery to end at one time and only later start to allow WAL writes. The steps that themselves write WAL clearly shouldn't happen before we're ready to accept WAL writes, and it seems best for now to keep the steps performed by CleanupAfterArchiveRecovery() at the same point relative to the surrounding steps. We assume (hopefully correctly) that the user doesn't want recovery_end_command to run until we're committed to writing WAL on the new timeline. Until then, the machine is still usable as a standby on the old timeline. Aside from the value of this patch as preparatory work, this order of operations actually seems more logical, since it means we don't actually write any WAL until after exiting recovery. Robert Haas, with modifications by Amul Sul. --- src/backend/access/transam/xlog.c | 62 +++++++++++++++++-------------- 1 file changed, 34 insertions(+), 28 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 7c258465780..cc08d8a475c 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8018,34 +8018,6 @@ StartupXLOG(void) XLogCtl->LogwrtRqst.Write = EndOfLog; XLogCtl->LogwrtRqst.Flush = EndOfLog; - LocalSetXLogInsertAllowed(); - - /* If necessary, write overwrite-contrecord before doing anything else */ - if (!XLogRecPtrIsInvalid(abortedRecPtr)) - { - Assert(!XLogRecPtrIsInvalid(missingContrecPtr)); - CreateOverwriteContrecordRecord(abortedRecPtr); - abortedRecPtr = InvalidXLogRecPtr; - missingContrecPtr = InvalidXLogRecPtr; - } - - /* - * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE - * record before resource manager writes cleanup WAL records or checkpoint - * record is written. - */ - Insert->fullPageWrites = lastFullPageWrites; - UpdateFullPageWrites(); - LocalXLogInsertAllowed = -1; - - /* Emit checkpoint or end-of-recovery record in XLOG, if required. */ - if (InRecovery) - promoted = PerformRecoveryXLogAction(); - - /* If this is archive recovery, perform post-recovery cleanup actions. */ - if (ArchiveRecoveryRequested) - CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog); - /* * Preallocate additional log files, if wanted. */ @@ -8090,6 +8062,40 @@ StartupXLOG(void) } XLogReaderFree(xlogreader); + LocalSetXLogInsertAllowed(); + + /* If necessary, write overwrite-contrecord before doing anything else */ + if (!XLogRecPtrIsInvalid(abortedRecPtr)) + { + Assert(!XLogRecPtrIsInvalid(missingContrecPtr)); + CreateOverwriteContrecordRecord(abortedRecPtr); + abortedRecPtr = InvalidXLogRecPtr; + missingContrecPtr = InvalidXLogRecPtr; + } + + /* + * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE + * record before resource manager writes cleanup WAL records or checkpoint + * record is written. + */ + Insert->fullPageWrites = lastFullPageWrites; + UpdateFullPageWrites(); + LocalXLogInsertAllowed = -1; + + /* + * Emit checkpoint or end-of-recovery record in XLOG, if the server has been + * through the archive or the crash recovery. + * + * If the recovery is performed lastReplayedEndRecPtr will always be a valid + * record pointer that never changes after REDO loop. + */ + if (!XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr)) + promoted = PerformRecoveryXLogAction(); + + /* If this is archive recovery, perform post-recovery cleanup actions. */ + if (ArchiveRecoveryRequested) + CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog); + /* * If any of the critical GUCs have changed, log them before we allow * backends to write WAL. -- 2.18.0
From 01731a5b955535b619f9fec887d25049e3137174 Mon Sep 17 00:00:00 2001 From: Robert Haas <rhaas@postgresql.org> Date: Fri, 23 Jul 2021 13:07:56 -0400 Subject: [PATCH v37 1/4] Refactor some end-of-recovery code out of StartupXLOG(). Moved the code that performs whether to write a checkpoint or an end-of-recovery record into PerformRecoveryXlogAction(). Also create a new function CleanupAfterArchiveRecovery() to perform a few tasks that we want to do after we've actually exited archive recovery but before we start accepting new WAL writes. This is straightforward code movement to make StartupXLOG() a little bit shorter and a little bit easier to understand. Robert Haas, with modifications by Amul Sul. --- src/backend/access/transam/xlog.c | 261 ++++++++++++++++-------------- 1 file changed, 143 insertions(+), 118 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index eddb13d13a7..7c258465780 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -889,6 +889,8 @@ static MemoryContext walDebugCxt = NULL; static void readRecoverySignalFile(void); static void validateRecoveryParameters(void); static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog); +static void CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, + XLogRecPtr EndOfLog); static bool recoveryStopsBefore(XLogReaderState *record); static bool recoveryStopsAfter(XLogReaderState *record); static char *getRecoveryStopReason(void); @@ -937,6 +939,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force); static XLogRecord *ReadRecord(XLogReaderState *xlogreader, int emode, bool fetching_ckpt); static void CheckRecoveryConsistency(void); +static bool PerformRecoveryXLogAction(void); static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int whichChkpt, bool report); static bool rescanLatestTimeLine(void); @@ -5731,6 +5734,88 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog) (errmsg("archive recovery complete"))); } +/* + * Perform cleanup actions at the conclusion of archive recovery. + */ +static void +CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog) +{ + /* + * Execute the recovery_end_command, if any. + */ + if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0) + ExecuteRecoveryCommand(recoveryEndCommand, + "recovery_end_command", + true); + + /* + * We switched to a new timeline. Clean up segments on the old timeline. + * + * If there are any higher-numbered segments on the old timeline, remove + * them. They might contain valid WAL, but they might also be pre-allocated + * files containing garbage. In any case, they are not part of the new + * timeline's history so we don't need them. + */ + RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID); + + /* + * If the switch happened in the middle of a segment, what to do with the + * last, partial segment on the old timeline? If we don't archive it, and + * the server that created the WAL never archives it either (e.g. because it + * was hit by a meteor), it will never make it to the archive. That's OK + * from our point of view, because the new segment that we created with the + * new TLI contains all the WAL from the old timeline up to the switch + * point. But if you later try to do PITR to the "missing" WAL on the old + * timeline, recovery won't find it in the archive. It's physically present + * in the new file with new TLI, but recovery won't look there when it's + * recovering to the older timeline. On the other hand, if we archive the + * partial segment, and the original server on that timeline is still + * running and archives the completed version of the same segment later, it + * will fail. (We used to do that in 9.4 and below, and it caused such + * problems). + * + * As a compromise, we rename the last segment with the .partial suffix, and + * archive it. Archive recovery will never try to read .partial segments, so + * they will normally go unused. But in the odd PITR case, the administrator + * can copy them manually to the pg_wal directory (removing the suffix). + * They can be useful in debugging, too. + * + * If a .done or .ready file already exists for the old timeline, however, + * we had already determined that the segment is complete, so we can let it + * be archived normally. (In particular, if it was restored from the archive + * to begin with, it's expected to have a .done file). + */ + if (XLogSegmentOffset(EndOfLog, wal_segment_size) != 0 && + XLogArchivingActive()) + { + char origfname[MAXFNAMELEN]; + XLogSegNo endLogSegNo; + + XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size); + XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size); + + if (!XLogArchiveIsReadyOrDone(origfname)) + { + char origpath[MAXPGPATH]; + char partialfname[MAXFNAMELEN]; + char partialpath[MAXPGPATH]; + + XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size); + snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname); + snprintf(partialpath, MAXPGPATH, "%s.partial", origpath); + + /* + * Make sure there's no .done or .ready file for the .partial + * file. + */ + XLogArchiveCleanup(partialfname); + + durable_rename(origpath, partialpath, ERROR); + XLogArchiveNotify(partialfname); + } + } +} + /* * Extract timestamp from WAL record. * @@ -7953,127 +8038,13 @@ StartupXLOG(void) UpdateFullPageWrites(); LocalXLogInsertAllowed = -1; + /* Emit checkpoint or end-of-recovery record in XLOG, if required. */ if (InRecovery) - { - /* - * Perform a checkpoint to update all our recovery activity to disk. - * - * Note that we write a shutdown checkpoint rather than an on-line - * one. This is not particularly critical, but since we may be - * assigning a new TLI, using a shutdown checkpoint allows us to have - * the rule that TLI only changes in shutdown checkpoints, which - * allows some extra error checking in xlog_redo. - * - * In promotion, only create a lightweight end-of-recovery record - * instead of a full checkpoint. A checkpoint is requested later, - * after we're fully out of recovery mode and already accepting - * queries. - */ - if (ArchiveRecoveryRequested && IsUnderPostmaster && - LocalPromoteIsTriggered) - { - promoted = true; - - /* - * Insert a special WAL record to mark the end of recovery, since - * we aren't doing a checkpoint. That means that the checkpointer - * process may likely be in the middle of a time-smoothed - * restartpoint and could continue to be for minutes after this. - * That sounds strange, but the effect is roughly the same and it - * would be stranger to try to come out of the restartpoint and - * then checkpoint. We request a checkpoint later anyway, just for - * safety. - */ - CreateEndOfRecoveryRecord(); - } - else - { - RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY | - CHECKPOINT_IMMEDIATE | - CHECKPOINT_WAIT); - } - } + promoted = PerformRecoveryXLogAction(); + /* If this is archive recovery, perform post-recovery cleanup actions. */ if (ArchiveRecoveryRequested) - { - /* - * And finally, execute the recovery_end_command, if any. - */ - if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0) - ExecuteRecoveryCommand(recoveryEndCommand, - "recovery_end_command", - true); - - /* - * We switched to a new timeline. Clean up segments on the old - * timeline. - * - * If there are any higher-numbered segments on the old timeline, - * remove them. They might contain valid WAL, but they might also be - * pre-allocated files containing garbage. In any case, they are not - * part of the new timeline's history so we don't need them. - */ - RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID); - - /* - * If the switch happened in the middle of a segment, what to do with - * the last, partial segment on the old timeline? If we don't archive - * it, and the server that created the WAL never archives it either - * (e.g. because it was hit by a meteor), it will never make it to the - * archive. That's OK from our point of view, because the new segment - * that we created with the new TLI contains all the WAL from the old - * timeline up to the switch point. But if you later try to do PITR to - * the "missing" WAL on the old timeline, recovery won't find it in - * the archive. It's physically present in the new file with new TLI, - * but recovery won't look there when it's recovering to the older - * timeline. On the other hand, if we archive the partial segment, and - * the original server on that timeline is still running and archives - * the completed version of the same segment later, it will fail. (We - * used to do that in 9.4 and below, and it caused such problems). - * - * As a compromise, we rename the last segment with the .partial - * suffix, and archive it. Archive recovery will never try to read - * .partial segments, so they will normally go unused. But in the odd - * PITR case, the administrator can copy them manually to the pg_wal - * directory (removing the suffix). They can be useful in debugging, - * too. - * - * If a .done or .ready file already exists for the old timeline, - * however, we had already determined that the segment is complete, so - * we can let it be archived normally. (In particular, if it was - * restored from the archive to begin with, it's expected to have a - * .done file). - */ - if (XLogSegmentOffset(EndOfLog, wal_segment_size) != 0 && - XLogArchivingActive()) - { - char origfname[MAXFNAMELEN]; - XLogSegNo endLogSegNo; - - XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size); - XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size); - - if (!XLogArchiveIsReadyOrDone(origfname)) - { - char origpath[MAXPGPATH]; - char partialfname[MAXFNAMELEN]; - char partialpath[MAXPGPATH]; - - XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size); - snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname); - snprintf(partialpath, MAXPGPATH, "%s.partial", origpath); - - /* - * Make sure there's no .done or .ready file for the .partial - * file. - */ - XLogArchiveCleanup(partialfname); - - durable_rename(origpath, partialpath, ERROR); - XLogArchiveNotify(partialfname); - } - } - } + CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog); /* * Preallocate additional log files, if wanted. @@ -8282,6 +8253,60 @@ CheckRecoveryConsistency(void) } } +/* + * Perform whatever XLOG actions are necessary at end of REDO. + * + * The goal here is to make sure that we'll be able to recover properly if + * we crash again. If we choose to write a checkpoint, we'll write a shutdown + * checkpoint rather than an on-line one. This is not particularly critical, + * but since we may be assigning a new TLI, using a shutdown checkpoint allows + * us to have the rule that TLI only changes in shutdown checkpoints, which + * allows some extra error checking in xlog_redo. + */ +static bool +PerformRecoveryXLogAction(void) +{ + bool promoted = false; + + /* + * Perform a checkpoint to update all our recovery activity to disk. + * + * Note that we write a shutdown checkpoint rather than an on-line one. This + * is not particularly critical, but since we may be assigning a new TLI, + * using a shutdown checkpoint allows us to have the rule that TLI only + * changes in shutdown checkpoints, which allows some extra error checking + * in xlog_redo. + * + * In promotion, only create a lightweight end-of-recovery record instead of + * a full checkpoint. A checkpoint is requested later, after we're fully out + * of recovery mode and already accepting queries. + */ + if (ArchiveRecoveryRequested && IsUnderPostmaster && + LocalPromoteIsTriggered) + { + promoted = true; + + /* + * Insert a special WAL record to mark the end of recovery, since we + * aren't doing a checkpoint. That means that the checkpointer process + * may likely be in the middle of a time-smoothed restartpoint and could + * continue to be for minutes after this. That sounds strange, but the + * effect is roughly the same and it would be stranger to try to come + * out of the restartpoint and then checkpoint. We request a checkpoint + * later anyway, just for safety. + */ + CreateEndOfRecoveryRecord(); + } + else + { + RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY | + CHECKPOINT_IMMEDIATE | + CHECKPOINT_WAIT); + } + + return promoted; +} + /* * Is the system still in recovery? * -- 2.18.0
From 5d65798aef2aa6cae6b933b9d805aadf71a71b49 Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Mon, 4 Oct 2021 00:44:31 -0400 Subject: [PATCH v37 3/4] Create XLogAcceptWrites() function with code from StartupXLOG(). This is just code movement. A future patch will want to defer the call to XLogAcceptWrites() until a later time, rather than doing it as soon as we finish applying WAL, but here we're just grouping related code together into a new function. Robert Haas, with modifications by Amul Sul. --- src/backend/access/transam/xlog.c | 101 +++++++++++++++++------------- 1 file changed, 59 insertions(+), 42 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index cc08d8a475c..5abb7c5e542 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -939,6 +939,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force); static XLogRecord *ReadRecord(XLogReaderState *xlogreader, int emode, bool fetching_ckpt); static void CheckRecoveryConsistency(void); +static bool XLogAcceptWrites(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog); static bool PerformRecoveryXLogAction(void); static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int whichChkpt, bool report); @@ -8062,52 +8063,15 @@ StartupXLOG(void) } XLogReaderFree(xlogreader); - LocalSetXLogInsertAllowed(); - - /* If necessary, write overwrite-contrecord before doing anything else */ - if (!XLogRecPtrIsInvalid(abortedRecPtr)) - { - Assert(!XLogRecPtrIsInvalid(missingContrecPtr)); - CreateOverwriteContrecordRecord(abortedRecPtr); - abortedRecPtr = InvalidXLogRecPtr; - missingContrecPtr = InvalidXLogRecPtr; - } - /* - * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE - * record before resource manager writes cleanup WAL records or checkpoint - * record is written. + * Update full_page_writes in shared memory, and later whenever wal write + * permitted, write an XLOG_FPW_CHANGE record before resource manager + * writes cleanup WAL records or checkpoint record is written. */ Insert->fullPageWrites = lastFullPageWrites; - UpdateFullPageWrites(); - LocalXLogInsertAllowed = -1; - /* - * Emit checkpoint or end-of-recovery record in XLOG, if the server has been - * through the archive or the crash recovery. - * - * If the recovery is performed lastReplayedEndRecPtr will always be a valid - * record pointer that never changes after REDO loop. - */ - if (!XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr)) - promoted = PerformRecoveryXLogAction(); - - /* If this is archive recovery, perform post-recovery cleanup actions. */ - if (ArchiveRecoveryRequested) - CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog); - - /* - * If any of the critical GUCs have changed, log them before we allow - * backends to write WAL. - */ - LocalSetXLogInsertAllowed(); - XLogReportParameters(); - - /* - * Local WAL inserts enabled, so it's time to finish initialization of - * commit timestamp. - */ - CompleteCommitTsInitialization(); + /* Prepare to accept WAL writes. */ + promoted = XLogAcceptWrites(EndOfLogTLI, EndOfLog); /* * All done with end-of-recovery actions. @@ -8163,6 +8127,59 @@ StartupXLOG(void) RequestCheckpoint(CHECKPOINT_FORCE); } +/* + * Prepare to accept WAL writes. + */ +static bool +XLogAcceptWrites(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog) +{ + bool promoted = false; + + LocalSetXLogInsertAllowed(); + + /* If necessary, write overwrite-contrecord before doing anything else */ + if (!XLogRecPtrIsInvalid(abortedRecPtr)) + { + Assert(!XLogRecPtrIsInvalid(missingContrecPtr)); + CreateOverwriteContrecordRecord(abortedRecPtr); + abortedRecPtr = InvalidXLogRecPtr; + missingContrecPtr = InvalidXLogRecPtr; + } + + /* Write an XLOG_FPW_CHANGE record */ + UpdateFullPageWrites(); + LocalXLogInsertAllowed = -1; + + /* + * Emit checkpoint or end-of-recovery record in XLOG, if the server has been + * through the archive or the crash recovery. + * + * If the recovery is performed lastReplayedEndRecPtr will always be a valid + * record pointer that never changes after REDO loop. + */ + if (!XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr)) + promoted = PerformRecoveryXLogAction(); + + /* If this is archive recovery, perform post-recovery cleanup actions. */ + if (ArchiveRecoveryRequested) + CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog); + + /* + * If any of the critical GUCs have changed, log them before we allow + * backends to write WAL. + */ + LocalSetXLogInsertAllowed(); + XLogReportParameters(); + + /* + * Local WAL inserts enabled, so it's time to finish initialization of + * commit timestamp. + */ + CompleteCommitTsInitialization(); + + return promoted; +} + /* * Checks if recovery has reached a consistent state. When consistency is * reached and we have a valid starting standby snapshot, tell postmaster -- 2.18.0