On Thu, Oct 7, 2021 at 6:21 PM Amul Sul <sula...@gmail.com> wrote: > > On Thu, Oct 7, 2021 at 5:56 AM Jaime Casanova > <jcasa...@systemguards.com.ec> wrote: > > > > On Tue, Oct 05, 2021 at 04:11:58PM +0530, Amul Sul wrote: > > > On Mon, Oct 4, 2021 at 1:57 PM Rushabh Lathia > > > <rushabh.lat...@gmail.com> wrote: > > > > > > > > I tried to apply the patch on the master branch head and it's failing > > > > with conflicts. > > > > > > > > > > Thanks, Rushabh, for the quick check, I have attached a rebased version > > > for the > > > latest master head commit # f6b5d05ba9a. > > > > > > > Hi, > > > > I got this error while executing "make check" on src/test/recovery: > > > > """ > > t/026_overwrite_contrecord.pl ........ 1/3 # poll_query_until timed out > > executing this query: > > # SELECT '0/201D4D8'::pg_lsn <= pg_last_wal_replay_lsn() > > # expecting this output: > > # t > > # last actual query output: > > # f > > # with stderr: > > # Looks like your test exited with 29 just after 1. > > t/026_overwrite_contrecord.pl ........ Dubious, test returned 29 (wstat > > 7424, 0x1d00) > > Failed 2/3 subtests > > > > Test Summary Report > > ------------------- > > t/026_overwrite_contrecord.pl (Wstat: 7424 Tests: 1 Failed: 0) > > Non-zero exit status: 29 > > Parse errors: Bad plan. You planned 3 tests but ran 1. > > Files=26, Tests=279, 400 wallclock secs ( 0.27 usr 0.10 sys + 73.78 cusr > > 59.66 csys = 133.81 CPU) > > Result: FAIL > > make: *** [Makefile:23: check] Error 1 > > """ > > > > Thanks for the reporting problem, I am working on it. The cause of > failure is that v37_0004 patch clearing the missingContrecPtr global > variable before CreateOverwriteContrecordRecord() execution, which it > shouldn't. >
In the attached version I have fixed this issue by restoring missingContrecPtr. To handle abortedRecPtr and missingContrecPtr newly added global variables thought the commit # ff9f111bce24, we don't need to store them in the shared memory separately, instead, we need a flag that indicates a broken record found previously, at the end of recovery, so that we can overwrite contrecord. The missingContrecPtr is assigned to the EndOfLog, and we have handled EndOfLog previously in the 0004 patch, and the abortedRecPtr is the same as the lastReplayedEndRecPtr, AFAICS. I have added an assert to ensure that the lastReplayedEndRecPtr value is the same as the abortedRecPtr, but I think that is not needed, we can go ahead and write an overwrite-contrecord starting at lastReplayedEndRecPtr. Regards, Amul
From 5bf021226d9742a6fefbcb33e54f7ef044d8fbcc Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Thu, 30 Sep 2021 06:29:06 -0400 Subject: [PATCH v38 4/4] Remove dependencies on startup-process specifical variables. To make XLogAcceptWrites(), need to dependency on few global and local variable spcific to startup process. Global variables are abortedRecPtr, missingContrecPtr, ArchiveRecoveryRequested and LocalPromoteIsTriggered, whereas LocalPromoteIsTriggered can be accessed in any other process using existing PromoteIsTriggered(). ArchiveRecoveryRequested is made accessible by copying into shared memory. abortedRecPtr and missingContrecPtr can get from the existing shared memory values but for that, we need a flag indicating that broken records was found previously and OVERWRITE_CONTRECORD message needs to be written when WAL writes permitted, added a flag for the same. XLogAcceptWrites() accepts two argument as EndOfLogTLI and EndOfLog which are local to StartupXLOG(). Instead of passing as an argument XLogCtl->replayEndTLI and XLogCtl->lastSegSwitchLSN from the shared memory can be used as an replacement to EndOfLogTLI and EndOfLog respectively. XLogCtl->lastSegSwitchLSN is not going to change until we use it. That changes only when the current WAL segment gets full which never going to happen because of two reasons, first WAL writes are disabled for other processes until XLogAcceptWrites() finishes and other reasons before use of lastSegSwitchLSN, XLogAcceptWrites() is writes fix size wal records as full-page write and record for either recovery end or checkpoint which not going to fill up the 16MB wal segment. EndOfLogTLI in the StartupXLOG() is the timeline ID of the last record that xlogreader reads, but this xlogreader was simply re-fetching the last record which we have replied in redo loop if it was in recovery, if not in recovery, we don't need to worry since this value is needed only in case of ArchiveRecoveryRequested = true, which implicitly forces redo and sets XLogCtl->replayEndTLI value. --- src/backend/access/transam/xlog.c | 90 ++++++++++++++++++++++++------- 1 file changed, 72 insertions(+), 18 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index cdfec248081..b9596ca005c 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -668,6 +668,13 @@ typedef struct XLogCtlData */ bool SharedPromoteIsTriggered; + /* + * SharedArchiveRecoveryRequested exports the value of the + * ArchiveRecoveryRequested flag to be share which is otherwise valid only + * in the startup process. + */ + bool SharedArchiveRecoveryRequested; + /* * WalWriterSleeping indicates whether the WAL writer is currently in * low-power mode (and hence should be nudged if an async commit occurs). @@ -706,9 +713,10 @@ typedef struct XLogCtlData /* * lastReplayedEndRecPtr points to end+1 of the last record successfully - * replayed. When we're currently replaying a record, ie. in a redo - * function, replayEndRecPtr points to the end+1 of the record being - * replayed, otherwise it's equal to lastReplayedEndRecPtr. + * replayed and that could be point where broken record starts (if exists). + * When we're currently replaying a record, ie. in a redo function, + * replayEndRecPtr points to the end+1 of the record being replayed, + * otherwise it's equal to lastReplayedEndRecPtr. */ XLogRecPtr lastReplayedEndRecPtr; TimeLineID lastReplayedTLI; @@ -717,6 +725,12 @@ typedef struct XLogCtlData /* timestamp of last COMMIT/ABORT record replayed (or being replayed) */ TimestampTz recoveryLastXTime; + /* + * overwriteContrecord indicates if a record was found to be broken at the + * end of recovery and OVERWRITE_CONTRECORD message needs to write. + */ + bool overwriteContrecord; + /* * timestamp of when we started replaying the current chunk of WAL data, * only relevant for replication or archive recovery @@ -889,8 +903,7 @@ static MemoryContext walDebugCxt = NULL; static void readRecoverySignalFile(void); static void validateRecoveryParameters(void); static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog); -static void CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, - XLogRecPtr EndOfLog); +static void CleanupAfterArchiveRecovery(void); static bool recoveryStopsBefore(XLogReaderState *record); static bool recoveryStopsAfter(XLogReaderState *record); static char *getRecoveryStopReason(void); @@ -939,7 +952,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force); static XLogRecord *ReadRecord(XLogReaderState *xlogreader, int emode, bool fetching_ckpt); static void CheckRecoveryConsistency(void); -static bool XLogAcceptWrites(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog); +static bool XLogAcceptWrites(void); static bool PerformRecoveryXLogAction(void); static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int whichChkpt, bool report); @@ -5267,6 +5280,7 @@ XLOGShmemInit(void) XLogCtl->SharedHotStandbyActive = false; XLogCtl->InstallXLogFileSegmentActive = false; XLogCtl->SharedPromoteIsTriggered = false; + XLogCtl->SharedArchiveRecoveryRequested = false; XLogCtl->WalWriterSleeping = false; SpinLockInit(&XLogCtl->Insert.insertpos_lck); @@ -5548,6 +5562,11 @@ readRecoverySignalFile(void) ereport(FATAL, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("standby mode is not supported by single-user servers"))); + + /* + * Remember archive recovery request in shared memory state. + */ + XLogCtl->SharedArchiveRecoveryRequested = ArchiveRecoveryRequested; } static void @@ -5739,8 +5758,10 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog) * Perform cleanup actions at the conclusion of archive recovery. */ static void -CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog) +CleanupAfterArchiveRecovery(void) { + XLogRecPtr EndOfLog; + /* * Execute the recovery_end_command, if any. */ @@ -5757,6 +5778,7 @@ CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog) * files containing garbage. In any case, they are not part of the new * timeline's history so we don't need them. */ + (void) GetLastSegSwitchData(&EndOfLog); RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID); /* @@ -5791,6 +5813,7 @@ CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog) { char origfname[MAXFNAMELEN]; XLogSegNo endLogSegNo; + TimeLineID EndOfLogTLI = XLogCtl->replayEndTLI; XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size); XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size); @@ -7965,6 +7988,27 @@ StartupXLOG(void) { Assert(!XLogRecPtrIsInvalid(abortedRecPtr)); EndOfLog = missingContrecPtr; + + /* + * Set broken record found flag in shared memory. This process might + * unable to write an OVERWRITE_CONTRECORD message because of WAL write + * restriction. Storing in shared memory helps that get written later + * by another process when WAL writes enabled. + */ + XLogCtl->overwriteContrecord = true; + + /* + * While writing OVERWRITE_CONTRECORD message abortedRecPtr and + * missingContrecPtr values need to be restored, and that can be fetched + * from the shared memory as lastReplayedEndRecPtr is the abortedRecPtr + * and missingContrecPtr is the EndOfLog which going to be stored at a + * bunch of places in the shared memory (e.g. lastSegSwitchLSN which not + * going to change before the point where the OVERWRITE_CONTRECORD + * message gets written). + */ + Assert(!XLogRecPtrIsInvalid(abortedRecPtr == XLogCtl->lastReplayedEndRecPtr)); + abortedRecPtr = InvalidXLogRecPtr; + missingContrecPtr = InvalidXLogRecPtr; } /* @@ -8071,7 +8115,7 @@ StartupXLOG(void) Insert->fullPageWrites = lastFullPageWrites; /* Prepare to accept WAL writes. */ - promoted = XLogAcceptWrites(EndOfLogTLI, EndOfLog); + promoted = XLogAcceptWrites(); /* * All done with end-of-recovery actions. @@ -8131,19 +8175,29 @@ StartupXLOG(void) * Prepare to accept WAL writes. */ static bool -XLogAcceptWrites(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog) +XLogAcceptWrites(void) { bool promoted = false; LocalSetXLogInsertAllowed(); /* If necessary, write overwrite-contrecord before doing anything else */ - if (!XLogRecPtrIsInvalid(abortedRecPtr)) + if (!XLogRecPtrIsInvalid(XLogCtl->overwriteContrecord)) { - Assert(!XLogRecPtrIsInvalid(missingContrecPtr)); - CreateOverwriteContrecordRecord(abortedRecPtr); - abortedRecPtr = InvalidXLogRecPtr; - missingContrecPtr = InvalidXLogRecPtr; + /* + * Restore missingContrecPtr, needed to set + * XLP_FIRST_IS_OVERWRITE_CONTRECORD flag on the page header where + * overwrite-contrecord get written. See AdvanceXLInsertBuffer(). + */ + GetLastSegSwitchData(&missingContrecPtr); + + /* + * Start writing overwrite-contrecord after the point where the last + * valid replyed record ended. + */ + CreateOverwriteContrecordRecord(XLogCtl->lastReplayedEndRecPtr); + + XLogCtl->overwriteContrecord = false; } /* Write an XLOG_FPW_CHANGE record */ @@ -8161,8 +8215,8 @@ XLogAcceptWrites(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog) promoted = PerformRecoveryXLogAction(); /* If this is archive recovery, perform post-recovery cleanup actions. */ - if (ArchiveRecoveryRequested) - CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog); + if (XLogCtl->SharedArchiveRecoveryRequested) + CleanupAfterArchiveRecovery(); /* * If any of the critical GUCs have changed, log them before we allow @@ -8304,8 +8358,8 @@ PerformRecoveryXLogAction(void) * a full checkpoint. A checkpoint is requested later, after we're fully out * of recovery mode and already accepting queries. */ - if (ArchiveRecoveryRequested && IsUnderPostmaster && - LocalPromoteIsTriggered) + if (XLogCtl->SharedArchiveRecoveryRequested && IsUnderPostmaster && + PromoteIsTriggered()) { promoted = true; -- 2.18.0
From 733035e06d5dedd2142a9f126332c056c8a4d42d Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Mon, 4 Oct 2021 00:44:31 -0400 Subject: [PATCH v38 3/4] Create XLogAcceptWrites() function with code from StartupXLOG(). This is just code movement. A future patch will want to defer the call to XLogAcceptWrites() until a later time, rather than doing it as soon as we finish applying WAL, but here we're just grouping related code together into a new function. Robert Haas, with modifications by Amul Sul. --- src/backend/access/transam/xlog.c | 101 +++++++++++++++++------------- 1 file changed, 59 insertions(+), 42 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6612b81e4b9..cdfec248081 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -939,6 +939,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force); static XLogRecord *ReadRecord(XLogReaderState *xlogreader, int emode, bool fetching_ckpt); static void CheckRecoveryConsistency(void); +static bool XLogAcceptWrites(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog); static bool PerformRecoveryXLogAction(void); static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int whichChkpt, bool report); @@ -8062,52 +8063,15 @@ StartupXLOG(void) } XLogReaderFree(xlogreader); - LocalSetXLogInsertAllowed(); - - /* If necessary, write overwrite-contrecord before doing anything else */ - if (!XLogRecPtrIsInvalid(abortedRecPtr)) - { - Assert(!XLogRecPtrIsInvalid(missingContrecPtr)); - CreateOverwriteContrecordRecord(abortedRecPtr); - abortedRecPtr = InvalidXLogRecPtr; - missingContrecPtr = InvalidXLogRecPtr; - } - /* - * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE - * record before resource manager writes cleanup WAL records or checkpoint - * record is written. + * Update full_page_writes in shared memory, and later whenever wal write + * permitted, write an XLOG_FPW_CHANGE record before resource manager + * writes cleanup WAL records or checkpoint record is written. */ Insert->fullPageWrites = lastFullPageWrites; - UpdateFullPageWrites(); - LocalXLogInsertAllowed = -1; - /* - * Emit checkpoint or end-of-recovery record in XLOG, if the server has been - * through the archive or the crash recovery. - * - * If the recovery is performed lastReplayedEndRecPtr will always be a valid - * record pointer that never changes after REDO loop. - */ - if (!XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr)) - promoted = PerformRecoveryXLogAction(); - - /* If this is archive recovery, perform post-recovery cleanup actions. */ - if (ArchiveRecoveryRequested) - CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog); - - /* - * If any of the critical GUCs have changed, log them before we allow - * backends to write WAL. - */ - LocalSetXLogInsertAllowed(); - XLogReportParameters(); - - /* - * Local WAL inserts enabled, so it's time to finish initialization of - * commit timestamp. - */ - CompleteCommitTsInitialization(); + /* Prepare to accept WAL writes. */ + promoted = XLogAcceptWrites(EndOfLogTLI, EndOfLog); /* * All done with end-of-recovery actions. @@ -8163,6 +8127,59 @@ StartupXLOG(void) RequestCheckpoint(CHECKPOINT_FORCE); } +/* + * Prepare to accept WAL writes. + */ +static bool +XLogAcceptWrites(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog) +{ + bool promoted = false; + + LocalSetXLogInsertAllowed(); + + /* If necessary, write overwrite-contrecord before doing anything else */ + if (!XLogRecPtrIsInvalid(abortedRecPtr)) + { + Assert(!XLogRecPtrIsInvalid(missingContrecPtr)); + CreateOverwriteContrecordRecord(abortedRecPtr); + abortedRecPtr = InvalidXLogRecPtr; + missingContrecPtr = InvalidXLogRecPtr; + } + + /* Write an XLOG_FPW_CHANGE record */ + UpdateFullPageWrites(); + LocalXLogInsertAllowed = -1; + + /* + * Emit checkpoint or end-of-recovery record in XLOG, if the server has been + * through the archive or the crash recovery. + * + * If the recovery is performed lastReplayedEndRecPtr will always be a valid + * record pointer that never changes after REDO loop. + */ + if (!XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr)) + promoted = PerformRecoveryXLogAction(); + + /* If this is archive recovery, perform post-recovery cleanup actions. */ + if (ArchiveRecoveryRequested) + CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog); + + /* + * If any of the critical GUCs have changed, log them before we allow + * backends to write WAL. + */ + LocalSetXLogInsertAllowed(); + XLogReportParameters(); + + /* + * Local WAL inserts enabled, so it's time to finish initialization of + * commit timestamp. + */ + CompleteCommitTsInitialization(); + + return promoted; +} + /* * Checks if recovery has reached a consistent state. When consistency is * reached and we have a valid starting standby snapshot, tell postmaster -- 2.18.0
From 19ac27a62187753eaef168785b6222bb9497de26 Mon Sep 17 00:00:00 2001 From: Robert Haas <rhaas@postgresql.org> Date: Fri, 23 Jul 2021 13:07:56 -0400 Subject: [PATCH v38 1/4] Refactor some end-of-recovery code out of StartupXLOG(). Moved the code that performs whether to write a checkpoint or an end-of-recovery record into PerformRecoveryXlogAction(). Also create a new function CleanupAfterArchiveRecovery() to perform a few tasks that we want to do after we've actually exited archive recovery but before we start accepting new WAL writes. This is straightforward code movement to make StartupXLOG() a little bit shorter and a little bit easier to understand. Robert Haas, with modifications by Amul Sul. --- src/backend/access/transam/xlog.c | 261 ++++++++++++++++-------------- 1 file changed, 143 insertions(+), 118 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 26dcc00ac01..44e5a0610ef 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -889,6 +889,8 @@ static MemoryContext walDebugCxt = NULL; static void readRecoverySignalFile(void); static void validateRecoveryParameters(void); static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog); +static void CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, + XLogRecPtr EndOfLog); static bool recoveryStopsBefore(XLogReaderState *record); static bool recoveryStopsAfter(XLogReaderState *record); static char *getRecoveryStopReason(void); @@ -937,6 +939,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force); static XLogRecord *ReadRecord(XLogReaderState *xlogreader, int emode, bool fetching_ckpt); static void CheckRecoveryConsistency(void); +static bool PerformRecoveryXLogAction(void); static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int whichChkpt, bool report); static bool rescanLatestTimeLine(void); @@ -5731,6 +5734,88 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog) (errmsg("archive recovery complete"))); } +/* + * Perform cleanup actions at the conclusion of archive recovery. + */ +static void +CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog) +{ + /* + * Execute the recovery_end_command, if any. + */ + if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0) + ExecuteRecoveryCommand(recoveryEndCommand, + "recovery_end_command", + true); + + /* + * We switched to a new timeline. Clean up segments on the old timeline. + * + * If there are any higher-numbered segments on the old timeline, remove + * them. They might contain valid WAL, but they might also be pre-allocated + * files containing garbage. In any case, they are not part of the new + * timeline's history so we don't need them. + */ + RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID); + + /* + * If the switch happened in the middle of a segment, what to do with the + * last, partial segment on the old timeline? If we don't archive it, and + * the server that created the WAL never archives it either (e.g. because it + * was hit by a meteor), it will never make it to the archive. That's OK + * from our point of view, because the new segment that we created with the + * new TLI contains all the WAL from the old timeline up to the switch + * point. But if you later try to do PITR to the "missing" WAL on the old + * timeline, recovery won't find it in the archive. It's physically present + * in the new file with new TLI, but recovery won't look there when it's + * recovering to the older timeline. On the other hand, if we archive the + * partial segment, and the original server on that timeline is still + * running and archives the completed version of the same segment later, it + * will fail. (We used to do that in 9.4 and below, and it caused such + * problems). + * + * As a compromise, we rename the last segment with the .partial suffix, and + * archive it. Archive recovery will never try to read .partial segments, so + * they will normally go unused. But in the odd PITR case, the administrator + * can copy them manually to the pg_wal directory (removing the suffix). + * They can be useful in debugging, too. + * + * If a .done or .ready file already exists for the old timeline, however, + * we had already determined that the segment is complete, so we can let it + * be archived normally. (In particular, if it was restored from the archive + * to begin with, it's expected to have a .done file). + */ + if (XLogSegmentOffset(EndOfLog, wal_segment_size) != 0 && + XLogArchivingActive()) + { + char origfname[MAXFNAMELEN]; + XLogSegNo endLogSegNo; + + XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size); + XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size); + + if (!XLogArchiveIsReadyOrDone(origfname)) + { + char origpath[MAXPGPATH]; + char partialfname[MAXFNAMELEN]; + char partialpath[MAXPGPATH]; + + XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size); + snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname); + snprintf(partialpath, MAXPGPATH, "%s.partial", origpath); + + /* + * Make sure there's no .done or .ready file for the .partial + * file. + */ + XLogArchiveCleanup(partialfname); + + durable_rename(origpath, partialpath, ERROR); + XLogArchiveNotify(partialfname); + } + } +} + /* * Extract timestamp from WAL record. * @@ -7953,127 +8038,13 @@ StartupXLOG(void) UpdateFullPageWrites(); LocalXLogInsertAllowed = -1; + /* Emit checkpoint or end-of-recovery record in XLOG, if required. */ if (InRecovery) - { - /* - * Perform a checkpoint to update all our recovery activity to disk. - * - * Note that we write a shutdown checkpoint rather than an on-line - * one. This is not particularly critical, but since we may be - * assigning a new TLI, using a shutdown checkpoint allows us to have - * the rule that TLI only changes in shutdown checkpoints, which - * allows some extra error checking in xlog_redo. - * - * In promotion, only create a lightweight end-of-recovery record - * instead of a full checkpoint. A checkpoint is requested later, - * after we're fully out of recovery mode and already accepting - * queries. - */ - if (ArchiveRecoveryRequested && IsUnderPostmaster && - LocalPromoteIsTriggered) - { - promoted = true; - - /* - * Insert a special WAL record to mark the end of recovery, since - * we aren't doing a checkpoint. That means that the checkpointer - * process may likely be in the middle of a time-smoothed - * restartpoint and could continue to be for minutes after this. - * That sounds strange, but the effect is roughly the same and it - * would be stranger to try to come out of the restartpoint and - * then checkpoint. We request a checkpoint later anyway, just for - * safety. - */ - CreateEndOfRecoveryRecord(); - } - else - { - RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY | - CHECKPOINT_IMMEDIATE | - CHECKPOINT_WAIT); - } - } + promoted = PerformRecoveryXLogAction(); + /* If this is archive recovery, perform post-recovery cleanup actions. */ if (ArchiveRecoveryRequested) - { - /* - * And finally, execute the recovery_end_command, if any. - */ - if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0) - ExecuteRecoveryCommand(recoveryEndCommand, - "recovery_end_command", - true); - - /* - * We switched to a new timeline. Clean up segments on the old - * timeline. - * - * If there are any higher-numbered segments on the old timeline, - * remove them. They might contain valid WAL, but they might also be - * pre-allocated files containing garbage. In any case, they are not - * part of the new timeline's history so we don't need them. - */ - RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID); - - /* - * If the switch happened in the middle of a segment, what to do with - * the last, partial segment on the old timeline? If we don't archive - * it, and the server that created the WAL never archives it either - * (e.g. because it was hit by a meteor), it will never make it to the - * archive. That's OK from our point of view, because the new segment - * that we created with the new TLI contains all the WAL from the old - * timeline up to the switch point. But if you later try to do PITR to - * the "missing" WAL on the old timeline, recovery won't find it in - * the archive. It's physically present in the new file with new TLI, - * but recovery won't look there when it's recovering to the older - * timeline. On the other hand, if we archive the partial segment, and - * the original server on that timeline is still running and archives - * the completed version of the same segment later, it will fail. (We - * used to do that in 9.4 and below, and it caused such problems). - * - * As a compromise, we rename the last segment with the .partial - * suffix, and archive it. Archive recovery will never try to read - * .partial segments, so they will normally go unused. But in the odd - * PITR case, the administrator can copy them manually to the pg_wal - * directory (removing the suffix). They can be useful in debugging, - * too. - * - * If a .done or .ready file already exists for the old timeline, - * however, we had already determined that the segment is complete, so - * we can let it be archived normally. (In particular, if it was - * restored from the archive to begin with, it's expected to have a - * .done file). - */ - if (XLogSegmentOffset(EndOfLog, wal_segment_size) != 0 && - XLogArchivingActive()) - { - char origfname[MAXFNAMELEN]; - XLogSegNo endLogSegNo; - - XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size); - XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size); - - if (!XLogArchiveIsReadyOrDone(origfname)) - { - char origpath[MAXPGPATH]; - char partialfname[MAXFNAMELEN]; - char partialpath[MAXPGPATH]; - - XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size); - snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname); - snprintf(partialpath, MAXPGPATH, "%s.partial", origpath); - - /* - * Make sure there's no .done or .ready file for the .partial - * file. - */ - XLogArchiveCleanup(partialfname); - - durable_rename(origpath, partialpath, ERROR); - XLogArchiveNotify(partialfname); - } - } - } + CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog); /* * Preallocate additional log files, if wanted. @@ -8282,6 +8253,60 @@ CheckRecoveryConsistency(void) } } +/* + * Perform whatever XLOG actions are necessary at end of REDO. + * + * The goal here is to make sure that we'll be able to recover properly if + * we crash again. If we choose to write a checkpoint, we'll write a shutdown + * checkpoint rather than an on-line one. This is not particularly critical, + * but since we may be assigning a new TLI, using a shutdown checkpoint allows + * us to have the rule that TLI only changes in shutdown checkpoints, which + * allows some extra error checking in xlog_redo. + */ +static bool +PerformRecoveryXLogAction(void) +{ + bool promoted = false; + + /* + * Perform a checkpoint to update all our recovery activity to disk. + * + * Note that we write a shutdown checkpoint rather than an on-line one. This + * is not particularly critical, but since we may be assigning a new TLI, + * using a shutdown checkpoint allows us to have the rule that TLI only + * changes in shutdown checkpoints, which allows some extra error checking + * in xlog_redo. + * + * In promotion, only create a lightweight end-of-recovery record instead of + * a full checkpoint. A checkpoint is requested later, after we're fully out + * of recovery mode and already accepting queries. + */ + if (ArchiveRecoveryRequested && IsUnderPostmaster && + LocalPromoteIsTriggered) + { + promoted = true; + + /* + * Insert a special WAL record to mark the end of recovery, since we + * aren't doing a checkpoint. That means that the checkpointer process + * may likely be in the middle of a time-smoothed restartpoint and could + * continue to be for minutes after this. That sounds strange, but the + * effect is roughly the same and it would be stranger to try to come + * out of the restartpoint and then checkpoint. We request a checkpoint + * later anyway, just for safety. + */ + CreateEndOfRecoveryRecord(); + } + else + { + RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY | + CHECKPOINT_IMMEDIATE | + CHECKPOINT_WAIT); + } + + return promoted; +} + /* * Is the system still in recovery? * -- 2.18.0
From b0d1790e5aa95a02217efb4e635398d3086a7493 Mon Sep 17 00:00:00 2001 From: Robert Haas <rhaas@postgresql.org> Date: Fri, 23 Jul 2021 14:27:51 -0400 Subject: [PATCH v38 2/4] Postpone some end-of-recovery operations relating to allowing WAL. Previously, moved the code that performs whether to write a checkpoint or an end-of-recovery record into PerformRecoveryXlogAction(), and code performs post-archive-recovery into CleanupAfterArchiveRecovery(), but called both the functions from the same place. Now postpone that stuff until after we clear InRecovery and shut down the XLogReader. We do find out of InRecovery value afterward by looking XLogCtl->lastReplayedEndRecPtr, that will be only get set inside the REDO loop. This is preparatory work for a future patch that wants to allow recovery to end at one time and only later start to allow WAL writes. The steps that themselves write WAL clearly shouldn't happen before we're ready to accept WAL writes, and it seems best for now to keep the steps performed by CleanupAfterArchiveRecovery() at the same point relative to the surrounding steps. We assume (hopefully correctly) that the user doesn't want recovery_end_command to run until we're committed to writing WAL on the new timeline. Until then, the machine is still usable as a standby on the old timeline. Aside from the value of this patch as preparatory work, this order of operations actually seems more logical, since it means we don't actually write any WAL until after exiting recovery. Robert Haas, with modifications by Amul Sul. --- src/backend/access/transam/xlog.c | 62 +++++++++++++++++-------------- 1 file changed, 34 insertions(+), 28 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 44e5a0610ef..6612b81e4b9 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8018,34 +8018,6 @@ StartupXLOG(void) XLogCtl->LogwrtRqst.Write = EndOfLog; XLogCtl->LogwrtRqst.Flush = EndOfLog; - LocalSetXLogInsertAllowed(); - - /* If necessary, write overwrite-contrecord before doing anything else */ - if (!XLogRecPtrIsInvalid(abortedRecPtr)) - { - Assert(!XLogRecPtrIsInvalid(missingContrecPtr)); - CreateOverwriteContrecordRecord(abortedRecPtr); - abortedRecPtr = InvalidXLogRecPtr; - missingContrecPtr = InvalidXLogRecPtr; - } - - /* - * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE - * record before resource manager writes cleanup WAL records or checkpoint - * record is written. - */ - Insert->fullPageWrites = lastFullPageWrites; - UpdateFullPageWrites(); - LocalXLogInsertAllowed = -1; - - /* Emit checkpoint or end-of-recovery record in XLOG, if required. */ - if (InRecovery) - promoted = PerformRecoveryXLogAction(); - - /* If this is archive recovery, perform post-recovery cleanup actions. */ - if (ArchiveRecoveryRequested) - CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog); - /* * Preallocate additional log files, if wanted. */ @@ -8090,6 +8062,40 @@ StartupXLOG(void) } XLogReaderFree(xlogreader); + LocalSetXLogInsertAllowed(); + + /* If necessary, write overwrite-contrecord before doing anything else */ + if (!XLogRecPtrIsInvalid(abortedRecPtr)) + { + Assert(!XLogRecPtrIsInvalid(missingContrecPtr)); + CreateOverwriteContrecordRecord(abortedRecPtr); + abortedRecPtr = InvalidXLogRecPtr; + missingContrecPtr = InvalidXLogRecPtr; + } + + /* + * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE + * record before resource manager writes cleanup WAL records or checkpoint + * record is written. + */ + Insert->fullPageWrites = lastFullPageWrites; + UpdateFullPageWrites(); + LocalXLogInsertAllowed = -1; + + /* + * Emit checkpoint or end-of-recovery record in XLOG, if the server has been + * through the archive or the crash recovery. + * + * If the recovery is performed lastReplayedEndRecPtr will always be a valid + * record pointer that never changes after REDO loop. + */ + if (!XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr)) + promoted = PerformRecoveryXLogAction(); + + /* If this is archive recovery, perform post-recovery cleanup actions. */ + if (ArchiveRecoveryRequested) + CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog); + /* * If any of the critical GUCs have changed, log them before we allow * backends to write WAL. -- 2.18.0