At Mon, 20 Jun 2022 11:57:20 -0400, Robert Haas <[email protected]> wrote
in
> It seems to me that what we want to do is: if we're about to start
> allowing WAL writes, then consider whether to insert an aborted
> contrecord record. Now, if we are about to start allowing WAL write,
> we must determine the LSN at which the first such record will be
> written. Then there are two possibilities: either that LSN is on an
> existing record boundary, or it isn't. In the former case, no aborted
> contrecord record is required. In the latter case, we're writing at
> LSN which would have been in the middle of a previous record, except
> that the record in question was never finished or at least we don't
> have all of it. So the first WAL we insert at our chosen starting LSN
> must be an aborted contrecord record.
Right.
> I'm not quite sure I understand the nature of the remaining bug here,
> but this logic seems quite sketchy to me:
>
> /*
> * When not in standby mode we find that WAL ends in an incomplete
> * record, keep track of that record. After recovery is done,
> * we'll write a record to indicate to downstream WAL readers that
> * that portion is to be ignored.
> */
> if (!StandbyMode &&
> !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr))
> {
> abortedRecPtr = xlogreader->abortedRecPtr;
> missingContrecPtr = xlogreader->missingContrecPtr;
> }
>
> I don't see what StandbyMode has to do with anything here. The
> question is whether the place where we're going to begin writing WAL
> is on an existing record boundary or not. If it so happens that when
> StandbyMode is turned on we can never decide to promote in the middle
> of a record, then maybe there's no need to track this information when
> StandbyMode = true, but I don't see a good reason why we should make
> that assumption. What if in the future somebody added a command that
Right. I came to the same conclusion before reading this section. It
is rearer than other cases but surely possible.
> says "don't keep trying to read more WAL, just promote RIGHT HERE?". I
> think this logic would surely be incorrect in that case. It feels to
> me like the right thing to do is to always keep track if WAL ends in
> an incomplete record, and then when we promote, we write an aborted
> contrecord record if WAL ended in an incomplete record, full stop.
Agreed. Actually, with the second patch applied, removing !StandbyMode
from the condition makes no difference in behavior.
> Now, we do need to keep in mind that, while in StandbyMode, we might
> reach the end of WAL more than once, because we have a polling loop
> that looks for more WAL. So it does not work to just set the values
> once and then assume that's the whole truth forever. But why not
> handle that by storing the abortedRecPtr and missingContrecPtr
> unconditionally after every call to XLogPrefetcherReadRecord()? They
> will go back and forth between XLogRecPtrIsInvalid and other values
> many times but the last value should -- I think -- be however things
> ended up at the point where we decided to stop reading WAL.
Unfortunately it doesn't work because we read a record already known
to be complete again at the end of recovery. It is the reason of
"abortedRecPtr < xlogreader->EndRecPtr" in my PoC patch. Without it,
abrotedRecPtr is erased when it is actually needed. I don't like that
expedient-looking condition, but the strict condition for resetting
abortedRecPtr is iff "we have read a complete record at the same or
grater LSN ofabortedRecPtr"...
Come to think of this, I noticed that we can get rid of the
file-global abortedRecPtr by letting FinishWalRecovery copy
abortedRecPtr *before* reading the last record. This also allows us to
remove the "expedient" condition.
> I haven't really dived into this issue much so I might be totally off
> base, but it just doesn't feel right to me that this should depend on
> whether we're in standby mode. That seems at best incidentally related
> to whether an aborted contrecord record is required.
>
> P.S. "aborted contrecord record" is really quite an awkward turn of phrase!
Thats true! Always open for better phrasings:(
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 6eba626420..c1be15039d 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -360,15 +360,6 @@ typedef struct XLogRecoveryCtlData
static XLogRecoveryCtlData *XLogRecoveryCtl = NULL;
-/*
- * abortedRecPtr is the start pointer of a broken record at end of WAL when
- * recovery completes; missingContrecPtr is the location of the first
- * contrecord that went missing. See CreateOverwriteContrecordRecord for
- * details.
- */
-static XLogRecPtr abortedRecPtr;
-static XLogRecPtr missingContrecPtr;
-
/*
* if recoveryStopsBefore/After returns true, it saves information of the stop
* point here
@@ -944,12 +935,6 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
minRecoveryPointTLI = 0;
}
- /*
- * Start recovery assuming that the final record isn't lost.
- */
- abortedRecPtr = InvalidXLogRecPtr;
- missingContrecPtr = InvalidXLogRecPtr;
-
*wasShutdown_ptr = wasShutdown;
*haveBackupLabel_ptr = haveBackupLabel;
*haveTblspcMap_ptr = haveTblspcMap;
@@ -1430,6 +1415,14 @@ FinishWalRecovery(void)
lastRec = XLogRecoveryCtl->lastReplayedReadRecPtr;
lastRecTLI = XLogRecoveryCtl->lastReplayedTLI;
}
+
+ /*
+ * record aborted contrecord status before the next ReadRecord erases the
+ * state
+ */
+ result->abortedRecPtr = xlogreader->abortedRecPtr;
+ result->missingContrecPtr = xlogreader->missingContrecPtr;
+
XLogPrefetcherBeginRead(xlogprefetcher, lastRec);
(void) ReadRecord(xlogprefetcher, PANIC, false, lastRecTLI);
endOfLog = xlogreader->EndRecPtr;
@@ -1503,9 +1496,6 @@ FinishWalRecovery(void)
result->lastRecTLI = lastRecTLI;
result->endOfLog = endOfLog;
- result->abortedRecPtr = abortedRecPtr;
- result->missingContrecPtr = missingContrecPtr;
-
result->standby_signal_file_found = standby_signal_file_found;
result->recovery_signal_file_found = recovery_signal_file_found;
@@ -1970,10 +1960,6 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI)
LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
LSN_FORMAT_ARGS(record->overwrittenRecPtr));
- /* We have safely skipped the aborted record */
- abortedRecPtr = InvalidXLogRecPtr;
- missingContrecPtr = InvalidXLogRecPtr;
-
ereport(LOG,
(errmsg("successfully skipped missing contrecord at %X/%X, overwritten at %s",
LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
@@ -2971,19 +2957,6 @@ ReadRecord(XLogPrefetcher *xlogprefetcher, int emode,
record = XLogPrefetcherReadRecord(xlogprefetcher, &errormsg);
if (record == NULL)
{
- /*
- * When not in standby mode we find that WAL ends in an incomplete
- * record, keep track of that record. After recovery is done,
- * we'll write a record to indicate to downstream WAL readers that
- * that portion is to be ignored.
- */
- if (!StandbyMode &&
- !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr))
- {
- abortedRecPtr = xlogreader->abortedRecPtr;
- missingContrecPtr = xlogreader->missingContrecPtr;
- }
-
if (readFile >= 0)
{
close(readFile);