Re: [Patch] ALTER SYSTEM READ ONLY

Robert Haas Fri, 12 Nov 2021 12:48:53 -0800

On Mon, Nov 8, 2021 at 8:20 AM Amul Sul <sula...@gmail.com> wrote:
> Attached is the rebased version of refactoring as well as the
> pg_prohibit_wal feature patches for the latest master head (commit #
> 39a3105678a).


I spent a lot of time today studying 0002, and specifically the
question of whether EndOfLog must be the same as
XLogCtl->replayEndRecPtr and whether EndOfLogTLI must be the same as
XLogCtl->replayEndTLI.

The answer to the former question is "no" because, if we don't enter
redo, XLogCtl->replayEndRecPtr won't be initialized at all. If we do
enter redo, then I think it has to be the same unless something very
weird happens. EndOfLog gets set like this:

    XLogBeginRead(xlogreader, LastRec);
    record = ReadRecord(xlogreader, PANIC, false, replayTLI);
    EndOfLog = EndRecPtr;

In every case that exists in our regression tests, EndRecPtr is the
same before these three lines of code as it is afterward. However, if
you test with recovery_target=immediate, you can get it to be
different, because in that case we drop out of the redo loop after
calling recoveryStopsBefore() rather than after calling
recoveryStopsAfter(). Similarly I'm fairly sure that if you use
recovery_target_inclusive=off you can likewise get it to be different
(though I discovered the hard way that recovery_target_inclusive=off
is ignored when you use recovery_target_name). It seems like a really
bad thing that neither recovery_target=immediate nor
recovery_target_inclusive=off have any tests, and I think we ought to
add some.

Anyway, in effect, these three lines of code have the effect of
backing up the xlogreader by one record when we stop before rather
than after a record that we're replaying. What that means is that
EndOfLog is going to be the end+1 of the last record that we actually
replayed. There might be one more record that we read but did not
replay, and that record won't impact the value we end up with in
EndOfLog. Now, XLogCtl->replayEndRecPtr is also that end+1 of the last
record that we actually replayed. To put that another way, there's no
way to exit the main redo loop after we set XLogCtl->replayEndRecPtr
and before we change LastRec. So in the cases where
XLogCtl->replayEndRecPtr gets initialized at all, it can only be
different from EndOfLog if something different happens when we re-read
the last-replayed WAL record than what happened when we read it the
first time. That seems unlikely, and would be unfortunate it if it did
happen. I am inclined to think that it might be better not to reread
the record at all, though. As far as this patch goes, I think we need
a solution that doesn't involve fetching EndOfLog from a variable
that's only sometimes initialized and then not doing anything with it
except in the cases where it was initialized.

As for EndOfLogTLI, I'm afraid I don't think that's the same thing as
XLogCtl->replayEndTLI. Now, it's hard to be sure, because I don't
think the regression tests contain any scenarios where we run recovery
and the values end up different. However, I think that the code sets
EndOfLogTLI to the TLI of the last WAL file that we read, and I think
XLogCtl->replayEndTLI gets set to the timeline from which that WAL
record originated. So imagine that we are looking for WAL that ought
to be in 000000010000000000000003 but we don't find it; instead we
find 000000020000000000000003 because our recovery target timeline is
2, or something that has 2 in its history. We will read the WAL for
timeline 1 from this file which has timeline 2 in the file name. I
think if recovery ends in this file before the timeline switch, these
values will be different. I did not try to construct a test case for
this today due to not having enough time, so it's possible that I'm
wrong about this, but that's how it looks to me from the code.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [Patch] ALTER SYSTEM READ ONLY

Reply via email to