Re: Various bugs if segment containing redo pointer does not exist

2023-10-24 Thread Robert Haas
On Mon, Oct 23, 2023 at 8:43 PM Andres Freund  wrote:
> The source of the emode=13=DEBUG2 is that that's hardcoded in
> WaitForWALToBecomeAvailable(). I guess the error ought to come from
> XLogPageRead(), but all that happens is this:
>
> case XLREAD_FAIL:
> if (readFile >= 0)
> close(readFile);
> readFile = -1;
> readLen = 0;
> readSource = XLOG_FROM_ANY;
> return XLREAD_FAIL;

I've been under the impression that the guiding principle here is that
we shouldn't error out upon hitting a condition that should only cause
us to switch sources. I think WaitForWALToBecomeAvailable() is
supposed to set things up so that XLogPageRead()'s call to pg_pread()
will succeed. If it says it can't, then XLogPageRead() is only obliged
to pass that information up to the caller, who can decide to wait
longer for the data to show up, or give up, or whatever it wants to
do. On the other hand, if WaitForWALToBecomeAvailable() says that it's
fine to go ahead and call pg_pread() and pg_pread() then fails, then
that means that we've got a problem with the WAL file other than it
just not being available yet, like it's the wrong length or there was
an I/O error, and those are reportable errors. Said differently, in
the former case, the WAL is not broken, merely not currently
available; in the latter case, it's broken.

The legibility and maintainability of this code are certainly not
great. The xlogreader mechanism is extremely useful, but maybe we
should have done more cleanup of the underlying mechanism first. It's
a giant ball of spaghetti code that is challenging to understand and
almost impossible to validate thoroughly (as you just discovered).

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Various bugs if segment containing redo pointer does not exist

2023-10-23 Thread Andres Freund
Hi,

On 2023-10-23 16:21:45 -0700, Andres Freund wrote:
> 1) For some reason I haven't yet debugged, the ReadRecord(PANIC) in
>PerformWalRecovery() doesn't PANIC, but instead just returns NULL
>
>We *do* output a DEBUG message, but well, that's insufficient.

The debug is from this backtrace:

#0  XLogFileReadAnyTLI (segno=6476, emode=13, source=XLOG_FROM_PG_WAL) at 
/home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:4291
#1  0x55d7b3949db0 in WaitForWALToBecomeAvailable (RecPtr=108649259008, 
randAccess=true, fetching_ckpt=false, tliRecPtr=108665421680, replayTLI=1,
replayLSN=108665421680, nonblocking=false) at 
/home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:3697
#2  0x55d7b39494ff in XLogPageRead (xlogreader=0x55d7b472c470, 
targetPagePtr=108649250816, reqLen=8192, targetRecPtr=108665421680,
readBuf=0x55d7b47ba5d8 "\024\321\005") at 
/home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:3278
#3  0x55d7b3941bb1 in ReadPageInternal (state=0x55d7b472c470, 
pageptr=108665413632, reqLen=8072)
at /home/andres/src/postgresql/src/backend/access/transam/xlogreader.c:1014
#4  0x55d7b3940f43 in XLogDecodeNextRecord (state=0x55d7b472c470, 
nonblocking=false)
at /home/andres/src/postgresql/src/backend/access/transam/xlogreader.c:571
#5  0x55d7b3941a41 in XLogReadAhead (state=0x55d7b472c470, 
nonblocking=false) at 
/home/andres/src/postgresql/src/backend/access/transam/xlogreader.c:947
#6  0x55d7b393f5fa in XLogPrefetcherNextBlock (pgsr_private=94384934340072, 
lsn=0x55d7b47cfeb8)
at 
/home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:496
#7  0x55d7b393efcd in lrq_prefetch (lrq=0x55d7b47cfe88) at 
/home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:256
#8  0x55d7b393f190 in lrq_complete_lsn (lrq=0x55d7b47cfe88, lsn=0) at 
/home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:294
#9  0x55d7b39401ba in XLogPrefetcherReadRecord (prefetcher=0x55d7b47bc5e8, 
errmsg=0x7ffc23505920)
at 
/home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:1041
#10 0x55d7b3948ff8 in ReadRecord (xlogprefetcher=0x55d7b47bc5e8, emode=23, 
fetching_ckpt=false, replayTLI=1)
at 
/home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:3078
#11 0x55d7b3946749 in PerformWalRecovery () at 
/home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:1640


The source of the emode=13=DEBUG2 is that that's hardcoded in
WaitForWALToBecomeAvailable(). I guess the error ought to come from
XLogPageRead(), but all that happens is this:

case XLREAD_FAIL:
if (readFile >= 0)
close(readFile);
readFile = -1;
readLen = 0;
readSource = XLOG_FROM_ANY;
return XLREAD_FAIL;

which *does* error out for some other failures:
errno = save_errno;
ereport(emode_for_corrupt_record(emode, targetPagePtr + 
reqLen),
(errcode_for_file_access(),
 errmsg("could not read from WAL 
segment %s, LSN %X/%X, offset %u: %m",
fname, 
LSN_FORMAT_ARGS(targetPagePtr),
readOff)));

but not for a file that couldn't be opened. Which wouldn't have to be due to
ENOENT, could also be EACCESS...


xlogreader has undergone a fair bit of changes in the last few releases. As of
now, I can't really understand who is responsible for reporting what kind of
error.

/*
 * Data input callback
 *
 * This callback shall read at least reqLen valid bytes of the xlog page
 * starting at targetPagePtr, and store them in readBuf.  The callback
 * shall return the number of bytes read (never more than XLOG_BLCKSZ), 
or
 * -1 on failure.  The callback shall sleep, if necessary, to wait for 
the
 * requested bytes to become available.  The callback will not be 
invoked
 * again for the same page unless more than the returned number of bytes
 * are needed.
 *
 * targetRecPtr is the position of the WAL record we're reading.  
Usually
 * it is equal to targetPagePtr + reqLen, but sometimes xlogreader needs
 * to read and verify the page or segment header, before it reads the
 * actual WAL record it's interested in.  In that case, targetRecPtr can
 * be used to determine which timeline to read the page from.
 *
 * The callback shall set ->seg.ws_tli to the TLI of the file the page 
was
 * read from.
 */
XLogPageReadCB page_read;

/*
 * Callback to 

Various bugs if segment containing redo pointer does not exist

2023-10-23 Thread Andres Freund
Hi,

I investigated a crashing postgres instance. Turns out the original issue was
operator error. But in the process I found a few postgres issues. The scenario
is basically that redo LSN and checkpoint LSN are in seperate segments, and
that for whatever reason, the file containing the redo LSN is missing.

I'd expect a PANIC in that situation. But, turns out that no such luck.

I've looked at 15 and HEAD so far.

1) For some reason I haven't yet debugged, the ReadRecord(PANIC) in
   PerformWalRecovery() doesn't PANIC, but instead just returns NULL

   We *do* output a DEBUG message, but well, that's insufficient.


2) On HEAD, we then segfault, because the cross check for XLOG_CHECKPOINT_REDO
   causes null pointer dereference. Which I guess is good, we shouldn't have
   gotten there without a record.


3) On 15, with or without assertions, we decide that "redo is not
   required". Gulp.


4) On 15, with assertions enabled, we fail with an assertion in the startup
   process, in FinishWalRecovery()->XLogPrefetcherBeginRead()->XLogBeginRead()
   Assert(!XLogRecPtrIsInvalid(RecPtr))


5) On 15, with optimizations enabled, we don't just crash, it gets scarier.

   First, the startup process actually creates a bogus WAL segment:

#1  0x55f53b2725ff in XLogFileInitInternal (path=0x7ffccb7b3360 
"pg_wal/0001", added=0x7ffccb7b335f, logtli=1, 
logsegno=1099511627776)
at /home/andres/src/postgresql-15/src/backend/access/transam/xlog.c:2936
#2  PreallocXlogFiles (endptr=endptr@entry=0, tli=tli@entry=1) at 
/home/andres/src/postgresql-15/src/backend/access/transam/xlog.c:3433
#3  0x55f53b277e00 in PreallocXlogFiles (tli=1, endptr=0) at 
/home/andres/src/postgresql-15/src/backend/access/transam/xlog.c:3425
#4  StartupXLOG () at 
/home/andres/src/postgresql-15/src/backend/access/transam/xlog.c:5517
#5  0x55f53b4a385e in StartupProcessMain () at 
/home/andres/src/postgresql-15/src/backend/postmaster/startup.c:282
#6  0x55f53b49860b in AuxiliaryProcessMain 
(auxtype=auxtype@entry=StartupProcess)
at /home/andres/src/postgresql-15/src/backend/postmaster/auxprocess.c:141
#7  0x55f53b4a2eed in StartChildProcess (type=StartupProcess) at 
/home/andres/src/postgresql-15/src/backend/postmaster/postmaster.c:5432
#8  PostmasterMain (argc=argc@entry=39, argv=argv@entry=0x55f53d5095d0) at 
/home/andres/src/postgresql-15/src/backend/postmaster/postmaster.c:1473
#9  0x55f53b1d1bff in main (argc=39, argv=0x55f53d5095d0) at 
/home/andres/src/postgresql-15/src/backend/main/main.c:202

Note the endptr=0 and pg_wal/0001 path.

With normal log level, one wouldn't learn anything about this.


Then the *checkpointer* segfaults, trying to write the end-of-recovery
checkpoint:

#0  __memcpy_evex_unaligned_erms () at 
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:344
#1  0x56102ebe84a2 in memcpy (__len=26, __src=0x56102fbe7b48, 
__dest=) at 
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29
#2  CopyXLogRecordToWAL (tli=1, EndPos=160, StartPos=40, rdata=0x56102f31ffb0 
, isLogSwitch=false, write_len=114)
at /home/andres/src/postgresql-15/src/backend/access/transam/xlog.c:1229
#3  XLogInsertRecord (rdata=, fpw_lsn=, 
flags=, num_fpi=0, topxid_included=)
at /home/andres/src/postgresql-15/src/backend/access/transam/xlog.c:861
#4  0x56102ebf12cb in XLogInsert (rmid=rmid@entry=0 '\000', 
info=info@entry=0 '\000')
at 
/home/andres/src/postgresql-15/src/backend/access/transam/xloginsert.c:492
#5  0x56102ebea92e in CreateCheckPoint (flags=102) at 
/home/andres/src/postgresql-15/src/backend/access/transam/xlog.c:6583
#6  0x56102ee0e552 in CheckpointerMain () at 
/home/andres/src/postgresql-15/src/backend/postmaster/checkpointer.c:455
#7  0x56102ee0c5f9 in AuxiliaryProcessMain 
(auxtype=auxtype@entry=CheckpointerProcess)
at /home/andres/src/postgresql-15/src/backend/postmaster/auxprocess.c:153
#8  0x56102ee12c38 in StartChildProcess (type=CheckpointerProcess) at 
/home/andres/src/postgresql-15/src/backend/postmaster/postmaster.c:5432
#9  0x56102ee16d54 in PostmasterMain (argc=argc@entry=35, 
argv=argv@entry=0x56102fb6e5d0)
at /home/andres/src/postgresql-15/src/backend/postmaster/postmaster.c:1466
#10 0x56102eb45bff in main (argc=35, argv=0x56102fb6e5d0) at 
/home/andres/src/postgresql-15/src/backend/main/main.c:202

The immediate cause of the crash is that GetXLogBuffer() is called with
ptr=40, which makes GetXLogBuffer() think it can use the cached path,
because cachedPage is still zero.

Which in turn is because because the startup process happily initialized
XLogCtl->Insert.{CurrBytePos,PrevBytePos} with 0s.  Even though it
initialized RedoRecPtr with the valid redo pointer.

The checkpointer actually ends up resetting the valid RedoRecPtr with
bogus content as part of CreateCheckPoint(), due to the bogus CurrBytePos.


6) On 15, with optimizations enabled, we