Thanks!
At Wed, 16 Jun 2021 16:52:11 +0900, Masahiko Sawada <[email protected]>
wrote in
> On Fri, Jun 4, 2021 at 5:35 PM Kyotaro Horiguchi
> <[email protected]> wrote:
> >
> > In a very common operation of accidentally specifying a recycled
> > segment, pg_waldump often returns the following obscure message.
> >
> > $ pg_waldump 00000001000000000000002D
> > pg_waldump: fatal: could not find a valid record after 0/2D000000
> >
> > The more detailed message is generated internally and we can use it.
> > That looks like the following.
> >
> > $ pg_waldump 00000001000000000000002D
> > pg_waldump: fatal: unexpected pageaddr 0/24000000 in log segment
> > 00000001000000000000002D, offset 0
> >
> > Is it work doing?
>
> Perhaps we need both? The current message describes where the error
> happened and the message internally generated describes the details.
> It seems to me that both are useful. For example, if we find an error
> during XLogReadRecord(), we show both as follows:
>
> if (errormsg)
> fatal_error("error in WAL record at %X/%X: %s",
> LSN_FORMAT_ARGS(xlogreader_state->ReadRecPtr),
> errormsg);
Yeah, I thought that it might be a bit vervous and lengty but actually
we have another place where doing that. One more point is whether we
have a case where first_record is invalid but errormsg is NULL
there. WALDumpReadPage immediately exits so we should always have a
message in that case according to the comment in ReadRecord.
> * We only end up here without a message when XLogPageRead()
> * failed - in that case we already logged something. In
> * StandbyMode that only happens if we have been triggered, so we
> * shouldn't loop anymore in that case.
So that can be an assertion.
Now the messages looks like this.
$ pg_waldump /home/horiguti/data/data_work/pg_wal/000000020000000000000010
pg_waldump: fatal: could not find a valid record after 0/0: unexpected pageaddr
0/9000000 in log segment 000000020000000000000010, offset 0
reagards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 42738eb940..5a98c523c6 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -932,17 +932,28 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
*
* This positions the reader, like XLogBeginRead(), so that the next call to
* XLogReadRecord() will read the next valid record.
+ *
+ * If the page_read callback fails to read the requested data,
+ * InvalidXLogRecPtr is returned. The callback is expected to have reported
+ * the error; errormsg is set to NULL.
+ *
+ * If the reading fails for some other reason, InvalidXLogRecPtr is also
+ * returned, and *errormsg is set to a string with details of the failure.
+ *
+ * The returned pointer (or *errormsg) points to an internal buffer that's
+ * valid until the next call to XLogFindNextRecord or XLogReadRecord.
*/
XLogRecPtr
-XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr)
+XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
{
XLogRecPtr tmpRecPtr;
XLogRecPtr found = InvalidXLogRecPtr;
XLogPageHeader header;
- char *errormsg;
Assert(!XLogRecPtrIsInvalid(RecPtr));
+ *errormsg = NULL;
+
/*
* skip over potential continuation data, keeping in mind that it may span
* multiple pages
@@ -1021,7 +1032,7 @@ XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr)
* or we just jumped over the remaining data of a continuation.
*/
XLogBeginRead(state, tmpRecPtr);
- while (XLogReadRecord(state, &errormsg) != NULL)
+ while (XLogReadRecord(state, errormsg) != NULL)
{
/* past the record we've found, break out */
if (RecPtr <= state->ReadRecPtr)
@@ -1036,6 +1047,9 @@ XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr)
err:
XLogReaderInvalReadState(state);
+ if (state->errormsg_buf[0] != '\0')
+ *errormsg = state->errormsg_buf;
+
return InvalidXLogRecPtr;
}
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index f8b8afe4a7..0dae1ebea7 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -1051,11 +1051,17 @@ main(int argc, char **argv)
fatal_error("out of memory");
/* first find a valid recptr to start from */
- first_record = XLogFindNextRecord(xlogreader_state, private.startptr);
+ first_record = XLogFindNextRecord(xlogreader_state, private.startptr,
+ &errormsg);
if (first_record == InvalidXLogRecPtr)
- fatal_error("could not find a valid record after %X/%X",
- LSN_FORMAT_ARGS(private.startptr));
+ {
+ /* XLogFindNextRecord must return errormsg on error */
+ Assert (errormsg);
+ fatal_error("could not find a valid record after %X/%X: %s",
+ LSN_FORMAT_ARGS(xlogreader_state->ReadRecPtr),
+ errormsg);
+ }
/*
* Display a message that we're skipping data if `from` wasn't a pointer
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 21d200d3df..0f6d920fed 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -267,7 +267,8 @@ extern void XLogReaderFree(XLogReaderState *state);
/* Position the XLogReader to given record */
extern void XLogBeginRead(XLogReaderState *state, XLogRecPtr RecPtr);
#ifdef FRONTEND
-extern XLogRecPtr XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr);
+extern XLogRecPtr XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr,
+ char **errormsg);
#endif /* FRONTEND */
/* Read the next XLog record. Returns NULL on end-of-WAL or failure */