Hi Amul,

I reviewed the patch and got some comments:

> On Nov 25, 2025, at 14:37, Amul Sul <[email protected]> wrote:
> 
> 
> Regards,
> Amul
> <v8-0001-Refactor-pg_waldump-Move-some-declarations-to-new.patch><v8-0002-Refactor-pg_waldump-Separate-logic-used-to-calcul.patch><v8-0003-Refactor-pg_waldump-Restructure-TAP-tests.patch><v8-0004-pg_waldump-Add-support-for-archived-WAL-decoding.patch><v8-0005-pg_waldump-Remove-the-restriction-on-the-order-of.patch><v8-0006-pg_verifybackup-Delay-default-WAL-directory-prepa.patch><v8-0007-pg_verifybackup-Rename-the-wal-directory-switch-t.patch><v8-0008-pg_verifybackup-enabled-WAL-parsing-for-tar-forma.patch>

1 - 0001 - pg_waldump.h
```
+ * pg_waldump.h - decode and display WAL
+ *
+ * Copyright (c) 2013-2025, PostgreSQL Global Development Group
```

This header file is brand new, so copyright year should be only 2025.

2 - 0001 - pg_waldump.c
```
-static int     WalSegSz;
+int                    WalSegSz = DEFAULT_XLOG_SEG_SIZE;
```

0001 claims a refactoring, but if you initialize WalSegSz with 
DEFAULT_XLOG_SEG_SIZE, then the behavior is changing, this change is no longer 
a pure refactor.

I would suggest leave WalSegSz uninitiated (compiler will set 0 to it), then no 
behavior change, so that 0001 stays a self-contained pure refactor.

The other nit thing is that, as “static” is removed, now “WalSegSz” is placed 
in middle of two static variables, which looks not good. If I were making the 
code change, I would have moved WalSegSz to after all static variables.

3 - 0002
```
@@ -383,21 +406,11 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr 
targetPagePtr, int reqLen,
                                XLogRecPtr targetPtr, char *readBuff)
 {
        XLogDumpPrivate *private = state->private_data;
-       int                     count = XLOG_BLCKSZ;
+       int                     count = required_read_len(private, 
targetPagePtr, reqLen);
        WALReadError errinfo;
 
-       if (XLogRecPtrIsValid(private->endptr))
-       {
-               if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
-                       count = XLOG_BLCKSZ;
-               else if (targetPagePtr + reqLen <= private->endptr)
-                       count = private->endptr - targetPagePtr;
-               else
-               {
-                       private->endptr_reached = true;
-                       return -1;
-               }
-       }
+       if (private->endptr_reached)
+               return -1;
```

This change introduces a logic hole. In old code, it sets 
private->endptr_reached = true; and return -1. In the code code, count and 
private->endptr_reached assignments are wrapped into required_read_len(). 
However, required_read_len() doesn’t check if private->endptr_reached has 
already been true, so that the logic hole is that, if private->endptr_reached 
is already true when calling required_read_len(), and required_read_len() 
returns a positive count, if (private->endptr_reached) will also be satisfied 
and return -1 from the function.

So, to be safe, we should check “if (count < 0) return -1”.

4 - 0002
```
+/* Returns the size in bytes of the data to be read. */
+static inline int
+required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr,
+                                 int reqLen)
+{
```

The function comment is too simple. It doesn’t cover the case where -1 is 
returned.

5 - 0003
```
+my @scenario = (
+       {
+               'path' => $node->data_dir
+       });
 
-@lines = test_pg_waldump('--limit' => 6);
-is(@lines, 6, 'limit option observed');
+for my $scenario (@scenario)
+{
```

"my @scenario” should be "my @scenarios”, so that for line become "for my 
$scenario (@scenarios)”, a little bit clearer.

6 - 0003
```
+       SKIP:
+       {
```

Why SKIP label is defined here? A SKIP label usually follows a skip statement, 
for example: in bin/pg_ctl/t/001_start_stop.pl
```
SKIP:
{
skip "unix-style permissions not supported on Windows", 2
if ($windows_os);

ok(-f $logFileName);
ok(check_mode_recursive("$tempdir/data", 0700, 0600));
}
```

7 - 0004 - Makefile
```
        $(WIN32RES) \
        compat.o \
        pg_waldump.o \
+       archive_waldump.o \
        rmgrdesc.o \
        xlogreader.o \
        xlogstats.o
```

Obviously the list was in alphabetical order, so archive_waldump.o should be 
placed before compat.o.

8 - 0004
```
+/*
+ * pg_waldump's XLogReaderRoutine->page_read callback to support dumping WAL
+ * files from tar archives.
+ */
+static int
+TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int 
reqLen,
+                                  XLogRecPtr targetPtr, char *readBuff)
+{
+       XLogDumpPrivate *private = state->private_data;
+       int                     count = required_read_len(private, 
targetPagePtr, reqLen);
```

Looking the page_read’s spec:
```
        /*
         * Data input callback
         *
         * This callback shall read at least reqLen valid bytes of the xlog page
         * starting at targetPagePtr, and store them in readBuf.  The callback
         * shall return the number of bytes read (never more than XLOG_BLCKSZ), 
or
         * -1 on failure.  The callback shall sleep, if necessary, to wait for 
the
         * requested bytes to become available.  The callback will not be 
invoked
         * again for the same page unless more than the returned number of bytes
         * are needed.
         *
         * targetRecPtr is the position of the WAL record we're reading.  
Usually
         * it is equal to targetPagePtr + reqLen, but sometimes xlogreader needs
         * to read and verify the page or segment header, before it reads the
         * actual WAL record it's interested in.  In that case, targetRecPtr can
         * be used to determine which timeline to read the page from.
         *
         * The callback shall set ->seg.ws_tli to the TLI of the file the page 
was
         * read from.
         */
        XLogPageReadCB page_read;
```

It says that page_read must read reqLen bytes, otherwise it should wait for 
more bytes.

However, TarWALDumpReadPage just calculate how many bytes can read and only 
read that long, which breaks the protocol. Is it a problem?

9 - 0004
```
+/*
+ * Create an astreamer that can read WAL from tar file.
+ */
+static astreamer *
+astreamer_waldump_new(XLogDumpPrivate *privateInfo)
+{
+       astreamer_waldump *streamer;
+
+       streamer = palloc0(sizeof(astreamer_waldump));
+       *((const astreamer_ops **) &streamer->base.bbs_ops) =
+               &astreamer_waldump_ops;
+
+       streamer->privateInfo = privateInfo;
+
+       return &streamer->base;
+}
```

This function allocates memory for streamer but only returns &streamer->base, 
so memory of streamer is leaked.

Also, in the function comment, “from tar file” => “from a tar file”.

10 - 0004
```
+ * End-of-stream processing for a astreamer_waldump stream.
```

Nit typo: a => an

11 - 0004
```
+       if (!IsValidWalSegSize(WalSegSz))
+       {
+               pg_log_error(ngettext("invalid WAL segment size in WAL file 
from archive \"%s\" (%d byte)",
+                                                         "invalid WAL segment 
size in WAL file from archive \"%s\" (%d bytes)",
+                                                         WalSegSz),
+                                        privateInfo->archive_name, WalSegSz);
+               pg_log_error_detail("The WAL segment size must be a power of 
two between 1 MB and 1 GB.");
+               exit(1);
+       }
```

Why don’t pg_fatal()?

12 - 0005
```
+               /* Create a temporary file if one does not already exist */
+               if (!entry->tmpseg_exists)
+               {
+                       write_fp = prepare_tmp_write(entry->segno);
+                       entry->tmpseg_exists = true;
+               }
+
+               /* Flush data from the buffer to the file */
+               perform_tmp_write(entry->segno, &entry->buf, write_fp);
+               resetStringInfo(&entry->buf);
+
+               /*
+                * The change in the current segment entry indicates that the 
reading
+                * of this file has ended.
+                */
+               if (entry != privateInfo->cur_wal && write_fp != NULL)
+               {
+                       fclose(write_fp);
+                       write_fp = NULL;
+               }
```

When entry->tmpseg_exists is true, then write_fp will not be initialized, but 
there should be a check to make sure write_fp is not NULL before 
perform_tmp_write(). 

Also, if write_fp != NULL, should we anyway close the file without considering 
entry != privateInfo->cur_wal? Otherwise write_fp may be left open.

13 - 0005
```
+        * Use the directory specified by the TEMDIR environment variable. If 
it’s
```

Typo: TEMDIR => TMPDIR

14 - 0005
```
+ * Set up a temporary directory to temporarily store WAL segments.
```

temporary and temporarily are redundant.

No comment for 0007.

15 - 0007

I wonder why we need to manually po files? This is the first time I see a patch 
including po file changes.

16 - 0008
```
+               {
+                       pg_log_error("wal archive not found");
+                       pg_log_error_hint("Specify the correct path using the 
option -w/--wal-path."
+                                                         "Or you must use 
-n/--no-parse-wal when verifying a tar-format backup.");
+                       exit(1);
+               }
```

“wal” should be “WAL”.

In the hint message, there should be a white space between the two sentences.

Again, why not pg_fatal().

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/






Reply via email to