Hi,

On Fri, Oct 17, 2025 at 4:22 AM John H <[email protected]> wrote:

> Hi,
>
> On Wed, Oct 15, 2025 at 7:27 AM Srinath Reddy Sadipiralla
> <[email protected]> wrote:
>
> >
> > TBH first it seemed to me a good coding practice for future proofing,
> > then i checked if there's any place in postgres we are doing
> > something similar ,then found 1 in src/include/access/gist.h
> >
> > typedef struct GISTDeletedPageContents
> > {
> > /* last xid which could see the page in a scan */
> > FullTransactionId deleteXid;
> > } GISTDeletedPageContents;
> >
>
> It makes more sense to me if we expect the struct or same set of arguments
> to be reused in different places / want a stable API reference. I don't
> think
> we want everything to be wrapped around a struct for functions just in
> case.
>
> > ..
> > thanks for updating, i have attached a diff patch on
> > top of v9-0001 patch , where i tried to add more tests
> > ...
>
> Thank you for the diff! Your changes are a lot cleaner and I've included
> it in
> the latest patch. One difference I've made is instead of additionally
> logging in
> decide_wal_file_action
>
> > + pg_log_debug("WAL segment \"%s\" is copied to target", fname);
>
> I realized we are already logging the WAL segments that are copied over in
> print_filemap so I've updated the test to check for that instead
>
> > qr/pg_wal\/$corrupt_wal_seg \(COPY\)/
>
> I also updated the error message for the last three checks.
>

Thanks for updating the patch, I have reviewed it,
except these below concerns, patch LGTM.

1) regarding the parsing the WAL filename

On Fri, Oct 17, 2025 at 4:25 AM John H <[email protected]> wrote:

> On Thu, Oct 16, 2025 at 12:00 PM Robert Haas <[email protected]>
> wrote:
> >
> > On Wed, Oct 15, 2025 at 10:27 AM Srinath Reddy Sadipiralla
> > > ,the main problem is when if someone manually places an invalid WAL
> file
> > > in pg_wal like 00000001FFFFFFFFFFFFFF10, IsXLogFileName will
> > > consider it as valid ,so with the approach as i mentioned earlier we
> can
> > > catch such cases.
> >
> > I think that parsing the file name may be a good idea so that we can
> > do appropriate sanity checks on the values (e.g. checking that we're
> > only skipping copying prior to last_common_segno), but I do not think
> > we should worry too much about the user manually injecting invalid WAL
> > files. I mean, I would prefer that if that does happen, it either
> > works anyway or fails with a sensible error message, rather than
> > emitting an incomprehensible error message or dumping core. But, it is
> > in general true that if manual modifications are made to the data
> > directory, things may go terribly wrong, and this code is not obliged
> > to provide any more protection against such scenarios than we do in
> > other cases. Ultimately, such modifications are user error.
> >
>
> It feels like there's a lot of things we could attempt to ensure
> "correctness" if we are concerned about scenarios when the user manually
> puts
> or modifies content unexpectedly in the pg_wal directory.
>
> For instance, one could make the argument that when considering to skip
> copying the common WAL segments, even though they are of the same
> size, it's possible the user has manipulated them directly. I don't
> think we need to
> run checksums on every WAL segment that is a valid candidate to ensure they
> match.


I agree with Robert here, and i think if we found
that the WAL filename is not valid after parsing
we can return something like FILE_CONTENT_INVALID_TYPE
in getFileContentType so that it won't end up going
through decide_wal_file_action,then XLogFromFileName
may assign file_segno invalid values greater than
last_common_segno which means we lead to copying it,even
though it's an invalid file, thoughts?

2) Please fix the indentation by running pgindent,
also add file_content_type_t typedef to typedefs.list
file.
3) maybe we can improve the error messages for the
last 2 checks in tap test, that because of this
reason hence proven that the copy from source to
target has been done.
4) We need to update the pg_rewind docs regarding
this new behaviour.

-- 
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

Reply via email to