On Tue, Aug 02, 2022 at 07:37:27AM -0700, Noah Misch wrote:
> On Tue, Aug 02, 2022 at 10:14:22AM -0400, David Steele wrote:
> > On 7/31/22 02:17, Noah Misch wrote:
> > >On Tue, Jul 26, 2022 at 07:21:29AM -0400, David Steele wrote:
> > >>On 6/19/21 16:39, Noah Misch wrote:
> > >>>On Tue, Feb 02, 2021 at 07:14:16AM -0800, Noah Misch wrote:
> > >>>>Recycling and preallocation are wasteful during archive recovery, 
> > >>>>because
> > >>>>KeepFileRestoredFromArchive() unlinks every entry in its path.  I 
> > >>>>propose to
> > >>>>fix the race by adding an XLogCtl flag indicating which regime 
> > >>>>currently owns
> > >>>>the right to add long-term pg_wal directory entries.  In the archive 
> > >>>>recovery
> > >>>>regime, the checkpointer will not preallocate and will unlink old 
> > >>>>segments
> > >>>>instead of recycling them (like wal_recycle=off).  XLogFileInit() will 
> > >>>>fail.
> > >>>
> > >>>Here's the implementation.  Patches 1-4 suffice to stop the user-visible
> > >>>ERROR.  Patch 5 avoids a spurious LOG-level message and wasted filesystem
> > >>>writes, and it provides some future-proofing.
> > >>>
> > >>>I was tempted to (but did not) just remove preallocation.  Creating one 
> > >>>file
> > >>>per checkpoint seems tiny relative to the max_wal_size=1GB default, so I
> > >>>expect it's hard to isolate any benefit.  Under the old 
> > >>>checkpoint_segments=3
> > >>>default, a preallocated segment covered a respectable third of the next
> > >>>checkpoint.  Before commit 63653f7 (2002), preallocation created more 
> > >>>files.
> > >>
> > >>This also seems like it would fix the link issues we are seeing in [1].
> > >>
> > >>I wonder if that would make it worth a back patch?
> > >
> > >Perhaps.  It's sad to have multiple people deep-diving into something 
> > >fixed on
> > >HEAD.  On the other hand, I'm not eager to spend risk-of-backpatch points 
> > >on
> > >this.  One alternative would be adding an errhint like "This is known to
> > >happen occasionally during archive recovery, where it is harmless."  That 
> > >has
> > >an unpolished look, but it's low-risk and may avoid deep-dive efforts.
> > 
> > I think in this case a HINT might be sufficient to at least keep people from
> > wasting time tracking down a problem that has already been fixed.

Here's a patch to add that HINT.  I figure it's better to do this before next
week's minor releases.  In the absence of objections, I'll push this around
2022-08-05 14:00 UTC.

> > However, there is another issue [1] that might argue for a back patch if
> > this patch (as I believe) would fix the issue.
> 
> > [1] 
> > https://www.postgresql.org/message-id/CAHJZqBDxWfcd53jm0bFttuqpK3jV2YKWx%3D4W7KxNB4zzt%2B%2BqFg%40mail.gmail.com
> 
> That makes sense.  Each iteration of the restartpoint recycle loop has a 1/N
> chance of failing.  Recovery adds >N files between restartpoints.  Hence, the
> WAL directory grows without bound.  Is that roughly the theory in mind?

On further reflection, I don't expect it to happen that way.  Each failure
message is LOG-level, so the remaining recycles still happen.  (The race
condition can yield an ERROR under PreallocXlogFiles(), but recycling is
already done at that point.)
Author:     Noah Misch <n...@leadboat.com>
Commit:     Noah Misch <n...@leadboat.com>

    Add HINT for restartpoint race with KeepFileRestoredFromArchive().
    
    The five commits ending at cc2c7d65fc27e877c9f407587b0b92d46cd6dd16
    closed this race condition for v15+.  For v14 through v10, add a HINT to
    discourage studying the cosmetic problem.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20220731061747.ga3692...@rfd.leadboat.com

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 5cdd01f..9c845b7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3480,7 +3480,10 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, 
bool use_lock)
        if (fd < 0)
                ereport(ERROR,
                                (errcode_for_file_access(),
-                                errmsg("could not open file \"%s\": %m", 
path)));
+                                errmsg("could not open file \"%s\": %m", path),
+                                (AmCheckpointerProcess() ?
+                                 errhint("This is known to fail occasionally 
during archive recovery, where it is harmless.") :
+                                 0)));
 
        elog(DEBUG2, "done creating and filling new WAL file");
 
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 6688dee..e76daff 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -834,7 +834,10 @@ durable_rename_excl(const char *oldfile, const char 
*newfile, int elevel)
                ereport(elevel,
                                (errcode_for_file_access(),
                                 errmsg("could not link file \"%s\" to \"%s\": 
%m",
-                                               oldfile, newfile)));
+                                               oldfile, newfile),
+                                (AmCheckpointerProcess() ?
+                                 errhint("This is known to fail occasionally 
during archive recovery, where it is harmless.") :
+                                 0)));
                return -1;
        }
        unlink(oldfile);
@@ -844,7 +847,10 @@ durable_rename_excl(const char *oldfile, const char 
*newfile, int elevel)
                ereport(elevel,
                                (errcode_for_file_access(),
                                 errmsg("could not rename file \"%s\" to 
\"%s\": %m",
-                                               oldfile, newfile)));
+                                               oldfile, newfile),
+                                (AmCheckpointerProcess() ?
+                                 errhint("This is known to fail occasionally 
during archive recovery, where it is harmless.") :
+                                 0)));
                return -1;
        }
 #endif

Reply via email to