On Tue, Aug 02, 2022 at 07:37:27AM -0700, Noah Misch wrote: > On Tue, Aug 02, 2022 at 10:14:22AM -0400, David Steele wrote: > > On 7/31/22 02:17, Noah Misch wrote: > > >On Tue, Jul 26, 2022 at 07:21:29AM -0400, David Steele wrote: > > >>On 6/19/21 16:39, Noah Misch wrote: > > >>>On Tue, Feb 02, 2021 at 07:14:16AM -0800, Noah Misch wrote: > > >>>>Recycling and preallocation are wasteful during archive recovery, > > >>>>because > > >>>>KeepFileRestoredFromArchive() unlinks every entry in its path. I > > >>>>propose to > > >>>>fix the race by adding an XLogCtl flag indicating which regime > > >>>>currently owns > > >>>>the right to add long-term pg_wal directory entries. In the archive > > >>>>recovery > > >>>>regime, the checkpointer will not preallocate and will unlink old > > >>>>segments > > >>>>instead of recycling them (like wal_recycle=off). XLogFileInit() will > > >>>>fail. > > >>> > > >>>Here's the implementation. Patches 1-4 suffice to stop the user-visible > > >>>ERROR. Patch 5 avoids a spurious LOG-level message and wasted filesystem > > >>>writes, and it provides some future-proofing. > > >>> > > >>>I was tempted to (but did not) just remove preallocation. Creating one > > >>>file > > >>>per checkpoint seems tiny relative to the max_wal_size=1GB default, so I > > >>>expect it's hard to isolate any benefit. Under the old > > >>>checkpoint_segments=3 > > >>>default, a preallocated segment covered a respectable third of the next > > >>>checkpoint. Before commit 63653f7 (2002), preallocation created more > > >>>files. > > >> > > >>This also seems like it would fix the link issues we are seeing in [1]. > > >> > > >>I wonder if that would make it worth a back patch? > > > > > >Perhaps. It's sad to have multiple people deep-diving into something > > >fixed on > > >HEAD. On the other hand, I'm not eager to spend risk-of-backpatch points > > >on > > >this. One alternative would be adding an errhint like "This is known to > > >happen occasionally during archive recovery, where it is harmless." That > > >has > > >an unpolished look, but it's low-risk and may avoid deep-dive efforts. > > > > I think in this case a HINT might be sufficient to at least keep people from > > wasting time tracking down a problem that has already been fixed.
Here's a patch to add that HINT. I figure it's better to do this before next week's minor releases. In the absence of objections, I'll push this around 2022-08-05 14:00 UTC. > > However, there is another issue [1] that might argue for a back patch if > > this patch (as I believe) would fix the issue. > > > [1] > > https://www.postgresql.org/message-id/CAHJZqBDxWfcd53jm0bFttuqpK3jV2YKWx%3D4W7KxNB4zzt%2B%2BqFg%40mail.gmail.com > > That makes sense. Each iteration of the restartpoint recycle loop has a 1/N > chance of failing. Recovery adds >N files between restartpoints. Hence, the > WAL directory grows without bound. Is that roughly the theory in mind? On further reflection, I don't expect it to happen that way. Each failure message is LOG-level, so the remaining recycles still happen. (The race condition can yield an ERROR under PreallocXlogFiles(), but recycling is already done at that point.)
Author: Noah Misch <n...@leadboat.com> Commit: Noah Misch <n...@leadboat.com> Add HINT for restartpoint race with KeepFileRestoredFromArchive(). The five commits ending at cc2c7d65fc27e877c9f407587b0b92d46cd6dd16 closed this race condition for v15+. For v14 through v10, add a HINT to discourage studying the cosmetic problem. Reviewed by FIXME. Discussion: https://postgr.es/m/20220731061747.ga3692...@rfd.leadboat.com diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 5cdd01f..9c845b7 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -3480,7 +3480,10 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock) if (fd < 0) ereport(ERROR, (errcode_for_file_access(), - errmsg("could not open file \"%s\": %m", path))); + errmsg("could not open file \"%s\": %m", path), + (AmCheckpointerProcess() ? + errhint("This is known to fail occasionally during archive recovery, where it is harmless.") : + 0))); elog(DEBUG2, "done creating and filling new WAL file"); diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index 6688dee..e76daff 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -834,7 +834,10 @@ durable_rename_excl(const char *oldfile, const char *newfile, int elevel) ereport(elevel, (errcode_for_file_access(), errmsg("could not link file \"%s\" to \"%s\": %m", - oldfile, newfile))); + oldfile, newfile), + (AmCheckpointerProcess() ? + errhint("This is known to fail occasionally during archive recovery, where it is harmless.") : + 0))); return -1; } unlink(oldfile); @@ -844,7 +847,10 @@ durable_rename_excl(const char *oldfile, const char *newfile, int elevel) ereport(elevel, (errcode_for_file_access(), errmsg("could not rename file \"%s\" to \"%s\": %m", - oldfile, newfile))); + oldfile, newfile), + (AmCheckpointerProcess() ? + errhint("This is known to fail occasionally during archive recovery, where it is harmless.") : + 0))); return -1; } #endif