On 8/2/22 10:37, Noah Misch wrote:
On Tue, Aug 02, 2022 at 10:14:22AM -0400, David Steele wrote:
On 7/31/22 02:17, Noah Misch wrote:
On Tue, Jul 26, 2022 at 07:21:29AM -0400, David Steele wrote:
On 6/19/21 16:39, Noah Misch wrote:
On Tue, Feb 02, 2021 at 07:14:16AM -0800, Noah Misch wrote:
Recycling and preallocation are wasteful during archive recovery, because
KeepFileRestoredFromArchive() unlinks every entry in its path.  I propose to
fix the race by adding an XLogCtl flag indicating which regime currently owns
the right to add long-term pg_wal directory entries.  In the archive recovery
regime, the checkpointer will not preallocate and will unlink old segments
instead of recycling them (like wal_recycle=off).  XLogFileInit() will fail.

Here's the implementation.  Patches 1-4 suffice to stop the user-visible
ERROR.  Patch 5 avoids a spurious LOG-level message and wasted filesystem
writes, and it provides some future-proofing.

I was tempted to (but did not) just remove preallocation.  Creating one file
per checkpoint seems tiny relative to the max_wal_size=1GB default, so I
expect it's hard to isolate any benefit.  Under the old checkpoint_segments=3
default, a preallocated segment covered a respectable third of the next
checkpoint.  Before commit 63653f7 (2002), preallocation created more files.

This also seems like it would fix the link issues we are seeing in [1].

I wonder if that would make it worth a back patch?

Perhaps.  It's sad to have multiple people deep-diving into something fixed on
HEAD.  On the other hand, I'm not eager to spend risk-of-backpatch points on
this.  One alternative would be adding an errhint like "This is known to
happen occasionally during archive recovery, where it is harmless."  That has
an unpolished look, but it's low-risk and may avoid deep-dive efforts.

I think in this case a HINT might be sufficient to at least keep people from
wasting time tracking down a problem that has already been fixed.

However, there is another issue [1] that might argue for a back patch if
this patch (as I believe) would fix the issue.

[1] 
https://www.postgresql.org/message-id/CAHJZqBDxWfcd53jm0bFttuqpK3jV2YKWx%3D4W7KxNB4zzt%2B%2BqFg%40mail.gmail.com

That makes sense.  Each iteration of the restartpoint recycle loop has a 1/N
chance of failing.  Recovery adds >N files between restartpoints.  Hence, the
WAL directory grows without bound.  Is that roughly the theory in mind?

Yes, though you have formulated it better than I had in my mind.

Let's see if Don can confirm that he is seeing the "could not link file" messages.

Regards,
-David


Reply via email to