On Wed, Jan 24, 2024 at 12:46:16PM -0500, Robert Haas wrote: > The "examining summary" line is generated based on the output of > pg_available_wal_summaries(). The way that works is that the server > calls readdir(), disassembles the filename into a TLI and two LSNs, > and returns the result. Then, a fraction of a second later, the test > script reassembles those components into a filename and finds the file > missing. If the logic to translate between filenames and TLIs & LSNs > were incorrect, the test would fail consistently. So the only > explanation that seems to fit the facts is the file disappearing out > from under us. But that really shouldn't happen. We do have code to > remove such files in MaybeRemoveOldWalSummaries(), but it's only > supposed to be nuking files more than 10 days old. > > So I don't really have a theory here as to what could be happening. :-(
There might be an overflow risk in the cutoff time calculation, but I doubt that's the root cause of these failures: /* * Files should only be removed if the last modification time precedes the * cutoff time we compute here. */ cutoff_time = time(NULL) - 60 * wal_summary_keep_time; Otherwise, I think we'll probably need to add some additional logging to figure out what is happening... -- Nathan Bossart Amazon Web Services: https://aws.amazon.com