On Mon, Jun 24, 2024 at 04:12:38PM +0200, Tomas Vondra wrote: > The important observation is that this only happens if a database is > created while the backup is running, and that it only happens with the > FILE_COPY strategy - I've never seen this with WAL_LOG (which is the > default since PG15).
My first thought is that this sounds related to the large comment in CreateDatabaseUsingFileCopy(): /* * We force a checkpoint before committing. This effectively means that * committed XLOG_DBASE_CREATE_FILE_COPY operations will never need to be * replayed (at least not in ordinary crash recovery; we still have to * make the XLOG entry for the benefit of PITR operations). This avoids * two nasty scenarios: * * #1: When PITR is off, we don't XLOG the contents of newly created * indexes; therefore the drop-and-recreate-whole-directory behavior of * DBASE_CREATE replay would lose such indexes. * * #2: Since we have to recopy the source database during DBASE_CREATE * replay, we run the risk of copying changes in it that were committed * after the original CREATE DATABASE command but before the system crash * that led to the replay. This is at least unexpected and at worst could * lead to inconsistencies, eg duplicate table names. * * (Both of these were real bugs in releases 8.0 through 8.0.3.) * * In PITR replay, the first of these isn't an issue, and the second is * only a risk if the CREATE DATABASE and subsequent template database * change both occur while a base backup is being taken. There doesn't * seem to be much we can do about that except document it as a * limitation. * * See CreateDatabaseUsingWalLog() for a less cheesy CREATE DATABASE * strategy that avoids these problems. */ > I don't recall any reports of similar issues from pre-15 releases, where > FILE_COPY was the only available option - I'm not sure why is that. > Either it didn't have this issue back then, or maybe people happen to > not create databases concurrently with a backup very often. It's a race > condition / timing issue, essentially. If it requires concurrent activity on the template database, I wouldn't be surprised at all that this is rare. > I see there have been a couple threads proposing various improvements to > FILE_COPY, that might make it more efficient/faster, namely using the > filesystem cloning [1] or switching pg_upgrade to use it [2]. But having > something that's (maybe) faster but not quite correct does not seem like > a winning strategy to me ... > > Alternatively, if we don't have clear desire to fix it, maybe the right > solution would be get rid of it? It would be unfortunate if we couldn't use this for pg_upgrade, especially if it is unaffected by these problems. -- nathan