A customer's upgrade failed, and it took me a while to figure out that the problem was that they had set "vacuum_defer_cleanup_age=10000" on the new cluster.
The consequence was that the "vacuumdb --freeze" that takes place before copying commit log files failed to freeze "pg_database". That caused later updates to the table to fail with "Could not open file "pg_xact/0000": No such file or directory." I think it would increase the robustness of pg_upgrade to force "vacuum_defer_cleanup_age" to 0 on the new cluster. Suggested patch attached. Yours, Laurenz Albe
From e8f84eaadb132b8647406bc28028862d6d4d8717 Mon Sep 17 00:00:00 2001 From: Laurenz Albe <laurenz.a...@cybertec.at> Date: Wed, 10 Jun 2020 15:57:36 +0200 Subject: [PATCH] Force vacuum_defer_cleanup_age=0 during pg_upgrade This is important on the new cluster, because tuples must be frozen before the commit logs can be safely copied from the old server. Author: Laurenz Albe Backpatch-through: 9.5 --- src/bin/pg_upgrade/server.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c index 7d17280ecb..79ec3f04c0 100644 --- a/src/bin/pg_upgrade/server.c +++ b/src/bin/pg_upgrade/server.c @@ -239,6 +239,9 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error) * we only modify the new cluster, so only use it there. If there is a * crash, the new cluster has to be recreated anyway. fsync=off is a big * win on ext4. + * + * Force vacuum_defer_cleanup_age to 0 on the new cluster, so that + * vacuumdb --freeze actually freezes the tuples. */ snprintf(cmd, sizeof(cmd), "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s\" start", @@ -247,7 +250,7 @@ start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error) BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" : " -c autovacuum=off -c autovacuum_freeze_max_age=2000000000", (cluster == &new_cluster) ? - " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "", + " -c synchronous_commit=off -c fsync=off -c full_page_writes=off -c vacuum_defer_cleanup_age=0" : "", cluster->pgopts ? cluster->pgopts : "", socket_string); /* -- 2.21.3