On Tue, Apr 23, 2019 at 08:03:37PM -0400, Tom Lane wrote: > Oh! One gets you ten it "works" as long as the pg_class update is a > HOT update, so that we don't actually end up touching the indexes. > This explains why the crash is less likely to happen in a database > where one's done some work (and, probably, created some dead space in > pg_class). On the other hand, it doesn't quite fit the observation > that a VACUUM FULL masked the problem ... wouldn't that have ended up > with densely packed pg_class? Maybe not, if it rebuilt everything > else after pg_class...
I have been able to spend a bit more time testing and looking at the root of the problem, and I have found two things: 1) The problem is reproducible with REL9_5_STABLE. 2) Bisecting between the merge base points of REL9_4_STABLE/master and REL9_5_STABLE/master, I am being pointed to the introduction of replication origins: commit: 5aa2350426c4fdb3d04568b65aadac397012bbcb author: Andres Freund <and...@anarazel.de> date: Wed, 29 Apr 2015 19:30:53 +0200 Introduce replication progress tracking infrastructure. In order to see the problem, also one needs to patch initdb.c so as the final VACUUM FULL on pg_database is replaced by VACUUM as on 9.6~. The root of the problem is actually surprising, but manually testing on 5aa2350 commit and 5aa2350~1 the difference shows up as the issue is easily reproducible here. -- Michael
signature.asc
Description: PGP signature