On Tue, Apr 23, 2019 at 08:03:37PM -0400, Tom Lane wrote:
> Oh!  One gets you ten it "works" as long as the pg_class update is a
> HOT update, so that we don't actually end up touching the indexes.
> This explains why the crash is less likely to happen in a database
> where one's done some work (and, probably, created some dead space in
> pg_class).  On the other hand, it doesn't quite fit the observation
> that a VACUUM FULL masked the problem ... wouldn't that have ended up
> with densely packed pg_class?  Maybe not, if it rebuilt everything
> else after pg_class...

I have been able to spend a bit more time testing and looking at the
root of the problem, and I have found two things:
1) The problem is reproducible with REL9_5_STABLE.
2) Bisecting between the merge base points of REL9_4_STABLE/master and
REL9_5_STABLE/master, I am being pointed to the introduction of
replication origins:
commit: 5aa2350426c4fdb3d04568b65aadac397012bbcb
author: Andres Freund <and...@anarazel.de>
date: Wed, 29 Apr 2015 19:30:53 +0200
Introduce replication progress tracking infrastructure.

In order to see the problem, also one needs to patch initdb.c so as
the final VACUUM FULL on pg_database is replaced by VACUUM as on
9.6~.  The root of the problem is actually surprising, but manually
testing on 5aa2350 commit and 5aa2350~1 the difference shows up as the
issue is easily reproducible here.
--
Michael

Attachment: signature.asc
Description: PGP signature

Reply via email to