On Fri, May 25, 2018 at 3:37 PM, Andres Freund <and...@anarazel.de> wrote:
> Hi, > > Moving discussion to -hackers. Tom, I think you worked most with this > code, your input would be appreciated. > > Original discussion is around: > http://archives.postgresql.org/message-id/20180524211311. > tnswfnjwnii54htx%40alvherre.pgsql > > On 2018-05-24 17:13:11 -0400, Alvaro Herrera wrote: > > On 2018-May-24, Andres Freund wrote: > > > Then there's also: > > > http://archives.postgresql.org/message-id/1527193504642. > 36340%40amazon.com > > > > ah, so deleting the relcache file makes the problem to go away? That's > > definitely pretty strange. I see no reason for the value in relcache to > > become out of step with the catalogued value in the same database ... I > > don't think we transmit in any way values of one database to another. > > I can reproduce the issue. As far as I can tell we just don't ever > actually update nailed relcache entries in the normal course, leaving > the "physical address" aside. VACUUM will, via > vac_update_relstats() -> heap_inplace_update() -> > CacheInvalidateHeapTuple(), > send out an invalidation. But invalidation, in my case another session, > will essentially ignore most of that due to: > > static void > RelationClearRelation(Relation relation, bool rebuild) > ... > /* > * Never, never ever blow away a nailed-in system relation, > because we'd > * be unable to recover. However, we must redo > RelationInitPhysicalAddr > * in case it is a mapped relation whose mapping changed. > * > * If it's a nailed-but-not-mapped index, then we need to re-read > the > * pg_class row to see if its relfilenode changed. We do that > immediately > * if we're inside a valid transaction and the relation is open > (not > * counting the nailed refcount). Otherwise just mark the entry as > * possibly invalid, and it'll be fixed when next opened. > */ > if (relation->rd_isnailed) > { > RelationInitPhysicalAddr(relation); > > if (relation->rd_rel->relkind == RELKIND_INDEX || > relation->rd_rel->relkind == > RELKIND_PARTITIONED_INDEX) > { > relation->rd_isvalid = false; /* needs to be > revalidated */ > if (relation->rd_refcnt > 1 && > IsTransactionState()) > RelationReloadIndexInfo(relation); > } > return; > } > > Which basically means that once running we'll never update the relcache > data for nailed entries. That's unproblematic for most relcache fields, > but not for things like RelationData->rd_rel->relfrozenxid / relminmxid. > > This'll e.g. lead to lazy_vacuum_rel() wrongly not using aggressive > vacuums despite being required. And it'll lead, triggering this thread, > to wrong errors being raised during vacuum because relfrozenxid just is > some random value from the past. I suspect this might also be > co-responsible for a bunch of planning issues for queries involving the > catalog, because the planner will use wrong relcache data until the next > time the init file is thrown away? > > This looks like a very longstanding bug to me. I'm not yet quite sure > what the best way to deal with this is. I suspect we might get away > with just looking up a new version of the pg_class tuple and copying > rd_rel over? > > Greetings, > > Andres Freund > I have a question related to this - and specifically, preventing the error until we have a patch :). We are encountering this error every few weeks on one very high transaction db, and have to restart to fix it. If I read you correctly, the cache may never be invalidated for these catalogs even if I manually VACUUM them? I was thinking if I routinely run VACUUM FREEZE on these tables in every database I might avoid the issue. But given the cause of the issue, would that just make no difference and I will still hit the error eventually? Thanks, Jeremy