On Fri, May 25, 2018 at 3:37 PM, Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
> Moving discussion to -hackers.  Tom, I think you worked most with this
> code, your input would be appreciated.
>
> Original discussion is around:
> http://archives.postgresql.org/message-id/20180524211311.
> tnswfnjwnii54htx%40alvherre.pgsql
>
> On 2018-05-24 17:13:11 -0400, Alvaro Herrera wrote:
> > On 2018-May-24, Andres Freund wrote:
> > > Then there's also:
> > > http://archives.postgresql.org/message-id/1527193504642.
> 36340%40amazon.com
> >
> > ah, so deleting the relcache file makes the problem to go away?  That's
> > definitely pretty strange.  I see no reason for the value in relcache to
> > become out of step with the catalogued value in the same database ... I
> > don't think we transmit in any way values of one database to another.
>
> I can reproduce the issue. As far as I can tell we just don't ever
> actually update nailed relcache entries in the normal course, leaving
> the "physical address" aside.  VACUUM will, via
> vac_update_relstats() -> heap_inplace_update() ->
> CacheInvalidateHeapTuple(),
> send out an invalidation. But invalidation, in my case another session,
> will essentially ignore most of that due to:
>
> static void
> RelationClearRelation(Relation relation, bool rebuild)
> ...
>         /*
>          * Never, never ever blow away a nailed-in system relation,
> because we'd
>          * be unable to recover.  However, we must redo
> RelationInitPhysicalAddr
>          * in case it is a mapped relation whose mapping changed.
>          *
>          * If it's a nailed-but-not-mapped index, then we need to re-read
> the
>          * pg_class row to see if its relfilenode changed. We do that
> immediately
>          * if we're inside a valid transaction and the relation is open
> (not
>          * counting the nailed refcount).  Otherwise just mark the entry as
>          * possibly invalid, and it'll be fixed when next opened.
>          */
>         if (relation->rd_isnailed)
>         {
>                 RelationInitPhysicalAddr(relation);
>
>                 if (relation->rd_rel->relkind == RELKIND_INDEX ||
>                         relation->rd_rel->relkind ==
> RELKIND_PARTITIONED_INDEX)
>                 {
>                         relation->rd_isvalid = false;   /* needs to be
> revalidated */
>                         if (relation->rd_refcnt > 1 &&
> IsTransactionState())
>                                 RelationReloadIndexInfo(relation);
>                 }
>                 return;
>         }
>
> Which basically means that once running we'll never update the relcache
> data for nailed entries.  That's unproblematic for most relcache fields,
> but not for things like RelationData->rd_rel->relfrozenxid / relminmxid.
>
> This'll e.g. lead to lazy_vacuum_rel() wrongly not using aggressive
> vacuums despite being required. And it'll lead, triggering this thread,
> to wrong errors being raised during vacuum because relfrozenxid just is
> some random value from the past.  I suspect this might also be
> co-responsible for a bunch of planning issues for queries involving the
> catalog, because the planner will use wrong relcache data until the next
> time the init file is thrown away?
>
> This looks like a very longstanding bug to me.  I'm not yet quite sure
> what the best way to deal with this is.  I suspect we might get away
> with just looking up a new version of the pg_class tuple and copying
> rd_rel over?
>
> Greetings,
>
> Andres Freund
>

I have a question related to this - and specifically, preventing the error
until we have a patch :).  We are encountering this error every few weeks
on one very high transaction db, and have to restart to fix it.

If I read you correctly, the cache may never be invalidated for these
catalogs even if I manually VACUUM them?  I was thinking if I routinely run
VACUUM FREEZE on these tables in every database I might avoid the issue.
But given the cause of the issue, would that just make no difference and I
will still hit the error eventually?

Thanks,
Jeremy

Reply via email to