2662": read only 0 of 8192 bytes

Tom Lane Wed, 29 Aug 2018 08:31:09 -0700

Andres Freund <and...@anarazel.de> writes:
> It's not OK to rebuild relcache entries in the middle of
> ReceiveSharedInvalidMessages() - a later entry in the invalidation queue
> might be relmapper invalidation, and thus immediately processing a
> relcache invalidation might attempt to scan a relation that does not
> exist anymore.


Check, but the bug is actually more subtle than that.  I've reproduced the
problem using your scripts, changed the ERROR to PANIC, and studied the
stack traces of a few core dumps, and here's what I find is the scenario:

* Code tries to open some relation or other.  After acquiring lock, it
goes off to do AcceptInvalidationMessages (cf. LockRelationOid).  That
reads a relcache inval message for "t", and because "t"'s relcache entry
is already open, it decides to rebuild it immediately.

* Therefore it calls ScanPgRelation, which needs to do a search using
pg_class_oid_index, so it opens and locks that.  Again, having acquired
the lock in LockRelationOid, it goes off to do AcceptInvalidationMessages
(recursively).

* Now we read another relcache inval (for "t_pkey", in the cases I've
traced through in detail).  Again, that's open and we decide we'd better
do RelationReloadIndexInfo for it.

* We now recursively enter ScanPgRelation, which (again) needs to do a
search using pg_class_oid_index, so it (again) opens and locks that.
BUT: LockRelationOid sees that *this process already has share lock on
pg_class_oid_index*, so it figures it can skip AcceptInvalidationMessages.

Had we done AcceptInvalidationMessages at this point, we'd have found
the relcache inval message that's waiting for us about pg_class_oid_index,
and we'd have processed it before attempting to read pg_class_oid_index,
and all would be well.  It's the missed AIM call due to the recursive
entry to ScanPgRelation that is causing us to fail to read all available
inval messages before believing that our info about pg_class_oid_index
is up to date.

> Instead, receiving a relcache invalidation now just queues an entry onto
> a new "pending rebuilds" list, that is then processed in a second stage,
> after emptying the entire sinval queue.  At that stage we're guaranteed
> that the relmapper is up2date.

More to the point, we're guaranteed that we've seen all inval messages
that were sent before we acquired whatever lock we just acquired.
(In the cases I've looked at, the local relmapper *is* up to date,
it's the relcache entry for pg_class_oid_index that is not.)

My first gut feeling after understanding the recursion aspect was that
what you're suggesting would not be enough to be bulletproof.  We will
still have the hazard that while we're rebuilding relcache entries,
we may end up doing recursive ScanPgRelation calls and the inner ones
will decide they needn't call AcceptInvalidationMessages.  So we might
well still be rebuilding stuff without having seen every available
inval message.  But it should be OK, because any inval messages we could
possibly need to worry about will have been scanned and queued by the
outer call.

We could perhaps fix this with a less invasive change than what you
suggest here, by attacking the missed-call-due-to-recursion aspect
rather than monkeying with how relcache rebuild itself works.  But I'm
thinking this way might be a good idea anyhow, in order to reduce the
depth of recursion that occurs in scenarios like this.

I've not read your patch yet, so no comment on specifics.

                        regards, tom lane

Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

Reply via email to