2662": read only 0 of 8192 bytes

Andres Freund Wed, 29 Aug 2018 12:46:30 -0700

Hi,

On 2018-08-29 12:56:07 -0400, Tom Lane wrote:
> I wrote:
> > * We now recursively enter ScanPgRelation, which (again) needs to do a
> > search using pg_class_oid_index, so it (again) opens and locks that.
> > BUT: LockRelationOid sees that *this process already has share lock on
> > pg_class_oid_index*, so it figures it can skip AcceptInvalidationMessages.
> 
> BTW, I now have a theory for why we suddenly started seeing this problem
> in mid-June: commits a54e1f158 et al added a ScanPgRelation call where
> there had been none before (in RelationReloadNailed, for non-index rels).
> That didn't create the problem, but it probably increased the odds of
> seeing it happen.


Yea.  Doesn't explain why it's only really visible on the BF in
11/master though :/


> Also ... isn't the last "relation->rd_isvalid = true" in
> RelationReloadNailed wrong?  If it got cleared during ScanPgRelation,
> I do not think we want to believe that we got an up-to-date row.

I don't really think so - note how a normal relcache inval essentially
does the same. RelationClearRelation() first marks the entry as invalid,
then goes and builds a new entry that's *not* hooked into the hashtable
(therefore doesn't receive new invals), and then moves the contents
over. That overwrites rd_isvalid to true, as that's guaranteed to be set
by by RelationBuildDesc(). During the move no new invalidations are
accepted.   So this really is just behaving equivalently.

The harder question is why that's safe. I think I convinced myself that
it is a couple times over the years, but I don't think we've properly
documented it. As the header says:
 *              The following code contains many undocumented hacks.  Please be
 *              careful....

We definitely relied on RelationClearRelation() always returning a valid
record for a while, c.f. RelationIdGetRelation()'s rd_isvalid assertion,
and the lack of a loop in that function.

(There's no coffee in this hotel at 4am. Shame.)

Ah, yes.  This assumption is currently safe because the locking on
relations being looked up, better guarantees that there's no critical
changes to relcache entries while the entry is being rebuilt.

I think we'd also run into trouble with clobber cache recursively etc
without it.

Greetings,

Andres Freund

Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

Reply via email to