On Thu, Apr 2, 2015 at 12:54 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> On Wed, Apr 1, 2015 at 7:05 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> I've not fully tracked it down, but I think that the blame falls on the
>>> MVCC-snapshots-for-catalog-scans patch; it appears that it's trying to
>>> read pg_am's pg_class entry with a snapshot that's too old, possibly
>>> because it assumes that sinval signaling is alive which I think ain't so.
>
>> Hmm, sorry for the bug.  It looks to me like sinval signaling gets
>> started up at the beginning of InitPostgres().  Perhaps something like
>> this?
>
> Yeah, it does, so you'd think this would work.  I've now tracked down
> exactly what's happening, and it's this: while we're reading pg_authid
> during PerformAuthentication, we establish a catalog snapshot.  Later on,
> we read pg_database to find out what DB we're connecting to.  If any
> sinval messages for unshared system catalogs arrive at that point, they'll
> effectively be ignored, *because our MyDatabaseId is still zero and so
> doesn't match the incoming message*.  That's okay as far as the catcaches
> are concerned, cause they're empty, and the relcache only knows about
> shared catalogs so far.  But InvalidateCatalogSnapshot doesn't get called
> by LocalExecuteInvalidationMessage, and so we think we can plow ahead with
> the old snapshot.
>
> It looks to me like an appropriate fix would be as attached; thoughts?
> (I've tested this with a delay during relation lock acquisition and
> concurrent whole-database VACUUM FULLs, and so far been unable to break
> it.)

Hmm, that fix doesn't reach as far as what I did.  My proposal would
regard a catalog snapshot as immediately stale, so if we're asked for
a catalog snapshot multiple times before InitPostgres() is called,
we'll take a new one every time.  Your proposal invalidates them just
once, after setting up MyDatabaseId.  Assuming yours is safe, it's
better, because it invalidates less aggressively.

The only thing I'm worried about is that I think
PerformAuthentication() runs before InitPostgres(); sinval isn't
working at all at that point, even for shared catalogs.  If we were to
lock a relation, build a relcache entry, unlock it, and then do that
again, all before SharedInvalBackendInit(false) is called, what would
keep us from failing to notice an intervening invalidation?  Maybe
there is something, because otherwise this was probably broken even
before the MVCC-catalog-snapshots patch, but I don't know what it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to