On 2014-12-15 11:30:40 -0500, Tom Lane wrote:
> Andres Freund <and...@2ndquadrant.com> writes:
> > On 2014-12-15 10:15:30 -0500, Tom Lane wrote:
> >> The CLOBBER_CACHE_ALWAYS buildfarm members occasionally fail in
> >> contrib/test_decoding with
> >> TRAP: FailedAssertion("!(((bool)((relation)->rd_refcnt == 0)))", File: 
> >> "relcache.c", Line: 1981)
> 
> > Without catchup invalidations and/or an outside reference to a relation
> > that's normally not a problem because it won't get reloaded from the
> > caches at an inappropriate time while invisible. Until a few weeks ago
> > there was no regression test covering that case which is why these
> > crashes are only there now.
> 
> I've always thought that this whole idea of allowing the caches to contain
> stale information was a Rube Goldberg plan that was going to bite you on
> the ass eventually.

I guess we'll see. I don't think this specific case is that dramatic.
The complication here is that there's a reference to a relation from
outside logical decoding - that's not something actually likely to be
used at least by replication solutions. It's also impossible to hit from
the walsender interface.

We could just prohibit referencing relations like that... The SQL
interface primarily is used for regression tests and discovery of the
feature. At least as long there's no way to do streaming from SQL which
I can't really see how to do.

> This case isn't doing anything to increase my faith
> in it, and the proposed patch seems like just another kluge on top of
> a rickety stack.

Another possibility would be to replace
else if (!IsTransactionState())
by
else if (!IsTransactionState() || HistoricSnapshotActive())

as that pretty much what we need - invalidations during decoding happen
either outside of a valid transaction state or when entries aren't
referenced anyway.

Btw, if ever hit that code path would Assert() out without logical
decoding as well - it's not allowed to Destroy() a relation that's still
referenced as that assert shows. It's an especially bad idea if the
reference is actually from outside a subxact and the error is caught. I
think we should just remove the Destroy() call - the normal refcount
and/or resowners will take care of deleting the entry later.

> I think the safest fix would be to defer catchup interrupt processing
> while you're in this mode.  You don't really want to be processing any
> remote sinval messages at all, I'd think.

Well, we need to do relmap, smgr and similar things. So I think that'd
be more complicated than we want.

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to