On Tue, Aug 8, 2017 at 11:36 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> By looking at the stack-trace, and as discussed it with my team members; >> what we have observed that in SearchCatCacheList(), we are incrementing >> refcount and then decrementing it at the end. However for some reason, if >> we are in TRY() block (where we increment the refcount), and hit with any >> interrupt, we failed to decrement the refcount due to which later we get >> assertion failure. > > Hm. So SearchCatCacheList has a PG_TRY block that is meant to release > those refcounts, but if you hit the backend with a SIGTERM while it's > in that function, control goes out through elog(FATAL) which doesn't > execute the PG_CATCH cleanup. But it does do AbortTransaction which > calls AtEOXact_CatCache, and that is expecting that all the cache > refcounts have reached zero. > > We could respond to this by using PG_ENSURE_ERROR_CLEANUP there instead > of plain PG_TRY. But I have an itchy feeling that there may be a lot > of places with similar issues. Should we be revisiting the basic way > that elog(FATAL) works, to make it less unlike elog(ERROR)?
I'm not sure what the technically best fix here is. It strikes me that the charter of FATAL ought to be to exit the backend cleanly, safely, and expeditiously. PG_ENSURE_ERROR_CLEANUP, or some similar mechanism, is necessary when we need to correct the contents of shared memory so that other backends can continue to function, but there's no such problem here. You're proposing to do more work in the FATAL pathway so that the debugging cross-checks in the FATAL pathway will pass, which is not an entirely palatable idea. It's definitely frustrating that the ERROR and FATAL pathways are so different - that generated a surprising amount of the work around DSM - but I'm still skeptical about a solution that involves doing more cleanup of what are essentially irrelevant backend-local data structures in the FATAL path. Arguably, this assertion is merely overzealous; we could skip this processing when proc_exit_inprogress, for example. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers