Jeevan Chalke <jeevan.cha...@enterprisedb.com> writes: > We have observed a random server crash (FailedAssertion), while running few > tests at our end. Stack-trace is attached.
> By looking at the stack-trace, and as discussed it with my team members; > what we have observed that in SearchCatCacheList(), we are incrementing > refcount and then decrementing it at the end. However for some reason, if > we are in TRY() block (where we increment the refcount), and hit with any > interrupt, we failed to decrement the refcount due to which later we get > assertion failure. Hm. So SearchCatCacheList has a PG_TRY block that is meant to release those refcounts, but if you hit the backend with a SIGTERM while it's in that function, control goes out through elog(FATAL) which doesn't execute the PG_CATCH cleanup. But it does do AbortTransaction which calls AtEOXact_CatCache, and that is expecting that all the cache refcounts have reached zero. We could respond to this by using PG_ENSURE_ERROR_CLEANUP there instead of plain PG_TRY. But I have an itchy feeling that there may be a lot of places with similar issues. Should we be revisiting the basic way that elog(FATAL) works, to make it less unlike elog(ERROR)? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers