Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling
On Sun, Aug 13, 2017 at 3:05 PM, Tom Lanewrote: > Not having heard anyone arguing against that, I'll go make it so, > ie AtEOXact_CatCache is toast in all branches. Great, thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling
Robert Haaswrites: > On Tue, Aug 8, 2017 at 10:48 PM, Tom Lane wrote: >> Robert Haas writes: >>> On Tue, Aug 8, 2017 at 4:34 PM, Tom Lane wrote: In the meantime, I think my vote would be to remove AtEOXact_CatCache. >>> In all supported branches? >> Whatever we do about this issue, I don't feel a need to do it further >> back than HEAD. It's a non-problem except in an assert-enabled build, >> and we don't recommend running those for production, only development. > Sure, but people still do testing and development against older > branches - bug fixes, for example. It doesn't make much sense to me > to leave code that we know does the wrong thing in the back branches. Not having heard anyone arguing against that, I'll go make it so, ie AtEOXact_CatCache is toast in all branches. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling
Andreas Seltenreichwrites: > Tom Lane writes: >> I wonder if Andreas would be interested in trying the randomly-timed- >> SIGTERM thing with sqlsmith. > So far, most of the core dumps generated are Jeevan's assertion failing > with backtraces through SearchCatCacheList. The rest is failing this > assertion: > TRAP: FailedAssertion("!(portal->cleanup == ((void *)0))", File: > "portalmem.c", Line: 846) > Example backtrace below. They all happened during a rollback statement. > Testing was done on master at 2336f84284. Interesting. I can reproduce this by forcing a FATAL exit right there, eg by adding if (pstmt->utilityStmt && IsA(pstmt->utilityStmt, TransactionStmt) && ((TransactionStmt *) pstmt->utilityStmt)->kind == TRANS_STMT_ROLLBACK) InterruptPending = ProcDiePending = true; before PortalRunMulti's CHECK_FOR_INTERRUPTS. But it only fails if the rollback is trying to exit a transaction that's already suffered an error. The explanation seems to be: 1. Because we already aborted the transaction, xact.c is in state blockState == TBLOCK_ABORT. 2. This causes AbortOutOfAnyTransaction to think that it can skip doing AbortTransaction() and go straight to CleanupTransaction(). 3. AtCleanup_Portals() is expecting that we already ran, and cleared, the portal cleanup hook (PortalCleanup) for every live portal. But we have not done so for the active portal that we were in the midst of running ROLLBACK in. So it fails the mentioned assertion. Thus, the basic problem is that we get to CleanupTransaction() without having done PortalDrop on the portal we're running ROLLBACK in. We could take this as an argument for simply removing that assertion, which would mean that when AtCleanup_Portals calls PortalDrop, the cleanup hook would get run, the same as it would have if exec_simple_query had had a chance to drop the portal. However, I'm still pretty afraid of allowing arbitrary code to execute during CleanupTransaction(). What seems like a better idea is to make AtCleanup_Portals just summarily clear the cleanup hook, perhaps while emitting a warning. I tried that (see portalmem.c changes in attached patch), and what I got was this: regression=# rollback; WARNING: skipping cleanup for portal "" FATAL: terminating connection due to administrator command FATAL: cannot drop active portal "" server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. or in the server log: 2017-08-13 13:49:50.312 EDT [8737] FATAL: terminating connection due to administrator command 2017-08-13 13:49:50.312 EDT [8737] STATEMENT: rollback; 2017-08-13 13:49:50.312 EDT [8737] WARNING: skipping cleanup for portal "" 2017-08-13 13:49:50.312 EDT [8737] FATAL: cannot drop active portal "" Well, we could tolerate that maybe, but it doesn't seem like it's actually acting nicely; the active portal is still creating problems for us. After some further thought, I decided to try making AbortOutOfAnyTransaction call AtAbort_Portals() in this situation, thus basically doing the minimum part of AbortTransaction() needed to clean up the active portal. That almost worked --- my initial try got an assertion failure in mcxt.c, because somebody was trying to drop the CurrentMemoryContext. So the minimum part of AbortTransaction that we need here is really AtAbort_Memory + AtAbort_Portals. After further thought I decided it'd be a good idea to phrase that as an unconditional AtAbort_Memory at the top of AbortOutOfAnyTransaction, thus making sure we are in some valid context to start with; and then, in case the loop itself doesn't have anything to do, we need AtCleanup_Memory() at the bottom of the function to revert CurrentMemoryContext to TopMemoryContext. In short, the attached patch seems to fix it nicely. We definitely need the xact.c changes, but the ones in portalmem.c are perhaps optional, as in theory now the assertion will never be violated. But I'm inclined to keep the portalmem changes anyway, as that will make it more robust if the situation ever happens in a non-assert build. Note: there are a couple of comments elsewhere in portalmem.c that need adjustment if we keep the portalmem changes. I have not bothered to do that yet, as it's just cosmetic. Comments? regards, tom lane diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 50c3c3b..1eabc67 100644 *** a/src/backend/access/transam/xact.c --- b/src/backend/access/transam/xact.c *** AbortOutOfAnyTransaction(void) *** 4233,4238 --- 4233,4241 { TransactionState s = CurrentTransactionState; + /* Ensure we're not running in a doomed memory context */ + AtAbort_Memory(); + /* * Get out of any transaction or nested transaction */ ***
Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling
Tom Lane writes: > I wonder if Andreas would be interested in trying the randomly-timed- > SIGTERM thing with sqlsmith. So far, most of the core dumps generated are Jeevan's assertion failing with backtraces through SearchCatCacheList. The rest is failing this assertion: TRAP: FailedAssertion("!(portal->cleanup == ((void *)0))", File: "portalmem.c", Line: 846) Example backtrace below. They all happened during a rollback statement. Testing was done on master at 2336f84284. regards, Andreas Core was generated by `postgres: smith regression [local] ROLLBACK '. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 #1 0x7f4c26d3240a in __GI_abort () at abort.c:89 #2 0x559d18897a73 in ExceptionalCondition (conditionName=conditionName@entry=0x559d18a81370 "!(portal->cleanup == ((void *)0))", errorType=errorType@entry=0x559d188e3f7d "FailedAssertion", fileName=fileName@entry=0x559d18a81013 "portalmem.c", lineNumber=lineNumber@entry=846) at assert.c:54 #3 0x559d188c42f1 in AtCleanup_Portals () at portalmem.c:846 #4 0x559d18536cb7 in CleanupTransaction () at xact.c:2652 #5 0x559d1853b825 in AbortOutOfAnyTransaction () at xact.c:4278 #6 0x559d188a7799 in ShutdownPostgres (code=, arg=) at postinit.c:1146 #7 0x559d1876b4e9 in shmem_exit (code=code@entry=1) at ipc.c:228 #8 0x559d1876b5fa in proc_exit_prepare (code=code@entry=1) at ipc.c:185 #9 0x559d1876b688 in proc_exit (code=code@entry=1) at ipc.c:102 #10 0x559d188999b1 in errfinish (dummy=) at elog.c:543 #11 0x559d1878fefa in ProcessInterrupts () at postgres.c:2841 #12 0x559d18790829 in ProcessInterrupts () at postgres.c:2828 #13 0x559d18795395 in PortalRunMulti (portal=portal@entry=0x559d197f2bf0, isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapshot@entry=0 '\000', dest=dest@entry=0x559d19850c40, altdest=altdest@entry=0x559d19850c40, completionTag=completionTag@entry=0x7ffc04f1b560 "") at pquery.c:1239 #14 0x559d18796069 in PortalRun (portal=portal@entry=0x559d197f2bf0, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=run_once@entry=1 '\001', dest=dest@entry=0x559d19850c40, altdest=altdest@entry=0x559d19850c40, completionTag=0x7ffc04f1b560 "") at pquery.c:799 #15 0x559d18791dca in exec_simple_query (query_string=0x559d1984fe00 "ROLLBACK;") at postgres.c:1099 #16 0x559d18793af1 in PostgresMain (argc=, argv=argv@entry=0x559d197fa078, dbname=, username=) at postgres.c:4090 #17 0x559d184a3428 in BackendRun (port=0x559d197e8f00) at postmaster.c:4357 #18 BackendStartup (port=0x559d197e8f00) at postmaster.c:4029 #19 ServerLoop () at postmaster.c:1753 #20 0x559d1871ad65 in PostmasterMain (argc=3, argv=0x559d197be5a0) at postmaster.c:1361 #21 0x559d184a4a6d in main (argc=3, argv=0x559d197be5a0) at main.c:228 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling
On Thu, Aug 10, 2017 at 9:00 PM, Robert Haaswrote: > On Thu, Aug 10, 2017 at 2:50 AM, Andreas Seltenreich > wrote: >> Will do. Won't miss this chance to try out discostu's extension >> pg_rage_terminator[1] :-) >> [1] https://github.com/disco-stu/pg_rage_terminator > > Oh, that's *awesome*. You can do that at query level using the planner hook: https://github.com/michaelpq/pg_plugins/tree/master/pg_panic The PANIC should be lowered to a FATAL though. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling
On Tue, Aug 8, 2017 at 10:48 PM, Tom Lanewrote: > Robert Haas writes: >> On Tue, Aug 8, 2017 at 4:34 PM, Tom Lane wrote: >>> In the meantime, I think my vote would be to remove AtEOXact_CatCache. > >> In all supported branches? > > Whatever we do about this issue, I don't feel a need to do it further > back than HEAD. It's a non-problem except in an assert-enabled build, > and we don't recommend running those for production, only development. Sure, but people still do testing and development against older branches - bug fixes, for example. It doesn't make much sense to me to leave code that we know does the wrong thing in the back branches. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling
On Thu, Aug 10, 2017 at 2:50 AM, Andreas Seltenreichwrote: > Will do. Won't miss this chance to try out discostu's extension > pg_rage_terminator[1] :-) > [1] https://github.com/disco-stu/pg_rage_terminator Oh, that's *awesome*. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling
Tom Lane writes: > I wonder if Andreas would be interested in trying the randomly-timed- > SIGTERM thing with sqlsmith. Will do. Won't miss this chance to try out discostu's extension pg_rage_terminator[1] :-) regards, Andreas Footnotes: [1] https://github.com/disco-stu/pg_rage_terminator -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling
Robert Haaswrites: > On Tue, Aug 8, 2017 at 4:34 PM, Tom Lane wrote: >> In the meantime, I think my vote would be to remove AtEOXact_CatCache. > In all supported branches? Whatever we do about this issue, I don't feel a need to do it further back than HEAD. It's a non-problem except in an assert-enabled build, and we don't recommend running those for production, only development. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling
On Tue, Aug 8, 2017 at 4:34 PM, Tom Lanewrote: > In the meantime, I think my vote would be to remove AtEOXact_CatCache. In all supported branches? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling
Robert Haaswrites: > On Tue, Aug 8, 2017 at 12:26 PM, Tom Lane wrote: >> Yeah, I thought about weakening the assertions too, but I couldn't >> see a fix of that kind that didn't seem mighty ad-hoc. > More concretely, the present example seems no different than the > ResourceOwner stuff emitting warnings on commit and remaining silent > on abort. We could make it complain on both commit and abort, but > then it would fail spuriously because there's no other mechanism to > release resources in the abort path, so we don't. Similarly here, we > have every reason to expect the check to pass during ERROR recovery > but there is no reason to expect it to pass during FATAL recovery, yet > as coded we will do the test anyway. That's wrong. I think that's arguing from expedience not principle. We had every reason to think it would pass during FATAL errors too, until we noticed this specific misbehavior; and there is at least as much of an argument that this is a bug in SearchCatCacheList as there is that the check is too strong. >> Now, there is some room to argue that AtEOXact_CatCache() is just >> obsolete and we should remove it altogether; I don't think it's >> caught a real bug since we invented resowners in 8.1. > Yeah, the same thought crossed my mind. IIUC, we'd crash if a > catcache reference were acquired without CurrentResourceOwner being > valid, so this is really just a belt-and-suspenders check. Right. It was worth keeping it around till we were sure all the bugs were shaken out of ResourceOwners, but surely we crossed the point of diminishing returns for that long ago. > ... I'm also not deadly opposed to > redesigning the whole mechanism either, but I think that should be > approached with a measure of caution: it might end up reducing > reliability rather than increasing it. I suggest in any case that we > start with a surgical fix. In the absence of clear evidence that there are similar bugs elsewhere, I agree that redesigning FATAL exits would likely cause more problems than it solves. But I feel like it would be a good thing to test for. I wonder if Andreas would be interested in trying the randomly-timed- SIGTERM thing with sqlsmith. In the meantime, I think my vote would be to remove AtEOXact_CatCache. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling
On Tue, Aug 8, 2017 at 12:26 PM, Tom Lanewrote: > Yeah, I thought about weakening the assertions too, but I couldn't > see a fix of that kind that didn't seem mighty ad-hoc. I don't really see what's ad-hoc about skipping it in the case where a FATAL is in progress. I mean, skipping a sanity check only in the cases where we know it might pass - and are OK with the fact that it might not pass - seems to me to be an extremely difficult policy to argue against on rational grounds. That's just the definition of writing correct sanity checks. More concretely, the present example seems no different than the ResourceOwner stuff emitting warnings on commit and remaining silent on abort. We could make it complain on both commit and abort, but then it would fail spuriously because there's no other mechanism to release resources in the abort path, so we don't. Similarly here, we have every reason to expect the check to pass during ERROR recovery but there is no reason to expect it to pass during FATAL recovery, yet as coded we will do the test anyway. That's wrong. > Now, there is some room to argue that AtEOXact_CatCache() is just > obsolete and we should remove it altogether; I don't think it's > caught a real bug since we invented resowners in 8.1. Yeah, the same thought crossed my mind. IIUC, we'd crash if a catcache reference were acquired without CurrentResourceOwner being valid, so this is really just a belt-and-suspenders check. > But, again, the larger question to my mind is where else we may have > similar issues. There's certainly never been any methodical attempt > to see whether elog(FATAL) will work everywhere. IIRC, you raised a similar objection back when Bruce added pg_terminate_backend(), which caused that change to be briefly reverted before ultimately being put back. Despite the hardening you did back then, I think it's highly likely that bugs remain in that path, and I am of course not opposed to trying to improve the situation. That having been said, the bugs that remain are probably mostly quite low-probability, because otherwise we'd have found them by now. I think parallel query is likely to flush out a few of the ones that remain by creating a new class of backends that terminate after a single query and may get killed by the leader at any time; that's how we discovered this issue. Fuzz testing could be done, too, e.g. run something like sqlsmith simultaneously in many sessions while killing off backends at random. I'm also not deadly opposed to redesigning the whole mechanism either, but I think that should be approached with a measure of caution: it might end up reducing reliability rather than increasing it. I suggest in any case that we start with a surgical fix. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling
Robert Haaswrites: > On Tue, Aug 8, 2017 at 11:36 AM, Tom Lane wrote: >> We could respond to this by using PG_ENSURE_ERROR_CLEANUP there instead >> of plain PG_TRY. But I have an itchy feeling that there may be a lot >> of places with similar issues. Should we be revisiting the basic way >> that elog(FATAL) works, to make it less unlike elog(ERROR)? > ... Arguably, this assertion is merely overzealous; we could skip > this processing when proc_exit_inprogress, for example. Yeah, I thought about weakening the assertions too, but I couldn't see a fix of that kind that didn't seem mighty ad-hoc. Now, there is some room to argue that AtEOXact_CatCache() is just obsolete and we should remove it altogether; I don't think it's caught a real bug since we invented resowners in 8.1. Short of that, though, I'm not really happy with just arbitrarily weakening the checks. But, again, the larger question to my mind is where else we may have similar issues. There's certainly never been any methodical attempt to see whether elog(FATAL) will work everywhere. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling
On Tue, Aug 8, 2017 at 11:36 AM, Tom Lanewrote: >> By looking at the stack-trace, and as discussed it with my team members; >> what we have observed that in SearchCatCacheList(), we are incrementing >> refcount and then decrementing it at the end. However for some reason, if >> we are in TRY() block (where we increment the refcount), and hit with any >> interrupt, we failed to decrement the refcount due to which later we get >> assertion failure. > > Hm. So SearchCatCacheList has a PG_TRY block that is meant to release > those refcounts, but if you hit the backend with a SIGTERM while it's > in that function, control goes out through elog(FATAL) which doesn't > execute the PG_CATCH cleanup. But it does do AbortTransaction which > calls AtEOXact_CatCache, and that is expecting that all the cache > refcounts have reached zero. > > We could respond to this by using PG_ENSURE_ERROR_CLEANUP there instead > of plain PG_TRY. But I have an itchy feeling that there may be a lot > of places with similar issues. Should we be revisiting the basic way > that elog(FATAL) works, to make it less unlike elog(ERROR)? I'm not sure what the technically best fix here is. It strikes me that the charter of FATAL ought to be to exit the backend cleanly, safely, and expeditiously. PG_ENSURE_ERROR_CLEANUP, or some similar mechanism, is necessary when we need to correct the contents of shared memory so that other backends can continue to function, but there's no such problem here. You're proposing to do more work in the FATAL pathway so that the debugging cross-checks in the FATAL pathway will pass, which is not an entirely palatable idea. It's definitely frustrating that the ERROR and FATAL pathways are so different - that generated a surprising amount of the work around DSM - but I'm still skeptical about a solution that involves doing more cleanup of what are essentially irrelevant backend-local data structures in the FATAL path. Arguably, this assertion is merely overzealous; we could skip this processing when proc_exit_inprogress, for example. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling
Jeevan Chalkewrites: > We have observed a random server crash (FailedAssertion), while running few > tests at our end. Stack-trace is attached. > By looking at the stack-trace, and as discussed it with my team members; > what we have observed that in SearchCatCacheList(), we are incrementing > refcount and then decrementing it at the end. However for some reason, if > we are in TRY() block (where we increment the refcount), and hit with any > interrupt, we failed to decrement the refcount due to which later we get > assertion failure. Hm. So SearchCatCacheList has a PG_TRY block that is meant to release those refcounts, but if you hit the backend with a SIGTERM while it's in that function, control goes out through elog(FATAL) which doesn't execute the PG_CATCH cleanup. But it does do AbortTransaction which calls AtEOXact_CatCache, and that is expecting that all the cache refcounts have reached zero. We could respond to this by using PG_ENSURE_ERROR_CLEANUP there instead of plain PG_TRY. But I have an itchy feeling that there may be a lot of places with similar issues. Should we be revisiting the basic way that elog(FATAL) works, to make it less unlike elog(ERROR)? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling
Hi, We have observed a random server crash (FailedAssertion), while running few tests at our end. Stack-trace is attached. By looking at the stack-trace, and as discussed it with my team members; what we have observed that in SearchCatCacheList(), we are incrementing refcount and then decrementing it at the end. However for some reason, if we are in TRY() block (where we increment the refcount), and hit with any interrupt, we failed to decrement the refcount due to which later we get assertion failure. To mimic the scenario, I have added a sleep in SearchCatCacheList() as given below: diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index e7e8e3b..eb6d4b5 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -1520,6 +1520,9 @@ SearchCatCacheList(CatCache *cache, hashValue = CatalogCacheComputeTupleHashValue(cache, ntp); hashIndex = HASH_INDEX(hashValue, cache->cc_nbuckets); + elog(INFO, "Sleeping for 0.1 seconds."); + pg_usleep(10L); /* 0.1 seconds */ + bucket = >cc_bucket[hashIndex]; dlist_foreach(iter, bucket) { And then followed these steps to get a server crash: -- Terminal 1 DROP TYPE typ; DROP FUNCTION func(x int); CREATE TYPE typ AS (X VARCHAR(50), Y INT); CREATE OR REPLACE FUNCTION func(x int) RETURNS int AS $$ DECLARE rec typ; var2 numeric; BEGIN RAISE NOTICE 'Function Called.'; REC.X := 'Hello'; REC.Y := 0; IF (rec.Y + var2) = 0 THEN RAISE NOTICE 'Check Pass'; END IF; RETURN 1; END; $$ LANGUAGE plpgsql; SELECT pg_backend_pid(); SELECT func(1); -- Terminal 2, should be run in parallel when SELECT func(1) is in progress in terminal 1. SELECT pg_terminate_backend(); I thought it worth posting here to get others attention. I have observed this on the master branch, but can also be reproducible on back-branches. Thanks -- Jeevan Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company Program terminated with signal 6, Aborted. #0 0x7fdff78951d7 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install audit-libs-2.6.5-3.el7_3.1.x86_64 cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 libcap-ng-0.7.5-4.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libicu-50.1.2-15.el7.x86_64 libselinux-2.5-6.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64 nspr-4.13.1-1.0.el7_3.x86_64 nss-3.28.2-1.6.el7_3.x86_64 nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64 nss-util-3.28.2-1.1.el7_3.x86_64 openldap-2.4.40-13.el7.x86_64 openssl-libs-1.0.1e-60.el7_3.1.x86_64 pam-1.1.8-18.el7.x86_64 pcre-8.32-15.el7_2.1.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt #0 0x7fdff78951d7 in raise () from /lib64/libc.so.6 #1 0x7fdff78968c8 in abort () from /lib64/libc.so.6 #2 0x009d43ff in ExceptionalCondition (conditionName=0xc02255 "!(ct->refcount == 0)", errorType=0xc020b4 "FailedAssertion", fileName=0xc02060 "catcache.c", lineNumber=567) at assert.c:54 #3 0x009b59e0 in AtEOXact_CatCache (isCommit=0 '\000') at catcache.c:567 #4 0x00538a8d in AbortTransaction () at xact.c:2613 #5 0x0053a95e in AbortOutOfAnyTransaction () at xact.c:4271 #6 0x009e8e99 in ShutdownPostgres (code=1, arg=0) at postinit.c:1146 #7 0x008456e1 in shmem_exit (code=1) at ipc.c:228 #8 0x008455d5 in proc_exit_prepare (code=1) at ipc.c:185 #9 0x00845543 in proc_exit (code=1) at ipc.c:102 #10 0x009d4bae in errfinish (dummy=0) at elog.c:543 #11 0x008761b4 in ProcessInterrupts () at postgres.c:2882 #12 0x009d4bea in errfinish (dummy=0) at elog.c:565 #13 0x009d7097 in elog_finish (elevel=17, fmt=0xc02678 "Sleeping for 0.1 seconds.") at elog.c:1378 #14 0x009b765b in SearchCatCacheList (cache=0x27636d0, nkeys=1, v1=11604184, v2=0, v3=0, v4=0) at catcache.c:1523 #15 0x009cbd20 in SearchSysCacheList (cacheId=37, nkeys=1, key1=11604184, key2=0, key3=0, key4=0) at syscache.c:1338 #16 0x0056e22e in OpernameGetCandidates (names=0x27f3b08, oprkind=98 'b', missing_schema_ok=0 '\000') at namespace.c:1576 #17 0x005f09a6 in oper (pstate=0x27f3ec0, opname=0x27f3b08, ltypeId=23, rtypeId=1700, noError=0 '\000', location=14) at parse_oper.c:414 #18 0x005f132d in make_op (pstate=0x27f3ec0, opname=0x27f3b08, ltree=0x27f40f0, rtree=0x27f4128, last_srf=0x0, location=14) at parse_oper.c:778 #19 0x005e5e8d in transformAExprOp (pstate=0x27f3ec0, a=0x27f3a60) at parse_expr.c:933 #20 0x005e454e in transformExprRecurse (pstate=0x27f3ec0, expr=0x27f3a60) at parse_expr.c:217 #21 0x005e5e44 in transformAExprOp (pstate=0x27f3ec0, a=0x27f3b78) at parse_expr.c:930 #22 0x005e454e in transformExprRecurse (pstate=0x27f3ec0,