Re: [HACKERS] exec_execute_message crash
I tried this but didn't have any luck crashing the backend. libpq gets tremendously confused by the extra ReadyForQuery responses, which is unsurprising. The postmaster log shows LOG: could not send data to client: Broken pipe ERROR: relation foo does not exist at character 15 STATEMENT: SELECT * FROM foo ERROR: unnamed prepared statement does not exist ERROR: current transaction is aborted, commands ignored until end of transaction block ERROR: current transaction is aborted, commands ignored until end of transaction block STATEMENT: SELECT NULL , n.nspname, ct.relname, a.attname, a.attnum, ci.relname FROM pg_catalog.pg_namespace n, pg_catalog.pg_class ct, pg_catalog.pg_class ci, pg_catalog.pg_attribute a, pg_catalog.pg_index i WHERE ct.oid=i.indrelid AND ci.oid=i.indexrelid AND a.attrelid=ci.oid AND i.indisprimary AND ct.relname = 'mst_Ucompany_feature_setting' AND ct.relnamespace = n.oid AND n.nspname = 'foo' ORDER BY 1, 2, 3 So the unnamed prepared statement does not exist bit seems to be related to what you are talking about, but it doesn't actually fail. I have put some debugging codes to make sure that portal-cplan and portal-stmts belong to the same memory context by calling GetMemoryChunkContext and surely they did. It appears that the memory was surely deleted by MemeoryContextDelete in ReleaseCachedPlan. Also I defined CLOBBER_FREED_MEMORY in aset.c to fill with 0x7f the freed memory. Strange thing was portal-smts was not clobbered by 0x7f. It seems I have missed something here... -- Tatsuo Ishii SRA OSS, Inc. Japan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exec_execute_message crash
Tatsuo Ishii is...@postgresql.org writes: Done. Inclded are C test program along with modified fe-exec.c. The modification made to fe-exec.c is sending Sync after Parse, Bind and Describe. Pgpool-II does this in order to get current transaction status. I tried this but didn't have any luck crashing the backend. libpq gets tremendously confused by the extra ReadyForQuery responses, which is unsurprising. The postmaster log shows LOG: could not send data to client: Broken pipe ERROR: relation foo does not exist at character 15 STATEMENT: SELECT * FROM foo ERROR: unnamed prepared statement does not exist ERROR: current transaction is aborted, commands ignored until end of transaction block ERROR: current transaction is aborted, commands ignored until end of transaction block STATEMENT: SELECT NULL , n.nspname, ct.relname, a.attname, a.attnum, ci.relname FROM pg_catalog.pg_namespace n, pg_catalog.pg_class ct, pg_catalog.pg_class ci, pg_catalog.pg_attribute a, pg_catalog.pg_index i WHERE ct.oid=i.indrelid AND ci.oid=i.indexrelid AND a.attrelid=ci.oid AND i.indisprimary AND ct.relname = 'mst_Ucompany_feature_setting' AND ct.relnamespace = n.oid AND n.nspname = 'foo' ORDER BY 1, 2, 3 So the unnamed prepared statement does not exist bit seems to be related to what you are talking about, but it doesn't actually fail. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exec_execute_message crash
While inspecting a complain from a pgpool user, I found that PostgreSQL crushes with following statck trace: #0 0x0826436a in list_length (l=0xaabe4e28) at ../../../src/include/nodes/pg_list.h:94 #1 0x08262168 in IsTransactionStmtList (parseTrees=0xaabe4e28) at postgres.c:2429 #2 0x0826132e in exec_execute_message (portal_name=0x857bab0 , max_rows=0) at postgres.c:1824 #3 0x08263b2a in PostgresMain (argc=4, argv=0x84f6c28, username=0x84f6b08 t-ishii) at postgres.c:3671 #4 0x0823299e in BackendRun (port=0x8511e68) at postmaster.c:3449 #5 0x08231f78 in BackendStartup (port=0x8511e68) at postmaster.c:3063 #6 0x0822f90a in ServerLoop () at postmaster.c:1387 #7 0x0822f131 in PostmasterMain (argc=3, argv=0x84f4bf8) at postmaster.c:1040 #8 0x081c6217 in main (argc=3, argv=0x84f4bf8) at main.c:188 Ok, I think I understand what's going on. parse bind describe execute This sequence of commands create cached plan in unnamed portal. $5 = {name = 0x8574de4 , prepStmtName = 0x0, heap = 0x8598400, resowner = 0x8598488, cleanup = 0x81632ca PortalCleanup, createSubid = 1, sourceText = 0x85ab818 SELECT omitted..., commandTag = 0x84682ca SELECT, stmts = 0xaabf43b0, cplan = 0xaabf4950, portalParams = 0x0, strategy = PORTAL_ONE_SELECT, cursorOptions = 4, status = PORTAL_READY, queryDesc = 0x85abc20, tupDesc = 0x85ddcb0, formats = 0x85abc68, holdStore = 0x0, holdContext = 0x0, atStart = 1 '\001', atEnd = 1 '\001', posOverflow = 0 '\0', portalPos = 0, creation_time = 315487957498169, visible = 1 '\001'} The cached plan(portal-cplan) and statements(portal-stmts) are created by exec_bind_message(): /* * Revalidate the cached plan; this may result in replanning. Any * cruft will be generated in MessageContext. The plan refcount will * be assigned to the Portal, so it will be released at portal * destruction. */ cplan = RevalidateCachedPlan(psrc, false); plan_list = cplan-stmt_list; Please note that cplan and stmts belong to the same memory context. Then following commands are coming: parse invalid SQL thus abort a transaction bind (error) describe (error) execute (crash) parse causes transaction to abort, which causes call to AbortCurrentTransaction-AbortTransaction-AtAbort_portals-ReleaseCachedPlan. It calls ReleaseCachePlan(portal-cplan). ReleaseCachePlan calls MemoryContextDelete(plan-context) which destroys both portal-cplan and portal-stmts. That was the reason why I had segfault by accessing portal-stmts. To fix this I think exec_execute_message should throw an error if portal-cleanup is NULL, since portal-cleanup is NULLed by AtAbort_Portals at transaction abort (or portal is dropped). Here is a suggested fix: diff -c postgres.c~ postgres.c *** postgres.c~ 2009-06-18 19:08:08.0 +0900 --- postgres.c 2009-12-30 21:34:49.0 +0900 *** *** 1804,1810 dest = DestRemoteExecute; portal = GetPortalByName(portal_name); ! if (!PortalIsValid(portal)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_CURSOR), errmsg(portal \%s\ does not exist, portal_name))); --- 1804,1810 dest = DestRemoteExecute; portal = GetPortalByName(portal_name); ! if (!PortalIsValid(portal) || (PortalIsValid(portal) portal-cleanup == NULL)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_CURSOR), errmsg(portal \%s\ does not exist, portal_name))); -- Tatsuo Ishii SRA OSS, Inc. Japan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exec_execute_message crash
Tatsuo Ishii wrote: ! if (!PortalIsValid(portal) || (PortalIsValid(portal) portal-cleanup == NULL)) Surely the second call to PortalIsValid() is redundant. if (( !PortalIsValid(portal)) || portal-cleanup == NULL) should do it, no? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exec_execute_message crash
Tatsuo Ishii wrote: ! if (!PortalIsValid(portal) || (PortalIsValid(portal) portal-cleanup == NULL)) Surely the second call to PortalIsValid() is redundant. if (( !PortalIsValid(portal)) || portal-cleanup == NULL) should do it, no? Oops. You are right. -- Tatsuo Ishii SRA OSS, Inc. Japan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exec_execute_message crash
Tatsuo Ishii is...@postgresql.org writes: parse causes transaction to abort, which causes call to AbortCurrentTransaction-AbortTransaction-AtAbort_portals-ReleaseCachedPlan. It calls ReleaseCachePlan(portal-cplan). ReleaseCachePlan calls MemoryContextDelete(plan-context) which destroys both portal-cplan and portal-stmts. That was the reason why I had segfault by accessing portal-stmts. To fix this I think exec_execute_message should throw an error if portal-cleanup is NULL, since portal-cleanup is NULLed by AtAbort_Portals at transaction abort (or portal is dropped). This is just a kluge, and a rather bad one I think. The real problem here is that AtAbort_Portals destroys the portal contents and doesn't do anything to record the fact. It should probably be putting the portal into PORTAL_FAILED state, and what exec_execute_message ought to be doing is checking for that. It might be a good idea to explicitly zero out the now-dangling pointers in the Portal struct, too. It'd be nice to have a test case for this, hint hint ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exec_execute_message crash
This is just a kluge, and a rather bad one I think. The real problem here is that AtAbort_Portals destroys the portal contents and doesn't do anything to record the fact. It should probably be putting the portal into PORTAL_FAILED state, and what exec_execute_message ought to be doing is checking for that. Yeah I thought about that too. in AtAbort_Portals: -- /* * Abort processing for portals. * * At this point we reset active status and run the cleanup hook if * present, but we can't release the portal's memory until the cleanup call. * * The reason we need to reset active is so that we can replace the unnamed * portal, else we'll fail to execute ROLLBACK when it arrives. */ void AtAbort_Portals(void) { HASH_SEQ_STATUS status; PortalHashEnt *hentry; hash_seq_init(status, PortalHashTable); while ((hentry = (PortalHashEnt *) hash_seq_search(status)) != NULL) { Portal portal = hentry-portal; if (portal-status == PORTAL_ACTIVE) portal-status = PORTAL_FAILED; -- Should I change the last if clause to? if (portal-status == PORTAL_ACTIVE || portal-status == PORTAL_READY) portal-status = PORTAL_FAILED; zero out the now-dangling pointers in the Portal struct, too. portal-cplan is already zero out by PortalReleaseCachedPlan. Problem is, portal-stmts may belong to PortalContext or others (in this particluar case). So if we want to zero out portal-stmts, we need to memorize the memory context which it belongs to and we need add a new struct member to portal. I'm afraid this is an overkill... It'd be nice to have a test case for this, hint hint ... Still working on... -- Tatsuo Ishii SRA OSS, Inc. Japan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers