Re: [HACKERS] exec_execute_message crash

2010-01-03 Thread Tatsuo Ishii
 I tried this but didn't have any luck crashing the backend.  libpq gets
 tremendously confused by the extra ReadyForQuery responses, which is
 unsurprising.  The postmaster log shows
 
 LOG:  could not send data to client: Broken pipe
 ERROR:  relation foo does not exist at character 15
 STATEMENT:  SELECT * FROM foo
 ERROR:  unnamed prepared statement does not exist
 ERROR:  current transaction is aborted, commands ignored until end of 
 transaction block
 ERROR:  current transaction is aborted, commands ignored until end of 
 transaction block
 STATEMENT:  SELECT NULL , n.nspname,  ct.relname,  a.attname,  a.attnum,  
 ci.relname FROM pg_catalog.pg_namespace n, pg_catalog.pg_class ct, 
 pg_catalog.pg_class ci, pg_catalog.pg_attribute a, pg_catalog.pg_index i  
 WHERE ct.oid=i.indrelid AND ci.oid=i.indexrelid  AND a.attrelid=ci.oid AND 
 i.indisprimary  AND ct.relname = 'mst_Ucompany_feature_setting'  AND 
 ct.relnamespace = n.oid  AND n.nspname = 'foo' ORDER BY 1, 2, 3
 
 So the unnamed prepared statement does not exist bit seems to be
 related to what you are talking about, but it doesn't actually fail.

I have put some debugging codes to make sure that portal-cplan and
portal-stmts belong to the same memory context by calling
GetMemoryChunkContext and surely they did. It appears that the memory
was surely deleted by MemeoryContextDelete in ReleaseCachedPlan. Also
I defined CLOBBER_FREED_MEMORY in aset.c to fill with 0x7f the freed
memory. Strange thing was portal-smts was not clobbered by 0x7f.
It seems I have missed something here...
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exec_execute_message crash

2009-12-31 Thread Tom Lane
Tatsuo Ishii is...@postgresql.org writes:
 Done. Inclded are C test program along with modified fe-exec.c.

 The modification made to fe-exec.c is sending Sync after Parse, Bind
 and Describe. Pgpool-II does this in order to get current transaction
 status.

I tried this but didn't have any luck crashing the backend.  libpq gets
tremendously confused by the extra ReadyForQuery responses, which is
unsurprising.  The postmaster log shows

LOG:  could not send data to client: Broken pipe
ERROR:  relation foo does not exist at character 15
STATEMENT:  SELECT * FROM foo
ERROR:  unnamed prepared statement does not exist
ERROR:  current transaction is aborted, commands ignored until end of 
transaction block
ERROR:  current transaction is aborted, commands ignored until end of 
transaction block
STATEMENT:  SELECT NULL , n.nspname,  ct.relname,  a.attname,  a.attnum,  
ci.relname FROM pg_catalog.pg_namespace n, pg_catalog.pg_class ct, 
pg_catalog.pg_class ci, pg_catalog.pg_attribute a, pg_catalog.pg_index i  WHERE 
ct.oid=i.indrelid AND ci.oid=i.indexrelid  AND a.attrelid=ci.oid AND 
i.indisprimary  AND ct.relname = 'mst_Ucompany_feature_setting'  AND 
ct.relnamespace = n.oid  AND n.nspname = 'foo' ORDER BY 1, 2, 3

So the unnamed prepared statement does not exist bit seems to be
related to what you are talking about, but it doesn't actually fail.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exec_execute_message crash

2009-12-30 Thread Tatsuo Ishii
 While inspecting a complain from a pgpool user, I found that
 PostgreSQL crushes with following statck trace:
 
 #0  0x0826436a in list_length (l=0xaabe4e28)
 at ../../../src/include/nodes/pg_list.h:94
 #1  0x08262168 in IsTransactionStmtList (parseTrees=0xaabe4e28)
 at postgres.c:2429
 #2  0x0826132e in exec_execute_message (portal_name=0x857bab0 , max_rows=0)
 at postgres.c:1824
 #3  0x08263b2a in PostgresMain (argc=4, argv=0x84f6c28,
 username=0x84f6b08 t-ishii) at postgres.c:3671
 #4  0x0823299e in BackendRun (port=0x8511e68) at postmaster.c:3449
 #5  0x08231f78 in BackendStartup (port=0x8511e68) at postmaster.c:3063
 #6  0x0822f90a in ServerLoop () at postmaster.c:1387
 #7  0x0822f131 in PostmasterMain (argc=3, argv=0x84f4bf8) at postmaster.c:1040
 #8  0x081c6217 in main (argc=3, argv=0x84f4bf8) at main.c:188

Ok, I think I understand what's going on.

parse
bind
describe
execute

This sequence of commands create cached plan in unnamed portal.

$5 = {name = 0x8574de4 , prepStmtName = 0x0, heap = 0x8598400,
  resowner = 0x8598488, cleanup = 0x81632ca PortalCleanup, createSubid = 1,
  sourceText = 0x85ab818  SELECT omitted...,
  commandTag = 0x84682ca SELECT, stmts = 0xaabf43b0, cplan = 0xaabf4950,
  portalParams = 0x0, strategy = PORTAL_ONE_SELECT, cursorOptions = 4,
  status = PORTAL_READY, queryDesc = 0x85abc20, tupDesc = 0x85ddcb0,
  formats = 0x85abc68, holdStore = 0x0, holdContext = 0x0, atStart = 1 '\001',
  atEnd = 1 '\001', posOverflow = 0 '\0', portalPos = 0,
  creation_time = 315487957498169, visible = 1 '\001'}

The cached plan(portal-cplan) and statements(portal-stmts) are
created by exec_bind_message():

/*
 * Revalidate the cached plan; this may result in replanning.  
Any
 * cruft will be generated in MessageContext.  The plan 
refcount will
 * be assigned to the Portal, so it will be released at portal
 * destruction.
 */
cplan = RevalidateCachedPlan(psrc, false);
plan_list = cplan-stmt_list;

Please note that cplan and stmts belong to the same memory context.

Then following commands are coming:

parse invalid SQL thus abort a transaction
bind (error)
describe (error)
execute (crash)

parse causes transaction to abort, which causes call to
AbortCurrentTransaction-AbortTransaction-AtAbort_portals-ReleaseCachedPlan. 
It
calls ReleaseCachePlan(portal-cplan). ReleaseCachePlan calls
MemoryContextDelete(plan-context) which destroys both portal-cplan
and portal-stmts.

That was the reason why I had segfault by accessing portal-stmts.

To fix this I think exec_execute_message should throw an error if
portal-cleanup is NULL, since portal-cleanup is NULLed by
AtAbort_Portals at transaction abort (or portal is dropped).

Here is a suggested fix:

diff -c postgres.c~ postgres.c
*** postgres.c~ 2009-06-18 19:08:08.0 +0900
--- postgres.c  2009-12-30 21:34:49.0 +0900
***
*** 1804,1810 
dest = DestRemoteExecute;
  
portal = GetPortalByName(portal_name);
!   if (!PortalIsValid(portal))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_CURSOR),
 errmsg(portal \%s\ does not exist, 
portal_name)));
--- 1804,1810 
dest = DestRemoteExecute;
  
portal = GetPortalByName(portal_name);
!   if (!PortalIsValid(portal) || (PortalIsValid(portal)  portal-cleanup 
== NULL))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_CURSOR),
 errmsg(portal \%s\ does not exist, 
portal_name)));

--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exec_execute_message crash

2009-12-30 Thread Andrew Dunstan



Tatsuo Ishii wrote:

!   if (!PortalIsValid(portal) || (PortalIsValid(portal)  portal-cleanup 
== NULL))
 
  



Surely the second call to PortalIsValid() is redundant.
 
   if (( !PortalIsValid(portal)) || portal-cleanup == NULL)


should do it, no?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exec_execute_message crash

2009-12-30 Thread Tatsuo Ishii
 Tatsuo Ishii wrote:
  !   if (!PortalIsValid(portal) || (PortalIsValid(portal)  portal-cleanup 
  == NULL))
   

 
 
 Surely the second call to PortalIsValid() is redundant.
   
 if (( !PortalIsValid(portal)) || portal-cleanup == NULL)
 
 should do it, no?

Oops. You are right.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exec_execute_message crash

2009-12-30 Thread Tom Lane
Tatsuo Ishii is...@postgresql.org writes:
 parse causes transaction to abort, which causes call to
 AbortCurrentTransaction-AbortTransaction-AtAbort_portals-ReleaseCachedPlan.
  It
 calls ReleaseCachePlan(portal-cplan). ReleaseCachePlan calls
 MemoryContextDelete(plan-context) which destroys both portal-cplan
 and portal-stmts.

 That was the reason why I had segfault by accessing portal-stmts.

 To fix this I think exec_execute_message should throw an error if
 portal-cleanup is NULL, since portal-cleanup is NULLed by
 AtAbort_Portals at transaction abort (or portal is dropped).

This is just a kluge, and a rather bad one I think.  The real problem
here is that AtAbort_Portals destroys the portal contents and doesn't
do anything to record the fact.  It should probably be putting the
portal into PORTAL_FAILED state, and what exec_execute_message ought
to be doing is checking for that.  It might be a good idea to explicitly
zero out the now-dangling pointers in the Portal struct, too.

It'd be nice to have a test case for this, hint hint ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exec_execute_message crash

2009-12-30 Thread Tatsuo Ishii
 This is just a kluge, and a rather bad one I think.  The real problem
 here is that AtAbort_Portals destroys the portal contents and doesn't
 do anything to record the fact.  It should probably be putting the
 portal into PORTAL_FAILED state, and what exec_execute_message ought
 to be doing is checking for that.

Yeah I thought about that too. in AtAbort_Portals:

--
/*
 * Abort processing for portals.
 *
 * At this point we reset active status and run the cleanup hook if
 * present, but we can't release the portal's memory until the cleanup call.
 *
 * The reason we need to reset active is so that we can replace the unnamed
 * portal, else we'll fail to execute ROLLBACK when it arrives.
 */
void
AtAbort_Portals(void)
{
HASH_SEQ_STATUS status;
PortalHashEnt *hentry;

hash_seq_init(status, PortalHashTable);

while ((hentry = (PortalHashEnt *) hash_seq_search(status)) != NULL)
{
Portal  portal = hentry-portal;

if (portal-status == PORTAL_ACTIVE)
portal-status = PORTAL_FAILED;
--

Should I change the last if clause to?

if (portal-status == PORTAL_ACTIVE || portal-status == 
PORTAL_READY)
portal-status = PORTAL_FAILED;

 zero out the now-dangling pointers in the Portal struct, too.

portal-cplan is already zero out by PortalReleaseCachedPlan. Problem
is, portal-stmts may belong to PortalContext or others (in this
particluar case). So if we want to zero out portal-stmts, we need to
memorize the memory context which it belongs to and we need add a new
struct member to portal. I'm afraid this is an overkill...

 It'd be nice to have a test case for this, hint hint ...

Still working on...
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers