Re: [HACKERS] crash in plancache with subtransactions
Jim Nasby writes: > (looking at original case)... the original bug wasn't actually > recursive. No, there are two different cases being dealt with here. If the first call of an expression results in an error, and then we come back and try to re-use the expression state tree, we can have trouble because the state tree contains partially-updated internal state for the called function. This doesn't require any recursion but it leads to pretty much the same problems as the recursive case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash in plancache with subtransactions
On Oct 29, 2010, at 10:54 AM, Tom Lane wrote: > Alvaro Herrera writes: >> Excerpts from Tom Lane's message of mié oct 27 18:18:06 -0300 2010: >>> I spent quite a bit of time trying to deal with the memory-leakage >>> problem without adding still more bookkeeping overhead. It wasn't >>> looking good, and then I had a sudden insight: if we see that the in-use >>> flag is set, we can simply return FALSE from exec_eval_simple_expr. > >> I tried the original test cases that were handed to me (quite different >> from what I submitted here) and they are fixed also. Thanks. > > It'd be interesting to know if there's any noticeable slowdown on > affected real-world cases. (Of course, if they invariably crashed > before, there might not be a way to measure their previous speed...) I should be able to get Alvaro something he can use to test the performance. Our patch framework uses a recursive function to follow patch dependencies (of course that can go away in 8.4 thanks to WITH). I know we've got some other recursive calls but I don't think any are critical (it'd be nice if there was a way to find out if a function was recursive, I guess theoretically that could be discovered during compilation but I don't know how hairy it would be). One question: What happens if you have multiple paths to the same function within another function? For example, we have an assert function that's used all over the place; it will definitely be called from multiple places in a call stack. FWIW, I definitely run into cases where recursion makes for cleaner code than looping, so it'd be great to avoid making it slower than it needs to be. But I've always assumed that recursion is slower than looping so I avoid it for anything I know could be performance sensitive. (looking at original case)... the original bug wasn't actually recursive. It's not clear to me how it actually got into this case. The original error report is: psql:sql/code.lookup_table_dynamic.sql:23: ERROR: buffer 2682 is not owned by resource owner Portal CONTEXT: SQL function "table_schema_and_name" statement 1 SQL function "table_full_name" statement 1 PL/pgSQL function "getsert" line 9 during statement block local variable initialization server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Line 23 is: SELECT code.getsert( 'test.stuffs', 'stuff' ); The functions are below. The duplicity of full_name_table and table_full_name is because the function was originally called full_name_table, but I decided to rename it after creating other table functions. In any case, I don't see any obvious recursion or re-entry, unless perhaps tools.table_schema_and_name ends up getting called twice by tools.table_full_name? -[ RECORD 1 ]---+- Schema | code Name| getsert Result data type| void Argument data types | p_table_name text, p_lookup text Volatility | volatile Owner | cnuadmin Language| plpgsql Source code | : DECLARE : v_object_class text := 'getsert'; : v_function_name text := p_table_name || '__' || v_object_class; : : v_table_full text := tools.full_name_table( p_table_name ); : v_schema text; : v_table text; : : BEGIN : SELECT INTO v_schema, v_table * FROM tools.split_schema( v_table_full ); : : PERFORM code_internal.create_object( v_function_name, 'FUNCTION', v_object_class, array[ ['schema', v_schema], ['table', v_table], ['lookup', p_lookup] ] ); : END; : Description | Creates a function that will lookup an ID based on a text lookup value (p_lookup). If no record exists, one will be created. : : Parameters: : p_table_name Name of the table to lookup the value in : p_lookup Name of the field to use for the lookup value : : Results: : Creates function %p_table_name%__getsert( %p_lookup% with a type matching the p_lookup field in p_table_name ). The function returns an ID as an int. : Revokes all on the function from public and grants execute to cnuapp_role. : test...@workbook.local=# \df+ tools.full_name_table List of functions -[ RECORD 1 ]---+--- Schema | tools Name| full_name_table Result data type| text Argument data types | p_tab
Re: [HACKERS] crash in plancache with subtransactions
Excerpts from Tom Lane's message of vie oct 29 12:54:52 -0300 2010: > Alvaro Herrera writes: > > I tried the original test cases that were handed to me (quite different > > from what I submitted here) and they are fixed also. Thanks. > > It'd be interesting to know if there's any noticeable slowdown on > affected real-world cases. (Of course, if they invariably crashed > before, there might not be a way to measure their previous speed...) Yeah, the cases that were reported failed with one of ERROR: invalid memory alloc request size 18446744073482534916 ERROR: could not open relation with OID 0 ERROR: buffer 228 is not owned by resource owner Portal -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash in plancache with subtransactions
Alvaro Herrera writes: > Excerpts from Tom Lane's message of mié oct 27 18:18:06 -0300 2010: >> I spent quite a bit of time trying to deal with the memory-leakage >> problem without adding still more bookkeeping overhead. It wasn't >> looking good, and then I had a sudden insight: if we see that the in-use >> flag is set, we can simply return FALSE from exec_eval_simple_expr. > I tried the original test cases that were handed to me (quite different > from what I submitted here) and they are fixed also. Thanks. It'd be interesting to know if there's any noticeable slowdown on affected real-world cases. (Of course, if they invariably crashed before, there might not be a way to measure their previous speed...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash in plancache with subtransactions
Excerpts from Tom Lane's message of mié oct 27 18:18:06 -0300 2010: > I wrote: > >> Heikki Linnakangas writes: > >>> One simple idea is to keep a flag along with the executor state to > >>> indicate that the executor state is currently in use. Set it just before > >>> calling ExecEvalExpr, and reset afterwards. If the flag is already set > >>> in the beginning of exec_eval_simple_expr, we have recursed, and must > >>> create a new executor state. > > >> Yeah, the same thought occurred to me in the shower this morning. > >> I'm concerned about possible memory leakage during repeated recursion, > >> but maybe that can be dealt with. > > I spent quite a bit of time trying to deal with the memory-leakage > problem without adding still more bookkeeping overhead. It wasn't > looking good, and then I had a sudden insight: if we see that the in-use > flag is set, we can simply return FALSE from exec_eval_simple_expr. I tried the original test cases that were handed to me (quite different from what I submitted here) and they are fixed also. Thanks. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash in plancache with subtransactions
I wrote: >> Heikki Linnakangas writes: >>> One simple idea is to keep a flag along with the executor state to >>> indicate that the executor state is currently in use. Set it just before >>> calling ExecEvalExpr, and reset afterwards. If the flag is already set >>> in the beginning of exec_eval_simple_expr, we have recursed, and must >>> create a new executor state. >> Yeah, the same thought occurred to me in the shower this morning. >> I'm concerned about possible memory leakage during repeated recursion, >> but maybe that can be dealt with. I spent quite a bit of time trying to deal with the memory-leakage problem without adding still more bookkeeping overhead. It wasn't looking good, and then I had a sudden insight: if we see that the in-use flag is set, we can simply return FALSE from exec_eval_simple_expr. That causes exec_eval_expr to run the expression using the "non simple" code path, which is perfectly safe because it isn't trying to reuse state that might be dirty. Thus the attached patch, which fixes both of the failure cases discussed in this thread. Advantages: 1. The slowdown for "normal" cases (non-recursive, non-error-inducing) is negligible. 2. It's simple enough to back-patch without fear. Disadvantages: 1. Recursive cases get noticeably slower, about 4X slower according to tests with this example: create or replace function factorial(n int) returns float8 as $$ begin if (n > 1) then return n * factorial(n - 1); end if; return 1; end $$ language plpgsql strict immutable; The slowdown is only for the particular expression that actually has invoked a recursive call, so the above is probably much the worst case in practice. I doubt many people really use plpgsql this way, but ... 2. Cases where we're re-executing an expression that failed earlier in the same transaction likewise get noticeably slower. This is only a problem if you're using subtransactions to catch errors, and the overhead of the subtransaction is going to be large enough to partially hide the extra eval cost anyway. So I didn't bother to make a timing test case --- it's not going to be as bad as the example above. I currently think that we don't have much choice except to use this patch for the back branches: any better-performing alternative is going to require enough surgery that back-patching would be dangerous. Maybe somebody can come up with a better answer for HEAD, but I don't have one. Comments? regards, tom lane diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index 9929e04e57bbc7d08938765b7f506c05c07b772b..1f40f3cf69234ff650ece2ccb09791f8e075ec1a 100644 *** a/src/pl/plpgsql/src/pl_exec.c --- b/src/pl/plpgsql/src/pl_exec.c *** loop_exit: *** 4491,4497 *a Datum by directly calling ExecEvalExpr(). * * If successful, store results into *result, *isNull, *rettype and return ! * TRUE. If the expression is not simple (any more), return FALSE. * * It is possible though unlikely for a simple expression to become non-simple * (consider for example redefining a trivial view). We must handle that for --- 4491,4508 *a Datum by directly calling ExecEvalExpr(). * * If successful, store results into *result, *isNull, *rettype and return ! * TRUE. If the expression cannot be handled by simple evaluation, ! * return FALSE. ! * ! * Because we only store one execution tree for a simple expression, we ! * can't handle recursion cases. So, if we see the tree is already busy ! * with an evaluation in the current xact, we just return FALSE and let the ! * caller run the expression the hard way. (Other alternatives such as ! * creating a new tree for a recursive call either introduce memory leaks, ! * or add enough bookkeeping to be doubtful wins anyway.) Another case that ! * is covered by the expr_simple_in_use test is where a previous execution ! * of the tree was aborted by an error: the tree may contain bogus state ! * so we dare not re-use it. * * It is possible though unlikely for a simple expression to become non-simple * (consider for example redefining a trivial view). We must handle that for *** exec_eval_simple_expr(PLpgSQL_execstate *** 4528,4533 --- 4539,4550 return false; /* + * If expression is in use in current xact, don't touch it. + */ + if (expr->expr_simple_in_use && expr->expr_simple_lxid == curlxid) + return false; + + /* * Revalidate cached plan, so that we will notice if it became stale. (We * also need to hold a refcount while using the plan.) Note that even if * replanning occurs, the length of plancache_list can't change, since it *** exec_eval_simple_expr(PLpgSQL_execstate *** 4562,4567 --- 4579,4585 { expr->expr_simple_state = ExecPrepareExpr(expr->expr_simple_expr, simple_eval_e
Re: [HACKERS] crash in plancache with subtransactions
I wrote: > Heikki Linnakangas writes: >> One simple idea is to keep a flag along with the executor state to >> indicate that the executor state is currently in use. Set it just before >> calling ExecEvalExpr, and reset afterwards. If the flag is already set >> in the beginning of exec_eval_simple_expr, we have recursed, and must >> create a new executor state. > Yeah, the same thought occurred to me in the shower this morning. > I'm concerned about possible memory leakage during repeated recursion, > but maybe that can be dealt with. I spent some time poking at this today, and developed the attached patch, which gets rid of all the weird assumptions associated with "simple expressions" in plpgsql, in favor of just doing another ExecInitExpr per expression in each call of the function. While this is a whole lot cleaner than what we have, I'm afraid that it's unacceptably slow. I'm seeing an overall slowdown of 2X to 3X on function execution with examples like: create or replace function speedtest10(x float8) returns float8 as $$ declare z float8 := x; begin z := z * 2 + 1; z := z * 2 + 1; z := z * 2 + 1; z := z * 2 + 1; z := z * 2 + 1; z := z * 2 + 1; z := z * 2 + 1; z := z * 2 + 1; z := z * 2 + 1; z := z * 2 + 1; return z; end $$ language plpgsql immutable; Now, this is about the worst case for the patch. This function's runtime depends almost entirely on the speed of simple expressions, and because there's no internal looping, we only get to use the result of each ExecInitExpr once per function call. So probably "typical" use cases wouldn't be quite so bad; but still it seems like we can't go this route. We need to be able to use the ExecInitExpr results across successive calls one way or another. I'll look into creating an in-use flag next. regards, tom lane diff --git a/src/pl/plpgsql/src/gram.y b/src/pl/plpgsql/src/gram.y index a28c6707e4670be17ed1c947d383f1d11b9c90c5..0963efe1b5b5e7e974a682d2a7f71d6427180e9b 100644 *** a/src/pl/plpgsql/src/gram.y --- b/src/pl/plpgsql/src/gram.y *** decl_statement : decl_varname decl_const *** 490,495 --- 490,496 curname_def = palloc0(sizeof(PLpgSQL_expr)); curname_def->dtype = PLPGSQL_DTYPE_EXPR; + curname_def->expr_num = plpgsql_nExprs++; strcpy(buf, "SELECT "); cp1 = new->refname; cp2 = buf + strlen(buf); *** read_sql_construct(int until, *** 2277,2282 --- 2278,2284 expr->query = pstrdup(ds.data); expr->plan = NULL; expr->paramnos = NULL; + expr->expr_num = plpgsql_nExprs++; expr->ns= plpgsql_ns_top(); pfree(ds.data); *** make_execsql_stmt(int firsttoken, int lo *** 2476,2481 --- 2478,2484 expr->query = pstrdup(ds.data); expr->plan = NULL; expr->paramnos = NULL; + expr->expr_num = plpgsql_nExprs++; expr->ns= plpgsql_ns_top(); pfree(ds.data); diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c index 3ddcf3b5a595e0847627cc10c4084258f44cc352..4feb14d2bec87de5d8a66071ab3a66da3c2a2d76 100644 *** a/src/pl/plpgsql/src/pl_comp.c --- b/src/pl/plpgsql/src/pl_comp.c *** *** 37,42 --- 37,47 /* -- * Our own local and global variables + * + * Ideally these would live in some dynamically-allocated structure, but + * it's unpleasant to pass such a thing around in a Bison parser. As long + * as plpgsql function compilation isn't re-entrant, it's okay to use + * static storage for these. * -- */ PLpgSQL_stmt_block *plpgsql_parse_result; *** int plpgsql_nDatums; *** 46,51 --- 51,58 PLpgSQL_datum **plpgsql_Datums; static int datums_last = 0; + int plpgsql_nExprs; + char *plpgsql_error_funcname; bool plpgsql_DumpExecTree = false; bool plpgsql_check_syntax = false; *** do_compile(FunctionCallInfo fcinfo, *** 367,372 --- 374,381 sizeof(PLpgSQL_datum *) * datums_alloc); datums_last = 0; + plpgsql_nExprs = 0; + switch (is_trigger) { case false: *** do_compile(FunctionCallInfo fcinfo, *** 693,698 --- 702,708 function->datums = palloc(sizeof(PLpgSQL_datum *) * plpgsql_nDatums); for (i = 0; i < plpgsql_nDatums; i++) function->datums[i] = plpgsql_Datums[i]; + function->nexprs = plpgsql_nExprs; /* Debug dump for completed functions */ if (plpgsql_DumpExecTree) *** plpgsql_compile_inline(char *proc_source *** 788,793 --- 798,804 plpgsql_nDatums = 0; plpgsql_Datums = palloc(sizeof(PLpgSQL_datum *) * datums_alloc); datums_last = 0; + plpgsql_nExprs = 0; /* Set up as though in a function returning VOID */ function->fn_rettype = VOIDOID; *** plpgsql_compile_inline(char *proc_source *** 838,843 --- 849,855 function->datums = palloc(sizeof(PLpgSQL_datum *) * plpgsq
Re: [HACKERS] crash in plancache with subtransactions
Heikki Linnakangas writes: > On 22.10.2010 06:10, Tom Lane wrote: >> (But on the third >> hand, what are we gonna do for back-patching to versions without the >> plancache?) > One simple idea is to keep a flag along with the executor state to > indicate that the executor state is currently in use. Set it just before > calling ExecEvalExpr, and reset afterwards. If the flag is already set > in the beginning of exec_eval_simple_expr, we have recursed, and must > create a new executor state. Yeah, the same thought occurred to me in the shower this morning. I'm concerned about possible memory leakage during repeated recursion, but maybe that can be dealt with. I'll look into this issue soon, though probably not today (Red Hat work calls :-(). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash in plancache with subtransactions
On 22.10.2010 06:10, Tom Lane wrote: Right at the moment I'm not seeing any way that the present exec_eval_simple_expr approach can be fixed to work safely in the presence of recursion. What I think we might have to do is give up on the idea of caching execution state trees across calls, instead using them just for the duration of a single plpgsql function call. I'm not sure what sort of runtime penalty might ensue. The whole design predates the plancache, and I think it was mostly intended to prevent having to re-parse-and-plan simple expressions every time. So a lot of that overhead has gone away anyway given the plancache, and maybe we shouldn't sweat too much about paying what remains. We should test and measure that. (But on the third hand, what are we gonna do for back-patching to versions without the plancache?) One simple idea is to keep a flag along with the executor state to indicate that the executor state is currently in use. Set it just before calling ExecEvalExpr, and reset afterwards. If the flag is already set in the beginning of exec_eval_simple_expr, we have recursed, and must create a new executor state. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash in plancache with subtransactions
Alvaro Herrera writes: > Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010: >> I don't believe that it's plancache's fault; the real problem is that >> plpgsql is keeping "simple expression" execution trees around longer >> than it should. Your patch masks the problem by forcing those trees to >> be rebuilt, but it's the execution trees not the plan trees that contain >> stale data. > Ahh, this probably explains why I wasn't been able to reproduce the > problem without involving subxacts, or prepared plans, that seemed to > follow mostly the same paths around plancache cleanup. > It's also the likely cause that this hasn't ben reported earlier. I traced through the details and found that the proximate cause of the crash is that this bit in fmgr_sql() gets confused: /* * Convert params to appropriate format if starting a fresh execution. (If * continuing execution, we can re-use prior params.) */ if (es && es->status == F_EXEC_START) postquel_sub_params(fcache, fcinfo); After the error in the first subtransaction, the execution state tree for the "public.dummy(p_name_table)" expression has a fn_extra link that is pointing to a SQLFunctionCache that's in F_EXEC_RUN state. So when the second call of broken() tries to re-use the state tree, fmgr_sql() thinks it's continuing the execution of a set-returning function, and doesn't bother to re-initialize its ParamListInfo struct. So it merrily tries to execute using a text datum that's pointing at long-since-pfree'd storage for the original 'nonexistant.stuffs' argument string. I had always felt a tad uncomfortable with the way that plpgsql re-uses execution state trees for simple expressions; it seemed to me that it was entirely unsafe in recursive usage. But I'd never been able to prove that it was broken. Now that I've seen this example, I know how to break it: recurse indirectly through a SQL function. For instance, this will dump core immediately: create or replace function recurse(float8) returns float8 as $$ begin raise notice 'recurse(%)', $1; if ($1 < 10) then return sql_recurse($1 + 1); else return $1; end if; end $$ language plpgsql; -- "limit" is to prevent this from being inlined create or replace function sql_recurse(float8) returns float8 as $$ select recurse($1) limit 1; $$ language sql; select recurse(0); Notice the lack of any subtransaction or error condition. The reason this fails is *not* plancache misfeasance or failure to clean up after error. Rather, it's that the inner execution of recurse() is trying to re-use an execution state tree that is pointing at an already-active execution of sql_recurse(). In general, what plpgsql is doing is entirely unsafe whenever a called function tries to keep changeable execution state in storage pointed to by fn_extra. We've managed to miss the problem because plpgsql doesn't try to use this technique on functions returning set (see exec_simple_check_node), and the vast majority of non-SRF functions that use fn_extra at all use it to cache catalog lookup results, which don't change from call to call. But there's no convention that says a function can't keep execution status data in fn_extra --- in fact, there isn't anyplace else for it to keep such data. Right at the moment I'm not seeing any way that the present exec_eval_simple_expr approach can be fixed to work safely in the presence of recursion. What I think we might have to do is give up on the idea of caching execution state trees across calls, instead using them just for the duration of a single plpgsql function call. I'm not sure what sort of runtime penalty might ensue. The whole design predates the plancache, and I think it was mostly intended to prevent having to re-parse-and-plan simple expressions every time. So a lot of that overhead has gone away anyway given the plancache, and maybe we shouldn't sweat too much about paying what remains. (But on the third hand, what are we gonna do for back-patching to versions without the plancache?) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash in plancache with subtransactions
Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010: > I'm not immediately sure why plpgsql_subxact_cb is failing to clean up > correctly in this example, but that seems to be where to look. I think the reason is that one econtext is pushed for function execution, and another one for blocks that contain exceptions. The example function does not contain exceptions -- the savepoints are handled by the external SQL code. I'll have a closer look tomorrow. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash in plancache with subtransactions
Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010: > Alvaro Herrera writes: > > A customer was hitting some misbehavior in one of their internal tests and > > I tracked it down to plancache not behaving properly with > > subtransactions: in particular, a plan is not being marked "dead" when > > the subtransaction on which it is planned rolls back. > > I don't believe that it's plancache's fault; the real problem is that > plpgsql is keeping "simple expression" execution trees around longer > than it should. Your patch masks the problem by forcing those trees to > be rebuilt, but it's the execution trees not the plan trees that contain > stale data. Ahh, this probably explains why I wasn't been able to reproduce the problem without involving subxacts, or prepared plans, that seemed to follow mostly the same paths around plancache cleanup. It's also the likely cause that this hasn't ben reported earlier. > I'm not immediately sure why plpgsql_subxact_cb is failing to clean up > correctly in this example, but that seems to be where to look. Will take a look ... if the girls let me ... -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash in plancache with subtransactions
Alvaro Herrera writes: > A customer was hitting some misbehavior in one of their internal tests and > I tracked it down to plancache not behaving properly with > subtransactions: in particular, a plan is not being marked "dead" when > the subtransaction on which it is planned rolls back. I don't believe that it's plancache's fault; the real problem is that plpgsql is keeping "simple expression" execution trees around longer than it should. Your patch masks the problem by forcing those trees to be rebuilt, but it's the execution trees not the plan trees that contain stale data. I'm not immediately sure why plpgsql_subxact_cb is failing to clean up correctly in this example, but that seems to be where to look. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] crash in plancache with subtransactions
Hi, A customer was hitting some misbehavior in one of their internal tests and I tracked it down to plancache not behaving properly with subtransactions: in particular, a plan is not being marked "dead" when the subtransaction on which it is planned rolls back. It was reported in 8.4, but I can reproduce the problem on 9.0 too with this small script: drop schema alvherre cascade; drop schema test cascade; create schema test; create schema alvherre; set search_path = 'alvherre'; create or replace function dummy(text) returns text language sql as $$ SELECT relname::text FROM pg_class c WHERE c.oid = $1::regclass $$; create or replace function broken(p_name_table text) returns void language plpgsql as $$ declare v_table_full text := alvherre.dummy(p_name_table); begin return; end; $$; BEGIN; create table test.stuffs (stuff text); SAVEPOINT a; select broken('nonexistant.stuffs'); ROLLBACK TO a; select broken('test.stuffs'); rollback; The symptom is that the second call to broken() fails with this error message: ERROR: relation "" does not exist CONTEXT: SQL function "dummy" statement 1 PL/pgSQL function "broken" line 3 during statement block local variable initialization Note that this is totally bogus, because the relation being referenced does indeed exist. In fact, if you commit the transaction and call the function again, it works. Also, the state after the first call is a bit bogus: if you repeat the whole sequence starting at the BEGIN line, it causes a crash on 8.4. I hacked up plancache a bit so that it marks plans as dead when the subtransaction resource owner releases it. It adds a new arg to ReleaseCachedPlan(); if true, the plan is marked dead. All current callers, except the one in ResourceOwnerReleaseInternal(), use false thus preserving the current behavior. resowner sets this as true when aborting a (sub)transaction. I have to admit that it seems somewhat the wrong API, but I don't see a better way. (I thought above relcache or syscache inval, but as far as I can't tell there isn't any here). I'm open to suggestions. Patch attached. -- Álvaro Herrera 0001-Mark-a-cache-plan-as-dead-when-aborting-its-creating.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers