Re: [HACKERS] crash in plancache with subtransactions

2010-11-01 Thread Jim Nasby
On Oct 29, 2010, at 10:54 AM, Tom Lane wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of mié oct 27 18:18:06 -0300 2010:
 I spent quite a bit of time trying to deal with the memory-leakage
 problem without adding still more bookkeeping overhead.  It wasn't
 looking good, and then I had a sudden insight: if we see that the in-use
 flag is set, we can simply return FALSE from exec_eval_simple_expr.
 
 I tried the original test cases that were handed to me (quite different
 from what I submitted here) and they are fixed also.  Thanks.
 
 It'd be interesting to know if there's any noticeable slowdown on
 affected real-world cases.  (Of course, if they invariably crashed
 before, there might not be a way to measure their previous speed...)

I should be able to get Alvaro something he can use to test the performance. 
Our patch framework uses a recursive function to follow patch dependencies (of 
course that can go away in 8.4 thanks to WITH). I know we've got some other 
recursive calls but I don't think any are critical (it'd be nice if there was a 
way to find out if a function was recursive, I guess theoretically that could 
be discovered during compilation but I don't know how hairy it would be).

One question: What happens if you have multiple paths to the same function 
within another function? For example, we have an assert function that's used 
all over the place; it will definitely be called from multiple places in a call 
stack.

FWIW, I definitely run into cases where recursion makes for cleaner code than 
looping, so it'd be great to avoid making it slower than it needs to be. But 
I've always assumed that recursion is slower than looping so I avoid it for 
anything I know could be performance sensitive.

(looking at original case)... the original bug wasn't actually recursive. It's 
not clear to me how it actually got into this case. The original error report 
is:

psql:sql/code.lookup_table_dynamic.sql:23: ERROR:  buffer 2682 is not owned by 
resource owner Portal
CONTEXT:  SQL function table_schema_and_name statement 1
SQL function table_full_name statement 1
PL/pgSQL function getsert line 9 during statement block local variable 
initialization
server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.

Line 23 is:

SELECT code.getsert( 'test.stuffs', 'stuff' );

The functions are below. The duplicity of full_name_table and table_full_name 
is because the function was originally called full_name_table, but I decided to 
rename it after creating other table functions. In any case, I don't see any 
obvious recursion or re-entry, unless perhaps tools.table_schema_and_name ends 
up getting called twice by tools.table_full_name?

-[ RECORD 1 
]---+-
Schema  | code
Name| getsert
Result data type| void
Argument data types | p_table_name text, p_lookup text
Volatility  | volatile
Owner   | cnuadmin
Language| plpgsql
Source code | 
: DECLARE
: v_object_class text := 'getsert';
: v_function_name text := p_table_name || '__' || 
v_object_class;
: 
: v_table_full text := tools.full_name_table( 
p_table_name );
: v_schema text;
: v_table text;
: 
: BEGIN
: SELECT INTO v_schema, v_table * FROM 
tools.split_schema( v_table_full );
: 
: PERFORM code_internal.create_object( v_function_name, 
'FUNCTION', v_object_class, array[ ['schema', v_schema], ['table', v_table], 
['lookup', p_lookup] ] );
: END;
: 
Description | Creates a function that will lookup an ID based on a text 
lookup value (p_lookup). If no record exists, one will be created.
: 
: Parameters:
: p_table_name Name of the table to lookup the value in
: p_lookup Name of the field to use for the lookup value
: 
: Results:
: Creates function %p_table_name%__getsert( %p_lookup% with 
a type matching the p_lookup field in p_table_name ). The function returns an 
ID as an int.
: Revokes all on the function from public and grants 
execute to cnuapp_role.
: 

test...@workbook.local=# \df+ tools.full_name_table 
List of functions
-[ RECORD 1 ]---+---
Schema  | tools
Name| full_name_table
Result data type| text
Argument data types | 

Re: [HACKERS] crash in plancache with subtransactions

2010-11-01 Thread Tom Lane
Jim Nasby j...@nasby.net writes:
 (looking at original case)... the original bug wasn't actually
 recursive.

No, there are two different cases being dealt with here.  If the first
call of an expression results in an error, and then we come back and try
to re-use the expression state tree, we can have trouble because the
state tree contains partially-updated internal state for the called
function.  This doesn't require any recursion but it leads to pretty
much the same problems as the recursive case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash in plancache with subtransactions

2010-10-29 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mié oct 27 18:18:06 -0300 2010:
 I wrote:
  Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  One simple idea is to keep a flag along with the executor state to 
  indicate that the executor state is currently in use. Set it just before 
  calling ExecEvalExpr, and reset afterwards. If the flag is already set 
  in the beginning of exec_eval_simple_expr, we have recursed, and must 
  create a new executor state.
 
  Yeah, the same thought occurred to me in the shower this morning.
  I'm concerned about possible memory leakage during repeated recursion,
  but maybe that can be dealt with.
 
 I spent quite a bit of time trying to deal with the memory-leakage
 problem without adding still more bookkeeping overhead.  It wasn't
 looking good, and then I had a sudden insight: if we see that the in-use
 flag is set, we can simply return FALSE from exec_eval_simple_expr.

I tried the original test cases that were handed to me (quite different
from what I submitted here) and they are fixed also.  Thanks.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash in plancache with subtransactions

2010-10-29 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of mié oct 27 18:18:06 -0300 2010:
 I spent quite a bit of time trying to deal with the memory-leakage
 problem without adding still more bookkeeping overhead.  It wasn't
 looking good, and then I had a sudden insight: if we see that the in-use
 flag is set, we can simply return FALSE from exec_eval_simple_expr.

 I tried the original test cases that were handed to me (quite different
 from what I submitted here) and they are fixed also.  Thanks.

It'd be interesting to know if there's any noticeable slowdown on
affected real-world cases.  (Of course, if they invariably crashed
before, there might not be a way to measure their previous speed...)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash in plancache with subtransactions

2010-10-29 Thread Alvaro Herrera
Excerpts from Tom Lane's message of vie oct 29 12:54:52 -0300 2010:
 Alvaro Herrera alvhe...@commandprompt.com writes:

  I tried the original test cases that were handed to me (quite different
  from what I submitted here) and they are fixed also.  Thanks.
 
 It'd be interesting to know if there's any noticeable slowdown on
 affected real-world cases.  (Of course, if they invariably crashed
 before, there might not be a way to measure their previous speed...)

Yeah, the cases that were reported failed with one of

ERROR: invalid memory alloc request size 18446744073482534916
ERROR: could not open relation with OID 0
ERROR: buffer 228 is not owned by resource owner Portal

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash in plancache with subtransactions

2010-10-27 Thread Tom Lane
I wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 One simple idea is to keep a flag along with the executor state to 
 indicate that the executor state is currently in use. Set it just before 
 calling ExecEvalExpr, and reset afterwards. If the flag is already set 
 in the beginning of exec_eval_simple_expr, we have recursed, and must 
 create a new executor state.

 Yeah, the same thought occurred to me in the shower this morning.
 I'm concerned about possible memory leakage during repeated recursion,
 but maybe that can be dealt with.

I spent quite a bit of time trying to deal with the memory-leakage
problem without adding still more bookkeeping overhead.  It wasn't
looking good, and then I had a sudden insight: if we see that the in-use
flag is set, we can simply return FALSE from exec_eval_simple_expr.
That causes exec_eval_expr to run the expression using the non simple
code path, which is perfectly safe because it isn't trying to reuse
state that might be dirty.  Thus the attached patch, which fixes
both of the failure cases discussed in this thread.

Advantages:
1. The slowdown for normal cases (non-recursive, non-error-inducing)
is negligible.
2. It's simple enough to back-patch without fear.

Disadvantages:
1. Recursive cases get noticeably slower, about 4X slower according
to tests with this example:

create or replace function factorial(n int) returns float8 as $$
begin
  if (n  1) then
return n * factorial(n - 1);
  end if;
  return 1;
end
$$ language plpgsql strict immutable;

The slowdown is only for the particular expression that actually has
invoked a recursive call, so the above is probably much the worst case
in practice.  I doubt many people really use plpgsql this way, but ...

2. Cases where we're re-executing an expression that failed earlier in
the same transaction likewise get noticeably slower.  This is only a
problem if you're using subtransactions to catch errors, and the
overhead of the subtransaction is going to be large enough to partially
hide the extra eval cost anyway.  So I didn't bother to make a timing
test case --- it's not going to be as bad as the example above.

I currently think that we don't have much choice except to use this
patch for the back branches: any better-performing alternative is
going to require enough surgery that back-patching would be dangerous.
Maybe somebody can come up with a better answer for HEAD, but I don't
have one.

Comments?

regards, tom lane

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 9929e04e57bbc7d08938765b7f506c05c07b772b..1f40f3cf69234ff650ece2ccb09791f8e075ec1a 100644
*** a/src/pl/plpgsql/src/pl_exec.c
--- b/src/pl/plpgsql/src/pl_exec.c
*** loop_exit:
*** 4491,4497 
   *a Datum by directly calling ExecEvalExpr().
   *
   * If successful, store results into *result, *isNull, *rettype and return
!  * TRUE.  If the expression is not simple (any more), return FALSE.
   *
   * It is possible though unlikely for a simple expression to become non-simple
   * (consider for example redefining a trivial view).  We must handle that for
--- 4491,4508 
   *a Datum by directly calling ExecEvalExpr().
   *
   * If successful, store results into *result, *isNull, *rettype and return
!  * TRUE.  If the expression cannot be handled by simple evaluation,
!  * return FALSE.
!  *
!  * Because we only store one execution tree for a simple expression, we
!  * can't handle recursion cases.  So, if we see the tree is already busy
!  * with an evaluation in the current xact, we just return FALSE and let the
!  * caller run the expression the hard way.  (Other alternatives such as
!  * creating a new tree for a recursive call either introduce memory leaks,
!  * or add enough bookkeeping to be doubtful wins anyway.)  Another case that
!  * is covered by the expr_simple_in_use test is where a previous execution
!  * of the tree was aborted by an error: the tree may contain bogus state
!  * so we dare not re-use it.
   *
   * It is possible though unlikely for a simple expression to become non-simple
   * (consider for example redefining a trivial view).  We must handle that for
*** exec_eval_simple_expr(PLpgSQL_execstate 
*** 4528,4533 
--- 4539,4550 
  		return false;
  
  	/*
+ 	 * If expression is in use in current xact, don't touch it.
+ 	 */
+ 	if (expr-expr_simple_in_use  expr-expr_simple_lxid == curlxid)
+ 		return false;
+ 
+ 	/*
  	 * Revalidate cached plan, so that we will notice if it became stale. (We
  	 * also need to hold a refcount while using the plan.)	Note that even if
  	 * replanning occurs, the length of plancache_list can't change, since it
*** exec_eval_simple_expr(PLpgSQL_execstate 
*** 4562,4567 
--- 4579,4585 
  	{
  		expr-expr_simple_state = ExecPrepareExpr(expr-expr_simple_expr,
    

Re: [HACKERS] crash in plancache with subtransactions

2010-10-26 Thread Tom Lane
I wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 One simple idea is to keep a flag along with the executor state to 
 indicate that the executor state is currently in use. Set it just before 
 calling ExecEvalExpr, and reset afterwards. If the flag is already set 
 in the beginning of exec_eval_simple_expr, we have recursed, and must 
 create a new executor state.

 Yeah, the same thought occurred to me in the shower this morning.
 I'm concerned about possible memory leakage during repeated recursion,
 but maybe that can be dealt with.

I spent some time poking at this today, and developed the attached
patch, which gets rid of all the weird assumptions associated with
simple expressions in plpgsql, in favor of just doing another
ExecInitExpr per expression in each call of the function.  While this is
a whole lot cleaner than what we have, I'm afraid that it's unacceptably
slow.  I'm seeing an overall slowdown of 2X to 3X on function execution
with examples like:

create or replace function speedtest10(x float8) returns float8 as $$
declare
  z float8 := x;
begin
  z := z * 2 + 1;
  z := z * 2 + 1;
  z := z * 2 + 1;
  z := z * 2 + 1;
  z := z * 2 + 1;
  z := z * 2 + 1;
  z := z * 2 + 1;
  z := z * 2 + 1;
  z := z * 2 + 1;
  z := z * 2 + 1;
  return z;
end
$$
language plpgsql immutable;

Now, this is about the worst case for the patch.  This function's
runtime depends almost entirely on the speed of simple expressions,
and because there's no internal looping, we only get to use the result
of each ExecInitExpr once per function call.  So probably typical use
cases wouldn't be quite so bad; but still it seems like we can't go this
route.  We need to be able to use the ExecInitExpr results across
successive calls one way or another.

I'll look into creating an in-use flag next.

regards, tom lane

diff --git a/src/pl/plpgsql/src/gram.y b/src/pl/plpgsql/src/gram.y
index a28c6707e4670be17ed1c947d383f1d11b9c90c5..0963efe1b5b5e7e974a682d2a7f71d6427180e9b 100644
*** a/src/pl/plpgsql/src/gram.y
--- b/src/pl/plpgsql/src/gram.y
*** decl_statement	: decl_varname decl_const
*** 490,495 
--- 490,496 
  		curname_def = palloc0(sizeof(PLpgSQL_expr));
  
  		curname_def-dtype = PLPGSQL_DTYPE_EXPR;
+ 		curname_def-expr_num = plpgsql_nExprs++;
  		strcpy(buf, SELECT );
  		cp1 = new-refname;
  		cp2 = buf + strlen(buf);
*** read_sql_construct(int until,
*** 2277,2282 
--- 2278,2284 
  	expr-query			= pstrdup(ds.data);
  	expr-plan			= NULL;
  	expr-paramnos		= NULL;
+ 	expr-expr_num  = plpgsql_nExprs++;
  	expr-ns= plpgsql_ns_top();
  	pfree(ds.data);
  
*** make_execsql_stmt(int firsttoken, int lo
*** 2476,2481 
--- 2478,2484 
  	expr-query			= pstrdup(ds.data);
  	expr-plan			= NULL;
  	expr-paramnos		= NULL;
+ 	expr-expr_num  = plpgsql_nExprs++;
  	expr-ns= plpgsql_ns_top();
  	pfree(ds.data);
  
diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index 3ddcf3b5a595e0847627cc10c4084258f44cc352..4feb14d2bec87de5d8a66071ab3a66da3c2a2d76 100644
*** a/src/pl/plpgsql/src/pl_comp.c
--- b/src/pl/plpgsql/src/pl_comp.c
***
*** 37,42 
--- 37,47 
  
  /* --
   * Our own local and global variables
+  *
+  * Ideally these would live in some dynamically-allocated structure, but
+  * it's unpleasant to pass such a thing around in a Bison parser.  As long
+  * as plpgsql function compilation isn't re-entrant, it's okay to use
+  * static storage for these.
   * --
   */
  PLpgSQL_stmt_block *plpgsql_parse_result;
*** int			plpgsql_nDatums;
*** 46,51 
--- 51,58 
  PLpgSQL_datum **plpgsql_Datums;
  static int	datums_last = 0;
  
+ int			plpgsql_nExprs;
+ 
  char	   *plpgsql_error_funcname;
  bool		plpgsql_DumpExecTree = false;
  bool		plpgsql_check_syntax = false;
*** do_compile(FunctionCallInfo fcinfo,
*** 367,372 
--- 374,381 
  	 sizeof(PLpgSQL_datum *) * datums_alloc);
  	datums_last = 0;
  
+ 	plpgsql_nExprs = 0;
+ 
  	switch (is_trigger)
  	{
  		case false:
*** do_compile(FunctionCallInfo fcinfo,
*** 693,698 
--- 702,708 
  	function-datums = palloc(sizeof(PLpgSQL_datum *) * plpgsql_nDatums);
  	for (i = 0; i  plpgsql_nDatums; i++)
  		function-datums[i] = plpgsql_Datums[i];
+ 	function-nexprs = plpgsql_nExprs;
  
  	/* Debug dump for completed functions */
  	if (plpgsql_DumpExecTree)
*** plpgsql_compile_inline(char *proc_source
*** 788,793 
--- 798,804 
  	plpgsql_nDatums = 0;
  	plpgsql_Datums = palloc(sizeof(PLpgSQL_datum *) * datums_alloc);
  	datums_last = 0;
+ 	plpgsql_nExprs = 0;
  
  	/* Set up as though in a function returning VOID */
  	function-fn_rettype = VOIDOID;
*** plpgsql_compile_inline(char *proc_source
*** 838,843 
--- 849,855 
  	function-datums = palloc(sizeof(PLpgSQL_datum *) * 

Re: [HACKERS] crash in plancache with subtransactions

2010-10-22 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 22.10.2010 06:10, Tom Lane wrote:
 (But on the third
 hand, what are we gonna do for back-patching to versions without the
 plancache?)

 One simple idea is to keep a flag along with the executor state to 
 indicate that the executor state is currently in use. Set it just before 
 calling ExecEvalExpr, and reset afterwards. If the flag is already set 
 in the beginning of exec_eval_simple_expr, we have recursed, and must 
 create a new executor state.

Yeah, the same thought occurred to me in the shower this morning.
I'm concerned about possible memory leakage during repeated recursion,
but maybe that can be dealt with.  I'll look into this issue soon,
though probably not today (Red Hat work calls :-().

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] crash in plancache with subtransactions

2010-10-21 Thread Alvaro Herrera
Hi,

A customer was hitting some misbehavior in one of their internal tests and
I tracked it down to plancache not behaving properly with
subtransactions: in particular, a plan is not being marked dead when
the subtransaction on which it is planned rolls back.  It was reported
in 8.4, but I can reproduce the problem on 9.0 too with this small
script:

drop schema alvherre cascade;
drop schema test cascade;
create schema test;
create schema alvherre;
set search_path = 'alvherre';

create or replace function dummy(text) returns text language sql
as $$ SELECT relname::text FROM pg_class c WHERE c.oid = $1::regclass 
$$;

create or replace function broken(p_name_table text) returns void
language plpgsql as $$
declare
v_table_full text := alvherre.dummy(p_name_table);
begin
return;
end;
$$;

BEGIN;
 create table test.stuffs (stuff text);
 SAVEPOINT a;
 select broken('nonexistant.stuffs');

 ROLLBACK TO a;
 select broken('test.stuffs');

rollback;


The symptom is that the second call to broken() fails with this error
message:

ERROR:  relation  does not exist
CONTEXT:  SQL function dummy statement 1
PL/pgSQL function broken line 3 during statement block local variable 
initialization

Note that this is totally bogus, because the relation being referenced
does indeed exist.  In fact, if you commit the transaction and call the
function again, it works.

Also, the state after the first call is a bit bogus: if you repeat the
whole sequence starting at the BEGIN line, it causes a crash on 8.4.

I hacked up plancache a bit so that it marks plans as dead when the
subtransaction resource owner releases it.  It adds a new arg to
ReleaseCachedPlan(); if true, the plan is marked dead.  All current
callers, except the one in ResourceOwnerReleaseInternal(), use false
thus preserving the current behavior.  resowner sets this as true when
aborting a (sub)transaction.

I have to admit that it seems somewhat the wrong API, but I don't see a
better way.  (I thought above relcache or syscache inval, but as far as
I can't tell there isn't any here).  I'm open to suggestions.

Patch attached.

-- 
Álvaro Herrera alvhe...@alvh.no-ip.org


0001-Mark-a-cache-plan-as-dead-when-aborting-its-creating.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash in plancache with subtransactions

2010-10-21 Thread Tom Lane
Alvaro Herrera alvhe...@alvh.no-ip.org writes:
 A customer was hitting some misbehavior in one of their internal tests and
 I tracked it down to plancache not behaving properly with
 subtransactions: in particular, a plan is not being marked dead when
 the subtransaction on which it is planned rolls back.

I don't believe that it's plancache's fault; the real problem is that
plpgsql is keeping simple expression execution trees around longer
than it should.  Your patch masks the problem by forcing those trees to
be rebuilt, but it's the execution trees not the plan trees that contain
stale data.

I'm not immediately sure why plpgsql_subxact_cb is failing to clean up
correctly in this example, but that seems to be where to look.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash in plancache with subtransactions

2010-10-21 Thread Alvaro Herrera
Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010:
 Alvaro Herrera alvhe...@alvh.no-ip.org writes:
  A customer was hitting some misbehavior in one of their internal tests and
  I tracked it down to plancache not behaving properly with
  subtransactions: in particular, a plan is not being marked dead when
  the subtransaction on which it is planned rolls back.
 
 I don't believe that it's plancache's fault; the real problem is that
 plpgsql is keeping simple expression execution trees around longer
 than it should.  Your patch masks the problem by forcing those trees to
 be rebuilt, but it's the execution trees not the plan trees that contain
 stale data.

Ahh, this probably explains why I wasn't been able to reproduce the
problem without involving subxacts, or prepared plans, that seemed to
follow mostly the same paths around plancache cleanup.

It's also the likely cause that this hasn't ben reported earlier.

 I'm not immediately sure why plpgsql_subxact_cb is failing to clean up
 correctly in this example, but that seems to be where to look.

Will take a look ... if the girls let me ...

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash in plancache with subtransactions

2010-10-21 Thread Alvaro Herrera
Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010:

 I'm not immediately sure why plpgsql_subxact_cb is failing to clean up
 correctly in this example, but that seems to be where to look.

I think the reason is that one econtext is pushed for function
execution, and another one for blocks that contain exceptions.  The
example function does not contain exceptions -- the savepoints are
handled by the external SQL code.

I'll have a closer look tomorrow.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash in plancache with subtransactions

2010-10-21 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010:
 I don't believe that it's plancache's fault; the real problem is that
 plpgsql is keeping simple expression execution trees around longer
 than it should.  Your patch masks the problem by forcing those trees to
 be rebuilt, but it's the execution trees not the plan trees that contain
 stale data.

 Ahh, this probably explains why I wasn't been able to reproduce the
 problem without involving subxacts, or prepared plans, that seemed to
 follow mostly the same paths around plancache cleanup.

 It's also the likely cause that this hasn't ben reported earlier.

I traced through the details and found that the proximate cause of the
crash is that this bit in fmgr_sql() gets confused:

/*
 * Convert params to appropriate format if starting a fresh execution. (If
 * continuing execution, we can re-use prior params.)
 */
if (es  es-status == F_EXEC_START)
postquel_sub_params(fcache, fcinfo);

After the error in the first subtransaction, the execution state tree
for the public.dummy(p_name_table) expression has a fn_extra link
that is pointing to a SQLFunctionCache that's in F_EXEC_RUN state.
So when the second call of broken() tries to re-use the state tree,
fmgr_sql() thinks it's continuing the execution of a set-returning
function, and doesn't bother to re-initialize its ParamListInfo
struct.  So it merrily tries to execute using a text datum that's
pointing at long-since-pfree'd storage for the original
'nonexistant.stuffs' argument string.

I had always felt a tad uncomfortable with the way that plpgsql re-uses
execution state trees for simple expressions; it seemed to me that it
was entirely unsafe in recursive usage.  But I'd never been able to
prove that it was broken.  Now that I've seen this example, I know how
to break it: recurse indirectly through a SQL function.  For instance,
this will dump core immediately:

create or replace function recurse(float8) returns float8 as
$$
begin
  raise notice 'recurse(%)', $1;
  if ($1  10) then
return sql_recurse($1 + 1);
  else
return $1;
  end if;
end
$$ language plpgsql;

-- limit is to prevent this from being inlined
create or replace function sql_recurse(float8) returns float8 as
$$ select recurse($1) limit 1; $$ language sql;

select recurse(0);

Notice the lack of any subtransaction or error condition.  The reason
this fails is *not* plancache misfeasance or failure to clean up after
error.  Rather, it's that the inner execution of recurse() is trying to
re-use an execution state tree that is pointing at an already-active
execution of sql_recurse().  In general, what plpgsql is doing is
entirely unsafe whenever a called function tries to keep changeable
execution state in storage pointed to by fn_extra.  We've managed to
miss the problem because plpgsql doesn't try to use this technique on
functions returning set (see exec_simple_check_node), and the vast
majority of non-SRF functions that use fn_extra at all use it to cache
catalog lookup results, which don't change from call to call.  But
there's no convention that says a function can't keep execution status
data in fn_extra --- in fact, there isn't anyplace else for it to keep
such data.

Right at the moment I'm not seeing any way that the present
exec_eval_simple_expr approach can be fixed to work safely in the
presence of recursion.  What I think we might have to do is give up
on the idea of caching execution state trees across calls, instead
using them just for the duration of a single plpgsql function call.
I'm not sure what sort of runtime penalty might ensue.  The whole design
predates the plancache, and I think it was mostly intended to prevent
having to re-parse-and-plan simple expressions every time.  So a lot of
that overhead has gone away anyway given the plancache, and maybe we
shouldn't sweat too much about paying what remains.  (But on the third
hand, what are we gonna do for back-patching to versions without the
plancache?)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash in plancache with subtransactions

2010-10-21 Thread Heikki Linnakangas

On 22.10.2010 06:10, Tom Lane wrote:

Right at the moment I'm not seeing any way that the present
exec_eval_simple_expr approach can be fixed to work safely in the
presence of recursion.  What I think we might have to do is give up
on the idea of caching execution state trees across calls, instead
using them just for the duration of a single plpgsql function call.
I'm not sure what sort of runtime penalty might ensue.  The whole design
predates the plancache, and I think it was mostly intended to prevent
having to re-parse-and-plan simple expressions every time.  So a lot of
that overhead has gone away anyway given the plancache, and maybe we
shouldn't sweat too much about paying what remains.


We should test and measure that.


 (But on the third
hand, what are we gonna do for back-patching to versions without the
plancache?)


One simple idea is to keep a flag along with the executor state to 
indicate that the executor state is currently in use. Set it just before 
calling ExecEvalExpr, and reset afterwards. If the flag is already set 
in the beginning of exec_eval_simple_expr, we have recursed, and must 
create a new executor state.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers