On 15/02/11 20:39, Peter Eisentraut wrote: > On tis, 2011-02-15 at 09:58 +0100, Jan Urbański wrote: >> Because the invocation that actually recurses sets up the scene for >> failure. > > That's what we're observing, but I can't figure out why it is. If you > can, could you explain it? > > It actually makes sense to me that the arguments should be deleted at > the end of the call. The data belongs to that call only, and > PLy_procedure_delete() that would otherwise clean it up is only called > rarely. > > Apparently, the recursive call ends up deleting the wrong arguments, but > it's not clear to me why that would affect the next top-level call, > because that would set up its own arguments again anyway. In any case, > perhaps the right fix is to fix PLy_function_delete_args() to delete the > args correctly.
Aaah, ok, I got it (again). Let me write this in full before I forget and spend another hour chasing that bug (and boy, bugs that disappear because you're doing things in the debugger are so annoying). And actually, my patch doesn't fix it fully :| Let me demonstrate: CREATE FUNCTION rec(n integer) RETURNS integer AS $$ if n == 0: return plpy.notice("before n is %d" % n) plpy.execute("select rec(0)") plpy.notice("after n is %d" % n) $$ LANGUAGE plpythonu; Without the patch the second plpy.notice raises a NameError. With the patch the output is: NOTICE: before n is 4 CONTEXT: PL/Python function "rec" NOTICE: after n is 0 CONTEXT: PL/Python function "rec" What happens? In PLy_function_handler, PLy_function_build_args is called, and proc->globals is set. After that PLy_procedure_call is called, which starts executing Python code. The Python code does a call into C with plpy.execute, and PLy_function_handler gets called (a reentrant call). Then PLy_function_build_args is called again. It overwrites the "n" entry in proc->globals and then PLy_procedure_call gets called, which drops us back into Python (on the stack there's now C, Python, C, Python). This second invocation exits quickly because n == 0, and we're back in C. Now without my patch, the next thing to happen was deleting the arguments, which removed "n" from the proc->globals dict. The rest of C code runs and finally plpy.execute returns and we;re back in Python (the stack is C, Python). The second plpy.notice is run, which fetches "n" from the globals, and not finding it, raises a NameError. With the patch it simply fetches the overwritten value, namely 0. The KeyError was a red herring - that's how Python reacted when evaluating "n in (0, 1)", and if you look in the server log you'll see a RuntimeWarning complaining about something internal, that doesn't matter. The bottom line is that PLy_procedure_call is not reentrant because of proc->globals, and it has to be. Now when fixing this bug I tries copying the globals dict and restoring it, but ran into issues (I think the problem was that the function didn't like running with different globals then the one it has been compiled with). Not sure what to do with this :( Document it as a caveat (with or without my patch) and carry on? That sucks quite badly... Jan -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers