On 15/02/11 20:39, Peter Eisentraut wrote:
> On tis, 2011-02-15 at 09:58 +0100, Jan Urbański wrote:
>> Because the invocation that actually recurses sets up the scene for
>> failure.
> That's what we're observing, but I can't figure out why it is.  If you
> can, could you explain it?
> It actually makes sense to me that the arguments should be deleted at
> the end of the call.  The data belongs to that call only, and
> PLy_procedure_delete() that would otherwise clean it up is only called
> rarely.
> Apparently, the recursive call ends up deleting the wrong arguments, but
> it's not clear to me why that would affect the next top-level call,
> because that would set up its own arguments again anyway.  In any case,
> perhaps the right fix is to fix PLy_function_delete_args() to delete the
> args correctly.

Aaah, ok, I got it (again). Let me write this in full before I forget
and spend another hour chasing that bug (and boy, bugs that disappear
because you're doing things in the debugger are so annoying). And
actually, my patch doesn't fix it fully :| Let me demonstrate:

CREATE FUNCTION rec(n integer) RETURNS integer AS $$
if n == 0:
plpy.notice("before n is %d" % n)
plpy.execute("select rec(0)")
plpy.notice("after n is %d" % n)
$$ LANGUAGE plpythonu;

Without the patch the second plpy.notice raises a NameError. With the
patch the output is:

NOTICE:  before n is 4
CONTEXT:  PL/Python function "rec"
NOTICE:  after n is 0
CONTEXT:  PL/Python function "rec"

What happens? In PLy_function_handler, PLy_function_build_args is
called, and proc->globals is set. After that PLy_procedure_call is
called, which starts executing Python code. The Python code does a call
into C with plpy.execute, and PLy_function_handler gets called (a
reentrant call).

Then PLy_function_build_args is called again. It overwrites the "n"
entry in proc->globals and then PLy_procedure_call gets called, which
drops us back into Python (on the stack there's now C, Python, C,
Python). This second invocation exits quickly because n == 0, and we're
back in C.

Now without my patch, the next thing to happen was deleting the
arguments, which removed "n" from the proc->globals dict. The rest of C
code runs and finally plpy.execute returns and we;re back in Python (the
stack is C, Python).

The second plpy.notice is run, which fetches "n" from the globals, and
not finding it, raises a NameError. With the patch it simply fetches the
overwritten value, namely 0.

The KeyError was a red herring - that's how Python reacted when
evaluating "n in (0, 1)", and if you look in the server log you'll see a
RuntimeWarning complaining about something internal, that doesn't
matter. The bottom line is that PLy_procedure_call is not reentrant
because of proc->globals, and it has to be.

Now when fixing this bug I tries copying the globals dict and restoring
it, but ran into issues (I think the problem was that the function
didn't like running with different globals then the one it has been
compiled with). Not sure what to do with this :( Document it as a caveat
(with or without my patch) and carry on? That sucks quite badly...


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to