Re: [HACKERS] Endless loop calling PL/Python set returning functions

2016-04-05 Thread Tom Lane
Alexey Grishchenko  writes:
> Any comments on this patch?

I felt that this was more nearly a bug fix than a new feature, so I picked
it up even though it's nominally in the next commitfest not the current
one.  I did not like the code too much as it stood: you were not being
paranoid enough about ensuring that the callstack data structure stayed
in sync with the actual control flow.  Also, it didn't work for functions
that modify their argument values (cf the committed regression tests);
you have to explicitly save named arguments not only the "args" version,
and you have to do it for SRF suspend/resume not just recursion cases.
But I cleaned all that up and committed it.

> triggers are a bit different - they depend on modifying the global "TD"
> dictionary inside the Python function, and they return only the status
> string. For them, there is no option of modifying the code to avoid global
> input parameters without breaking the backward compatibility with the old
> enduser code.

Yeah.  It might be worth the trouble to include triggers in the
save/restore logic, since at least in principle they can be invoked
recursively; but there's not that much practical use for such cases.
I didn't bother with that in the patch as-committed, but if you want
to follow up with an adjustment for it, I'd take a look.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Endless loop calling PL/Python set returning functions

2016-03-22 Thread Alexey Grishchenko
Alexey Grishchenko  wrote:

> Alexey Grishchenko  wrote:
>
>> Tom Lane  wrote:
>>
>>> Alexey Grishchenko  writes:
>>> > No, my fix handles this well.
>>> > In fact, with the first function call you allocate global variables
>>> > representing Python function input parameters, call the function and
>>> > receive iterator over the function results. Then in a series of
>>> Postgres
>>> > calls to PL/Python handler you just fetch next value from the
>>> iterator, you
>>> > are not calling the Python function anymore. When the iterator reaches
>>> the
>>> > end, PL/Python call handler deallocates the global variable
>>> representing
>>> > function input parameter.
>>>
>>> > Regardless of the number of parallel invocations of the same function,
>>> each
>>> > of them in my patch would set its own input parameters to the Python
>>> > function, call the function and receive separate iterators. When the
>>> first
>>> > function's result iterator would reach its end, it would deallocate the
>>> > input global variable. But it won't affect other functions as they no
>>> > longer need to invoke any Python code.
>>>
>>> Well, if you think that works, why not undo the global-dictionary changes
>>> at the end of the first call, rather than later?  Then there's certainly
>>> no overlap in their lifespan.
>>>
>>> regards, tom lane
>>>
>>
>> Could you elaborate more on this? In general, stack-like solution would
>> work - if before the function call there is a global variable with the name
>> matching input variable name, push its value to the stack, and pop it after
>> the function execution. Would implement it tomorrow and see how it works
>>
>>
>> --
>>
>> Sent from handheld device
>>
>
> I have improved the code using proposed approach. The second version of
> patch is in attachment
>
> It works in a following way - the procedure object PLyProcedure stores
> information about the call stack depth (calldepth field) and the stack
> itself (argstack field). When the call stack depth is zero we don't make
> any additional processing, i.e. there won't be any performance impact for
> existing enduser functions. Stack manipulations are put in action only when
> the calldepth is greater than zero, which can be achieved either when the
> function is called recursively with SPI, or when you are calling the same
> set-returning function in a single query twice or more.
>
> Example of multiple calls to SRF within a single function:
>
> CREATE OR REPLACE FUNCTION func(iter int) RETURNS SETOF int AS $$
> return xrange(iter)
> $$ LANGUAGE plpythonu;
>
> select func(3), func(4);
>
>
> Before the patch query caused endless loop finishing with OOM. Now it
> works as it should
>
> Example of recursion with SPI:
>
> CREATE OR REPLACE FUNCTION test(a int) RETURNS int AS $BODY$
> r = 0
> if a > 1:
> r = plpy.execute("SELECT test(%d) as a" % (a-1))[0]['a']
> return a + r
> $BODY$ LANGUAGE plpythonu;
>
> select test(10);
>
>
> Before the patch query failed with "NameError: global name 'a' is not
> defined". Now it works correctly and returns 55
>
> --
> Best regards,
> Alexey Grishchenko
>

Hi

Any comments on this patch?

Regarding passing parameters to the Python function using globals - it was
in initial design of PL/Python (code
,
documentation
).
Originally you had to work with "args" global list of input parameters and
wasn't able to access the named parameters directly. And you can do so even
with the latest release. Going away from global input parameters would
require switching to PyObject_CallFunctionObjArgs
,
which should be possible by changing the function declaration to include
input parameters plus "args" (for backward compatibility). However,
triggers are a bit different - they depend on modifying the global "TD"
dictionary inside the Python function, and they return only the status
string. For them, there is no option of modifying the code to avoid global
input parameters without breaking the backward compatibility with the old
enduser code

-- 
Best regards,
Alexey Grishchenko


Re: [HACKERS] Endless loop calling PL/Python set returning functions

2016-03-11 Thread Alexey Grishchenko
Alexey Grishchenko  wrote:

> Tom Lane  wrote:
>
>> Alexey Grishchenko  writes:
>> > No, my fix handles this well.
>> > In fact, with the first function call you allocate global variables
>> > representing Python function input parameters, call the function and
>> > receive iterator over the function results. Then in a series of Postgres
>> > calls to PL/Python handler you just fetch next value from the iterator,
>> you
>> > are not calling the Python function anymore. When the iterator reaches
>> the
>> > end, PL/Python call handler deallocates the global variable representing
>> > function input parameter.
>>
>> > Regardless of the number of parallel invocations of the same function,
>> each
>> > of them in my patch would set its own input parameters to the Python
>> > function, call the function and receive separate iterators. When the
>> first
>> > function's result iterator would reach its end, it would deallocate the
>> > input global variable. But it won't affect other functions as they no
>> > longer need to invoke any Python code.
>>
>> Well, if you think that works, why not undo the global-dictionary changes
>> at the end of the first call, rather than later?  Then there's certainly
>> no overlap in their lifespan.
>>
>> regards, tom lane
>>
>
> Could you elaborate more on this? In general, stack-like solution would
> work - if before the function call there is a global variable with the name
> matching input variable name, push its value to the stack, and pop it after
> the function execution. Would implement it tomorrow and see how it works
>
>
> --
>
> Sent from handheld device
>

I have improved the code using proposed approach. The second version of
patch is in attachment

It works in a following way - the procedure object PLyProcedure stores
information about the call stack depth (calldepth field) and the stack
itself (argstack field). When the call stack depth is zero we don't make
any additional processing, i.e. there won't be any performance impact for
existing enduser functions. Stack manipulations are put in action only when
the calldepth is greater than zero, which can be achieved either when the
function is called recursively with SPI, or when you are calling the same
set-returning function in a single query twice or more.

Example of multiple calls to SRF within a single function:

CREATE OR REPLACE FUNCTION func(iter int) RETURNS SETOF int AS $$
return xrange(iter)
$$ LANGUAGE plpythonu;

select func(3), func(4);


Before the patch query caused endless loop finishing with OOM. Now it works
as it should

Example of recursion with SPI:

CREATE OR REPLACE FUNCTION test(a int) RETURNS int AS $BODY$
r = 0
if a > 1:
r = plpy.execute("SELECT test(%d) as a" % (a-1))[0]['a']
return a + r
$BODY$ LANGUAGE plpythonu;

select test(10);


Before the patch query failed with "NameError: global name 'a' is not
defined". Now it works correctly and returns 55

-- 
Best regards,
Alexey Grishchenko


0002-Fix-endless-loop-in-plpython-set-returning-function.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Endless loop calling PL/Python set returning functions

2016-03-10 Thread Alexey Grishchenko
Tom Lane  wrote:

> Alexey Grishchenko > writes:
> > No, my fix handles this well.
> > In fact, with the first function call you allocate global variables
> > representing Python function input parameters, call the function and
> > receive iterator over the function results. Then in a series of Postgres
> > calls to PL/Python handler you just fetch next value from the iterator,
> you
> > are not calling the Python function anymore. When the iterator reaches
> the
> > end, PL/Python call handler deallocates the global variable representing
> > function input parameter.
>
> > Regardless of the number of parallel invocations of the same function,
> each
> > of them in my patch would set its own input parameters to the Python
> > function, call the function and receive separate iterators. When the
> first
> > function's result iterator would reach its end, it would deallocate the
> > input global variable. But it won't affect other functions as they no
> > longer need to invoke any Python code.
>
> Well, if you think that works, why not undo the global-dictionary changes
> at the end of the first call, rather than later?  Then there's certainly
> no overlap in their lifespan.
>
> regards, tom lane
>

Could you elaborate more on this? In general, stack-like solution would
work - if before the function call there is a global variable with the name
matching input variable name, push its value to the stack, and pop it after
the function execution. Would implement it tomorrow and see how it works


-- 

Sent from handheld device


Re: [HACKERS] Endless loop calling PL/Python set returning functions

2016-03-10 Thread Tom Lane
Alexey Grishchenko  writes:
> No, my fix handles this well.
> In fact, with the first function call you allocate global variables
> representing Python function input parameters, call the function and
> receive iterator over the function results. Then in a series of Postgres
> calls to PL/Python handler you just fetch next value from the iterator, you
> are not calling the Python function anymore. When the iterator reaches the
> end, PL/Python call handler deallocates the global variable representing
> function input parameter.

> Regardless of the number of parallel invocations of the same function, each
> of them in my patch would set its own input parameters to the Python
> function, call the function and receive separate iterators. When the first
> function's result iterator would reach its end, it would deallocate the
> input global variable. But it won't affect other functions as they no
> longer need to invoke any Python code.

Well, if you think that works, why not undo the global-dictionary changes
at the end of the first call, rather than later?  Then there's certainly
no overlap in their lifespan.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Endless loop calling PL/Python set returning functions

2016-03-10 Thread Alexey Grishchenko
No, my fix handles this well.

In fact, with the first function call you allocate global variables
representing Python function input parameters, call the function and
receive iterator over the function results. Then in a series of Postgres
calls to PL/Python handler you just fetch next value from the iterator, you
are not calling the Python function anymore. When the iterator reaches the
end, PL/Python call handler deallocates the global variable representing
function input parameter.

Regardless of the number of parallel invocations of the same function, each
of them in my patch would set its own input parameters to the Python
function, call the function and receive separate iterators. When the first
function's result iterator would reach its end, it would deallocate the
input global variable. But it won't affect other functions as they no
longer need to invoke any Python code. Even if they need - they would
reallocate global variable (it would be set before the Python function
invocation). The issue here was in the fact that they tried to deallocate
the global input variable multiple times independently, which caused error
that I fixed.

Regarding the patch for the second case with recursion - not caching the
"globals" between function calls would have a performance impact, as you
would have to construct "globals" object before each function call. And you
need globals as it contains references to "plpy" module and global
variables and global dictionary ("GD"). I will think on this, maybe there
might be a better design for this scenario. But I still think the second
scenario requires a separate patch

On Thu, Mar 10, 2016 at 4:33 PM, Tom Lane  wrote:

> Alexey Grishchenko  writes:
> > One scenario when the problem occurs, is when you are calling the same
> > set-returning function in a single query twice. This way they share the
> > same "globals" which is not a bad thing, but when one function finishes
> > execution and deallocates input parameter's global, the second will fail
> > trying to do the same. I included the fix for this problem in my patch
>
> > The second scenario when the problem occurs is when you want to call the
> > same PL/Python function in recursion. For example, this code will not
> work:
>
> Right, the recursion case is what's not being covered by this patch.
> I would rather have a single patch that deals with both of those cases,
> perhaps by *not* sharing the same dictionary across calls.  I think
> what you've done here is not so much a fix as a band-aid.  In fact,
> it doesn't even really fix the problem for the two-calls-per-query
> case does it?  It'll work if the first execution of the SRF is run to
> completion before starting the second one, but not if the two executions
> are interleaved.  I believe you can test that type of scenario with
> something like
>
>   select set_returning_function_1(...), set_returning_function_2(...);
>
> regards, tom lane
>



-- 
Best regards,
Alexey Grishchenko


Re: [HACKERS] Endless loop calling PL/Python set returning functions

2016-03-10 Thread Tom Lane
Alexey Grishchenko  writes:
> One scenario when the problem occurs, is when you are calling the same
> set-returning function in a single query twice. This way they share the
> same "globals" which is not a bad thing, but when one function finishes
> execution and deallocates input parameter's global, the second will fail
> trying to do the same. I included the fix for this problem in my patch

> The second scenario when the problem occurs is when you want to call the
> same PL/Python function in recursion. For example, this code will not work:

Right, the recursion case is what's not being covered by this patch.
I would rather have a single patch that deals with both of those cases,
perhaps by *not* sharing the same dictionary across calls.  I think
what you've done here is not so much a fix as a band-aid.  In fact,
it doesn't even really fix the problem for the two-calls-per-query
case does it?  It'll work if the first execution of the SRF is run to
completion before starting the second one, but not if the two executions
are interleaved.  I believe you can test that type of scenario with
something like

  select set_returning_function_1(...), set_returning_function_2(...);

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Endless loop calling PL/Python set returning functions

2016-03-10 Thread Alexey Grishchenko
I agree that passing function parameters through globals is not the best
solution

It works in a following way - executing custom code (in our case Python
function invocation) in Python is made with PyEval_EvalCode
. As an input to this C
function you specify dictionary of globals that would be available to this
code. The structure PLyProcedure stores "PyObject *globals;", which is the
dictionary of globals for specific function. So SPI works pretty fine, as
each function has a separate dictionary of globals and they don't conflict
with each other

One scenario when the problem occurs, is when you are calling the same
set-returning function in a single query twice. This way they share the
same "globals" which is not a bad thing, but when one function finishes
execution and deallocates input parameter's global, the second will fail
trying to do the same. I included the fix for this problem in my patch

The second scenario when the problem occurs is when you want to call the
same PL/Python function in recursion. For example, this code will not work:


create or replace function test(a int) returns int as $BODY$
r = 0
if a > 1:
r = plpy.execute("SELECT test(%d) as a" % (a-1))[0]['a']
return a + r
$BODY$ language plpythonu;

select test(10);


The function "test" has a single PLyProcedure object allocated to handle
it, thus it has a single "globals" dictionary. When internal function call
finishes, it removes the key "a" from the dictionary, and the outer
function fails with "NameError: global name 'a' is not defined" when it
tries to execute "return a + r"

But the second issue is a separate story and I think it is worth a separate
patch


On Thu, Mar 10, 2016 at 3:35 PM, Tom Lane  wrote:

> Alexey Grishchenko  writes:
> > There is a bug in implementation of set-returning functions in PL/Python.
> > When you call the same set-returning function twice in a single query,
> the
> > executor falls to infinite loop which causes OOM.
>
> Ugh.
>
> > Another issue with calling the same set-returning function twice in the
> > same query, is that it would delete the input parameter of the function
> > from the global variables dictionary at the end of execution. With
> calling
> > the function twice, this code attempts to delete the same entry from
> global
> > variables dict twice, thus causing KeyError. This is why the
> > function PLy_function_delete_args is modified as well to check whether
> the
> > key we intend to delete is in the globals dictionary.
>
> That whole business with putting a function's parameters into a global
> dictionary makes me itch.  Doesn't it mean problems if one plpython
> function calls another (presumably via SPI)?
>
> regards, tom lane
>



-- 
Best regards,
Alexey Grishchenko


Re: [HACKERS] Endless loop calling PL/Python set returning functions

2016-03-10 Thread Tom Lane
Alexey Grishchenko  writes:
> There is a bug in implementation of set-returning functions in PL/Python.
> When you call the same set-returning function twice in a single query, the
> executor falls to infinite loop which causes OOM.

Ugh.

> Another issue with calling the same set-returning function twice in the
> same query, is that it would delete the input parameter of the function
> from the global variables dictionary at the end of execution. With calling
> the function twice, this code attempts to delete the same entry from global
> variables dict twice, thus causing KeyError. This is why the
> function PLy_function_delete_args is modified as well to check whether the
> key we intend to delete is in the globals dictionary.

That whole business with putting a function's parameters into a global
dictionary makes me itch.  Doesn't it mean problems if one plpython
function calls another (presumably via SPI)?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers