Re: [HACKERS] Fix for PL/Python slow input arrays traversal issue

2016-09-19 Thread Dave Cramer
Yes, this should be closed as it is contained in 
https://commitfest.postgresql.org/10/697/
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for PL/Python slow input arrays traversal issue

2016-09-18 Thread Dave Cramer
Pavel,

I will pick these up.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for PL/Python slow input arrays traversal issue

2016-09-09 Thread Pavel Stehule
This entry, should be closed, because this patch is part of another patch

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fix for PL/Python slow input arrays traversal issue

2016-07-28 Thread Alexey Grishchenko
Hi

Following issue exists with PL/Python: when your function takes array as
input parameters, processing arrays of fixed-size elements containing null
values is many times slower than processing same array without nulls. Here
is an example:

-- Function

create or replace function test(a int8[]) returns int8 as $BODY$
return sum([x for x in a if x is not None])
$BODY$ language plpythonu volatile;

pl_regression=# select test(array_agg(a)::int8[])
pl_regression-# from (
pl_regression(# select generate_series(1,10) as a
pl_regression(# ) as q;
test

 55
(1 row)

Time: 22.248 ms
pl_regression=# select test(array_agg(a)::int8[])
pl_regression-# from (
pl_regression(# select generate_series(1,10) as a
pl_regression(# union all
pl_regression(# select null::int8 as a
pl_regression(# ) as q;
test

 55
(1 row)

Time: 7179.921 ms


As you can see, single null in array introduces 320x slowdown. The reason
for this is following:
Original implementation uses array_ref for each element of the array. Each
call to array_ref causes subsequent call to array_seek. Function array_seek
in turn has a shortcut for fixed-size arrays with no nulls. But if your
array is not of fixed-size elements, or if it contains nulls, each call to
array_seek would cause calculation of the Kth element offset starting from
the first element. This is O(N^2) algorithm, resulting in high processing
time for arrays of non-fixed-size elements and arrays with nulls.

The fix I propose applies same logic used at array_out function for
efficient array traversal, keeping the pointer to the last fetched
element's offset, which results in dramatical performance improvement for
affected cases. With this implementation, both arrays of fixed-size
elements without nulls, fixed-size elements with nulls and variable-size
elements are processed with the same speed. Here is the test after this fix
is applied:

pl_regression=# select test(array_agg(a)::int8[])
pl_regression-# from (
pl_regression(# select generate_series(1,10) as a
pl_regression(# ) as q;
test

 55
(1 row)

Time: 21.056 ms
pl_regression=# select test(array_agg(a)::int8[])
pl_regression-# from (
pl_regression(# select generate_series(1,10) as a
pl_regression(# union all
pl_regression(# select null::int8 as a
pl_regression(# ) as q;
test

 55
(1 row)

Time: 22.839 ms


-- 
Best regards,
Alexey Grishchenko


0001-Fix-for-PL-Python-slow-input-arrays-traversal-issue.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers