Hi all!
The best way to describe my question is to show the code as first:
create table tst(
id int primary key,
j1 jsonb,
j2 jsonb
);
insert into tst
select
ser,
jsonb_build_object(
floor(random() * 10 + 1), floor(random() * 1000 + 1),
floor(random() * 10 + 1), floor(random() * 1000 + 1),
floor(random() * 10 + 1), floor(random() * 1000 + 1)
),
jsonb_build_object(
floor(random() * 10 + 1), floor(random() * 1000 + 1),
floor(random() * 10 + 1), floor(random() * 1000 + 1),
floor(random() * 10 + 1), floor(random() * 1000 + 1)
)
from generate_series(1, 500000) ser;
analyze tst;
-- original func is a bit complicated. But it doesn't matter here
create or replace function tst_func(a jsonb, b jsonb) returns bigint
stable
language sql
as $$
select
sum(
((_a.value::text)::int - (coalesce(b->>_a.key, '0'))::int)::int
)
from
jsonb_each(a) _a
$$;
-- get plain data
explain analyze select
id,
j1,
j2
from
tst;
-- use subquery (the same code as in function)
explain analyze select
id,
j1,
j2,
(
select
sum(
((_a.value::text)::int - (coalesce(j2->>_a.key, '0'))::int)::int
)
from
jsonb_each(j1) _a
)
from
tst;
-- use function
explain analyze select
id,
j1,
j2,
tst_func(j1, j2)
from
tst;
select version();
And after run it I got following results:
CREATE TABLE
INSERT 0 500000
ANALYZE
CREATE FUNCTION
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Seq Scan on tst (cost=0.00..13558.00 rows=500000 width=108) (actual
time=0.009..40.348 rows=500000 loops=1)
Planning time: 0.189 ms
Execution time: 56.356 ms
(3 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Seq Scan on tst (cost=0.00..1644808.00 rows=500000 width=116) (actual
time=0.021..1966.190 rows=500000 loops=1)
SubPlan 1
-> Aggregate (cost=3.25..3.26 rows=1 width=8) (actual time=0.004..0.004
rows=1 loops=500000)
-> Function Scan on jsonb_each _a (cost=0.00..1.00 rows=100
width=64) (actual time=0.002..0.002 rows=3 loops=500000)
Planning time: 0.072 ms
Execution time: 1982.192 ms
(6 rows)
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on tst (cost=0.00..138558.00 rows=500000 width=116) (actual
time=0.072..5308.897 rows=500000 loops=1)
Planning time: 0.067 ms
Execution time: 5328.196 ms
(3 rows)
version
-------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.13 on x86_64-pc-linux-gnu (Ubuntu 9.6.13-1.pgdg16.04+1),
compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, 64-bit
(1 row)
As you see, subquery version needs 2 seconds when func version needs more than
5. And it's sad to see. I love functions and actually this func (its production
version) is widely used across our project.
I tried to alter function as immutable and even parallel safe but it doesn't
help.
I understand that functions execution is not free, but why so much?
So, my question is: is there any way to make this function works faster?
Something similar to prepared statement or maybe rules (create rule) or
whatever else is available.
PS current situation maybe solved by denormalization with precalculations of
function and storing results along with data, but it's not the way i would like
to use, because it leads to more issues to solve (invalidation, for instance)