Re: [HACKERS] Parallel sec scan in plpgsql

Alex Ignatov Tue, 20 Sep 2016 04:16:44 -0700


On 18.09.2016 06:54, Amit Kapila wrote:

On Fri, Sep 16, 2016 at 8:48 PM, Alex Ignatov <[email protected]> wrote:


On 16.09.2016 16:50, Amit Kapila wrote:



Can you try by setting force_parallel_mode = off;?  I think it is
sending the whole function execution to worker due to
force_parallel_mode.


No changes:


Okay, it just skipped from my mind that we don't support parallel
queries for SQL statement execution (or statements executed via
exec_stmt_execsql) from plpgsql.  For detailed explanation of why that
is not feasible you can refer one of my earlier e-mails [1] on similar
topic.  I think if we can somehow get the results via Perform
statement, then it could be possible to use parallelism via plpgsql.

However, you can use it via SQL functions, an example is below:

set min_parallel_relation_size =0;
set parallel_tuple_cost=0;
set parallel_setup_cost=0;

Load 'auto_explain';
set auto_explain.log_min_duration = 0;
set auto_explain.log_analyze = true;
set auto_explain.log_nested_statements = true;

create table test_plpgsql(c1 int, c2 char(1000));
insert into test_plpgsql values(generate_series(1,100000),'aaa');

create or replace function parallel_test_set_sql() returns
setof bigint as $$
select count(*) from test_plpgsql;
$$language sql PARALLEL SAFE STRICT STABLE;

Then execute function as: select * from parallel_test_set_sql();  You
can see below plan if auto_explain module is loaded.

        Finalize Aggregate  (cost=14806.85..14806.86 rows=1 width=8) (actual tim
e=1094.966..1094.967 rows=1 loops=1)
          ->  Gather  (cost=14806.83..14806.84 rows=2 width=8) (actual time=472.
216..1094.943 rows=3 loops=1)
                Workers Planned: 2
                Workers Launched: 2
                ->  Partial Aggregate  (cost=14806.83..14806.84 rows=1 width=8)
(actual time=177.867..177.868 rows=1 loops=3)
                      ->  Parallel Seq Scan on test_plpgsql  (cost=0.00..14702.6
7 rows=41667 width=0) (actual time=0.384..142.565 rows=33333 loops=3)
CONTEXT:  SQL function "parallel_test_set_sql" statement 1
LOG:  duration: 2965.040 ms  plan:
        Query Text: select * from parallel_test_set_sql();
        Function Scan on parallel_test_set_sql  (cost=0.25..10.25 rows=1000 widt
h=8) (actual time=2538.620..2776.955 rows=1 loops=1)


In general, I think we should support the cases as required (or
written) by you from plpgsql or sql functions.  We need more work to
support such cases. There are probably two ways of supporting such
cases, we can build some intelligence in plpgsql execution such that
it can recognise such queries and allow to use parallelism or we need
to think of enabling parallelism for cases where we don't run the plan
to completion.  Most of the use cases from plpgsql or sql function
fall into later category as they don't generally run the plan to
completion.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1K8kaO_jRk42-o2rmhSRbKV-3mR%2BiNVcONLdbcSXW5TfQ%40mail.gmail.com


Thank you for you sugestion! That works.

But what  we can do with this function:
create or replace function parallel_test_sql(t int) returns setof bigint as
$$
   select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1 
group by a,b,c,d,e)t;
$$ language sql PARALLEL SAFE STRICT STABLE;

explain (analyze,buffers) select  * from  parallel_test_sql(2);

"Function Scan on parallel_test_sql  (cost=0.25..10.25 rows=1000 width=8) (actual 
time=2410.789..2410.790 rows=1 loops=1)"
"  Buffers: shared hit=63696"
"Planning time: 0.082 ms"
"Execution time: 2410.841 ms"

2016-09-20 14:09:04 MSK [13037]: [75-1] user=ipdr,db=ipdr,app=pgAdmin III - 
Query Tool,client=127.0.0.1 LOG:  duration: 2410.135 ms  plan:
        Query Text:
           select count(*) from (select a,b,c,d,e,sum(bytes) from test where 
a>= $1 group by a,b,c,d,e)t;

        Aggregate  (cost=230701.42..230701.43 rows=1 width=8)
          ->  HashAggregate  (cost=230363.59..230513.74 rows=15015 width=28)
                Group Key: test.a, test.b, test.c, test.d, test.e
                ->  Seq Scan on test  (cost=0.00..188696.44 rows=3333372 
width=20)
                      Filter: (a >= $1)


No parallelism again. Looks like that Filter: (a >= $1)  breaks parallelism



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel sec scan in plpgsql

Reply via email to