Hi Bharath,
I'm trying to take some performance measurements on you patch v23.
But when I started, I found an issue about the tuples unbalance distribution
among workers(99% tuples read by one worker) under specified case which lead
the "parallel select" part makes no performance gain.
Then I find it's not introduced by your patch, because it's also happening in
master(HEAD). But I don't know how to deal with it , so I put it here to see if
anybody know what's going wrong with this or have good ideas to deal this issue.
Here are the conditions to produce the issue:
1. high CPU spec environment(say above 20 processors). In smaller CPU, it also
happen but not so obvious(40% tuples on one worker in my tests).
2. query plan is "serial insert + parallel select", I have reproduce this
behavior in (CTAS, Select into, insert into select).
3. select part needs to query large data size(e.g. query 100 million from 200
million).
According to above, IMHO, I guess it may be caused by the leader write rate
can't catch the worker read rate, then the tuples of one worker blocked in the
queue, become more and more.
Below is my test info:
1. test spec environment
CentOS 8.2, 128G RAM, 40 processors, disk SAS
2. test data prepare
create table x(a int, b int, c int);
create index on x(a);
insert into x select
generate_series(1,200000000),floor(random()*(10001-1)+1),floor(random()*(10001-1)+1);
3. test execute results
*Patched CTAS*: please look at worker 2, 99% tuples read by it.
explain analyze verbose create table test(a,b,c) as select
a,floor(random()*(10001-1)+1),c from x where b%2=0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..1942082.77 rows=1000001 width=16) (actual
time=0.203..24023.686 rows=100006268 loops=1)
Output: a, floor(((random() * '10000'::double precision) + '1'::double
precision)), c
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on public.x (cost=0.00..1831082.66 rows=250000
width=8) (actual time=0.016..4367.035 rows=20001254 loops=5)
Output: a, c
Filter: ((x.b % 2) = 0)
Rows Removed by Filter: 19998746
Worker 0: actual time=0.016..19.265 rows=94592 loops=1
Worker 1: actual time=0.027..31.422 rows=94574 loops=1
Worker 2: actual time=0.014..21744.549 rows=99627749 loops=1
Worker 3: actual time=0.015..19.347 rows=94586 loops=1 Planning
Time: 0.098 ms Execution Time: 91054.828 ms
*Non-patched CTAS*: please look at worker 0, also 99% tuples read by it.
explain analyze verbose create table test(a,b,c) as select
a,floor(random()*(10001-1)+1),c from x where b%2=0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..1942082.77 rows=1000001 width=16) (actual
time=0.283..19216.157 rows=100003148 loops=1)
Output: a, floor(((random() * '10000'::double precision) + '1'::double
precision)), c
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on public.x (cost=0.00..1831082.66 rows=250000
width=8) (actual time=0.020..4380.360 rows=20000630 loops=5)
Output: a, c
Filter: ((x.b % 2) = 0)
Rows Removed by Filter: 19999370
Worker 0: actual time=0.013..21805.647 rows=99624833 loops=1
Worker 1: actual time=0.016..19.790 rows=94398 loops=1
Worker 2: actual time=0.013..35.340 rows=94423 loops=1
Worker 3: actual time=0.035..19.849 rows=94679 loops=1 Planning
Time: 0.083 ms Execution Time: 91151.097 ms
I'm still working on the performance tests on your patch, if I make some
progress, I will post my results here.
Regards,
Tang