> > > It did have performance gain, but I think it's not huge enough to
> > > ignore the extra's index cost.
> > > What do you think ?
> >
> > Yes... as you suspect, I'm afraid the benefit from parallel bitmap
> > scan may not compensate for the loss of the parallel insert operation.
> >
> > The loss is probably due to 1) more index page splits, 2) more buffer
> > writes (table and index), and 3) internal locks for things such as
> > relation extension and page content protection. To investigate 3), we
> > should want something like [1], which tells us the wait event
> > statistics (wait count and time for each wait event) per session or
> > across the instance like Oracke, MySQL and EDB provides. I want to
> continue this in the near future.
>
> What would the result look like if you turn off
> parallel_leader_participation? If the leader is freed from
> reading/writing the table and index, the index page splits and internal
> lock contention may decrease enough to recover part of the loss.
>
> https://www.postgresql.org/docs/devel/parallel-plans.html
>
> "In a parallel bitmap heap scan, one process is chosen as the leader. That
> process performs a scan of one or more indexes and builds a bitmap indicating
> which table blocks need to be visited. These blocks are then divided among
> the cooperating processes as in a parallel sequential scan. In other words,
> the heap scan is performed in parallel, but the underlying index scan is
> not."
If I disable parallel_leader_participation.
For max_parallel_workers_per_gather = 4, It still have performance degradation.
For max_parallel_workers_per_gather = 2, the performance degradation will not
happen in most of the case.
There is sometimes a noise(performance degradation), but most of result(about
80%) is good.
Best regards,
houzj
postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan
select a from x where a<80000 or (a%2=0 and a>199900000);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=4272.20..1269111.15 rows=79918 width=0) (actual
time=381.764..382.715 rows=0 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=407094 read=4 dirtied=1085 written=1158
WAL: records=260498 bytes=17019359
-> Insert on public.testscan (cost=3272.20..1260119.35 rows=0 width=0)
(actual time=378.227..378.229 rows=0 loops=5)
Buffers: shared hit=407094 read=4 dirtied=1085 written=1158
WAL: records=260498 bytes=17019359
Worker 0: actual time=376.638..376.640 rows=0 loops=1
Buffers: shared hit=88281 dirtied=236 written=337
WAL: records=56167 bytes=3674994
Worker 1: actual time=377.889..377.892 rows=0 loops=1
Buffers: shared hit=81231 dirtied=213 written=99
WAL: records=52175 bytes=3386885
Worker 2: actual time=377.388..377.389 rows=0 loops=1
Buffers: shared hit=82544 dirtied=232 written=279
WAL: records=52469 bytes=3443843
Worker 3: actual time=377.733..377.734 rows=0 loops=1
Buffers: shared hit=87728 dirtied=225 written=182
WAL: records=56307 bytes=3660758
-> Parallel Bitmap Heap Scan on public.x (cost=3272.20..1260119.35
rows=19980 width=8) (actual time=5.832..14.787 rows=26000 loops=5)
Output: x.a, NULL::integer
Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a >
199900000)))
Rows Removed by Filter: 10000
Heap Blocks: exact=167
Buffers: shared hit=1475
Worker 0: actual time=5.203..14.921 rows=28028 loops=1
Buffers: shared hit=209
Worker 1: actual time=5.165..14.403 rows=26039 loops=1
Buffers: shared hit=196
Worker 2: actual time=5.188..14.284 rows=26177 loops=1
Buffers: shared hit=195
Worker 3: actual time=5.151..14.760 rows=28104 loops=1
Buffers: shared hit=208
-> BitmapOr (cost=3272.20..3272.20 rows=174813 width=0)
(actual time=8.288..8.290 rows=0 loops=1)
Buffers: shared hit=500
-> Bitmap Index Scan on x_a_idx (cost=0.00..1468.38
rows=79441 width=0) (actual time=3.681..3.681 rows=79999 loops=1)
Index Cond: (x.a < 80000)
Buffers: shared hit=222
-> Bitmap Index Scan on x_a_idx (cost=0.00..1763.86
rows=95372 width=0) (actual time=4.605..4.605 rows=100000 loops=1)
Index Cond: (x.a > 199900000)
Buffers: shared hit=278
Planning:
Buffers: shared hit=19
Planning Time: 0.173 ms
Execution Time: 382.776 ms
(47 rows)
Execuse me, tsunakawa san, sorry to bother you,
May I ask you some questions ?
>[1]
>Add accumulated statistics for wait event
>https://commitfest.postgresql.org/28/2332/
I remembered FEP has such feature, is this the same as FEP.
Ok. thanks a lot for the explanation.
BTW, I am sorry the tablename I tested today is different from yesterday,
It may confused you, I have check that the result of both is the same.
postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan
select a from x where a<80000 or (a%2=0 and a>199900000);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
-
Gather (cost=4258.59..2063373.72 rows=81338 width=0) (actual
time=0.203..310.184 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=393777 read=4 dirtied=939 written=953
WAL: records=260356 bytes=16281081
-> Insert on public.testscan (cost=3258.59..2054239.92 rows=0 width=0)
(actual time=305.474..305.476 rows=0 loops=2)
Buffers: shared hit=393777 read=4 dirtied=939 written=953
WAL: records=260356 bytes=16281081
Worker 0: actual time=305.378..305.379 rows=0 loops=1
Buffers: shared hit=196680 read=3 dirtied=462 written=478
WAL: records=130155 bytes=8129880
Worker 1: actual time=305.570..305.572 rows=0 loops=1
Buffers: shared hit=197097 read=1 dirtied=477 written=475
WAL: records=130201 bytes=8151201
-> Parallel Bitmap Heap Scan on public.x (cost=3258.59..2054239.92
rows=40669 width=8) (actual time=8.883..27.998 rows=65000 loops=2)
Output: x.a, NULL::integer
Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a >
199900000)))
Rows Removed by Filter: 25000
Buffers: shared hit=1476
Worker 0: actual time=8.796..27.982 rows=64992 loops=1
Buffers: shared hit=487
Worker 1: actual time=8.971..28.014 rows=65007 loops=1
Buffers: shared hit=989
-> BitmapOr (cost=3258.59..3258.59 rows=173971 width=0)
(actual time=8.810..8.811 rows=0 loops=1)
Buffers: shared hit=501
Worker 1: actual time=8.810..8.811 rows=0 loops=1
Buffers: shared hit=501
-> Bitmap Index Scan on x_a_idx (cost=0.00..1495.11
rows=80872 width=0) (actual time=3.832..3.832 rows=79999 loops=1)
Index Cond: (x.a < 80000)
Buffers: shared hit=223
Worker 1: actual time=3.832..3.832 rows=79999
loops=1
Buffers: shared hit=223
-> Bitmap Index Scan on x_a_idx (cost=0.00..1722.81
rows=93099 width=0) (actual time=4.976..4.976 rows=100000 loops=1)
Index Cond: (x.a > 199900000)
Buffers: shared hit=278
Worker 1: actual time=4.976..4.976 rows=100000
loops=1
Buffers: shared hit=278
Planning:
Buffers: shared hit=19
Planning Time: 0.146 ms
Execution Time: 310.257 ms
(42 rows)
postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan
select a from x where a<80000 or (a%2=0 and a>199900000);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Insert on public.testscan (cost=3272.20..3652841.26 rows=0 width=0) (actual
time=360.474..360.476 rows=0 loops=1)
Buffers: shared hit=392569 read=3 dirtied=934 written=933
WAL: records=260354 bytes=16259841
-> Bitmap Heap Scan on public.x (cost=3272.20..3652841.26 rows=79918
width=8) (actual time=8.096..41.005 rows=129999 loops=1)
Output: x.a, NULL::integer
Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000)))
Rows Removed by Filter: 50000
Heap Blocks: exact=975
Buffers: shared hit=1475
-> BitmapOr (cost=3272.20..3272.20 rows=174813 width=0) (actual
time=7.975..7.976 rows=0 loops=1)
Buffers: shared hit=500
-> Bitmap Index Scan on x_a_idx (cost=0.00..1468.38 rows=79441
width=0) (actual time=3.469..3.470 rows=79999 loops=1)
Index Cond: (x.a < 80000)
Buffers: shared hit=222
-> Bitmap Index Scan on x_a_idx (cost=0.00..1763.86 rows=95372
width=0) (actual time=4.499..4.499 rows=100000 loops=1)
Index Cond: (x.a > 199900000)
Buffers: shared hit=278
Planning:
Buffers: shared hit=10
Planning Time: 0.126 ms
Execution Time: 360.547 ms
(22 rows)