Performance regressions found using sqlfuzz

Jinho Jung Wed, 06 Mar 2019 13:42:49 -0800

Hello,

Thanks for the feedback so far! Continue with the previous report, we
sharing another four interesting cases that SQLFuzz discovered.
(previous discussion:
https://www.postgresql.org/message-id/flat/BN6PR07MB3409EE6CAAF8CCF43820AFB9EE670%40BN6PR07MB3409.namprd07.prod.outlook.com#acc68f0fbd8f0b207e162d2dd401d3e8
)


Here’s the time taken to execute four SQL queries on old (v9.5.16) and
newer version (v11.2) of PostgreSQL (in milliseconds):

+----------------------+--------+---------+---------+---------+
|                      | scale1 | scale10 | scale50 | scale300|
+----------------------+--------+---------+---------+---------+
| Case-5 (v9.5.16)     |     39 |     183 |    1459 |   11125 |
| Case-5 (v11.2)       |     73 |     620 |    4818 |   16956 |
+----------------------+--------+---------+---------+---------+
| Case-6 (v9.5.16)     |     46 |     329 |   15096 |   10721 |
| Case-6 (v11.2)       |     81 |     671 |   64808 |   26919 |
+----------------------+--------+---------+---------+---------+
| Case-7 (v9.5.16)     |     19 |       X |       X |       X |
| Case-7 (v11.2)       |     46 |       X |       X |       X |
+----------------------+--------+---------+---------+---------+
| Case-8 (v9.5.16)     |    215 |    2108 |   10460 |   64959 |
| Case-8 (v11.2)       |    449 |    3997 |   20246 |  130595 |
+----------------------+--------+---------+---------+---------+

For each regression, we share:
1) the associated query,
2) the commit that activated it,
3) our high-level analysis, and
4) query execution plans in old and new versions of PostgreSQL.

All these regressions are observed on the latest version. (v11.2 and
v9.5.16)

* You can download the queries at:
https://gts3.org/~/jjung/tpcc/case2.tar.gz

* You can reproduce the result by using the same setup that we described
before:
https://www.postgresql.org/message-id/BN6PR07MB3409922471073F2B619A8CA4EE640%40BN6PR07MB3409.namprd07.prod.outlook.com

 - As Andrew mentioned before, we increased default work_mem to 128MB

Best regards,
Jinho Jung

#### QUERY 5

EXPLAIN ANALYZE
select
  ref_0.c_zip as c0
from
  public.customer as ref_0
where EXISTS (
  select
    ref_1.ol_d_id as c10
  from
    public.order_line as ref_1
  where (ref_1.ol_o_id <> ref_0.c_d_id)
)

- Commit : 7ca25b7

- Our analysis: We believe newer version is slow when the number of rows in
the filter is small.

- Query execution plans:

[Old version]
Nested Loop Semi Join  (cost=0.00..15317063263550.52 rows=1 width=10)
(actual time=0.019..10888.266 rows=9000000 loops=1)
  Join Filter: (ref_1.ol_o_id <> ref_0.c_d_id)
  Rows Removed by Join Filter: 11700000
  ->  Seq Scan on customer ref_0  (cost=0.00..770327.00 rows=9000000
width=14) (actual time=0.008..7541.944 rows=9000000 loops=1)
  ->  Materialize  (cost=0.00..2813223.52 rows=90017568 width=4) (actual
time=0.000..0.000 rows=2 loops=9000000)
        ->  Seq Scan on order_line ref_1  (cost=0.00..2011503.68
rows=90017568 width=4) (actual time=0.005..0.007 rows=14 loops=1)
Planning time: 0.401 ms
Execution time: 11125.538 ms


[New version]
Nested Loop Semi Join  (cost=0.00..3409260.89 rows=9000000 width=10)
(actual time=0.033..16732.988 rows=9000000 loops=1)
  Join Filter: (ref_1.ol_o_id <> ref_0.c_d_id)
  Rows Removed by Join Filter: 11700000
  ->  Seq Scan on customer ref_0  (cost=0.00..770327.00 rows=9000000
width=14) (actual time=0.017..2113.336 rows=9000000 loops=1)
  ->  Seq Scan on order_line ref_1  (cost=0.00..2011503.68 rows=90017568
width=4) (actual time=0.001..0.001 rows=2 loops=9000000)
Planning Time: 0.615 ms
Execution Time: 16956.115 ms


##### QUERY 6

select distinct
  ref_0.h_data as c0,
  ref_0.h_c_id as c1
from
  public.history as ref_0
    left join public.item as ref_1
    on (ref_1.i_im_id < -1)
where ref_1.i_price is NULL

- Our analysis: We think that the 'merge sort' makes slow execution. We are
wondering why newer version applies external merge sort in this case.

- Commit: 3fc6e2d (big patch)

- Query execution plans:

[Old version]
HashAggregate  (cost=1312274.26..1312274.27 rows=1 width=21) (actual
time=7288.727..10443.586 rows=9000000 loops=1)
  Group Key: ref_0.h_data, ref_0.h_c_id
  ->  Nested Loop Left Join  (cost=0.00..1312274.26 rows=1 width=21)
(actual time=26.965..2463.231 rows=9000000 loops=1)
        Filter: (ref_1.i_price IS NULL)
        ->  Seq Scan on history ref_0  (cost=0.00..184795.61 rows=8999661
width=21) (actual time=0.347..795.936 rows=9000000 loops=1)
        ->  Materialize  (cost=0.00..2521.05 rows=10 width=6) (actual
time=0.000..0.000 rows=0 loops=9000000)
              ->  Seq Scan on item ref_1  (cost=0.00..2521.00 rows=10
width=6) (actual time=26.610..26.610 rows=0 loops=1)
                    Filter: (i_im_id < '-1'::integer)
                    Rows Removed by Filter: 100000
Planning time: 1.538 ms
Execution time: 10721.259 ms


[New version]
Unique  (cost=1312334.34..1312334.35 rows=1 width=21) (actual
time=21444.459..26651.868 rows=9000000 loops=1)
  ->  Sort  (cost=1312334.34..1312334.35 rows=1 width=21) (actual
time=21444.457..25629.389 rows=9000000 loops=1)
        Sort Key: ref_0.h_data, ref_0.h_c_id
        Sort Method: external merge  Disk: 285384kB
        ->  Nested Loop Left Join  (cost=0.00..1312334.33 rows=1 width=21)
(actual time=21.328..2409.302 rows=9000000 loops=1)
              Filter: (ref_1.i_price IS NULL)
              ->  Seq Scan on history ref_0  (cost=0.00..184800.06
rows=9000106 width=21) (actual time=0.320..734.376 rows=9000000 loops=1)
              ->  Materialize  (cost=0.00..2521.05 rows=10 width=6) (actual
time=0.000..0.000 rows=0 loops=9000000)
                    ->  Seq Scan on item ref_1  (cost=0.00..2521.00 rows=10
width=6) (actual time=21.000..21.001 rows=0 loops=1)
                          Filter: (i_im_id < '-1'::integer)
                          Rows Removed by Filter: 100000
Planning Time: 1.426 ms
Execution Time: 26919.635 ms


##### QUERY 7

select
  ref_0.c_id as c0
from
  public.customer as ref_0
where EXISTS (
  select
      ref_0.c_city as c0
  from
    public.order_line as ref_1
    left join public.new_order as ref_2
    on (ref_1.ol_supply_w_id = ref_2.no_w_id)
  where (ref_1.ol_delivery_d > ref_0.c_since)
)

- Our analysis : Parallel execution seems a problem. We also want to ask
whether only one worker is intended behavior of Postgres because we think
parallel execution with less than two workers is not parallel.

- Another interesting finding: this query cannot be finished within one day
if we incrase the size of DB (e.g., from scale factor 1 to scale factor
10/50/300).

- Commit: 16be2fd

- Query execution plans:

[Old version]
Nested Loop Semi Join  (cost=224.43..910152608046.08 rows=10000 width=4)
(actual time=2.619..18.042 rows=30000 loops=1)
  Join Filter: (ref_1.ol_delivery_d > ref_0.c_since)
  ->  Seq Scan on customer ref_0  (cost=0.00..2569.00 rows=30000 width=12)
(actual time=0.003..4.530 rows=30000 loops=1)
  ->  Materialize  (cost=224.43..48521498.97 rows=2406886812 width=8)
(actual time=0.000..0.000 rows=1 loops=30000)
        ->  Hash Left Join  (cost=224.43..27085162.91 rows=2406886812
width=8) (actual time=2.612..2.612 rows=1 loops=1)
              Hash Cond: (ref_1.ol_supply_w_id = ref_2.no_w_id)
              ->  Seq Scan on order_line ref_1  (cost=0.00..6711.48
rows=300148 width=12) (actual time=0.002..0.002 rows=1 loops=1)
              ->  Hash  (cost=124.19..124.19 rows=8019 width=4) (actual
time=2.571..2.571 rows=8019 loops=1)
                    Buckets: 8192  Batches: 1  Memory Usage: 346kB
                    ->  Seq Scan on new_order ref_2  (cost=0.00..124.19
rows=8019 width=4) (actual time=0.174..1.539 rows=8019 loops=1)
Planning time: 0.605 ms
Execution time: 19.045 ms

[New version]
Gather  (cost=1224.43..672617098015.10 rows=10000 width=4) (actual
time=3.099..45.077 rows=30000 loops=1)
  Workers Planned: 1
  Workers Launched: 1
  ->  Nested Loop Semi Join  (cost=224.43..672617096015.10 rows=5882
width=4) (actual time=2.307..38.258 rows=15000 loops=2)
        Join Filter: (ref_1.ol_delivery_d > ref_0.c_since)
        ->  Parallel Seq Scan on customer ref_0  (cost=0.00..2445.47
rows=17647 width=12) (actual time=0.003..2.668 rows=15000 loops=2)
        ->  Hash Left Join  (cost=224.43..27085162.91 rows=2406886812
width=8) (actual time=0.002..0.002 rows=1 loops=30000)
              Hash Cond: (ref_1.ol_supply_w_id = ref_2.no_w_id)
              ->  Seq Scan on order_line ref_1  (cost=0.00..6711.48
rows=300148 width=12) (actual time=0.001..0.001 rows=1 loops=30000)
              ->  Hash  (cost=124.19..124.19 rows=8019 width=4) (actual
time=2.211..2.211 rows=8019 loops=2)
                    Buckets: 8192  Batches: 1  Memory Usage: 346kB
                    ->  Seq Scan on new_order ref_2  (cost=0.00..124.19
rows=8019 width=4) (actual time=0.086..1.179 rows=8019 loops=2)
Planning Time: 0.611 ms
Execution Time: 46.195 ms


##### QUERY 8

select
  ref_0.ol_d_id as c0
from
  public.order_line as ref_0
  left join (
    select
      ref_1.ol_supply_w_id as c0,
      ref_1.ol_d_id as c2
    from
      public.order_line as ref_1
    where ref_1.ol_o_id < 1
  ) as subq_0
  on (subq_0.c2 = ref_0.ol_o_id)
where EXISTS (
  select
    ref_2.o_ol_cnt as c0
  from
    public.oorder as ref_2
  where
    nullif(ref_2.o_d_id, subq_0.c0) is not NULL
);

- Commit: 0c2070c

- Our analysis : We are not sure about the root cause of this regression.
This might have to do with parallel execution.

- Query execution plans:

[Old version]
Nested Loop Semi Join  (cost=3488355.85..210034755085.67 rows=245615461
width=4) (actual time=32830.372..62745.966 rows=90017507 loops=1)
  Join Filter: (NULLIF(ref_2.o_d_id, ref_1.ol_supply_w_id) IS NOT NULL)
  ->  Hash Right Join  (cost=3488355.85..9115461.16 rows=246849710 width=8)
(actual time=32829.980..48232.002 rows=90017507 loops=1)
        Hash Cond: (ref_1.ol_d_id = ref_0.ol_o_id)
        ->  Index Scan using order_line_pkey on order_line ref_1
(cost=0.57..2076988.89 rows=8461 width=8) (actual time=5346.547..5346.547
rows=0 loops=1)
              Index Cond: (ol_o_id < 1)
        ->  Hash  (cost=2011503.68..2011503.68 rows=90017568 width=8)
(actual time=27466.438..27466.438 rows=90017507 loops=1)
              Buckets: 4194304  Batches: 64  Memory Usage: 80798kB
              ->  Seq Scan on order_line ref_0  (cost=0.00..2011503.68
rows=90017568 width=8) (actual time=0.006..16821.719 rows=90017507 loops=1)
  ->  Materialize  (cost=0.00..245157.00 rows=9000000 width=4) (actual
time=0.000..0.000 rows=1 loops=90017507)
        ->  Seq Scan on oorder ref_2  (cost=0.00..165000.00 rows=9000000
width=4) (actual time=0.377..0.377 rows=1 loops=1)
Planning time: 3.933 ms
Execution time: 64959.231 ms

[New version]
Gather  (cost=8797140.37..153823591859.11 rows=264667229 width=4) (actual
time=27917.270..128050.059 rows=90017507 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Nested Loop Semi Join  (cost=8796140.37..153797124136.21
rows=110278012 width=4) (actual time=27911.016..109588.362 rows=30005836
loops=3)
        Join Filter: (NULLIF(ref_2.o_d_id, ref_1.ol_supply_w_id) IS NOT
NULL)
        ->  Merge Left Join  (cost=8796140.37..10646159.56 rows=110832173
width=8) (actual time=27910.620..33777.444 rows=30005836 loops=3)
              Merge Cond: (ref_0.ol_o_id = ref_1.ol_d_id)
              ->  Sort  (cost=6717742.99..6811511.29 rows=37507320 width=8)
(actual time=22427.383..25862.373 rows=30005836 loops=3)
                    Sort Key: ref_0.ol_o_id
                    Sort Method: external merge  Disk: 534352kB
                    Worker 0:  Sort Method: external merge  Disk: 509888kB
                    Worker 1:  Sort Method: external merge  Disk: 541512kB
                    ->  Parallel Seq Scan on order_line ref_0
(cost=0.00..1486401.20 rows=37507320 width=8) (actual time=0.025..14349.178
rows=30005836 loops=3)
              ->  Sort  (cost=2078397.38..2078419.66 rows=8912 width=8)
(actual time=5483.221..5483.221 rows=0 loops=3)
                    Sort Key: ref_1.ol_d_id
                    Sort Method: quicksort  Memory: 25kB
                    Worker 0:  Sort Method: quicksort  Memory: 25kB
                    Worker 1:  Sort Method: quicksort  Memory: 25kB
                    ->  Index Scan using order_line_pkey on order_line
ref_1  (cost=0.57..2077812.68 rows=8912 width=8) (actual
time=5483.203..5483.203 rows=0 loops=3)
                          Index Cond: (ol_o_id < 1)
        ->  Seq Scan on oorder ref_2  (cost=0.00..165000.00 rows=9000000
width=4) (actual time=0.002..0.002 rows=1 loops=90017507)
Planning Time: 3.952 ms
Execution Time: 130595.467 ms


===================================
Following up with previous question
===================================

on-going discussion:
https://www.postgresql.org/message-id/flat/BN6PR07MB3409EE6CAAF8CCF43820AFB9EE670%40BN6PR07MB3409.namprd07.prod.outlook.com#acc68f0fbd8f0b207e162d2dd401d3e8


Hello Andres,

Could you please share your thoughts on QUERY 3?

The performance impact of this regression increases *linearly* on larger
databases. We concur with Andrew in that this is related to the lack of a
Materialize node and mis-costing of the Nested Loop Anti-Join.

We found more than 20 regressions related to this commit. We have shared
two illustrative examples (QUERIES 3A and 3B) below.

- Commit: 77cd477 (Enable parallel query by default.)

- Summary: Execution Time (milliseconds)

When we increased the scale-factor of TPC-C to 300 (~30 GB), this query ran
three times slower on v11 (24 seconds) in comparison to v9.5 (7 seconds).
We also found more than 15 regressions related to the same commit and share
a couple of them below.

+-----------------------+--------+---------+---------+-----------+
|                       | scale1 | scale10 | scale50 | scale 300 |
+-----------------------+--------+---------+---------+-----------+
| Query 3 (v9.5)        |     28 |     248 |    1231 |      7265 |
| Query 3 (v11)         |     74 |     677 |    3345 |     24581 |
+-----------------------+--------+---------+---------+-----------+
| Query 3A (v9.5)       |     88 |     937 |    4721 |     27241 |
| Query 3A (v11)        |    288 |    2822 |   13838 |     85081 |
+-----------------------+--------+---------+---------+-----------+
| Query 3B (v9.5)       |    101 |     934 |    4824 |     29363 |
| Query 3B (v11)        |    200 |    2331 |   12327 |     74110 |
+-----------------------+--------+---------+---------+-----------+


###### QUERY 3:

select
     cast(ref_1.ol_i_id as int4) as c0
from
     public.stock as ref_0
       left join public.order_line as ref_1
       on (ref_1.ol_number is not null)
where ref_1.ol_number is null


###### QUERY 3A:

select
  ref_0.ol_delivery_d as c1
from
  public.order_line as ref_0
where EXISTS (
  select
    ref_1.i_im_id as c0
    from
    public.item as ref_1
    where ref_0.ol_d_id <=  ref_1.i_im_id
  )

 Execution plan:

[OLD version]
Nested Loop Semi Join  (cost=0.00..90020417940.08 rows=30005835 width=8)
(actual time=0.034..24981.895 rows=90017507 loops=1)
  Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)
  ->  Seq Scan on order_line ref_0  (cost=0.00..2011503.04 rows=90017504
width=12) (actual time=0.022..7145.811 rows=90017507 loops=1)
  ->  Materialize  (cost=0.00..2771.00 rows=100000 width=4) (actual
time=0.000..0.000 rows=1 loops=90017507)
        ->  Seq Scan on item ref_1  (cost=0.00..2271.00 rows=100000
width=4) (actual time=0.006..0.006 rows=1 loops=1)

Planning time: 0.290 ms
Execution time: 27241.239 ms

[NEW version]
Gather  (cost=1000.00..88047487498.82 rows=30005835 width=8) (actual
time=0.265..82355.289 rows=90017507 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Nested Loop Semi Join  (cost=0.00..88044485915.32 rows=12502431
width=8) (actual time=0.033..68529.259 rows=30005836 loops=3)
        Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)
        ->  Parallel Seq Scan on order_line ref_0  (cost=0.00..1486400.93
rows=37507293 width=12) (actual time=0.023..2789.901 rows=30005836 loops=3)
        ->  Seq Scan on item ref_1  (cost=0.00..2271.00 rows=100000
width=4) (actual time=0.001..0.001 rows=1 loops=90017507)

Planning Time: 0.319 ms
Execution Time: 85081.158 ms


###### QUERY 3B:


select
  ref_0.ol_i_id as c0
from
  public.order_line as ref_0
where EXISTS (
  select
    ref_0.ol_delivery_d as c0
    from
    public.order_line as ref_1
    where ref_1.ol_d_id <= cast(nullif(ref_1.ol_o_id, ref_0.ol_i_id) as
int4))

Execution plan:

[OLD version]
Nested Loop Semi Join  (cost=0.00..115638730740936.53 rows=30005835
width=4) (actual time=0.017..27009.302 rows=90017507 loops=1)
  Join Filter: (ref_1.ol_d_id <= NULLIF(ref_1.ol_o_id, ref_0.ol_i_id))
  Rows Removed by Join Filter: 11557
  ->  Seq Scan on order_line ref_0  (cost=0.00..2011503.04 rows=90017504
width=4) (actual time=0.009..7199.540 rows=90017507 loops=1)
  ->  Materialize  (cost=0.00..2813221.56 rows=90017504 width=8) (actual
time=0.000..0.000 rows=1 loops=90017507)
        ->  Seq Scan on order_line ref_1  (cost=0.00..2011503.04
rows=90017504 width=8) (actual time=0.001..0.002 rows=14 loops=1)

Planning time: 0.252 ms
Execution time: 29363.737 ms

[NEW version]
Gather  (cost=1000.00..84060490326155.39 rows=30005835 width=4) (actual
time=0.272..71712.491 rows=90017507 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Nested Loop Semi Join  (cost=0.00..84060487324571.89 rows=12502431
width=4) (actual time=0.046..60153.472 rows=30005836 loops=3)
        Join Filter: (ref_1.ol_d_id <= NULLIF(ref_1.ol_o_id, ref_0.ol_i_id))
        Rows Removed by Join Filter: 1717
        ->  Parallel Seq Scan on order_line ref_0  (cost=0.00..1486400.93
rows=37507293 width=4) (actual time=0.023..2819.361 rows=30005836 loops=3)
        ->  Seq Scan on order_line ref_1  (cost=0.00..2011503.04
rows=90017504 width=8) (actual time=0.001..0.001 rows=1 loops=90017507)

Planning Time: 0.334 ms
Execution Time: 74110.942 ms

Performance regressions found using sqlfuzz

Reply via email to