Hello,

We are developing a tool called sqlfuzz for automatically finding performance 
regressions in PostgreSQL. sqlfuzz performs mutational fuzzing to generate SQL 
queries that take more time to execute on the latest version of PostgreSQL 
compared to prior versions. We hope that these queries would help further 
increase the utility of the regression test suite.

We would greatly appreciate feedback from the community regarding the queries 
found by the tool so far. We have already incorporated prior feedback from the 
community in the latest version of sqlfuzz.

We are sharing four SQL queries that exhibit regressions in this report. These 
queries have an average size of 245 bytes. Here’s an illustrative query:

EXAMPLE:

select distinct
  ref_0.i_im_id as c0,
  ref_1.ol_dist_info as c1
from
  public.item as ref_0
    right join public.order_line as ref_1
    on (cast(null as "numeric") <> 1)

Time taken on PostgreSQL v9.5: 15.1 (seconds)
Time taken on PostgreSQL v11: 64.8 (seconds)

Here are the steps for reproducing our observations:

[Test environment]
* Ubuntu 16.04 machine "Linux sludge 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 
12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux"
* Postgres installed via APT package manager
* Database: TPC-C benchmark (with three scale factors)

[Setup Test Environment]

1. Install PostgreSQL v11 and v9.5

    $ wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | 
sudo apt-key add -
    $ sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ 
xenial-pgdg main" > /etc/apt/sources.list.d/pgdg_xenial.list'
    $ sudo apt update
    $ sudo sudo apt-get install postgresql-11
    $ sudo sudo apt-get install postgresql-9.5

* set password of postgres user (with your desirable one)
    $ sudo passwd postgres

* change port number of two version of DBs
    $ sudo vim /etc/postgresql/11/main/postgresql.conf
     => change "port = ????" ==> "port = 5435"
    $ sudo vim /etc/postgresql/9.5/main/postgresql.conf
     => change "port = ????" ==> "port = 5432"

* restart DB
    $ sudo pg_ctlcluster 9.5 main restart
    $ sudo pg_ctlcluster 11 main restart

    => check you have opened ports at 5432 and 5435

* setup privilege
    $ sudo -i -u postgres
    $ psql -p 5432 (then copy and run the below query to setup password)
        # ALTER USER postgres PASSWORD 'mysecretpassword';
        # \q

    $ psql -p 5435 (then copy and run the below query to setup password)
        # ALTER USER postgres PASSWORD 'mysecretpassword';
        # \q

    $ exit (to original user)
    $ sudo -u postgres createuser -s $(whoami); createdb $(whoami)

* test your setting by showing DB version
    < old version >
    $ PGPASSWORD=mysecretpassword psql -h 127.0.0.1 -p 5432 -U postgres -c 
"select version();"

    < new version >
    $ PGPASSWORD=mysecretpassword psql -h 127.0.0.1 -p 5435 -U postgres -c 
"select version();"


2. Set up TPC-C test benchmark

* Download TPC-C (scale factor of 1) and extract it
    $ wget https://gts3.org/~/jjung/tpcc/tpcc1.tar.gz
    $ wget https://gts3.org/~/jjung/tpcc/tpcc10.tar.gz
    $ wget https://gts3.org/~/jjung/tpcc/tpcc50.tar.gz

    $ tar xzvf tpcc1.tar.gz
    $ tar xzvf tpcc10.tar.gz
    $ tar xzvf tpcc50.tar.gz

* Create DB (example of TPC-C scale factor 1)
    $ PGPASSWORD=mysecretpassword psql -h 127.0.0.1 -p 5432 -U postgres -c 
"create database test_bd;"
    $ PGPASSWORD=mysecretpassword psql -h 127.0.0.1 -p 5435 -U postgres -c 
"create database test_bd;"

* Import benchmark (example of TPC-C scale factor 1)
    $ PGPASSWORD=mysecretpassword psql -h 127.0.0.1 -p 5432 -U postgres -d 
test_bd -f ./tpcc_host.pgsql
    $ PGPASSWORD=mysecretpassword psql -h 127.0.0.1 -p 5435 -U postgres -d 
test_bd -f ./tpcc_host.pgsql

* (Optional) Deleting databases
    $ PGPASSWORD=mysecretpassword psql -h 127.0.0.1 -p 5432 -U postgres -c 
"drop database test_bd;"
    $ PGPASSWORD=mysecretpassword psql -h 127.0.0.1 -p 5435 -U postgres -c 
"drop database test_bd;"

3. Test SQL queries that exhibit performance regressions

We are sharing four queries in this report. We vary the scale-factor of the 
TPC-C benchmark from 1 through 50 to demonstrate that the performance 
regressions are more prominent on larger databases.

* Download queries
    $ wget https://gts3.org/~/jjung/tpcc/case.tar.gz
    $ tar xzvf case.tar.gz

* Execute the queries
    $ PGPASSWORD=mysecretpassword psql -t -A -F"," -h 127.0.0.1 -p 5432 -U 
postgres -d test_bd -f case-1.sql
    $ PGPASSWORD=mysecretpassword psql -t -A -F"," -h 127.0.0.1 -p 5435 -U 
postgres -d test_bd -f case-1.sql

Here’s the time taken to execute four SQL queries on old (v9.5) and newer 
version (v11) of PostgreSQL (in milliseconds):

+----------------------+--------+---------+---------+
|                      | scale1 | scale10 | scale50 |
+----------------------+--------+---------+---------+
| Case-1 (v9.5)        |     28 |     273 |    1459 |
| Case-1 (v11)         |     90 |     854 |    4818 |
+----------------------+--------+---------+---------+
| Case-2 (v9.5)        |    229 |    2793 |   15096 |
| Case-2 (v11)         |    838 |   11276 |   64808 |
+----------------------+--------+---------+---------+
| Case-3 (v9.5)        |     28 |     248 |    1231 |
| Case-3 (v11)         |     74 |     677 |    3345 |
+----------------------+--------+---------+---------+
| Case-4 (v9.5)        |   0.03 |    0.03 |    0.04 |
| Case-4 (v11)         |   0.04 |    0.04 |     632 |
+----------------------+--------+---------+---------+

1) CASE-1 shares same plan but shows different execution time. Execution time 
increases on larger databases.

2) CASE-2 shows different cost estimation and it causes performance regression. 
Execution time increases on larger databases.

3) CASE-3 uses different executor. Newer version (PG11.1) uses parallel seq 
scan but shows slower execution time. Execution time increases on larger 
databases.

4) CASE-4 shows performance regression only in TPC-C with scale factor 50. 
Instead of using index scan, newer version (PG11.1) applies filter, thereby 
increasing the time taken to execute the query.

We would greatly appreciate feedback from the community regarding these queries 
and are looking forward to improving the tool based on the community’s feedback.

Thanks.


Jinho Jung

Reply via email to