Potential performance issues

2021-03-01 Thread Jung, Jinho
# Performance issues discovered from differential test

Hello. We are studying DBMS from GeorgiaTech and reporting interesting queries 
that potentially show performance problems.

To discover such cases, we used the following procedures:

* Install four DBMSs with the latest version (PostgreSQL, SQLite, MySQL, 
CockroachDB)
* Import TPCC-C benchmark for each DBMS
* Generate random query (and translate the query to handle different dialects)
* Run the query and measure the query execution time
   * Remove `LIMIT` to prevent any non-deterministic behaviors
   * Discard the test case if any DBMS returned an error
   * Some DBMS does not show the actual query execution time. In this case, 
query the `current time` before and after the actual query, and then we 
calculate the elapsed time.

In this report, we attached a few queries. We believe that there are many 
duplicated or false-positive cases. It would be great if we can get feedback 
about the reported queries. Once we know the root cause of the problem or false 
positive, we will make a follow-up report after we remove them all.

For example, the below query runs x1000 slower than other DBMSs from PostgreSQL.

select ref_0.ol_amount as c0
from order_line as ref_0
left join stock as ref_1
  on (ref_0.ol_o_id = ref_1.s_w_id )
inner join warehouse as ref_2
on (ref_1.s_dist_09 is NULL)
where ref_2.w_tax is NULL;


* Query files link:

wget https://gts3.org/~jjung/report1/pg.tar.gz

* Execution result (execution time (second))

| Filename | Postgres |   Mysql  | Cockroachdb |  Sqlite  |   Ratio  |
|-:|-:|-:|:|-:|-:|
|34065 |  1.31911 |0.013 | 0.02493 |1.025 |   101.47 |
|36399 |  3.60298 |0.015 | 1.05593 |3.487 |   240.20 |
|35767 |  4.01327 |0.032 | 0.00727 |2.311 |   552.19 |
|11132 |   4.3518 |0.022 | 0.00635 |3.617 |   684.88 |
|29658 |   4.6783 |0.034 | 0.00778 | 2.63 |   601.10 |
|19522 |  1.06943 |0.014 | 0.00569 |   0.0009 |  1188.26 |
|38388 |  3.21383 |0.013 | 0.00913 |2.462 |   352.09 |
| 7187 |  1.20267 |0.015 | 0.00316 |   0.0009 |  1336.30 |
|24121 |  2.80611 |0.014 | 0.03083 |0.005 |   561.21 |
|25800 |  3.95163 |0.024 | 0.73027 |3.876 |   164.65 |
| 2030 |  1.91181 |0.013 | 0.04123 |1.634 |   147.06 |
|17383 |  3.28785 |0.014 | 0.00611 |  2.4 |   538.45 |
|19551 |  4.70967 |0.014 | 0.00329 |   0.0009 |  5232.97 |
|26595 |  3.70423 |0.014 | 0.00601 |2.747 |   615.92 |
|  469 |  4.18906 |0.013 | 0.12343 |0.016 |   322.23 |


# Reproduce: install DBMSs, import TPCC benchmark, run query

### Cockroach (from binary)

```sh
# install DBMS
wget https://binaries.cockroachdb.com/cockroach-v20.2.5.linux-amd64.tgz
tar xzvf cockroach-v20.2.5.linux-amd64.tgz
sudo cp -i cockroach-v20.2.5.linux-amd64/cockroach /usr/local/bin/cockroach20

sudo mkdir -p /usr/local/lib/cockroach
sudo cp -i cockroach-v20.2.5.linux-amd64/lib/libgeos.so 
/usr/local/lib/cockroach/
sudo cp -i cockroach-v20.2.5.linux-amd64/lib/libgeos_c.so 
/usr/local/lib/cockroach/

# test
which cockroach20
cockroach20 demo

# start the DBMS (to make initial node files)
cd ~
cockroach20 start-single-node --insecure --store=node20 
--listen-addr=localhost:26259 --http-port=28080 --max-sql-memory=1GB 
--background
# quit
cockroach20 quit --insecure --host=localhost:26259

# import DB
mkdir -p node20/extern
wget https://gts3.org/~jjung/tpcc-perf/tpcc_cr.tar.gz
tar xzvf tpcc_cr.tar.gz
cp tpcc_cr.sql node20/tpcc.sql

# start the DBMS again and createdb
cockroach20 sql --insecure --host=localhost:26259 --execute="CREATE DATABASE IF 
NOT EXISTS cockroachdb;"
--cockroach20 sql --insecure --host=localhost:26259 --execute="DROP DATABASE 
cockroachdb;"

cockroach20 sql --insecure --host=localhost:26259 --database=cockroachdb 
--execute="IMPORT PGDUMP 'nodelocal://self/tpcc.sql';"

# test
cockroach20 sql --insecure --host=localhost:26259 --database=cockroachdb 
--execute="explain analyze select count(*) from order_line;"

# run query
cockroach20 sql --insecure --host=localhost --port=26259 --database=cockroachdb 
< query.sql
```


### Postgre (from SRC)

```sh
# remove any previous postgres (if exist)
sudo apt-get --purge remove postgresql postgresql-doc postgresql-common

# build latest postgres
git clone https://github.com/postgres/postgres.git
mkdir bld
cd bld
../configure
make -j 20

# install DBMS
sudo su
make install
adduser postgres
rm -rf /usr/local/pgsql/data
mkdir /usr/local/pgsql/data
chown -R postgres /usr/local/pgsql/data
su - postgres
/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
/usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l logfile start
/usr/local/pgsql/bin/createdb jjung
#/usr/local/pgsql/bin/psql postgresdb

/usr/local/pgsql/bin/createuser -s {username}
/us

Re: Potential performance issues

2021-03-01 Thread Andrew Dunstan


On 2/28/21 10:04 AM, Jung, Jinho wrote:
> # install DBMS
> sudo su
> make install
> adduser postgres
> rm -rf /usr/local/pgsql/data
> mkdir /usr/local/pgsql/data
> chown -R postgres /usr/local/pgsql/data
> su - postgres
> /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
> /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l logfile start
> /usr/local/pgsql/bin/createdb jjung


Using an untuned Postgres is fairly useless for a performance test. Out
of the box, shared_buffers and work_mem are too low for almost all
situations, and many other settings can also usually be improved. The
default settings are deliberately very conservative.


cheers


andrew



-- Andrew Dunstan EDB: https://www.enterprisedb.com




Re: Potential performance issues

2021-03-01 Thread MichaelDBA

Hi,

It is worthy work trying to compare performance across multiple database 
vendors, but unfortunately, it does not really come across as comparing 
apples to apples.


For instance, configuration parameters:  I do not see where you are 
doing any modification of configuration at all.  Since DBVendors are 
different in how they apply "out of the box" configuration,  this alone 
can severely affect your comparison tests even though you are using a 
standard in benchmark testing, TPCC-C.  Postgres is especially 
conservative in "out of the box" configuration.  For instance, 
"work_mem" is set to an incredibly low value of 4MB.  This has a big 
impact on many types of queries. Oracle has something called SGA_TARGET, 
which if enabled, self-regulates where the memory is utilized, thus not 
limiting query memory specifically in the way Postgres does.  This is 
just one example of a bazillion others where differences in "out of the 
box" configuration makes these tests more like comparing apples to 
oranges.  There are many other areas of configuration related to memory, 
disk, parallel execution, io concurrency, etc.


In sum, when comparing performance across different database vendors, 
there are many other factors that must be taken into account when trying 
to do an impartial comparison.  I just showed one: how configuration 
differences can skew the results.


Regards,
Michael Vitale




Jung, Jinho wrote on 2/28/2021 10:04 AM:

# Performance issues discovered from differential test

Hello. We are studying DBMS from GeorgiaTech and reporting interesting 
queries that potentially show performance problems.


To discover such cases, we used the following procedures:

* Install four DBMSs with the latest version (PostgreSQL, SQLite, 
MySQL, CockroachDB)

* Import TPCC-C benchmark for each DBMS
* Generate random query (and translate the query to handle different 
dialects)

* Run the query and measure the query execution time
   * Remove `LIMIT` to prevent any non-deterministic behaviors
   * Discard the test case if any DBMS returned an error
   * Some DBMS does not show the actual query execution time. In this 
case, query the `current time` before and after the actual query, and 
then we calculate the elapsed time.


In this report, we attached a few queries. We believe that there are 
many duplicated or false-positive cases. It would be great if we can 
get feedback about the reported queries. Once we know the root cause 
of the problem or false positive, we will make a follow-up report 
after we remove them all.


For example, the below query runs x1000 slower than other DBMSs from 
PostgreSQL.


    select ref_0.ol_amount as c0
    from order_line as ref_0
        left join stock as ref_1
          on (ref_0.ol_o_id = ref_1.s_w_id )
        inner join warehouse as ref_2
        on (ref_1.s_dist_09 is NULL)
    where ref_2.w_tax is NULL;


* Query files link:

wget https://gts3.org/~jjung/report1/pg.tar.gz

* Execution result (execution time (second))

| Filename | Postgres |   Mysql  | Cockroachdb |  Sqlite  |   Ratio  |
|-:|-:|-:|:|-:|-:|
|    34065 |  1.31911 |    0.013 |     0.02493 |    1.025 | 101.47 |
|    36399 |  3.60298 |    0.015 |     1.05593 |    3.487 | 240.20 |
|    35767 |  4.01327 |    0.032 |     0.00727 |    2.311 | 552.19 |
|    11132 |   4.3518 |    0.022 |     0.00635 |    3.617 | 684.88 |
|    29658 |   4.6783 |    0.034 |     0.00778 |     2.63 | 601.10 |
|    19522 |  1.06943 |    0.014 |     0.00569 |   0.0009 |  1188.26 |
|    38388 |  3.21383 |    0.013 |     0.00913 |    2.462 | 352.09 |
|     7187 |  1.20267 |    0.015 |     0.00316 |   0.0009 |  1336.30 |
|    24121 |  2.80611 |    0.014 |     0.03083 |    0.005 | 561.21 |
|    25800 |  3.95163 |    0.024 |     0.73027 |    3.876 | 164.65 |
|     2030 |  1.91181 |    0.013 |     0.04123 |    1.634 | 147.06 |
|    17383 |  3.28785 |    0.014 |     0.00611 |      2.4 | 538.45 |
|    19551 |  4.70967 |    0.014 |     0.00329 |   0.0009 |  5232.97 |
|    26595 |  3.70423 |    0.014 |     0.00601 |    2.747 | 615.92 |
|      469 |  4.18906 |    0.013 |     0.12343 |    0.016 | 322.23 |


# Reproduce: install DBMSs, import TPCC benchmark, run query

### Cockroach (from binary)

```sh
# install DBMS
wget https://binaries.cockroachdb.com/cockroach-v20.2.5.linux-amd64.tgz
tar xzvf cockroach-v20.2.5.linux-amd64.tgz
sudo cp -i cockroach-v20.2.5.linux-amd64/cockroach 
/usr/local/bin/cockroach20


sudo mkdir -p /usr/local/lib/cockroach
sudo cp -i cockroach-v20.2.5.linux-amd64/lib/libgeos.so 
/usr/local/lib/cockroach/
sudo cp -i cockroach-v20.2.5.linux-amd64/lib/libgeos_c.so 
/usr/local/lib/cockroach/


# test
which cockroach20
cockroach20 demo

# start the DBMS (to make initial node files)
cd ~
cockroach20 start-single-node --insecure --store=node20 
--listen-addr=localhost:26259 --http-port=28080 --max-sql-memory=1GB 
--background

# quit
cockroach20 quit --insecure --host=localhos

Re: Potential performance issues

2021-03-01 Thread MichaelDBA

Ha, Andrew beat me to the punch!

Andrew Dunstan wrote on 3/1/2021 7:59 AM:

On 2/28/21 10:04 AM, Jung, Jinho wrote:

# install DBMS
sudo su
make install
adduser postgres
rm -rf /usr/local/pgsql/data
mkdir /usr/local/pgsql/data
chown -R postgres /usr/local/pgsql/data
su - postgres
/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
/usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l logfile start
/usr/local/pgsql/bin/createdb jjung


Using an untuned Postgres is fairly useless for a performance test. Out
of the box, shared_buffers and work_mem are too low for almost all
situations, and many other settings can also usually be improved. The
default settings are deliberately very conservative.


cheers


andrew



-- Andrew Dunstan EDB: https://www.enterprisedb.com








Re: Potential performance issues

2021-03-01 Thread Bob Jolliffe
Was just about to reply similarly.  Mind you it perhaps does raise the
question : are the default postgresql settings perhaps too
conservative or too static.  For example, in the absence of other
explicit configuration, might it make more sense for many use cases
for postgres to assess the physical memory available and make some
half-sensible allocations based on that?  I know there are downsides
to assuming that postgresql has free reign to all that it sees, but
there are clearly also some downsides in assuming it has next to
nothing.  This could also be more correctly part of a package
installation procedure, but just floating the idea ... some kind of
auto-tuning vs ultra-conservative defaults.

On Mon, 1 Mar 2021 at 13:05, MichaelDBA  wrote:
>
> Ha, Andrew beat me to the punch!
>
> Andrew Dunstan wrote on 3/1/2021 7:59 AM:
> > On 2/28/21 10:04 AM, Jung, Jinho wrote:
> >> # install DBMS
> >> sudo su
> >> make install
> >> adduser postgres
> >> rm -rf /usr/local/pgsql/data
> >> mkdir /usr/local/pgsql/data
> >> chown -R postgres /usr/local/pgsql/data
> >> su - postgres
> >> /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
> >> /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l logfile start
> >> /usr/local/pgsql/bin/createdb jjung
> >
> > Using an untuned Postgres is fairly useless for a performance test. Out
> > of the box, shared_buffers and work_mem are too low for almost all
> > situations, and many other settings can also usually be improved. The
> > default settings are deliberately very conservative.
> >
> >
> > cheers
> >
> >
> > andrew
> >
> >
> >
> > -- Andrew Dunstan EDB: https://www.enterprisedb.com
> >
> >
>
>
>




Re: Potential performance issues

2021-03-01 Thread Thomas Kellerer
Jung, Jinho schrieb am 28.02.2021 um 16:04:
> # Performance issues discovered from differential test
>
> For example, the below query runs x1000 slower than other DBMSs from 
> PostgreSQL.
>
>     select ref_0.ol_amount as c0
>     from order_line as ref_0
>         left join stock as ref_1
>           on (ref_0.ol_o_id = ref_1.s_w_id )
>         inner join warehouse as ref_2
>         on (ref_1.s_dist_09 is NULL)
>     where ref_2.w_tax is NULL;

I find this query extremely weird to be honest.

There is no join condition between warehouse and the other two tables which 
results in a cross join.
Which is "reduced" somehow by applying the IS NULL conditions - but still, to 
me this makes no sense.

Maybe the Postgres optimizer doesn't handle this ugly "join condition" the same 
way the others do.

I would rather expect a NOT EXISTS against the warehouse table.

Thomas




Re: Potential performance issues

2021-03-01 Thread Rick Otten
On Mon, Mar 1, 2021 at 8:44 AM Bob Jolliffe  wrote:

> Was just about to reply similarly.  Mind you it perhaps does raise the
> question : are the default postgresql settings perhaps too
> conservative or too static.  For example, in the absence of other
> explicit configuration, might it make more sense for many use cases
> for postgres to assess the physical memory available and make some
> half-sensible allocations based on that?  I know there are downsides
> to assuming that postgresql has free reign to all that it sees, but
> there are clearly also some downsides in assuming it has next to
> nothing.  This could also be more correctly part of a package
> installation procedure, but just floating the idea ... some kind of
> auto-tuning vs ultra-conservative defaults.
>
>
When you spin up an Aurora or RDS instance in AWS, their default parameter
group values are mostly set by formulas which derive values based on the
instance size.  Of course they can assume free reign of the entire system,
but the values they choose are still somewhat interesting.

For example, they set `maintenance_work_mem` like this:
"GREATEST({DBInstanceClassMemory/63963136*1024},65536)"

It doesn't completely remove the need for a human to optimize the parameter
group based on your use case, but it does seem to give you a better novice
starting point to work from.  And there are definitely some formulas that I
disagree with in the general case.  However it is something that is
adaptable for those times when you bump up the server size, but don't want
to have to revisit and update every parameter to support the change.

I've been thinking a lot about running PG in containers for dev
environments lately, and trying to tune to get reasonable dev performance
out of a container without crushing the other services and containers on
the laptop.  Most developers that I've worked with over the past few years
only have exposure to running PG in a container.  They've simply never run
it on a server or even barebones on their laptop.  I think any modern
approach to a default set of tuning parameters would probably also need to
be "container aware", which is for all practical purposes the new default
"minimal configuration" on multi-purpose systems.


Re: Potential performance issues

2021-03-01 Thread Jung, Jinho
Andrew, Bob, Michael

Thanks for the valuable feedback! Even with the default setting, PostgreSQL 
mostly showed good performance than other DBMSs. The reported queries are a 
very tiny portion among all executed queries (e.g., <0.001%).

As you guided, we will make the follow-up report after we test again with the 
performance-tuned PostgreSQL.

Hope we can contribute to improving PostgreSQL.

Thanks,
Jinho Jung


From: MichaelDBA 
Sent: Monday, March 1, 2021 8:04 AM
To: Jung, Jinho 
Cc: pgsql-performa...@postgresql.org 
Subject: Re: Potential performance issues

Hi,

It is worthy work trying to compare performance across multiple database 
vendors, but unfortunately, it does not really come across as comparing apples 
to apples.

For instance, configuration parameters:  I do not see where you are doing any 
modification of configuration at all.  Since DBVendors are different in how 
they apply "out of the box" configuration,  this alone can severely affect your 
comparison tests even though you are using a standard in benchmark testing, 
TPCC-C.  Postgres is especially conservative in "out of the box" configuration. 
 For instance, "work_mem" is set to an incredibly low value of 4MB.  This has a 
big impact on many types of queries. Oracle has something called SGA_TARGET, 
which if enabled, self-regulates where the memory is utilized, thus not 
limiting query memory specifically in the way Postgres does.  This is just one 
example of a bazillion others where differences in "out of the box" 
configuration makes these tests more like comparing apples to oranges.  There 
are many other areas of configuration related to memory, disk, parallel 
execution, io concurrency, etc.

In sum, when comparing performance across different database vendors, there are 
many other factors that must be taken into account when trying to do an 
impartial comparison.  I just showed one: how configuration differences can 
skew the results.

Regards,
Michael Vitale




Jung, Jinho wrote on 2/28/2021 10:04 AM:
# Performance issues discovered from differential test

Hello. We are studying DBMS from GeorgiaTech and reporting interesting queries 
that potentially show performance problems.

To discover such cases, we used the following procedures:

* Install four DBMSs with the latest version (PostgreSQL, SQLite, MySQL, 
CockroachDB)
* Import TPCC-C benchmark for each DBMS
* Generate random query (and translate the query to handle different dialects)
* Run the query and measure the query execution time
   * Remove `LIMIT` to prevent any non-deterministic behaviors
   * Discard the test case if any DBMS returned an error
   * Some DBMS does not show the actual query execution time. In this case, 
query the `current time` before and after the actual query, and then we 
calculate the elapsed time.

In this report, we attached a few queries. We believe that there are many 
duplicated or false-positive cases. It would be great if we can get feedback 
about the reported queries. Once we know the root cause of the problem or false 
positive, we will make a follow-up report after we remove them all.

For example, the below query runs x1000 slower than other DBMSs from PostgreSQL.

select ref_0.ol_amount as c0
from order_line as ref_0
left join stock as ref_1
  on (ref_0.ol_o_id = ref_1.s_w_id )
inner join warehouse as ref_2
on (ref_1.s_dist_09 is NULL)
where ref_2.w_tax is NULL;


* Query files link:

wget 
https://gts3.org/~jjung/report1/pg.tar.gz

* Execution result (execution time (second))

| Filename | Postgres |   Mysql  | Cockroachdb |  Sqlite  |   Ratio  |
|-:|-:|-:|:|-:|-:|
|34065 |  1.31911 |0.013 | 0.02493 |1.025 |   101.47 |
|36399 |  3.60298 |0.015 | 1.05593 |3.487 |   240.20 |
|35767 |  4.01327 |0.032 | 0.00727 |2.311 |   552.19 |
|11132 |   4.3518 |0.022 | 0.00635 |3.617 |   684.88 |
|29658 |   4.6783 |0.034 | 0.00778 | 2.63 |   601.10 |
|19522 |  1.06943 |0.014 | 0.00569 |   0.0009 |  1188.26 |
|38388 |  3.21383 |0.013 | 0.00913 |2.462 |   352.09 |
| 7187 |  1.20267 |0.015 | 0.00316 |   0.0009 |  1336.30 |
|24121 |  2.80611 |0.014 | 0.03083 |0.005 |   561.21 |
|25800 |  3.95163 |0.024 | 0.73027 |3.876 |   164.65 |
| 2030 |  1.91181 |0.013 | 0.04123 |1.634 |   147.06 |
|17383 |  3.28785 |0.014 | 0.00611 |  2.4 |   538.45

Re: Potential performance issues

2021-03-01 Thread Pavel Stehule
Hi

po 1. 3. 2021 v 15:59 odesílatel Jung, Jinho  napsal:

> Andrew, Bob, Michael
>
> Thanks for the valuable feedback! Even with the default setting,
> PostgreSQL mostly showed good performance than other DBMSs. The reported
> queries are a very tiny portion among all executed queries (e.g., <0.001%).
>
>
> As you guided, we will make the follow-up report after we test again with
> the performance-tuned PostgreSQL.
>
> Hope we can contribute to improving PostgreSQL.
>

Important thing - assign execution plan of slow query

https://explain.depesz.com/

https://wiki.postgresql.org/wiki/Slow_Query_Questions

Regards

Pavel


> Thanks,
> Jinho Jung
>
> --
> *From:* MichaelDBA 
> *Sent:* Monday, March 1, 2021 8:04 AM
> *To:* Jung, Jinho 
> *Cc:* pgsql-performa...@postgresql.org 
> *Subject:* Re: Potential performance issues
>
> Hi,
>
> It is worthy work trying to compare performance across multiple database
> vendors, but unfortunately, it does not really come across as comparing
> apples to apples.
>
> For instance, configuration parameters:  I do not see where you are doing
> any modification of configuration at all.  Since DBVendors are different in
> how they apply "out of the box" configuration,  this alone can severely
> affect your comparison tests even though you are using a standard in
> benchmark testing, TPCC-C.  Postgres is especially conservative in "out of
> the box" configuration.  For instance, "work_mem" is set to an incredibly
> low value of 4MB.  This has a big impact on many types of queries. Oracle
> has something called SGA_TARGET, which if enabled, self-regulates where the
> memory is utilized, thus not limiting query memory specifically in the way
> Postgres does.  This is just one example of a bazillion others where
> differences in "out of the box" configuration makes these tests more like
> comparing apples to oranges.  There are many other areas of configuration
> related to memory, disk, parallel execution, io concurrency, etc.
>
> In sum, when comparing performance across different database vendors,
> there are many other factors that must be taken into account when trying to
> do an impartial comparison.  I just showed one: how configuration
> differences can skew the results.
>
> Regards,
> Michael Vitale
>
>
>
>
> Jung, Jinho wrote on 2/28/2021 10:04 AM:
>
> # Performance issues discovered from differential test
>
> Hello. We are studying DBMS from GeorgiaTech and reporting interesting
> queries that potentially show performance problems.
>
> To discover such cases, we used the following procedures:
>
> * Install four DBMSs with the latest version (PostgreSQL, SQLite, MySQL,
> CockroachDB)
> * Import TPCC-C benchmark for each DBMS
> * Generate random query (and translate the query to handle different
> dialects)
> * Run the query and measure the query execution time
>* Remove `LIMIT` to prevent any non-deterministic behaviors
>* Discard the test case if any DBMS returned an error
>* Some DBMS does not show the actual query execution time. In this
> case, query the `current time` before and after the actual query, and then
> we calculate the elapsed time.
>
> In this report, we attached a few queries. We believe that there are many
> duplicated or false-positive cases. It would be great if we can get
> feedback about the reported queries. Once we know the root cause of the
> problem or false positive, we will make a follow-up report after we remove
> them all.
>
> For example, the below query runs x1000 slower than other DBMSs from
> PostgreSQL.
>
> select ref_0.ol_amount as c0
> from order_line as ref_0
> left join stock as ref_1
>   on (ref_0.ol_o_id = ref_1.s_w_id )
> inner join warehouse as ref_2
> on (ref_1.s_dist_09 is NULL)
> where ref_2.w_tax is NULL;
>
>
> * Query files link:
>
> wget https://gts3.org/~jjung/report1/pg.tar.gz
> 
>
> * Execution result (execution time (second))
>
> | Filename | Postgres |   Mysql  | Cockroachdb |  Sqlite  |   Ratio  |
> |-:|-:|-:|:|-:|-:|
> |34065 |  1.31911 |0.013 | 0.02493 |1.025 |   101.47 |
> |36399 |  3.60298 |0.015 | 1.05593 |3.487 |   240.20 |
> |35767 |  4.01327 |0.032 | 0.00727 |2.311 |   552.19 |
> |11132 |   4.3518 |0.022 | 0.00635 |3.617 |   684.88 |
> |29658 |   4.6783 |0.034 | 0.00778 | 2.63 |   601.10 |
> |19522 |  1.06943 |0.014 | 0.00569 |   0.0009 |  1188.26 |
> |38388 |  3.21383 |0.013 | 

Re: Postgres performance comparing GCP and AWS

2021-03-01 Thread Hannu Krosing
Have you tried to set the instance running on GCP to have similar
shared_buffers as the AWS database ?

What you described has a much lower cache hit rate on GCS and 2X the
shared buffers on AWS which could well explain much of the difference
in execution times.

DETAILS:
Query explain for Postgres on GCP VM:
Buffers: shared hit=423 read=4821

Query explain for Postgres on AWS RDS:
Buffers: shared hit=3290 read=1948

and the configuration :

Instance on VM on GCP (2 vCPUs, 2 GB memory, 800 GB disk):
• shared_buffers: 510920kB (close to 499MB)

Instance managed by RDS (2 vCPUs, 2 GiB RAM, 250GB disk, 750 de IOPS):
• shared_buffers: 943896kB (close to 922MB)


Cheers
Hannu

On Fri, Feb 26, 2021 at 9:16 AM Justin Pitts  wrote:
>
> Since this is a comparison to RDS, and the goal presumably is to make the 
> test as even as possible, you will want to pay attention to the network IO 
> capacity for the client and the server in both tests.
>
> For RDS, you will be unable to run the client software locally on the server 
> hardware, so you should plan to do the same for the GCP comparison.
>
> What is the machine size you are using for your RDS instance? Each machine 
> size will specify CPU and RAM along with disk and network IO capacity.
>
> Is your GCP VM where you are running PG ( a GCP VM is the equivalent of an 
> EC2 instance, by the way ) roughly equivalent to that RDS instance?
>
> Finally, is the network topology roughly equivalent? Are you performing these 
> tests with the same region and/or availability zone?
>
>
>
> On Thu, Feb 25, 2021 at 3:32 PM Philip Semanchuk 
>  wrote:
>>
>>
>>
>> > On Feb 25, 2021, at 4:04 PM, Igor Gois  wrote:
>> >
>> > Philip,
>> >
>> > The results in first email in this thread were using explain analyze.
>> >
>> > I thought that you asked to run using only 'explain'. My bad.
>> >
>> > The point is, the execution time with explain analyze is less the 1 
>> > second. But the actual execution time (calculated from the python client) 
>> > is 24 seconds (aws) and 300+ seconds in gcp
>>
>> Oh OK, sorry, I wasn’t following. Yes, network speed sounds like the source 
>> of the problem.
>>
>> Under AWS sometimes we log into an EC2 instance if we have to run a query 
>> that generates a lot of data so that both server and client are inside AWS. 
>> If GCP has something similar to EC2, it might be an interesting experiment 
>> to run your query from there and see how much, if any, that changes the time 
>> it takes to get results.
>>
>> Hope this helps
>> Philip
>>
>>
>>
>> >
>> > Em qui., 25 de fev. de 2021 às 17:53, Philip Semanchuk 
>> >  escreveu:
>> >
>> >
>> > > On Feb 25, 2021, at 3:46 PM, Igor Gois  wrote:
>> > >
>> > > Hi, Philip
>> > >
>> > > We ran: EXPLAIN (FORMAT JSON) SELECT "Id", "DateTime", 
>> > > "SignalRegisterId", "Raw" FROM "SignalRecordsBlobs" WHERE 
>> > > "SignalSettingId" = 103 AND "DateTime" BETWEEN 
>> > > '2019-11-28T14:00:12.54020' AND '2020-07-23T21:12:32.24900';
>> > >
>> > > but it was really fast. I think the results were discarded.
>> >
>> > EXPLAIN and EXPLAIN ANALYZE are different in an important way. EXPLAIN 
>> > merely plans the query, EXPLAIN ANALYZE plans *and executes* the query. 
>> > From the doc —
>> >
>> > "The ANALYZE option causes the statement to be actually executed, not only 
>> > plannedKeep in mind that the statement is actually executed when the 
>> > ANALYZE option is used. Although EXPLAIN will discard any output that a 
>> > SELECT would return, other side effects of the statement will happen as 
>> > usual. “
>> >
>> > https://www.postgresql.org/docs/12/sql-explain.html
>> >
>> >
>> > >
>> > > AWS Execution time select without explain: 24.96505s (calculated in 
>> > > python client)
>> > > AWS Execution time select with explain but without analyze: 0.03876s 
>> > > (calculated in python client)
>> > >
>> > > https://explain.depesz.com/s/5HRO
>> > >
>> > > Thanks in advance
>> > >
>> > >
>> > > Em qui., 25 de fev. de 2021 às 15:13, Philip Semanchuk 
>> > >  escreveu:
>> > >
>> > >
>> > > > On Feb 24, 2021, at 10:11 AM, Igor Gois  
>> > > > wrote:
>> > > >
>> > > > Hi, Julien
>> > > >
>> > > > Your hypothesis about network transfer makes sense. The query returns 
>> > > > a big size byte array blobs.
>> > > >
>> > > > Is there a way to test the network speed against the instances? I have 
>> > > > access to the network speed in gcp (5 Mb/s), but don't have access in 
>> > > > aws rds.
>> > >
>> > > Perhaps what you should run is EXPLAIN ANALYZE SELECT...? My 
>> > > understanding is that EXPLAIN ANALYZE executes the query but discards 
>> > > the results. That doesn’t tell you the network speed of  your AWS 
>> > > instance, but it does isolate the query execution speed (which is what I 
>> > > think you’re trying to measure) from the network speed.
>> > >
>> > > Hope this is useful.
>> > >
>> > > Cheers
>> > > Philip
>> > >
>> > > >
>> > > >
>> > > > Em qua., 24 de fev. de 2021 às 10:35, Julien Rouhaud 
>> >

Re: Potential performance issues

2021-03-01 Thread Hannu Krosing
...

* Remove `LIMIT` to prevent any non-deterministic behaviors

This seems counterproductive, as for example PostgreSQL has special
handling of "fast start" queries which is triggered by presence of
LIMIT or OFFSET, so this will miss some optimisations.

Also,it is not like removing LIMIT is some magic bullet which
guarantees there are not non-deterministic behaviors  - cost-based
optimisers can see lots of plan changes due to many things, like when
analyse and/or  vacuum was run last time, what is and is not in shared
buffers, how much of table fits in disk cache, and which parts etc.

Cheers
Hannu




On Mon, Mar 1, 2021 at 4:07 PM Pavel Stehule  wrote:
>
> Hi
>
> po 1. 3. 2021 v 15:59 odesílatel Jung, Jinho  napsal:
>>
>> Andrew, Bob, Michael
>>
>> Thanks for the valuable feedback! Even with the default setting, PostgreSQL 
>> mostly showed good performance than other DBMSs. The reported queries are a 
>> very tiny portion among all executed queries (e.g., <0.001%).
>>
>> As you guided, we will make the follow-up report after we test again with 
>> the performance-tuned PostgreSQL.
>>
>> Hope we can contribute to improving PostgreSQL.
>
>
> Important thing - assign execution plan of slow query
>
> https://explain.depesz.com/
>
> https://wiki.postgresql.org/wiki/Slow_Query_Questions
>
> Regards
>
> Pavel
>
>>
>> Thanks,
>> Jinho Jung
>>
>> 
>> From: MichaelDBA 
>> Sent: Monday, March 1, 2021 8:04 AM
>> To: Jung, Jinho 
>> Cc: pgsql-performa...@postgresql.org 
>> Subject: Re: Potential performance issues
>>
>> Hi,
>>
>> It is worthy work trying to compare performance across multiple database 
>> vendors, but unfortunately, it does not really come across as comparing 
>> apples to apples.
>>
>> For instance, configuration parameters:  I do not see where you are doing 
>> any modification of configuration at all.  Since DBVendors are different in 
>> how they apply "out of the box" configuration,  this alone can severely 
>> affect your comparison tests even though you are using a standard in 
>> benchmark testing, TPCC-C.  Postgres is especially conservative in "out of 
>> the box" configuration.  For instance, "work_mem" is set to an incredibly 
>> low value of 4MB.  This has a big impact on many types of queries. Oracle 
>> has something called SGA_TARGET, which if enabled, self-regulates where the 
>> memory is utilized, thus not limiting query memory specifically in the way 
>> Postgres does.  This is just one example of a bazillion others where 
>> differences in "out of the box" configuration makes these tests more like 
>> comparing apples to oranges.  There are many other areas of configuration 
>> related to memory, disk, parallel execution, io concurrency, etc.
>>
>> In sum, when comparing performance across different database vendors, there 
>> are many other factors that must be taken into account when trying to do an 
>> impartial comparison.  I just showed one: how configuration differences can 
>> skew the results.
>>
>> Regards,
>> Michael Vitale
>>
>>
>>
>>
>> Jung, Jinho wrote on 2/28/2021 10:04 AM:
>>
>> # Performance issues discovered from differential test
>>
>> Hello. We are studying DBMS from GeorgiaTech and reporting interesting 
>> queries that potentially show performance problems.
>>
>> To discover such cases, we used the following procedures:
>>
>> * Install four DBMSs with the latest version (PostgreSQL, SQLite, MySQL, 
>> CockroachDB)
>> * Import TPCC-C benchmark for each DBMS
>> * Generate random query (and translate the query to handle different 
>> dialects)
>> * Run the query and measure the query execution time
>>* Remove `LIMIT` to prevent any non-deterministic behaviors
>>* Discard the test case if any DBMS returned an error
>>* Some DBMS does not show the actual query execution time. In this case, 
>> query the `current time` before and after the actual query, and then we 
>> calculate the elapsed time.
>>
>> In this report, we attached a few queries. We believe that there are many 
>> duplicated or false-positive cases. It would be great if we can get feedback 
>> about the reported queries. Once we know the root cause of the problem or 
>> false positive, we will make a follow-up report after we remove them all.
>>
>> For example, the below query runs x1000 slower than other DBMSs from 
>> PostgreSQL.
>>
>> select ref_0.ol_amount as c0
>> from order_line as ref_0
>> left join stock as ref_1
>>   on (ref_0.ol_o_id = ref_1.s_w_id )
>> inner join warehouse as ref_2
>> on (ref_1.s_dist_09 is NULL)
>> where ref_2.w_tax is NULL;
>>
>>
>> * Query files link:
>>
>> wget https://gts3.org/~jjung/report1/pg.tar.gz
>>
>> * Execution result (execution time (second))
>>
>> | Filename | Postgres |   Mysql  | Cockroachdb |  Sqlite  |   Ratio  |
>> |-:|-:|-:|:|-:|-:|
>> |34065 |  1.31911 |0.013 | 0.02493 |1.025 |   101.47 |
>> |