Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?

Bartłomiej Romański Thu, 19 Sep 2013 19:05:07 -0700

Hi Julian,

Here's my complete python script:


import psycopg2
import random
import math
import time

connection = psycopg2.connect('host=localhost dbname=pgtest user=pgtest
password=pgtest')
cursor = connection.cursor()

while True:
    id = random.randrange(1, 1000 * 1000)
    cursor.execute('select offers.id, offers.name, categories.id,
categories.name from offers left join categories on categories.id =
offers.category_id where offers.id = %s', (id,))
    print cursor.fetchall()

So I assume that each of 20 instances opens and uses a single transaction.
I've just tested the options with connection.autocommit = True at the
begging, but it does not change anything. Also in production (where we
first noticed the problem) we use a new transaction for every select.

I start 20 instances of this python script (I use pv to measure
performance):

parallel -j 20 ./test.py -- $(seq 1 20) | pv -i1 -l > /dev/null

And then in a second console, I open psql and type:

pgtest=> begin; insert into categories (name) select 'category_' || x from
generate_series(1,1000) as x;

The QPS displayed be the first command drops immediately 20-30 times and
stays low as long as the transaction with insert is open.

Here's the script that I use to initiate the database. You should be able
to replicate the situation easily yourself if you want.

drop database pgtest;
drop user pgtest;
create user pgtest with password 'pgtest';
create database pgtest;
grant all privileges on database pgtest to pgtest;
\connect pgtest
CREATE TABLE categories
(
  id bigserial NOT NULL,
  name character varying NOT NULL,
  CONSTRAINT categories_pkey PRIMARY KEY (id)
);
CREATE TABLE offers
(
  id bigserial NOT NULL,
  name character varying NOT NULL,
  category_id bigint NOT NULL,
  CONSTRAINT offers_pkey PRIMARY KEY (id)
);
insert into categories (name) select 'category_' || x from
generate_series(1,1000) as x;
insert into offers (name, category_id) select 'offer_' || x, floor(random()
* 1000) + 1 from generate_series(1,1000*1000) as x;
alter table categories owner to pgtest;
alter table offers owner to pgtest;

Thanks,
Bartek




On Fri, Sep 20, 2013 at 3:42 AM, Julian <[email protected]> wrote:

> Hi Bart,
>
> You are doing heavy random reads in addition to a 1000k row insert
> within a single transaction.
>
> Also it is not clear whether your query within the python loop is itself
> within a transaction. (i.e. pyscopg2.connection.autocommit to True,
> disables transactional queries).
>
> Depending on your pg adapter, it may open a transaction by default even
> though it may not be required.
>
> So please clarify whether this is the case.
>
> Regards,
>
> Julian.
>
> On 20/09/13 10:49, Bartłomiej Romański wrote:
> > Hi all
> >
> > We're experiencing a very strange performance issue. Our setup is a
> > bit more complicated, but we've managed to isolate and replicate the
> > core problem. Here's what we observe:
> >
> > We took a strong machine (128 GB RAM, 8-core CPU, SSD drives...) and
> > installed a fresh copy of PostgreSQL 9.2 (Ubuntu Server 12.04 LTS,
> > default configuration).
> >
> > Then, we created a test database with the following schema:
> >
> > CREATE TABLE offers
> > (
> > id bigserial NOT NULL,
> > name character varying NOT NULL,
> > category_id bigint NOT NULL,
> > CONSTRAINT offers_pkey PRIMARY KEY (id)
> > );
> >
> > CREATE TABLE categories
> > (
> > id bigserial NOT NULL,
> > name character varying NOT NULL,
> > CONSTRAINT categories_pkey PRIMARY KEY (id)
> > );
> >
> > and populated it with in the following way:
> >
> > insert into categories (name) select 'category_' || x from
> > generate_series(1,1000) as x;
> > insert into offers (name, category_id) select 'offer_' || x,
> > floor(random() * 1000) + 1 from generate_series(1,1000*1000) as x;
> >
> > Finally, we created a python script to make simple queries in a loop:
> >
> > while True:
> > id = random.randrange(1, 1000 * 1000)
> > db.execute('select offers.id <http://offers.id/>, offers.name
> > <http://offers.name/>, categories.id <http://categories.id/>,
> > categories.name <http://categories.name/> from offers left join
> > categories on categories.id <http://categories.id/> =
> > offers.category_id where offers.id <http://offers.id/> = %s', (id,))
> > print db.fetchall()
> >
> > We start 20 instances simultaneously and measure performance:
> >
> > parallel -j 20 ./test.py -- $(seq 1 20) | pv -i1 -l > /dev/null
> >
> > Normally we observe about 30k QPS what's a satisfying result (without
> > any tuning at all).
> >
> > The problem occurs when we open a second console, start psql and type:
> >
> > pgtest=> begin; insert into categories (name) select 'category_' || x
> > from generate_series(1,1000) as x;
> >
> > We start a transaction and insert 1k records to the 'categories'
> > table. After that performance of the benchmark described above
> > immediately drops to about 1-2k QPS. That's 20-30 times! After closing
> > the transaction (committing or aborting - doesn't matter) it
> > immediately jumps back to 30k QPS.
> >
> > Restarting the running script and other simple tricks do not change
> > anything. The hanging, open transaction is causing a huge slowdown.
> > What's more when doing similar insert (even much larger) to the
> > 'offers' table we do not observe this effect.
> >
> > We analyzed the problem a bit deeper and find out that the actual
> > query execution times are not affected that much. They are constantly
> > close to 0.5 ms. This can be observed in a server log (after enabling
> > appropriate option) and this can be found in 'explain analyze...'
> > result. Also the query plan returned do not change and looks optimal
> > (pkey scan for 'offers' + pkey scan for 'categories').
> >
> > After a few random thought we've finally turned the
> > 'log_planner_stats' option and found out that the planner executions
> > times are highly affected by the hanging transaction. Here's the
> > typical output in the initial situation:
> >
> > 2013-09-17 21:54:59 UTC LOG: PLANNER STATISTICS
> > 2013-09-17 21:54:59 UTC DETAIL: ! system usage stats:
> > ! 0.000137 elapsed 0.000000 user 0.000000 system sec
> > ! [2.169670 user 0.383941 sys total]
> > ! 0/0 [0/11520] filesystem blocks in/out
> > ! 0/0 [0/7408] page faults/reclaims, 0 [0] swaps
> > ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
> > ! 0/0 [1362/7648] voluntary/involuntary context switches
> >
> > And here's a typical output with a hanging transaction:
> >
> > 2013-09-17 21:56:12 UTC LOG: PLANNER STATISTICS
> > 2013-09-17 21:56:12 UTC DETAIL: ! system usage stats:
> > ! 0.027251 elapsed 0.008999 user 0.001000 system sec
> > ! [32.722025 user 3.550460 sys total]
> > ! 0/0 [0/115128] filesystem blocks in/out
> > ! 0/0 [0/7482] page faults/reclaims, 0 [0] swaps
> > ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
> > ! 1/6 [12817/80202] voluntary/involuntary context switches
> >
> > As you can see it takes over 100 times more time when the extra
> > transaction is open!
> >
> > Any ideas why's that? It extremely affects total query time.
> >
> > I know that using prepared statements to solves the problem
> > completely, so we can deal with it, but we've already spend a lot of
> > time to go that far and I'm personally a bit curious what's the
> > fundamental reason for such a behavior.
> >
> > I'll be very thankful for any explanation what's going on here!
> >
> > Thanks,
> > BR
>
>
>
> --
> Sent via pgsql-performance mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?

Reply via email to