Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database

lars Tue, 12 Jul 2011 13:59:31 -0700

On 07/12/2011 01:04 PM, lars wrote:

On 07/12/2011 12:08 PM, Kevin Grittner wrote:
lars<lhofha...@yahoo.com>  wrote:
select count(*) from test where tenant = $1 and created_date = $2
Ah, that might be a major clue -- prepared statements.

What sort of a plan do you get for that as a prepared statement?
(Note, it is very likely *not* to be the same plan as you get if you
run with literal values!)  It is not at all unlikely that it could
resort to a table scan if you have one tenant which is five or ten
percent of the table, which would likely trigger the pruning as it
passed over the modified pages.

-Kevin
So a read of a row *will* trigger dead tuple pruning, and thatrequires WAL logging, and this is known/expected?This is actually the only answer I am looking for. :) I have not seenthis documented anywhere.
I know that Postgres will generate general plans for preparedstatements (how could it do otherwise?),
I also know that it sometimes chooses a sequential scan.
This can always be tweaked to touch fewer rows and/or use a differentplan. That's not my objective, though!
The fact that a select (maybe a big analytical query we'll run)touching many rows will update the WAL and wait(apparently) for that IO to complete is making a fully cached databasefar less useful.
I just artificially created this scenario.
... Just dropped the table to test something so I can't get the planright now. Will send an update as soon as I get
it setup again.

Thanks again.

-- Lars

Ok... Slightly changes the indexes:
\d test
            Table "lars.test"
    Column    |     Type      | Modifiers
--------------+---------------+-----------
 tenant       | character(15) |
 created_by   | character(15) |
 created_date | date          |
Indexes:
    "i1" btree (tenant)

So just just a simple index on tenant.

prepare x as select count(*) from test where tenant = $1 andcreated_date = $2;

PREPARE
explain execute x('000000000000001','2011-6-30');
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Aggregate  (cost=263301.40..263301.41 rows=1 width=0)
   ->  Bitmap Heap Scan on test  (cost=3895.99..263299.28 rows=847 width=0)
         Recheck Cond: (tenant = $1)
         Filter: (created_date = $2)

-> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372width=0)

               Index Cond: (tenant = $1)
(6 rows)

-- this is when the WAL rows are written:
explain (analyze on, buffers on) execute x('000000000000001','2011-6-30');
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=263301.40..263301.41 rows=1 width=0) (actualtime=191.150..191.151 rows=1 loops=1)

   Buffers: shared hit=3716

-> Bitmap Heap Scan on test (cost=3895.99..263299.28 rows=847width=0) (actual time=1.966..188.221 rows=3712 loops=1)

         Recheck Cond: (tenant = $1)
         Filter: (created_date = $2)
         Buffers: shared hit=3716

-> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372width=0) (actual time=1.265..1.265 rows=3712 loops=1)

               Index Cond: (tenant = $1)
               Buffers: shared hit=20
 Total runtime: 191.243 ms
(10 rows)

-- this is when no WAL is written:
explain (analyze on, buffers on) execute x('000000000000001','2011-6-30');
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=263301.40..263301.41 rows=1 width=0) (actualtime=11.529..11.530 rows=1 loops=1)

   Buffers: shared hit=3715

-> Bitmap Heap Scan on test (cost=3895.99..263299.28 rows=847width=0) (actual time=1.341..9.187 rows=3712 loops=1)

         Recheck Cond: (tenant = $1)
         Filter: (created_date = $2)
         Buffers: shared hit=3715

-> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372width=0) (actual time=0.756..0.756 rows=3712 loops=1)

               Index Cond: (tenant = $1)
               Buffers: shared hit=19
 Total runtime: 11.580 ms
(10 rows)

If you wanted to recreate this scenario I created a simple script tocreate the table:


create table test(tenant char(15), created_by char(15), created_date date);
insert into test values('x', 'y','2011-6-30');
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test; -- 256k rows

update test set tenant = lpad((random()*10000)::int::text,15,'0'),created_by = lpad((random()*10000)::int::text,15,'0');

insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test; -- 32m rows
create index i1 on test(tenant);
vacuum analyze;

I use JDBC to perform the updates and selects, but this will do too:

prepare x as select count(*) from test where tenant = $1 andcreated_date = $2;prepare y as update test set created_by = $1 where tenant = $2 andcreated_date = $3;


execute y('000000000000001', '000000000000001','2011-6-30');
execute x('000000000000001','2011-6-30');

I'll probably compile Postgres with WAL debug on next, and try the pageinspect module.On the face of it, though, this looks like Postgres would not be thatuseful as database that resides (mostly) in the cache.

Here what the query mentioned here returns (note that shared_buffers isvery large, but I observed the same with smaller settings):


http://wiki.postgresql.org/wiki/Server_Configuration

name| current_setting

------------------------------+------------------------------------------------------------------------------------------------------------------

version | PostgreSQL 9.1beta2 onx86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.4 20100726 (Red Hat4.4.4-13), 64-bit

 autovacuum                   | off
 bgwriter_delay               | 10ms
 bgwriter_lru_maxpages        | 1000
 checkpoint_completion_target | 0.9
 checkpoint_segments          | 128
 client_encoding              | UTF8
 effective_cache_size         | 64GB
 lc_collate                   | en_US.UTF-8
 lc_ctype                     | en_US.UTF-8
 log_checkpoints              | on
 log_line_prefix              | %m
 maintenance_work_mem         | 2GB
 max_connections              | 100
 max_stack_depth              | 2MB
 server_encoding              | UTF8
 shared_buffers               | 20GB
 TimeZone                     | UTC
 wal_buffers                  | 16MB
 work_mem                     | 1GB
(20 rows)

Thanks.

-- Lars


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database

Reply via email to