On 07/12/2011 01:04 PM, lars wrote:
On 07/12/2011 12:08 PM, Kevin Grittner wrote:
lars<lhofha...@yahoo.com>  wrote:

select count(*) from test where tenant = $1 and created_date = $2

Ah, that might be a major clue -- prepared statements.

What sort of a plan do you get for that as a prepared statement?
(Note, it is very likely *not* to be the same plan as you get if you
run with literal values!)  It is not at all unlikely that it could
resort to a table scan if you have one tenant which is five or ten
percent of the table, which would likely trigger the pruning as it
passed over the modified pages.

-Kevin
So a read of a row *will* trigger dead tuple pruning, and that requires WAL logging, and this is known/expected? This is actually the only answer I am looking for. :) I have not seen this documented anywhere.

I know that Postgres will generate general plans for prepared statements (how could it do otherwise?),
I also know that it sometimes chooses a sequential scan.
This can always be tweaked to touch fewer rows and/or use a different plan. That's not my objective, though!

The fact that a select (maybe a big analytical query we'll run) touching many rows will update the WAL and wait (apparently) for that IO to complete is making a fully cached database far less useful.
I just artificially created this scenario.

... Just dropped the table to test something so I can't get the plan right now. Will send an update as soon as I get
it setup again.

Thanks again.

-- Lars


Ok... Slightly changes the indexes:
\d test
            Table "lars.test"
    Column    |     Type      | Modifiers
--------------+---------------+-----------
 tenant       | character(15) |
 created_by   | character(15) |
 created_date | date          |
Indexes:
    "i1" btree (tenant)

So just just a simple index on tenant.

prepare x as select count(*) from test where tenant = $1 and created_date = $2;
PREPARE
explain execute x('000000000000001','2011-6-30');
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Aggregate  (cost=263301.40..263301.41 rows=1 width=0)
   ->  Bitmap Heap Scan on test  (cost=3895.99..263299.28 rows=847 width=0)
         Recheck Cond: (tenant = $1)
         Filter: (created_date = $2)
-> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372 width=0)
               Index Cond: (tenant = $1)
(6 rows)

-- this is when the WAL rows are written:
explain (analyze on, buffers on) execute x('000000000000001','2011-6-30');
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=263301.40..263301.41 rows=1 width=0) (actual time=191.150..191.151 rows=1 loops=1)
   Buffers: shared hit=3716
-> Bitmap Heap Scan on test (cost=3895.99..263299.28 rows=847 width=0) (actual time=1.966..188.221 rows=3712 loops=1)
         Recheck Cond: (tenant = $1)
         Filter: (created_date = $2)
         Buffers: shared hit=3716
-> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372 width=0) (actual time=1.265..1.265 rows=3712 loops=1)
               Index Cond: (tenant = $1)
               Buffers: shared hit=20
 Total runtime: 191.243 ms
(10 rows)

-- this is when no WAL is written:
explain (analyze on, buffers on) execute x('000000000000001','2011-6-30');
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=263301.40..263301.41 rows=1 width=0) (actual time=11.529..11.530 rows=1 loops=1)
   Buffers: shared hit=3715
-> Bitmap Heap Scan on test (cost=3895.99..263299.28 rows=847 width=0) (actual time=1.341..9.187 rows=3712 loops=1)
         Recheck Cond: (tenant = $1)
         Filter: (created_date = $2)
         Buffers: shared hit=3715
-> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372 width=0) (actual time=0.756..0.756 rows=3712 loops=1)
               Index Cond: (tenant = $1)
               Buffers: shared hit=19
 Total runtime: 11.580 ms
(10 rows)

If you wanted to recreate this scenario I created a simple script to create the table:

create table test(tenant char(15), created_by char(15), created_date date);
insert into test values('x', 'y','2011-6-30');
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test; -- 256k rows
update test set tenant = lpad((random()*10000)::int::text,15,'0'), created_by = lpad((random()*10000)::int::text,15,'0');
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test; -- 32m rows
create index i1 on test(tenant);
vacuum analyze;

I use JDBC to perform the updates and selects, but this will do too:

prepare x as select count(*) from test where tenant = $1 and created_date = $2; prepare y as update test set created_by = $1 where tenant = $2 and created_date = $3;

execute y('000000000000001', '000000000000001','2011-6-30');
execute x('000000000000001','2011-6-30');

I'll probably compile Postgres with WAL debug on next, and try the page inspect module. On the face of it, though, this looks like Postgres would not be that useful as database that resides (mostly) in the cache.

Here what the query mentioned here returns (note that shared_buffers is very large, but I observed the same with smaller settings):

http://wiki.postgresql.org/wiki/Server_Configuration


name | current_setting
------------------------------+------------------------------------------------------------------------------------------------------------------
version | PostgreSQL 9.1beta2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.4 20100726 (Red Hat 4.4.4-13), 64-bit
 autovacuum                   | off
 bgwriter_delay               | 10ms
 bgwriter_lru_maxpages        | 1000
 checkpoint_completion_target | 0.9
 checkpoint_segments          | 128
 client_encoding              | UTF8
 effective_cache_size         | 64GB
 lc_collate                   | en_US.UTF-8
 lc_ctype                     | en_US.UTF-8
 log_checkpoints              | on
 log_line_prefix              | %m
 maintenance_work_mem         | 2GB
 max_connections              | 100
 max_stack_depth              | 2MB
 server_encoding              | UTF8
 shared_buffers               | 20GB
 TimeZone                     | UTC
 wal_buffers                  | 16MB
 work_mem                     | 1GB
(20 rows)

Thanks.

-- Lars


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to