[PERFORM] Different query plans on same servers

2011-12-06 Thread Mario Splivalo
I have a fairly simple query:

SELECT some columns
FROM tubesite_image
INNER JOIN tubesite_object
ON (tubesite_image.object_ptr_id = tubesite_object.id)
WHERE
tubesite_object.site_id = 8
ORDER BY
tubesite_object.pub_date ASC LIMIT 21;



That query is having a bad query plan on production server:

 Limit  (cost=0.00..1938.67 rows=21 width=275) (actual
time=3270.000..3270.000 rows=0 loops=1)
   -  Nested Loop  (cost=0.00..792824.51 rows=8588 width=275) (actual
time=3269.997..3269.997 rows=0 loops=1)
 -  Index Scan using tubesite_object_pub_date_idx on
tubesite_object  (cost=0.00..789495.13 rows=9711 width=271) (actual
time=0.011..3243.629 rows=9905 loops=1)
   Filter: (site_id = 8)
 -  Index Scan using tubesite_image_pkey on tubesite_image
(cost=0.00..0.33 rows=1 width=4) (actual time=0.002..0.002 rows=0
loops=9905)
   Index Cond: (tubesite_image.object_ptr_id =
tubesite_object.id)
 Total runtime: 3270.071 ms

But, when I turn off nested loops, the query flies:


QUERY PLAN

--
 Limit  (cost=31384.35..31384.40 rows=21 width=275) (actual
time=37.988..37.988 rows=0 loops=1)
   -  Sort  (cost=31384.35..31405.82 rows=8588 width=275) (actual
time=37.986..37.986 rows=0 loops=1)
 Sort Key: tubesite_object.pub_date
 Sort Method:  quicksort  Memory: 25kB
 -  Hash Join  (cost=857.00..31152.80 rows=8588 width=275)
(actual time=37.968..37.968 rows=0 loops=1)
   Hash Cond: (tubesite_object.id =
tubesite_image.object_ptr_id)
   -  Bitmap Heap Scan on tubesite_object
(cost=596.77..30685.30 rows=9711 width=271) (actual time=7.414..25.132
rows=9905 loops=1)
 Recheck Cond: (site_id = 8)
 -  Bitmap Index Scan on tubesite_object_site_id
(cost=0.00..594.34 rows=9711 width=0) (actual time=4.943..4.943
rows=9905 loops=1)
   Index Cond: (site_id = 8)
   -  Hash  (cost=152.88..152.88 rows=8588 width=4) (actual
time=4.620..4.620 rows=8588 loops=1)
 -  Seq Scan on tubesite_image  (cost=0.00..152.88
rows=8588 width=4) (actual time=0.005..2.082 rows=8588 loops=1)
 Total runtime: 38.071 ms


I have rsynced the database from the prod server to the test server,
that has same configuration (shared buffers, work mem, estimated cache
size, and so on), and there it chooses bitmap heap scan with hash join
without disabling the nested loops.

I have 8.4.8 on producion and 8.4.9 on test, could that explain the
difference in plans chosen?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Different query plans on same servers

2011-12-06 Thread Tom Lane
Mario Splivalo mario.spliv...@megafon.hr writes:
 I have 8.4.8 on producion and 8.4.9 on test, could that explain the
 difference in plans chosen?

I'd wonder first if you have the same statistics settings on both.
The big problem here is that the estimation of the join size is bad
(8588 versus 0).

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Different query plans on same servers

2011-12-06 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 I'd wonder first if you have the same statistics settings on both.
 The big problem here is that the estimation of the join size is
 bad (8588 versus 0).
 
But both servers develop that estimate for the join size.  I was
wondering more about whether the costing factors were really the
same:
 
slow:
 
   -  Nested Loop
  (cost=0.00..792824.51 rows=8588 width=275)
  (actual time=3269.997..3269.997 rows=0 loops=1)
 
versus fast:
 
 -  Hash Join
(cost=857.00..31152.80 rows=8588 width=275)
(actual time=37.968..37.968 rows=0 loops=1)
 
The hash join path must look more expensive on the first machine,
for some reason.
 
Mario, could you post the result of running this query from both
servers?:
 
http://wiki.postgresql.org/wiki/Server_Configuration
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Different query plans on same servers

2011-12-06 Thread Kevin Grittner
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 But both servers develop that estimate for the join size.
 
[sigh]  Those *were* both from the production server.  Please show
us the EXPLAIN ANALYZE from the other server.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Response time increases over time

2011-12-06 Thread Havasvölgyi Ottó
Hi all,


I am running a load simulation on Debian  with PostgreSQL 8.4.9 (standard
Debian package).

Certain number of clients do the following stepsin a transaction (read
commited level) periodically (about 1.1 transaction per second / client)
and concurrently:

-reads a record of table Machine and State (they each have about 300
records, read size is about 1.4 KB)
-reads a record of table Card (it has about 1200 records)
-reads some other records from other tables, all these are straightforward,
single line queries (here there are even less records in the tables)
-updates Machine (1 record, updates 2-3 integer values)
-updates State (1 record, updates a bytea field, about 1,3KB)
-updates Card (1 record, updates an integer)
-inserts 1-1 record into 2 log tables

Its important, that each client updates different records, so there is no
update conflict. There are no triggers or rules. Tables have simple
indexes, 2 at most.

As I run the simulation with more and more clients, I can observe, that at
the beginning of the simulation the transaction times are quite acceptable
(20-30 ms) and quite uniform/smooth, but as the simultion progresses, it
becomes higher (30-40-50-60 ms) and more and more non-uniform, but the tps
interestingly remains the same during the simulation. With 100 clients this
kind of behaviour can be seen very well. The simulation's duration is 500
sec.
I wonder why this happens on this server, and how I can keep the response
time as low as at the beginning.

Just for comparison, I ran the same simulation on a Windows 7 notebook
machine but with PostgreSQL 9.1.2 (downloaded from EnterpriseDB's site, not
PostgreSQL Plus), and it did not show this problem even with 120 clients.
It's transaction times were surprisingly smooth and consistent. The client
code was the same in the 2 cases.
Actually I ran first the test on the Windows machine, and after that on the
better Debian. I expected that it would be even better there. Network
latency is quite minimal, because the clients and the database server run
on VMs on a server machine in the Linux case.

Here is some important config variables from the 8.4 (9.1.2 is configured
similarly):



ssl=false

shared_buffers=24MB (OS max currently, but should not be a problem because
9.1.2 performed quite well on Windows with 24 MB)

work_mem=1MB

maintainance_work_mem=16MB



fsync=on

sync_commit=on

wal_sync_method=fsync

full_page_writes=on

wal_buffers=1MB

commit_delay=0

checkpoint segments=8



effective_cache_size=256MB



vacuum: default
bgwriter: default


I suspected that due to the lot of update, the tables get bloated with dead
rows, but vacuum analyze verbose did not show that.
It seems that something cannot keep up with the load, but tps does not
change, just the response time gets higher.
Could you please help me with what can cause this kind of behaviour on
Linux?
What setting should I change perhaps?
Is there so much difference between 8.4 and 9.1, or is this something else?
Please tell me if any other info is needed.

Thanks in advance,
Otto


Re: [PERFORM] Different query plans on same servers

2011-12-06 Thread Mario Splivalo
On 12/06/2011 09:00 PM, Tom Lane wrote:
 Mario Splivalo mario.spliv...@megafon.hr writes:
 I have 8.4.8 on producion and 8.4.9 on test, could that explain the
 difference in plans chosen?
 
 I'd wonder first if you have the same statistics settings on both.
 The big problem here is that the estimation of the join size is bad
 (8588 versus 0).

They do, I guess. I did rsync postgres datadir from the prod server to
the test server. The only difference is that prod server was a bit more
loaded than the test server.

Mario

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Different query plans on same servers

2011-12-06 Thread Mario Splivalo
On 12/06/2011 09:17 PM, Kevin Grittner wrote:
  
 The hash join path must look more expensive on the first machine,
 for some reason.
  
 Mario, could you post the result of running this query from both
 servers?:
  
 http://wiki.postgresql.org/wiki/Server_Configuration

Sure. Here is from the prod server:

name |
  current_setting
-+
 version | PostgreSQL 8.4.8 on x86_64-pc-linux-gnu,
compiled by GCC gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2, 64-bit
 checkpoint_segments | 64
 default_statistics_target   | 2000
 effective_cache_size| 36GB
 external_pid_file   | /var/run/postgresql/8.4-main.pid
 lc_collate  | en_US.UTF-8
 lc_ctype| en_US.UTF-8
 listen_addresses| *
 log_autovacuum_min_duration | 0
 log_checkpoints | on
 log_line_prefix | %t [%p]: [%l-1] [%d]
 log_min_duration_statement  | 1s
 maintenance_work_mem| 256MB
 max_connections | 1500
 max_stack_depth | 3MB
 port| 5432
 server_encoding | UTF8
 shared_buffers  | 4GB
 statement_timeout   | 30min
 temp_buffers| 4096
 TimeZone| localtime
 track_activity_query_size   | 2048
 unix_socket_directory   | /var/run/postgresql
 wal_buffers | 128MB
 work_mem| 64MB


And here is from the test server:
name|
current_setting
+--
 version| PostgreSQL 8.4.9 on x86_64-pc-linux-gnu,
compiled by GCC gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit
 checkpoint_segments| 64
 default_statistics_target  | 2000
 effective_cache_size   | 36GB
 external_pid_file  | /var/run/postgresql/8.4-main.pid
 lc_collate | en_US.UTF-8
 lc_ctype   | en_US.UTF-8
 listen_addresses   | *
 log_connections| on
 log_disconnections | on
 log_line_prefix| %t [%p]: [%l-1] [%d]
 log_min_duration_statement | 0
 maintenance_work_mem   | 256MB
 max_connections| 40
 max_stack_depth| 3MB
 port   | 5432
 server_encoding| UTF8
 shared_buffers | 4GB
 ssl| on
 temp_buffers   | 4096
 TimeZone   | localtime
 unix_socket_directory  | /var/run/postgresql
 wal_buffers| 128MB
 work_mem   | 64MB
(24 rows)

At the time of doing 'explain analyze' on the prod server there were cca
80 connections on the server.

Mario

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Different query plans on same servers

2011-12-06 Thread Mario Splivalo
On 12/06/2011 09:29 PM, Kevin Grittner wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
  
 But both servers develop that estimate for the join size.
  
 [sigh]  Those *were* both from the production server.  Please show
 us the EXPLAIN ANALYZE from the other server.

Huh, right... missed that one. Here is the 'explain analyze' from the
other server:



QUERY PLAN

--
 Limit  (cost=31531.75..31531.80 rows=21 width=275) (actual
time=45.584..45.584 rows=0 loops=1)
   -  Sort  (cost=31531.75..31531.84 rows=36 width=275) (actual
time=45.579..45.579 rows=0 loops=1)
 Sort Key: tubesite_object.pub_date
 Sort Method:  quicksort  Memory: 25kB
 -  Hash Join  (cost=866.34..31530.82 rows=36 width=275)
(actual time=45.544..45.544 rows=0 loops=1)
   Hash Cond: (tubesite_object.id =
tubesite_image.object_ptr_id)
   -  Bitmap Heap Scan on tubesite_object
(cost=606.11..31146.68 rows=9884 width=271) (actual time=6.861..37.497
rows=9905 loops=1)
 Recheck Cond: (site_id = 8)
 -  Bitmap Index Scan on tubesite_object_site_id
(cost=0.00..603.64 rows=9884 width=0) (actual time=4.792..4.792
rows=9905 loops=1)
   Index Cond: (site_id = 8)
   -  Hash  (cost=152.88..152.88 rows=8588 width=4) (actual
time=3.816..3.816 rows=8588 loops=1)
 -  Seq Scan on tubesite_image  (cost=0.00..152.88
rows=8588 width=4) (actual time=0.003..1.740 rows=8588 loops=1)
 Total runtime: 45.798 ms




This is also a query from the prod server, but without LIMIT:


 Sort  (cost=31713.95..31735.42 rows=8588 width=275) (actual
time=60.311..60.311 rows=0 loops=1)
   Sort Key: tubesite_object.pub_date
   Sort Method:  quicksort  Memory: 25kB
   -  Hash Join  (cost=857.00..31152.80 rows=8588 width=275) (actual
time=60.255..60.255 rows=0 loops=1)
 Hash Cond: (tubesite_object.id = tubesite_image.object_ptr_id)
 -  Bitmap Heap Scan on tubesite_object  (cost=596.77..30685.30
rows=9711 width=271) (actual time=8.682..49.721 rows=9905 loops=1)
   Recheck Cond: (site_id = 8)
   -  Bitmap Index Scan on tubesite_object_site_id
(cost=0.00..594.34 rows=9711 width=0) (actual time=5.705..5.705
rows=9905 loops=1)
 Index Cond: (site_id = 8)
 -  Hash  (cost=152.88..152.88 rows=8588 width=4) (actual
time=4.281..4.281 rows=8588 loops=1)
   -  Seq Scan on tubesite_image  (cost=0.00..152.88
rows=8588 width=4) (actual time=0.005..1.437 rows=8588 loops=1)
 Total runtime: 60.483 ms
(12 rows)


I will try to rsync prod database to 8.4.8 on test server tomorrow, and
see what happens. Hopefully upgrade to 8.4.9 (or even 8.4.10 if Debian
packages is by tomorrow) will solve the issue...

Mario

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Response time increases over time

2011-12-06 Thread Josh Berkus
On 12/6/11 4:30 PM, Havasvölgyi Ottó wrote:
 Is there so much difference between 8.4 and 9.1, or is this something else?
 Please tell me if any other info is needed.

It is fairly likely that the difference you're seeing here is due to
improvements made in checkpointing and other operations made between 8.4
and 9.1.

Is there some reason you didn't test 9.1 on Linux to compare the two?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Question about VACUUM

2011-12-06 Thread Josh Berkus
On 12/5/11 1:36 PM, Kevin Grittner wrote:
 I understand the impulse to run autovacuum less frequently or less
 aggressively.  When we first started running PostgreSQL the default
 configuration was very cautious.

The default settings are deliberately cautious, as default settings
should be.

But yes, anyone with a really large/high-traffic database will often
want to make autovac more aggressive.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance