[PERFORM] Sudden drop in DBb performance
Dear list, we are encountering serious performance problems with our database. Queries which took around 100ms or less last week now take several seconds. The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on hardware as follows: 8-core Intel Xeon CPU with 2.83GHz 48 GB RAM RAID 5 with 8 SAS disks PostgreSQL 8.4.8 (installed from the Ubuntu repository). Additionally to the DB the machine also hosts a few virtual machines. In the past everything worked very well and the described problem occurs just out of the blue. We don't know of any postgresql config changes or anything else which might explain the performance reduction. We have a number of DBs running in the cluster, and the problem seems to affect all of them. We checked the performance of the RAID .. which is reasonable for eg. hdparm -tT. Memory is well used, but not swapping. vmstat shows, that the machine isn't using the swap and the load shouldn't be also to high: root@host:~# vmstat procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 0 308024 884812 4051293200 464 168 353 92 4 2 84 9 Bonnie++ results given below, I am no expert at interpreting those :-) Activating log_min_duration shows for instance this query --- there are now constantly queries which take absurdely long. 2011-09-02 22:38:18 CEST LOG: Dauer: 25520.374 ms Anweisung: SELECT keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten' db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten'; QUERY PLAN --- Index Scan using idx_table_x_keyword on table_x (cost=0.00..8.29 rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1) Index Cond: ((keyword)::text = 'diplomaten'::text) Total runtime: 0.087 ms (3 Zeilen) db=# \d keywords.table_x Tabelle »keywords.table_x« Spalte |Typ | Attribute +---+-- keyword_id | integer | not null Vorgabewert nextval('keywords.table_x_keyword_id_seq'::regclass) keyword| character varying | so | double precision | Indexe: table_x_pkey PRIMARY KEY, btree (keyword_id) CLUSTER idx_table_x_keyword btree (keyword) Fremdschlüsselverweise von: TABLE keywords.table_x_has CONSTRAINT table_x_has_keyword_id_fkey FOREIGN KEY (keyword_id) REFERENCES keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE Could you be so kind and give us any advice how to track down the problem or comment on possible reasons??? Thank you very much in advance!!! Regards, heinz + gerhard name | current_setting +- version| PostgreSQL 8.4.8 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit archive_command| /usr/local/sbin/weblyzard-wal-archiver.sh %p %f archive_mode | on checkpoint_segments| 192 effective_cache_size | 25000MB external_pid_file | /var/run/postgresql/8.4-main.pid full_page_writes | on geqo | on lc_collate | de_AT.UTF-8 lc_ctype | de_AT.UTF-8 listen_addresses | * log_line_prefix| %t log_min_duration_statement | 3s maintenance_work_mem | 500MB max_connections| 250 max_stack_depth| 2MB port | 5432 server_encoding| UTF8 shared_buffers | 7000MB ssl| on TimeZone | localtime unix_socket_directory | /var/run/postgresql work_mem | 256MB Results of Bonnie++ Version 1.96 --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP voyager 95G 1400 93 27804 3 16324 2 2925 96 41636 3 374.9 4 Latency 7576us 233s 164s 15647us 13120ms 3302ms Version 1.96 --Sequential Create-- Random Create voyager -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP
[PERFORM] Embedded VACUUM
I'm running a labour-intensive series of queries on a medium-sized dataset (~100,000 rows) with geometry objects and both gist and btree indices. The queries are embedded in plpgsql, and have multiple updates, inserts and deletes to the tables as well as multiple selects which require the indices to function correctly for any kind of performance. My problem is that I can't embed a vacuum analyze to reset the indices and speed up processing, and the queries get slower and slower as the un-freed space builds up. From my understanding, transaction commits within batches are not allowed (so no vacuum embedded within queries). Are there plans to change this? Is there a way to reclaim dead space for tables that have repeated inserts, updates and deletes on them? I have tried a simple analyze, and this does not quite cut it. I'm getting seq-scans after the first round of processing instead of hitting the index correctly. My apologies if this is directed at the wrong forum, and thank you for your help. -cris pond -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sudden drop in DBb performance
Hi. Autoexplain module allow to log plans and statistics of live queries. Try it. 2011/9/3, Gerhard Wohlgenannt wo...@ai.wu.ac.at: Dear list, we are encountering serious performance problems with our database. Queries which took around 100ms or less last week now take several seconds. The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on hardware as follows: 8-core Intel Xeon CPU with 2.83GHz 48 GB RAM RAID 5 with 8 SAS disks PostgreSQL 8.4.8 (installed from the Ubuntu repository). Additionally to the DB the machine also hosts a few virtual machines. In the past everything worked very well and the described problem occurs just out of the blue. We don't know of any postgresql config changes or anything else which might explain the performance reduction. We have a number of DBs running in the cluster, and the problem seems to affect all of them. We checked the performance of the RAID .. which is reasonable for eg. hdparm -tT. Memory is well used, but not swapping. vmstat shows, that the machine isn't using the swap and the load shouldn't be also to high: root@host:~# vmstat procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 0 308024 884812 4051293200 464 168 353 92 4 2 84 9 Bonnie++ results given below, I am no expert at interpreting those :-) Activating log_min_duration shows for instance this query --- there are now constantly queries which take absurdely long. 2011-09-02 22:38:18 CEST LOG: Dauer: 25520.374 ms Anweisung: SELECT keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten' db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten'; QUERY PLAN --- Index Scan using idx_table_x_keyword on table_x (cost=0.00..8.29 rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1) Index Cond: ((keyword)::text = 'diplomaten'::text) Total runtime: 0.087 ms (3 Zeilen) db=# \d keywords.table_x Tabelle »keywords.table_x« Spalte |Typ | Attribute +---+-- keyword_id | integer | not null Vorgabewert nextval('keywords.table_x_keyword_id_seq'::regclass) keyword| character varying | so | double precision | Indexe: table_x_pkey PRIMARY KEY, btree (keyword_id) CLUSTER idx_table_x_keyword btree (keyword) Fremdschlüsselverweise von: TABLE keywords.table_x_has CONSTRAINT table_x_has_keyword_id_fkey FOREIGN KEY (keyword_id) REFERENCES keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE Could you be so kind and give us any advice how to track down the problem or comment on possible reasons??? Thank you very much in advance!!! Regards, heinz + gerhard name | current_setting +- version| PostgreSQL 8.4.8 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit archive_command| /usr/local/sbin/weblyzard-wal-archiver.sh %p %f archive_mode | on checkpoint_segments| 192 effective_cache_size | 25000MB external_pid_file | /var/run/postgresql/8.4-main.pid full_page_writes | on geqo | on lc_collate | de_AT.UTF-8 lc_ctype | de_AT.UTF-8 listen_addresses | * log_line_prefix| %t log_min_duration_statement | 3s maintenance_work_mem | 500MB max_connections| 250 max_stack_depth| 2MB port | 5432 server_encoding| UTF8 shared_buffers | 7000MB ssl| on TimeZone | localtime unix_socket_directory | /var/run/postgresql work_mem | 256MB Results of Bonnie++ Version 1.96 --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP voyager 95G 1400 93 27804 3 16324 2 2925 96 41636 3 374.9 4 Latency 7576us 233s 164s 15647us 13120ms 3302ms Version 1.96 --Sequential Create-- Random
Re: [PERFORM] Sudden drop in DBb performance
On 3 Září 2011, 9:26, Gerhard Wohlgenannt wrote: Dear list, we are encountering serious performance problems with our database. Queries which took around 100ms or less last week now take several seconds. The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on hardware as follows: 8-core Intel Xeon CPU with 2.83GHz 48 GB RAM RAID 5 with 8 SAS disks PostgreSQL 8.4.8 (installed from the Ubuntu repository). Additionally to the DB the machine also hosts a few virtual machines. In the past everything worked very well and the described problem occurs just out of the blue. We don't know of any postgresql config changes or anything else which might explain the performance reduction. We have a number of DBs running in the cluster, and the problem seems to affect all of them. What are the virtual machines doing? Are you sure they are not doing a lot of IO? We checked the performance of the RAID .. which is reasonable for eg. hdparm -tT. Memory is well used, but not swapping. vmstat shows, that the machine isn't using the swap and the load shouldn't be also to high: root@host:~# vmstat procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 0 308024 884812 4051293200 464 168 353 92 4 2 84 9 Bonnie++ results given below, I am no expert at interpreting those :-) Activating log_min_duration shows for instance this query --- there are now constantly queries which take absurdely long. 2011-09-02 22:38:18 CEST LOG: Dauer: 25520.374 ms Anweisung: SELECT keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten' db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten'; QUERY PLAN --- Index Scan using idx_table_x_keyword on table_x (cost=0.00..8.29 rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1) Index Cond: ((keyword)::text = 'diplomaten'::text) Total runtime: 0.087 ms (3 Zeilen) db=# \d keywords.table_x Tabelle »keywords.table_x« Spalte |Typ | Attribute +---+-- keyword_id | integer | not null Vorgabewert nextval('keywords.table_x_keyword_id_seq'::regclass) keyword| character varying | so | double precision | Indexe: table_x_pkey PRIMARY KEY, btree (keyword_id) CLUSTER idx_table_x_keyword btree (keyword) Fremdschlüsselverweise von: TABLE keywords.table_x_has CONSTRAINT table_x_has_keyword_id_fkey FOREIGN KEY (keyword_id) REFERENCES keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE But in this explain analyze, the query finished in 41 ms. Use auto-explain contrib module to see the explain plan of the slow execution. Could you be so kind and give us any advice how to track down the problem or comment on possible reasons??? One of the things Thank you very much in advance!!! Regards, heinz + gerhard name | current_setting +- version| PostgreSQL 8.4.8 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit archive_command| /usr/local/sbin/weblyzard-wal-archiver.sh %p %f archive_mode | on checkpoint_segments| 192 effective_cache_size | 25000MB external_pid_file | /var/run/postgresql/8.4-main.pid full_page_writes | on geqo | on lc_collate | de_AT.UTF-8 lc_ctype | de_AT.UTF-8 listen_addresses | * log_line_prefix| %t log_min_duration_statement | 3s maintenance_work_mem | 500MB max_connections| 250 max_stack_depth| 2MB port | 5432 server_encoding| UTF8 shared_buffers | 7000MB ssl| on TimeZone | localtime unix_socket_directory | /var/run/postgresql work_mem | 256MB Results of Bonnie++ Version 1.96 --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
Re: [PERFORM] Slow performance
Hi all, Thanks for the replies and sorry for the late response, I have been away for a few days. Concerning the performance: 1 ms per row seems slow knowing that the entire database is less then 64MB and therefore should easily fit into memory and the client (pgAdmin III) runs on the server. I am going to test another database to check the performance of the hardware. -Kai -Original Message- From: Kevin Grittner [mailto:kevin.gritt...@wicourts.gov] Sent: Wednesday, August 31, 2011 8:59 PM To: Kai Otto; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Slow performance Kai Otto ko...@medis.nl wrote: Time taken: 35.833 ms (i.e. roughly 35 seconds) Which is it? 35 ms or 35 seconds? Number of rows: 121830 Number of columns: 38 This is extremely slow for a database server. Can anyone help me in finding the problem? Seq Scan on Frame (cost=0.00..9537.30 rows=121830 width=541) (actual time=0.047..93.318 rows=121830 loops=1) Total runtime: 100.686 ms Assuming 35 seconds for the 121 K rows, it would seem that you're taking less than 1 ms per row on the database server, which may not be too bad, depending on how many of them are read from disk. The rest of the time would seem to be in the network and the client. That's where you need to fix something if you want it to be faster. With only a fraction of 1% of the run time being on the database server, any attempt to tune things there can't improve performance by more than that fraction of a percent. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Embedded VACUUM
On 3/09/2011 8:25 AM, C Pond wrote: I'm running a labour-intensive series of queries on a medium-sized dataset (~100,000 rows) with geometry objects and both gist and btree indices. The queries are embedded in plpgsql, and have multiple updates, inserts and deletes to the tables as well as multiple selects which require the indices to function correctly for any kind of performance. My problem is that I can't embed a vacuum analyze to reset the indices and speed up processing, and the queries get slower and slower as the un-freed space builds up. From my understanding, transaction commits within batches are not allowed (so no vacuum embedded within queries). Are there plans to change this? Is there a way to reclaim dead space for tables that have repeated inserts, updates and deletes on them? Not, AFAIK, until the transaction doing the deletes/updates commits and so do any older SERIALIZABLE transactions as well as any older running READ COMMITTED statements. This is one of the areas where Pg's lack of true stored procedures bites you. You'll need to do the work via an out-of-process helper over a regular connection, or do your work via dblink to achieve the same effect. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Rather large LA
Hi, I have a database server that's part of a web stack and is experiencing prolonged load average spikes of up to 400+ when the db is restarted and first accessed by the other parts of the stack and has generally poor performance on even simple select queries. There are 30 DBs in total on the server coming in at 226GB. The one that's used the most is 67GB and there are another 29 that come to 159GB. I'd really appreciate it if you could review my configurations below and make any suggestions that might help alleviate the performance issues. I've been looking more into the shared buffers to the point of installing the contrib module to check what they're doing, possibly installing more RAM as the most used db @ 67GB might appreciate it, or moving the most used DB onto another set of disks, possible SSD. PostgreSQL 9.0.4 Pgbouncer 1.4.1 Linux 2.6.18-238.9.1.el5 #1 SMP Tue Apr 12 18:10:13 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux CentOS release 5.6 (Final) 4 x Intel Xeon-NehalemEX E7540-HexCore [2GHz] ( 24 physical cores ) 32GB DDR3 RAM 1 x Adaptec 5805 Z SATA/SAS RAID with battery backup 4 x Seagate Cheetah ST3300657SS 300GB 15RPM SAS drives in RAID 10 1 x 500GB 7200RPM SATA disk Postgres and the OS reside on the same ex3 filesystem, whilst query and archive logging go onto the SATA disk which is also ext3. name | current_setting +--- version| PostgreSQL 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit archive_command| tar jcf /disk1/db-wal/%f.tar.bz2 %p archive_mode | on autovacuum | off checkpoint_completion_target | 0.9 checkpoint_segments| 10 client_min_messages| notice effective_cache_size | 17192MB external_pid_file | /var/run/postgresql/9-main.pid fsync | off full_page_writes | on lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 listen_addresses | log_checkpoints| on log_destination| stderr log_directory | /disk1/pg_log log_error_verbosity| verbose log_filename | postgresql-%Y-%m-%d_%H%M%S.log log_line_prefix| %m %u %h log_min_duration_statement | 250ms log_min_error_statement| error log_min_messages | notice log_rotation_age | 1d logging_collector | on maintenance_work_mem | 32MB max_connections| 1000 max_prepared_transactions | 25 max_stack_depth| 4MB port | 6432 server_encoding| UTF8 shared_buffers | 8GB superuser_reserved_connections | 3 synchronous_commit | on temp_buffers | 5120 TimeZone | UTC unix_socket_directory | /var/run/postgresql wal_buffers| 10MB wal_level | archive wal_sync_method| fsync work_mem | 16MB Pgbouncer config [databases] * = port=6432 [pgbouncer] user=postgres pidfile = /tmp/pgbouncer.pid listen_addr = listen_port = 5432 unix_socket_dir = /var/run/postgresql auth_type = trust auth_file = /etc/pgbouncer/userlist.txt admin_users = postgres stats_users = postgres pool_mode = session server_reset_query = DISCARD ALL; server_check_query = select 1 server_check_delay = 10 server_idle_timeout = 5 server_lifetime = 0 max_client_conn = 4096 default_pool_size = 100 log_connections = 1 log_disconnections = 1 log_pooler_errors = 1 client_idle_timeout = 30 reserve_pool_size = 800 Thanks in advance Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Rather large LA
On 5/09/2011 6:28 PM, Richard Shaw wrote: max_connections| 1000 Woah! No wonder you have stampeding herd problems after a DB or server restart and are having performance issues. When you have 1000 clients trying to do work at once, they'll all be fighting over memory, disk I/O bandwidth, and CPU power which is nowhere near sufficient to allow them to all actually achieve something all at once. You'll have a lot of overhead as the OS tries to be fair and allow each to make progress - at the expense of overall throughput. If most of those connections are idle most of the time - say, they're peristent connections from some webapp that requrires one connection per webserver thread - then the situation isn't so bad. They're still costing you backend RAM and various housekeeping overhead (including task switching) related to lock management and shared memory, though. Consider using a connection pooler like PgPool-II or PgBouncer if your application is suitable. Most apps will be quite happy using pooled connections; only a few things like advisory locking and HOLD cursors work poorly with pooled connections. Using a pool allows you to reduce the number of actively working and busy connections to the real Pg backend to something your hardware can cope with, which should dramatically increase performance and reduce startup load spikes. The general very rough rule of thumb for number of active connections is number of CPU cores + number of HDDs but of course this is only incredibly rough and depends a lot on your workload and DB. Ideally PostgreSQL would take care of this pooling inside the server, breaking the one connection = one worker backend equivalence. Unfortunately the server's process-based design makes that harder than it could be. There's also a lot of debate about whether pooling is even the core DB server's job and if it is, which of the several possible approaches is the most appropriate. Then there's the issue of whether in-server connection pooling is even appropriate without admission control - which brings up the admission control is insanely hard problem. So for now, pooling lives outside the server in projects like PgPool-II and PgBouncer. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Rather large LA
Hi Craig, Apologies, I should have made that clearer, I am using PgBouncer 1.4.1 in front of Postgres and included the config at the bottom of my original mail Regards Richard . On 5 Sep 2011, at 11:49, Craig Ringer wrote: On 5/09/2011 6:28 PM, Richard Shaw wrote: max_connections| 1000 Woah! No wonder you have stampeding herd problems after a DB or server restart and are having performance issues. When you have 1000 clients trying to do work at once, they'll all be fighting over memory, disk I/O bandwidth, and CPU power which is nowhere near sufficient to allow them to all actually achieve something all at once. You'll have a lot of overhead as the OS tries to be fair and allow each to make progress - at the expense of overall throughput. If most of those connections are idle most of the time - say, they're peristent connections from some webapp that requrires one connection per webserver thread - then the situation isn't so bad. They're still costing you backend RAM and various housekeeping overhead (including task switching) related to lock management and shared memory, though. Consider using a connection pooler like PgPool-II or PgBouncer if your application is suitable. Most apps will be quite happy using pooled connections; only a few things like advisory locking and HOLD cursors work poorly with pooled connections. Using a pool allows you to reduce the number of actively working and busy connections to the real Pg backend to something your hardware can cope with, which should dramatically increase performance and reduce startup load spikes. The general very rough rule of thumb for number of active connections is number of CPU cores + number of HDDs but of course this is only incredibly rough and depends a lot on your workload and DB. Ideally PostgreSQL would take care of this pooling inside the server, breaking the one connection = one worker backend equivalence. Unfortunately the server's process-based design makes that harder than it could be. There's also a lot of debate about whether pooling is even the core DB server's job and if it is, which of the several possible approaches is the most appropriate. Then there's the issue of whether in-server connection pooling is even appropriate without admission control - which brings up the admission control is insanely hard problem. So for now, pooling lives outside the server in projects like PgPool-II and PgBouncer. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Rather large LA
On 5/09/2011 6:55 PM, Richard Shaw wrote: Hi Craig, Apologies, I should have made that clearer, I am using PgBouncer 1.4.1 in front of Postgres and included the config at the bottom of my original mail Ah, I see. The point still stands: your hardware can *not* efficiently do work for 1000 concurrent backend workers. Reduce the maximum number of workers by setting a lower cap on the pool size and a lower max_connections. This won't work (you'll run out of pooler connections) unless you also set PgBouncer to transaction pooling mode instead of the default session pooling mode, which you're currently using. It is *important* to read the documentation on this before doing it, as there are implications for apps that use extra-transactional features like HOLD cursors and advisory locks. See: http://pgbouncer.projects.postgresql.org/doc/usage.html It may also be necessary to set PgBouncer to block (wait) rather than report an error when there is no pooled connection available to start a new transaction on. I'm not sure what PgBouncer's default behavior for that is and didn't see anything immediately clear in the pgbouncer(5) ini file man page. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Rather large LA
On 09/05/2011 05:28 AM, Richard Shaw wrote: Hi, I have a database server that's part of a web stack and is experiencing prolonged load average spikes of up to 400+ when the db is restarted and first accessed by the other parts of the stack and has generally poor performance on even simple select queries. Is the slowness new? Or has it always been a bit slow? Have you checked for bloat on your tables/indexes? When you start up, does it peg a cpu or sit around doing IO? Have you reviewed the server logs? autovacuum | off Why? I assume that's a problem. fsync | off Seriously? -Andy There are 30 DBs in total on the server coming in at 226GB. The one that's used the most is 67GB and there are another 29 that come to 159GB. I'd really appreciate it if you could review my configurations below and make any suggestions that might help alleviate the performance issues. I've been looking more into the shared buffers to the point of installing the contrib module to check what they're doing, possibly installing more RAM as the most used db @ 67GB might appreciate it, or moving the most used DB onto another set of disks, possible SSD. PostgreSQL 9.0.4 Pgbouncer 1.4.1 Linux 2.6.18-238.9.1.el5 #1 SMP Tue Apr 12 18:10:13 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux CentOS release 5.6 (Final) 4 x Intel Xeon-NehalemEX E7540-HexCore [2GHz] ( 24 physical cores ) 32GB DDR3 RAM 1 x Adaptec 5805 Z SATA/SAS RAID with battery backup 4 x Seagate Cheetah ST3300657SS 300GB 15RPM SAS drives in RAID 10 1 x 500GB 7200RPM SATA disk Postgres and the OS reside on the same ex3 filesystem, whilst query and archive logging go onto the SATA disk which is also ext3. name | current_setting +--- version| PostgreSQL 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit archive_command| tar jcf /disk1/db-wal/%f.tar.bz2 %p archive_mode | on autovacuum | off checkpoint_completion_target | 0.9 checkpoint_segments| 10 client_min_messages| notice effective_cache_size | 17192MB external_pid_file | /var/run/postgresql/9-main.pid fsync | off full_page_writes | on lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 listen_addresses | log_checkpoints| on log_destination| stderr log_directory | /disk1/pg_log log_error_verbosity| verbose log_filename | postgresql-%Y-%m-%d_%H%M%S.log log_line_prefix| %m %u %h log_min_duration_statement | 250ms log_min_error_statement| error log_min_messages | notice log_rotation_age | 1d logging_collector | on maintenance_work_mem | 32MB max_connections| 1000 max_prepared_transactions | 25 max_stack_depth| 4MB port | 6432 server_encoding| UTF8 shared_buffers | 8GB superuser_reserved_connections | 3 synchronous_commit | on temp_buffers | 5120 TimeZone | UTC unix_socket_directory | /var/run/postgresql wal_buffers| 10MB wal_level | archive wal_sync_method| fsync work_mem | 16MB Pgbouncer config [databases] * = port=6432 [pgbouncer] user=postgres pidfile = /tmp/pgbouncer.pid listen_addr = listen_port = 5432 unix_socket_dir = /var/run/postgresql auth_type = trust auth_file = /etc/pgbouncer/userlist.txt admin_users = postgres stats_users = postgres pool_mode = session server_reset_query = DISCARD ALL; server_check_query = select 1 server_check_delay = 10 server_idle_timeout = 5 server_lifetime = 0 max_client_conn = 4096 default_pool_size = 100 log_connections = 1 log_disconnections = 1 log_pooler_errors = 1 client_idle_timeout = 30 reserve_pool_size = 800 Thanks in advance Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sudden drop in DBb performance
On 09/05/2011 02:48 AM, Tomas Vondra wrote: On 3 Září 2011, 9:26, Gerhard Wohlgenannt wrote: Dear list, we are encountering serious performance problems with our database. Queries which took around 100ms or less last week now take several seconds. Results of Bonnie++ Version 1.96 --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP voyager 95G 1400 93 27804 3 16324 2 2925 96 41636 3 374.9 4 Latency 7576us 233s 164s 15647us 13120ms 3302ms Version 1.96 --Sequential Create-- Random Create voyager -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 141 0 + +++ 146 0 157 0 + +++ 172 0 Latency 1020ms 128us9148ms 598ms 37us 485ms That seems a bit slow ... 27MB/s for writes and 41MB/s forreads is ait slow with 8 drives. Tomas Agreed, that's really slow. A single SATA drive will get 60 MB/s. Did you run Bonnie while the VM's were up and running? root@host:~# vmstat procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 0 308024 884812 4051293200 464 168 353 92 4 2 84 9 Only one line? That does not help much. Can you run it as 'vmstat 2' and let it run while a few slow queries are performed? Then paste all the lines? -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Rather large LA
Hi Andy, It's not a new issue no, It's a legacy system that is in no way ideal but is also not in a position to be overhauled. Indexes are correct, tables are up to 25 million rows. On startup, it hits CPU more than IO, I'll provide some additional stats after I restart it tonight. Server logs have been reviewed and where possible, slow queries have been fixed. Autovacuum has been disabled and set to run manually via cron during a quiet period and fsync has recently been turned off to gauge any real world performance increase, there is battery backup on the raid card providing some level of resilience. Thanks Richard On 5 Sep 2011, at 14:39, Andy Colson wrote: On 09/05/2011 05:28 AM, Richard Shaw wrote: Hi, I have a database server that's part of a web stack and is experiencing prolonged load average spikes of up to 400+ when the db is restarted and first accessed by the other parts of the stack and has generally poor performance on even simple select queries. Is the slowness new? Or has it always been a bit slow? Have you checked for bloat on your tables/indexes? When you start up, does it peg a cpu or sit around doing IO? Have you reviewed the server logs? autovacuum | off Why? I assume that's a problem. fsync | off Seriously? -Andy There are 30 DBs in total on the server coming in at 226GB. The one that's used the most is 67GB and there are another 29 that come to 159GB. I'd really appreciate it if you could review my configurations below and make any suggestions that might help alleviate the performance issues. I've been looking more into the shared buffers to the point of installing the contrib module to check what they're doing, possibly installing more RAM as the most used db @ 67GB might appreciate it, or moving the most used DB onto another set of disks, possible SSD. PostgreSQL 9.0.4 Pgbouncer 1.4.1 Linux 2.6.18-238.9.1.el5 #1 SMP Tue Apr 12 18:10:13 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux CentOS release 5.6 (Final) 4 x Intel Xeon-NehalemEX E7540-HexCore [2GHz] ( 24 physical cores ) 32GB DDR3 RAM 1 x Adaptec 5805 Z SATA/SAS RAID with battery backup 4 x Seagate Cheetah ST3300657SS 300GB 15RPM SAS drives in RAID 10 1 x 500GB 7200RPM SATA disk Postgres and the OS reside on the same ex3 filesystem, whilst query and archive logging go onto the SATA disk which is also ext3. name | current_setting +--- version| PostgreSQL 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit archive_command| tar jcf /disk1/db-wal/%f.tar.bz2 %p archive_mode | on autovacuum | off checkpoint_completion_target | 0.9 checkpoint_segments| 10 client_min_messages| notice effective_cache_size | 17192MB external_pid_file | /var/run/postgresql/9-main.pid fsync | off full_page_writes | on lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 listen_addresses | log_checkpoints| on log_destination| stderr log_directory | /disk1/pg_log log_error_verbosity| verbose log_filename | postgresql-%Y-%m-%d_%H%M%S.log log_line_prefix| %m %u %h log_min_duration_statement | 250ms log_min_error_statement| error log_min_messages | notice log_rotation_age | 1d logging_collector | on maintenance_work_mem | 32MB max_connections| 1000 max_prepared_transactions | 25 max_stack_depth| 4MB port | 6432 server_encoding| UTF8 shared_buffers | 8GB superuser_reserved_connections | 3 synchronous_commit | on temp_buffers | 5120 TimeZone | UTC unix_socket_directory | /var/run/postgresql wal_buffers| 10MB wal_level | archive wal_sync_method| fsync work_mem | 16MB Pgbouncer config [databases] * = port=6432 [pgbouncer] user=postgres pidfile = /tmp/pgbouncer.pid listen_addr = listen_port = 5432 unix_socket_dir = /var/run/postgresql auth_type = trust auth_file = /etc/pgbouncer/userlist.txt admin_users = postgres stats_users = postgres pool_mode = session server_reset_query = DISCARD ALL; server_check_query = select 1
Re: [PERFORM] Sudden drop in DBb performance
On 5 Září 2011, 15:51, Andy Colson wrote: On 09/05/2011 02:48 AM, Tomas Vondra wrote: That seems a bit slow ... 27MB/s for writes and 41MB/s forreads is ait slow with 8 drives. Tomas Agreed, that's really slow. A single SATA drive will get 60 MB/s. Did you run Bonnie while the VM's were up and running? root@host:~# vmstat procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 0 308024 884812 4051293200 464 168 353 92 4 2 84 9 Only one line? That does not help much. Can you run it as 'vmstat 2' and let it run while a few slow queries are performed? Then paste all the lines? And maybe a few lines from iostat -x 2 too. BTW what kind of raid is it? Is it hw or sw based? Have you checked health of the drives? Are you sure there's nothing else using the drives (e.g. one of the VMs, rebuild of the array or something like that)? Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sudden drop in DBb performance
On 5 Září 2011, 16:08, Gerhard Wohlgenannt wrote: Below please find the results of vmstat 2 over some periode of time .. with normal database / system load. What does a normal load mean? Does that mean a time when the queries are slow? Are you sure the machine really has 48GB of RAM? Because from the vmstat output it seems like there's just 32GB. procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 1342168 336936 107636 313535120014 183 1911 3426 2 1 93 4 1342168 + 336936 + 107636 + 31353512 = 33140252 ~ 31GB BTW there's 1.3GB of swap, although it's not used heavily (according to the vmstat output). Otherwise I don't see anything wrong in the output. What is the size of the database (use pg_database_size to get it)? Did it grow significantly recently? Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Rather large LA
I think that wal_segments are too low, try 30. 2011/9/5, Andy Colson a...@squeakycode.net: On 09/05/2011 05:28 AM, Richard Shaw wrote: Hi, I have a database server that's part of a web stack and is experiencing prolonged load average spikes of up to 400+ when the db is restarted and first accessed by the other parts of the stack and has generally poor performance on even simple select queries. Is the slowness new? Or has it always been a bit slow? Have you checked for bloat on your tables/indexes? When you start up, does it peg a cpu or sit around doing IO? Have you reviewed the server logs? autovacuum | off Why? I assume that's a problem. fsync | off Seriously? -Andy There are 30 DBs in total on the server coming in at 226GB. The one that's used the most is 67GB and there are another 29 that come to 159GB. I'd really appreciate it if you could review my configurations below and make any suggestions that might help alleviate the performance issues. I've been looking more into the shared buffers to the point of installing the contrib module to check what they're doing, possibly installing more RAM as the most used db @ 67GB might appreciate it, or moving the most used DB onto another set of disks, possible SSD. PostgreSQL 9.0.4 Pgbouncer 1.4.1 Linux 2.6.18-238.9.1.el5 #1 SMP Tue Apr 12 18:10:13 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux CentOS release 5.6 (Final) 4 x Intel Xeon-NehalemEX E7540-HexCore [2GHz] ( 24 physical cores ) 32GB DDR3 RAM 1 x Adaptec 5805 Z SATA/SAS RAID with battery backup 4 x Seagate Cheetah ST3300657SS 300GB 15RPM SAS drives in RAID 10 1 x 500GB 7200RPM SATA disk Postgres and the OS reside on the same ex3 filesystem, whilst query and archive logging go onto the SATA disk which is also ext3. name | current_setting +--- version| PostgreSQL 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit archive_command| tar jcf /disk1/db-wal/%f.tar.bz2 %p archive_mode | on autovacuum | off checkpoint_completion_target | 0.9 checkpoint_segments| 10 client_min_messages| notice effective_cache_size | 17192MB external_pid_file | /var/run/postgresql/9-main.pid fsync | off full_page_writes | on lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 listen_addresses | log_checkpoints| on log_destination| stderr log_directory | /disk1/pg_log log_error_verbosity| verbose log_filename | postgresql-%Y-%m-%d_%H%M%S.log log_line_prefix| %m %u %h log_min_duration_statement | 250ms log_min_error_statement| error log_min_messages | notice log_rotation_age | 1d logging_collector | on maintenance_work_mem | 32MB max_connections| 1000 max_prepared_transactions | 25 max_stack_depth| 4MB port | 6432 server_encoding| UTF8 shared_buffers | 8GB superuser_reserved_connections | 3 synchronous_commit | on temp_buffers | 5120 TimeZone | UTC unix_socket_directory | /var/run/postgresql wal_buffers| 10MB wal_level | archive wal_sync_method| fsync work_mem | 16MB Pgbouncer config [databases] * = port=6432 [pgbouncer] user=postgres pidfile = /tmp/pgbouncer.pid listen_addr = listen_port = 5432 unix_socket_dir = /var/run/postgresql auth_type = trust auth_file = /etc/pgbouncer/userlist.txt admin_users = postgres stats_users = postgres pool_mode = session server_reset_query = DISCARD ALL; server_check_query = select 1 server_check_delay = 10 server_idle_timeout = 5 server_lifetime = 0 max_client_conn = 4096 default_pool_size = 100 log_connections = 1 log_disconnections = 1 log_pooler_errors = 1 client_idle_timeout = 30 reserve_pool_size = 800 Thanks in advance Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- pasman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sudden drop in DBb performance
On 09/05/2011 09:39 AM, Tomas Vondra wrote: On 5 Září 2011, 16:08, Gerhard Wohlgenannt wrote: Below please find the results of vmstat 2 over some periode of time .. with normal database / system load. What does a normal load mean? Does that mean a time when the queries are slow? Are you sure the machine really has 48GB of RAM? Because from the vmstat output it seems like there's just 32GB. procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 1342168 336936 107636 313535120014 183 1911 3426 2 1 93 4 1342168 + 336936 + 107636 + 31353512 = 33140252 ~ 31GB BTW there's 1.3GB of swap, although it's not used heavily (according to the vmstat output). Otherwise I don't see anything wrong in the output. What is the size of the database (use pg_database_size to get it)? Did it grow significantly recently? Tomas Yeah, its interesting that it swapped in memory, but never out. Looking at this vmstat, it does not look like a hardware problem.(Assuming normal load means slow queries) Did it grow significantly recently? That's a good thought, maybe the stats are old and you have bad plans? It could also be major updates to the data too (as opposed to growth). Gerhard, have you done an 'explain analyze' on any of your slow queries? Have you done an analyze lately? -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sudden drop in DBb performance
On 09/05/2011 03:51 PM, Andy Colson wrote: On 09/05/2011 02:48 AM, Tomas Vondra wrote: On 3 Září 2011, 9:26, Gerhard Wohlgenannt wrote: Dear list, we are encountering serious performance problems with our database. Queries which took around 100ms or less last week now take several seconds. Results of Bonnie++ Version 1.96 --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP voyager 95G 1400 93 27804 3 16324 2 2925 96 41636 3 374.9 4 Latency 7576us 233s 164s 15647us 13120ms 3302ms Version 1.96 --Sequential Create-- Random Create voyager -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 141 0 + +++ 146 0 157 0 + +++ 172 0 Latency 1020ms 128us9148ms 598ms 37us 485ms That seems a bit slow ... 27MB/s for writes and 41MB/s forreads is ait slow with 8 drives. Tomas Agreed, that's really slow. A single SATA drive will get 60 MB/s. Did you run Bonnie while the VM's were up and running? root@host:~# vmstat procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 0 308024 884812 4051293200 464 168 353 92 4 2 84 9 Only one line? That does not help much. Can you run it as 'vmstat 2' and let it run while a few slow queries are performed? Then paste all the lines? -Andy Hi Andy, thanks a lot for your help. Below please find the results of vmstat 2 over some periode of time .. with normal database / system load. 0 0 1344332 237196 104140 3146841200 330 102 4322 7130 4 2 90 4 1 1 1344332 236708 104144 3146900000 322 105 2096 3723 1 2 92 5 2 1 1344204 240924 104156 31462484 3500 1906 234 3687 4512 12 3 77 9 0 0 1344200 238372 104168 3146245200 8 109 4050 8376 8 3 86 3 0 0 1344200 232668 104168 314624680012 158 2036 3633 2 2 92 3 0 3 1344196 282784 104180 3141338440 1768 343 2490 4391 1 2 84 13 1 1 1344196 278188 104192 3141608000 1392 341 2215 3850 1 2 82 15 0 0 1344120 276964 104608 31416904 900 634 304 2390 3949 4 2 86 8 1 1 1344120 277096 104628 3141775200 492 378 2394 3866 2 1 87 10 0 1 1344120 274476 104628 3141862000 260 233 1997 3255 2 1 91 6 1 1 1344120 276584 104628 3141880800 128 208 2015 3266 2 1 91 6 0 0 1343672 272352 106288 31418788 6940 1346 344 2170 3660 3 1 89 6 0 1 1343632 270220 107648 31419152 480 468 490 2356 3622 4 2 88 5 0 0 1343624 270708 107660 31419344 200 228 138 2086 3518 2 3 91 4 0 1 1343612 268732 107660 31419584 120 168 112 2100 3585 3 2 91 3 0 0 1343544 266616 107660 31420112 140 15473 2059 3719 3 2 93 3 0 1 1343540 267368 107684 314201680078 260 2256 3970 3 2 90 6 0 1 1343540 268352 107692 314203560094 284 2239 4086 2 2 89 6 0 0 1343540 274064 107692 3142358400 1622 301 2322 4258 2 3 83 13 0 2 1343440 273064 107704 31423696 960 106 180 2158 3795 3 2 90 5 0 0 1342184 262888 107708 31426040 8400 2014 146 2309 3713 5 3 83 9 0 0 1342184 261904 107732 314261280060 158 1893 3510 1 3 91 5 2 0 1342184 258680 107732 3142743600 794 114 2160 3647 2 3 90 5 0 2 1342176 258184 107744 31428308 240 310 116 1943 3335 2 2 91 4 1 0 1342172 259068 107756 3142870020 138 143 1976 3468 1 1 93 5 0 0 1342172 258084 107756 3142994800 62088 2117 3565 3 1 90 6 0 0 1342172 258456 107952 314300280062 305 2174 3827 1 2 91 6 1 0 1342172 257480 107952 3143063600 300 256 2316 3959 3 2 86 8 0 0 1342172 257720 107952 314307720046 133 2411 4047 3 2 91 3 1 2 1342172 257844 107976 3143077600 136 184 2111 3841 1 1 92 6 1 2 1342172 338376 107576 3134941200 462 8615 2655 5508 5 3 79 13 1 2 1342172 340772 107580 3135108000 682 377 2503 4022 2 1 87 10 1 2 1342172 335688 107596 3135199200 548 306 2480 3867 4 1 86 9 0 2 1342168 337432 107608 3135270400 224 188 1919 3158 1 1 93 6 0 0 1342168 337804 107608 3135302000 154 249 1933 3175 1 1 92 6 0 1 1342168 335944 107636 3135346400 212 173 1912 3280 4 2 89 5 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si
Re: [PERFORM] Sudden drop in DBb performance
hi, thanks a lot for your help! Dear list, we are encountering serious performance problems with our database. Queries which took around 100ms or less last week now take several seconds. The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on hardware as follows: 8-core Intel Xeon CPU with 2.83GHz 48 GB RAM RAID 5 with 8 SAS disks PostgreSQL 8.4.8 (installed from the Ubuntu repository). Additionally to the DB the machine also hosts a few virtual machines. In the past everything worked very well and the described problem occurs just out of the blue. We don't know of any postgresql config changes or anything else which might explain the performance reduction. We have a number of DBs running in the cluster, and the problem seems to affect all of them. What are the virtual machines doing? Are you sure they are not doing a lot of IO? we also have a ssd-disk in the machine, and the virtual machines do most of their IO on that. But there sure also is some amount of IO onto the systems raid array. maybe we should consider having a dedicated database server. We checked the performance of the RAID .. which is reasonable for eg. hdparm -tT. Memory is well used, but not swapping. vmstat shows, that the machine isn't using the swap and the load shouldn't be also to high: root@host:~# vmstat procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 0 308024 884812 4051293200 464 168 353 92 4 2 84 9 Bonnie++ results given below, I am no expert at interpreting those :-) Activating log_min_duration shows for instance this query --- there are now constantly queries which take absurdely long. 2011-09-02 22:38:18 CEST LOG: Dauer: 25520.374 ms Anweisung: SELECT keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten' db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten'; QUERY PLAN --- Index Scan using idx_table_x_keyword on table_x (cost=0.00..8.29 rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1) Index Cond: ((keyword)::text = 'diplomaten'::text) Total runtime: 0.087 ms (3 Zeilen) db=# \d keywords.table_x Tabelle »keywords.table_x« Spalte |Typ | Attribute +---+-- keyword_id | integer | not null Vorgabewert nextval('keywords.table_x_keyword_id_seq'::regclass) keyword| character varying | so | double precision | Indexe: table_x_pkey PRIMARY KEY, btree (keyword_id) CLUSTER idx_table_x_keyword btree (keyword) Fremdschlüsselverweise von: TABLE keywords.table_x_has CONSTRAINT table_x_has_keyword_id_fkey FOREIGN KEY (keyword_id) REFERENCES keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE But in this explain analyze, the query finished in 41 ms. Use auto-explain contrib module to see the explain plan of the slow execution. thanks. we will use auto_explain as soon as some long running updates are finished (don't won't to kill them) cheers gerhard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sudden drop in DBb performance
hi, thanks a lot for your help! Dear list, we are encountering serious performance problems with our database. Queries which took around 100ms or less last week now take several seconds. The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on hardware as follows: 8-core Intel Xeon CPU with 2.83GHz 48 GB RAM RAID 5 with 8 SAS disks PostgreSQL 8.4.8 (installed from the Ubuntu repository). Additionally to the DB the machine also hosts a few virtual machines. In the past everything worked very well and the described problem occurs just out of the blue. We don't know of any postgresql config changes or anything else which might explain the performance reduction. We have a number of DBs running in the cluster, and the problem seems to affect all of them. What are the virtual machines doing? Are you sure they are not doing a lot of IO? we also have a ssd-disk in the machine, and the virtual machines do most of their IO on that. But there sure also is some amount of I/O onto the systems raid array coming from the virtual machines. maybe we should consider having a dedicated database server. We checked the performance of the RAID .. which is reasonable for eg. hdparm -tT. Memory is well used, but not swapping. vmstat shows, that the machine isn't using the swap and the load shouldn't be also to high: root@host:~# vmstat procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 0 308024 884812 4051293200 464 168 353 92 4 2 84 9 Bonnie++ results given below, I am no expert at interpreting those :-) Activating log_min_duration shows for instance this query --- there are now constantly queries which take absurdely long. 2011-09-02 22:38:18 CEST LOG: Dauer: 25520.374 ms Anweisung: SELECT keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten' db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten'; QUERY PLAN --- Index Scan using idx_table_x_keyword on table_x (cost=0.00..8.29 rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1) Index Cond: ((keyword)::text = 'diplomaten'::text) Total runtime: 0.087 ms (3 Zeilen) db=# \d keywords.table_x Tabelle »keywords.table_x« Spalte |Typ | Attribute +---+-- keyword_id | integer | not null Vorgabewert nextval('keywords.table_x_keyword_id_seq'::regclass) keyword| character varying | so | double precision | Indexe: table_x_pkey PRIMARY KEY, btree (keyword_id) CLUSTER idx_table_x_keyword btree (keyword) Fremdschlüsselverweise von: TABLE keywords.table_x_has CONSTRAINT table_x_has_keyword_id_fkey FOREIGN KEY (keyword_id) REFERENCES keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE But in this explain analyze, the query finished in 41 ms. Use auto-explain contrib module to see the explain plan of the slow execution. thanks. we will use auto_explain as soon as some long running updates are finished (don't want to kill them) cheers gerhard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Rather large LA
On 09/05/2011 08:57 AM, Richard Shaw wrote: Hi Andy, It's not a new issue no, It's a legacy system that is in no way ideal but is also not in a position to be overhauled. Indexes are correct, tables are up to 25 million rows. On startup, it hits CPU more than IO, I'll provide some additional stats after I restart it tonight. Server logs have been reviewed and where possible, slow queries have been fixed. Autovacuum has been disabled and set to run manually via cron during a quiet period and fsync has recently been turned off to gauge any real world performance increase, there is battery backup on the raid card providing some level of resilience. Thanks Richard So I'm guessing that setting fsync off did not help your performance problems. And you say CPU is high, so I think we can rule out disk IO problems. possibly installing more RAM as the most used db @ 67GB might appreciate it That would only be if every row of that 67 gig is being used. If its history stuff that never get's looked up, then adding more ram wont help because none of that data is being loaded anyway. Out of that 67 Gig, what is the working size? (Not really a number you can look up, I'm looking for more of an empirical little/some/lots/most). pgpool: max_client_conn = 4096 reserve_pool_size = 800 I've not used pgpool, but these seem really high. Does that mean pgpool will create 4K connectsions to the backend? Or does it mean it'll allow 4K connections to pgpool but only 800 connections to the backend. I wonder...When you startup, if you watch vmstat for a bit, do you have tons of context switches? If its not IO, and you dont say OMG, CPU is pegged! so I assume its not CPU bound, I wonder if there are so many processes fighting for resources they are stepping on each other. When you get up and running (and its slow), what does this display: ps ax|grep postgr|wc --lines That and a minute of 'vmstat 2' would be neet to see as well. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Rather large LA
On Monday, September 05, 2011 14:57:43 Richard Shaw wrote: Autovacuum has been disabled and set to run manually via cron during a quiet period and fsync has recently been turned off to gauge any real world performance increase, there is battery backup on the raid card providing some level of resilience. That doesn't help you against a failure due to fsync() off as the BBU can only protect data that actually has been written to disk. Without fsync=on no guarantee about that exists. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sudden drop in DBb performance
On Mon, Sep 5, 2011 at 8:08 AM, Gerhard Wohlgenannt wo...@ai.wu.ac.at wrote: Below please find the results of vmstat 2 over some periode of time .. with normal database / system load. 0 0 1344332 237196 104140 31468412 0 0 330 102 4322 7130 4 2 90 4 1 1 1344332 236708 104144 31469000 0 0 322 105 2096 3723 1 2 92 5 2 1 1344204 240924 104156 31462484 350 0 1906 234 3687 4512 12 3 77 9 0 0 1344200 238372 104168 31462452 0 0 8 109 4050 8376 8 3 86 3 0 0 1344200 232668 104168 31462468 0 0 12 158 2036 3633 2 2 92 3 0 3 1344196 282784 104180 31413384 4 0 1768 343 2490 4391 1 2 84 13 1 1 1344196 278188 104192 31416080 0 0 1392 341 2215 3850 1 2 82 15 0 0 1344120 276964 104608 31416904 90 0 634 304 2390 3949 4 2 86 8 1 1 1344120 277096 104628 31417752 0 0 492 378 2394 3866 2 1 87 10 0 1 1344120 274476 104628 31418620 0 0 260 233 1997 3255 2 1 91 6 1 1 1344120 276584 104628 31418808 0 0 128 208 2015 3266 2 1 91 6 0 0 1343672 272352 106288 31418788 694 0 1346 344 2170 3660 3 1 89 6 0 1 1343632 270220 107648 31419152 48 0 468 490 2356 3622 4 2 88 5 0 0 1343624 270708 107660 31419344 20 0 228 138 2086 3518 2 3 91 4 0 1 1343612 268732 107660 31419584 12 0 168 112 2100 3585 3 2 91 3 0 0 1343544 266616 107660 31420112 14 0 154 73 2059 3719 3 2 93 3 0 1 1343540 267368 107684 31420168 0 0 78 260 2256 3970 3 2 90 6 0 1 1343540 268352 107692 31420356 0 0 94 284 2239 4086 2 2 89 6 0 0 1343540 274064 107692 31423584 0 0 1622 301 2322 4258 2 3 83 13 0 2 1343440 273064 107704 31423696 96 0 106 180 2158 3795 3 2 90 5 0 0 1342184 262888 107708 31426040 840 0 2014 146 2309 3713 5 3 83 9 0 0 1342184 261904 107732 31426128 0 0 60 158 1893 3510 1 3 91 5 2 0 1342184 258680 107732 31427436 0 0 794 114 2160 3647 2 3 90 5 0 2 1342176 258184 107744 31428308 24 0 310 116 1943 3335 2 2 91 4 1 0 1342172 259068 107756 31428700 2 0 138 143 1976 3468 1 1 93 5 0 0 1342172 258084 107756 31429948 0 0 620 88 2117 3565 3 1 90 6 0 0 1342172 258456 107952 31430028 0 0 62 305 2174 3827 1 2 91 6 1 0 1342172 257480 107952 31430636 0 0 300 256 2316 3959 3 2 86 8 0 0 1342172 257720 107952 31430772 0 0 46 133 2411 4047 3 2 91 3 1 2 1342172 257844 107976 31430776 0 0 136 184 2111 3841 1 1 92 6 1 2 1342172 338376 107576 31349412 0 0 462 8615 2655 5508 5 3 79 13 1 2 1342172 340772 107580 31351080 0 0 682 377 2503 4022 2 1 87 10 1 2 1342172 335688 107596 31351992 0 0 548 306 2480 3867 4 1 86 9 0 2 1342168 337432 107608 31352704 0 0 224 188 1919 3158 1 1 93 6 0 0 1342168 337804 107608 31353020 0 0 154 249 1933 3175 1 1 92 6 0 1 1342168 335944 107636 31353464 0 0 212 173 1912 3280 4 2 89 5 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 1342168 336936 107636 31353512 0 0 14 183 1911 3426 2 1 93 4 0 1 1342168 334440 107656 31353736 0 0 264 372 2119 3400 6 2 84 8 0 0 1342164 334084 107680 31354468 0 0 302 413 2361 3613 2 1 87 10 2 0 1342160 342764 107680 31354916 8 0 184 332 2142 3117 1 1 90 7 0 1 1342160 343788 107680 31355808 0 0 360 211 2247 3249 1 2 91 5 2 1 1342156 340804 107704 31355904 0 0 88 280 2287 3448 2 2 90 6 0 1 1342156 344276 107704 31356464 0 0 316 276 2050 3298 1 2 90 7 0 0 1342156 344160 107712 31356576 0 0 4 225 1884 3194 1 3 90 6 0 0 1342152 342548 107724 31356688 0 0 52 231 1963 3232 1 3 89 6 2 1 1342152 343664 107724 31356764 0 0 104 348 2643 3614 3 2 88 8 1 1 1342144 341060 107760 31357080 16 0 120 307 2511 3474 4 3 87 7 1 0 1342140 342332 107780 31357500 8 0 206 193 2243 3448 4 2 89 5 1 0 1342136 339472 107780 31357508 0 0 32 142 4290 3799 6 3 87 4 0 0 1342136 341160 107780 31357992 0 0 216 171 2613 3995 4 2 88 5 0 0 1342136 342168 107820 31357988 0 0 26 140 2347 3753 3 4 89 4 0 0 1342136 342532 107820 31358128 0 0 36 155 2119 3653 2 1 91 5 2 0 1342136 341564 107828 31358144 0 0 0 151 1973 3486 4 2 90 4 1 1 1342136 342076 107852 31358416 0 0 148 284 2251 3857 6 2 84 8 0 1 1342136 339944 107852 31359284 0 0 482 478 2902 5210 4 2 84 10 0 1 1342136 342184 107852 31359836 0 0 238 372 2292 4063 2 1 88 9
Re: [PERFORM] Sudden drop in DBb performance
On 09/05/2011 01:45 PM, Scott Marlowe wrote: On Mon, Sep 5, 2011 at 8:08 AM, Gerhard Wohlgenanntwo...@ai.wu.ac.at wrote: Below please find the results of vmstat 2 over some periode of time .. with normal database / system load. 2 1 1344204 240924 104156 31462484 3500 1906 234 3687 4512 12 3 77 9 Your IO Wait is actually pretty high. On an 8 core machine, 12.5% means one core is doing nothing but waiting for IO. My server is 2-core, so these numbers looked fine by me. I need to remember core count when I look at these. So the line above, for 2 core's would not worry me a bit, but on 8 cores, it pretty much means one core was pegged (with 9% wait? Or is it one core was pegged, and another was 72% io wait?) I have always loved the vmstat output, but its starting to get confusing when you have to take core's into account. (And my math was never strong in the first place :-) ) Good catch, thanks Scott. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Rather large LA
On September 5, 2011, Richard Shaw rich...@aggress.net wrote: Hi Andy, It's not a new issue no, It's a legacy system that is in no way ideal but is also not in a position to be overhauled. Indexes are correct, tables are up to 25 million rows. On startup, it hits CPU more than IO, I'll provide some additional stats after I restart it tonight. I bet it's I/O bound until a good chunk of the active data gets cached. Run a vmstat 1 while it's that busy, I bet most of the cpu time is really in io_wait.
Re: [PERFORM] Rather large LA
On Mon, Sep 5, 2011 at 11:24 AM, Andres Freund and...@anarazel.de wrote: On Monday, September 05, 2011 14:57:43 Richard Shaw wrote: Autovacuum has been disabled and set to run manually via cron during a quiet period and fsync has recently been turned off to gauge any real world performance increase, there is battery backup on the raid card providing some level of resilience. That doesn't help you against a failure due to fsync() off as the BBU can only protect data that actually has been written to disk. Without fsync=on no guarantee about that exists. Further, if you've got a bbu cache on the RAID card the gains from fsync=off wll be low / nonexistent. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sudden drop in DBb performance
On 5 Září 2011, 21:07, Andy Colson wrote: On 09/05/2011 01:45 PM, Scott Marlowe wrote: On Mon, Sep 5, 2011 at 8:08 AM, Gerhard Wohlgenanntwo...@ai.wu.ac.at wrote: Below please find the results of vmstat 2 over some periode of time .. with normal database / system load. 2 1 1344204 240924 104156 31462484 3500 1906 234 3687 4512 12 3 77 9 Your IO Wait is actually pretty high. On an 8 core machine, 12.5% means one core is doing nothing but waiting for IO. My server is 2-core, so these numbers looked fine by me. I need to remember core count when I look at these. So the line above, for 2 core's would not worry me a bit, but on 8 cores, it pretty much means one core was pegged (with 9% wait? Or is it one core was pegged, and another was 72% io wait?) AFAIK it's as if one core was 72% io wait. Anyway that's exactly why I was asking for iostat -x because the util% gives a better idea of what's going on. I have always loved the vmstat output, but its starting to get confusing when you have to take core's into account. (And my math was never strong in the first place :-) ) That's why I love dstat, just do this $ dstat -C 0,1,2,3,4,5,6,7 and you know all you need. Good catch, thanks Scott. Yes, good catch. Still, this does not explain why the queries were running fast before, and why the RAID array is so sluggish. Not to mention that we don't know what were the conditions when collecting those numbers (were the VMs off or running?). Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Rather large LA
vmstat 1 and iostat -x output Normal procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 3 0 2332 442428 73904 3128734400894200 7 5 85 3 0 4 1 2332 428428 73904 3128828800 1440 0 6553 29066 5 2 91 1 0 4 1 2332 422688 73904 3128868800 856 0 4480 18860 3 1 95 1 0 0 0 2332 476072 73920 3128944400 544 1452 4478 19103 3 1 95 0 0 3 0 2332 422288 73920 3129057200 1268 496 5565 23410 5 3 91 1 0 cavg-cpu: %user %nice %system %iowait %steal %idle 5.110.012.582.560.00 89.74 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 1.00 143.00 523.50 108.00 8364.00 2008.0016.42 2.784.41 1.56 98.35 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 1.00 143.00 523.50 108.00 8364.00 2008.0016.42 2.784.41 1.56 98.35 sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 4.890.002.943.140.00 89.04 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 1.00 0.00 285.00 0.00 4808.00 0.0016.87 2.468.29 3.02 86.20 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 1.00 0.00 285.00 0.00 4808.00 0.0016.87 2.468.29 3.02 86.20 sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 161.50 0.00 6.50 0.00 1344.00 206.77 0.00 0.69 0.15 0.10 sdb1 0.00 161.50 0.00 6.50 0.00 1344.00 206.77 0.00 0.69 0.15 0.10 After Restart procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 2 34 2332 5819012 75632 2585536800894200 7 5 85 3 0 4 39 2332 5813344 75628 2583358800 5104 324 5480 27047 3 1 84 11 0 2 47 2332 5815212 75336 2581206400 4356 1664 5627 28695 3 1 84 12 0 2 40 2332 5852452 75340 2581749600 5632 828 5817 28832 3 1 84 11 0 1 45 2332 5835704 75348 2581707200 4960 1004 5111 25782 2 1 88 9 0 2 42 2332 5840320 75356 2581163200 3884 492 5405 27858 3 1 88 8 0 0 47 2332 5826648 75348 2580529600 4432 1268 5888 29556 3 1 83 13 0 avg-cpu: %user %nice %system %iowait %steal %idle 3.260.001.69 25.210.00 69.84 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.5045.00 520.00 2.50 8316.00 380.0016.64 71.70 118.28 1.92 100.10 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.5045.00 520.00 2.50 8316.00 380.0016.64 71.70 118.28 1.92 100.10 sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 196.50 0.00 10.50 0.00 1656.00 157.71 0.01 0.67 0.52 0.55 sdb1 0.00 196.50 0.00 10.50 0.00 1656.00 157.71 0.01 0.67 0.52 0.55 avg-cpu: %user %nice %system %iowait %steal %idle 3.970.001.71 20.880.00 73.44 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 1.00 0.00 532.00 0.00 8568.00 0.0016.11 73.73 148.44 1.88 100.05 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 1.00 0.00 532.00 0.00 8568.00 0.0016.11 73.73 148.44 1.88 100.05 sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 106.50 0.00 11.50 0.00 944.0082.09 0.00 0.39 0.30 0.35 sdb1 0.00 106.50 0.00 11.50 0.00 944.0082.09 0.00 0.39 0.30 0.35 Regards Richard . On 5 Sep 2011, at 21:05, Alan Hodgson wrote: On September 5, 2011, Richard Shaw rich...@aggress.net wrote: Hi Andy, It's not a new issue no, It's a legacy system that is in no way ideal but is
Re: [PERFORM] Rather large LA
On Mon, Sep 5, 2011 at 4:36 PM, Richard Shaw rich...@aggress.net wrote: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 1.00 143.00 523.50 108.00 8364.00 2008.00 16.42 2.78 4.41 1.56 98.35 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 1.00 143.00 523.50 108.00 8364.00 2008.00 16.42 2.78 4.41 1.56 98.35 So what is /dev/sda2 mounted as? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query performance issue
Based on my initial hunch that something resulting from all the ALTERS was making PostgreSQL planner end up with bad plans, I tried a pg_dump and pg_restore. Now the 'bad' query comes back in 70 seconds (compared to 20 minutes earlier) and the rewritten query still comes back in 2 seconds. So we will stick with the re-written query. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Query-performance-issue-tp4753453p4773061.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance