Re: [GENERAL] General performance/load issue
A better view of iotop : TID PRIO USER DISK READ DISK WRITE SWAPIN IOCOMMAND 16974 be/4 postgres 46.33 K/s0.00 B/s 0.00 % 7.21 % postgres: database database 46.105.111.92(54930) idle 383 be/4 postgres7.72 K/s0.00 B/s 0.00 % 1.56 % postgres: database database 46.105.104.205(51706) idle in transaction 15934 be/4 postgres0.00 B/s 38.61 K/s 0.00 % 0.04 % postgres: wal writer process 31487 be/4 postgres0.00 B/s 11.32 M/s 0.00 % 0.00 % postgres: database database 46.105.104.205(48616) SELECT 29541 be/4 postgres0.00 B/s 11.52 M/s 0.00 % 0.00 % postgres: database database 46.105.104.205(40675) SELECT 31419 be/4 postgres0.00 B/s5.99 M/s 0.00 % 0.00 % postgres: database database 46.105.104.205(48476) SELECT 31875 be/4 postgres0.00 B/s 15.23 M/s 0.00 % 0.00 % postgres: database database 46.105.104.205(50228) SELECT 30985 be/4 postgres0.00 B/s 10.55 M/s 0.00 % 0.00 % postgres: database database 46.105.104.205(47672) SELECT 30902 be/4 postgres0.00 B/s6.04 M/s 0.00 % 0.00 % postgres: database database 176.31.228.6(43295) SELECT 30903 be/4 postgres0.00 B/s4.79 M/s 0.00 % 0.00 % postgres: database database 46.105.104.205(47565) SELECT 15933 be/4 postgres0.00 B/s 77.22 K/s 0.00 % 0.00 % postgres: writer process As you can see, SELECTS are writing very much on the disk. At this moment, pg_stat_activity show many many queries running at the same time. It looks like that all queries are suddenly writing on disk and not a particular one, making me think of a buffer issue or something. This is a example of 10 MB/s writing query : SELECT COUNT(*) FROM (SELECT DISTINCT table.id AS id, table.flux_id AS flux_id, table.locale_id AS locale_id, table.url_article AS url_article, table.original_url AS original_url, table.name AS name, table.description AS description, table.content AS content, table.permis AS permis, table.reviewed AS reviewed, table.author_id AS author_id, table.poster_id AS poster_id, table.post_date AS post_date, table.edit_date AS edit_date, table.add_date AS add_date, table.comments_open AS comments_open, table.site_id AS site_id, table.is_local AS is_local, table.status AS status, table.visits AS visits, table.votes AS votes, table.score AS score, arti Checkpoints logs still show very long write times : LOG: checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s, total=63.625 s Couldn't this be a hardware issue ? Best regards, Gaëtan Le 26/11/11 04:42, « Tomas Vondra » t...@fuzzy.cz a écrit : Dne 26.11.2011 00:17, Cédric Villemain napsal(a): Le 25 novembre 2011 23:47, Gaëtan Allart gae...@nexylan.com a écrit : Hello Tomas and Cédric, Right now, the server is not all right. Load is above 30 and queries are slow like hell. Here's the complete iotop. Note the 71 MB/s writes (apparently on SELECT queries). Total DISK READ: 633.35 K/s | Total DISK WRITE: 71.06 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IOCOMMAND 27352 be/4 postgres 15.64 K/s 86.01 K/s 0.00 % 99.99 % postgres: database database 176.31.228.6(38816) SELECT 20226 be/4 postgres7.82 K/s0.00 B/s 0.00 % 99.99 % postgres: database database 176.31.228.6(34166) SELECT 26950 be/4 postgres 23.46 K/s0.00 B/s 0.00 % 82.14 % postgres: database database 46.105.104.205(40820) SELECT 23160 be/4 postgres3.91 K/s0.00 B/s 0.00 % 81.14 % postgres: database database 46.105.104.205(58091) SELECT 29184 be/4 postgres7.82 K/s0.00 B/s 0.00 % 79.17 % postgres: database database 46.105.104.205(51047) SELECT Gaetan, you keep deleting the interesting rows for some reason. None of the rows you've posted writes more than a few KB/s - what about the rows that actually write those 71MB/s? Here is dirty_expire_centisecs : cat /proc/sys/vm/dirty_expire_centisecs 3000 OK. That's a default value and it's usually too high (just like the ratios), but don't change it until we find out what's wrong. Bgwriter configuration is default : #bgwriter_delay = 200ms # 10-1ms between rounds #bgwriter_lru_maxpages = 100 # 0-1000 max buffers written/round #bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers scanned/round OK. Is there anything I can provide to help you ? the checkpoints logs and the output of pg_stat_bgwriter (as asked by Tomas). It is probable that shared_buffers are too small for your workload (expected), do you still have issue with checkpoint sync duration ? I don't think that's the case. Too small shared buffers usually cause a lot of reads (especially when all the queries are SELECTs as here), but he has problems with writes. And according to the pg_stat_bgwriter Gaetan posted, the checkpoints wrote about 54MB, bgwriter about 370MB and backends about 80MB (during the 5 minutes between snapshots). So I'm really wondering where those 70MB/s of writes come
Re: [GENERAL] General performance/load issue
On 26 Listopad 2011, 10:45, Gaëtan Allart wrote: A better view of iotop : TID PRIO USER DISK READ DISK WRITE SWAPIN IOCOMMAND 31875 be/4 postgres0.00 B/s 15.23 M/s 0.00 % 0.00 % postgres: database database 46.105.104.205(50228) SELECT 30985 be/4 postgres0.00 B/s 10.55 M/s 0.00 % 0.00 % postgres: database database 46.105.104.205(47672) SELECT As you can see, SELECTS are writing very much on the disk. At this moment, pg_stat_activity show many many queries running at the same time. It looks like that all queries are suddenly writing on disk and not a particular one, making me think of a buffer issue or something. No, I don't think this is an issue with the size of shared buffers. That'd influence the pg_stat_bgwriter - the buffers_backend would grow much faster, and it's not the case. So I'm guessing it's either hint bits or sorting. If I had to, I'd probably guess about the hint bits - see for example this thread http://archives.postgresql.org/pgsql-performance/2008-05/msg00182.php it more or less behaves like what you described so far. This is a example of 10 MB/s writing query : SELECT COUNT(*) FROM (SELECT DISTINCT table.id AS id, table.flux_id AS flux_id, table.locale_id AS locale_id, table.url_article AS url_article, table.original_url AS original_url, table.name AS name, table.description AS description, table.content AS content, table.permis AS permis, table.reviewed AS reviewed, table.author_id AS author_id, table.poster_id AS poster_id, table.post_date AS post_date, table.edit_date AS edit_date, table.add_date AS add_date, table.comments_open AS comments_open, table.site_id AS site_id, table.is_local AS is_local, table.status AS status, table.visits AS visits, table.votes AS votes, table.score AS score, arti Post EXPLAIN ANALYZE output of this query (use explain.depesz.com to post it). Checkpoints logs still show very long write times : LOG: checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s, total=63.625 s No, that's fine - that's what spread checkpoints do. Once the checkpoint_timeout expires, the system decides to perform a checkpoint, i.e. it has to write all dirty (modified) shared buffers to the disk. But it knows the next checkpoint will happen in checkpoint_timeout, so it has about 5 minutes to write all the data. So it says something like 'I need to write 540MB want to write that in 270 seconds (5 minutes * completion_target), so I'll write at 2MB/s'. So the write phase is expected to take long. But it has to sync the data at the end, and that's where the problems usually happen - so the important thing is 'sync' and that improved significantly. 6 seconds is not great but it's not a big issue I guess. Couldn't this be a hardware issue ? I don't think so. The problem is that the queries write a lot of data, and the SSD can't help in this case. The SSDs provide much better random performance (compared to spinners), but the sequential performance is not much better. So pushing 70MB/s to the SSD may be fully utilized. Have you tested the basic performance (like bonnie++, dd etc) before going into production? There are some common SSD-related mistakes (e.g. not aligning the partitions properly) that may easily cause 30% performance loss. But this won't fix the problem, although you could get a bit better performance. Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] General performance/load issue
On Fri, Nov 25, 2011 at 6:48 PM, Gaëtan Allart gae...@nexylan.com wrote: Here are the latest checkpoint logs : LOG: checkpoint complete: wrote 842 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=168.970 s, sync=0.005 s, total=168.977 s LOG: checkpoint starting: time LOG: checkpoint complete: wrote 318 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 2 recycled; write=63.818 s, sync=0.006 s, total=63.825 s LOG: checkpoint starting: time LOG: checkpoint complete: wrote 744 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=147.035 s, sync=0.006 s, total=147.043 s LOG: checkpoint starting: time LOG: checkpoint complete: wrote 108 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=35.410 s, sync=14.921 s, total=54.811 s LOG: checkpoint starting: time LOG: checkpoint complete: wrote 393 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=88.835 s, sync=43.210 s, total=135.728 s LOG: checkpoint starting: time LOG: checkpoint complete: wrote 914 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=148.162 s, sync=14.249 s, total=170.481 s LOG: checkpoint starting: time LOG: checkpoint complete: wrote 202 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=53.152 s, sync=0.004 s, total=53.159 s LOG: checkpoint starting: time LOG: checkpoint complete: wrote 897 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=42.414 s, sync=1.175 s, total=58.957 s LOG: checkpoint starting: shutdown immediate LOG: checkpoint complete: wrote 666 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=0.027 s, sync=1.600 s, total=1.630 s LOG: checkpoint starting: time LOG: checkpoint complete: wrote 627 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=125.856 s, sync=0.006 s, total=125.864 s LOG: checkpoint starting: time And pg_stat_bgwriter : postgres=# select *,now() from pg_stat_bgwriter; checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc | now ---+-++---+ --+-+---+-- - 388 | 13 | 494948 | 4306591 | 13555 | 7458743 | 7835244602 | 2011-11-26 00:43:47.232924+01 (1 row) postgres=# select *,now() from pg_stat_bgwriter; checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc | now ---+-++---+ --+-+---+-- - 389 | 13 | 501802 | 4352198 | 13809 | 7469220 | 7839778941 | 2011-11-26 00:49:00.680779+01 (1 row) Processes that were writing were SELECT queries against database. What was the filesystem involved? What is the underlying disk layout (you said it's SSD's, but how what type, how many, and in what configuration? Also how is Postgres set up on top of the disks (all of $PGDATA and OS on one volume? Split up?) Also, how many active connections do you typically have? Can you reduce your sort mem to something like 4MB, and set log_temp_files to 0? Robert Treat conjecture: xzilla.net consulting: omniti.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] General performance/load issue
Hello Robert, I'm having around 30 active connections (average). As far as disk layout is concerned, ever thing's on the same disk (raid 1 with 2 SSDs). Gaëtan Le 26/11/11 15:25, « Robert Treat » r...@xzilla.net a écrit : On Fri, Nov 25, 2011 at 6:48 PM, Gaëtan Allart gae...@nexylan.com wrote: Here are the latest checkpoint logs : LOG: checkpoint complete: wrote 842 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=168.970 s, sync=0.005 s, total=168.977 s LOG: checkpoint starting: time LOG: checkpoint complete: wrote 318 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 2 recycled; write=63.818 s, sync=0.006 s, total=63.825 s LOG: checkpoint starting: time LOG: checkpoint complete: wrote 744 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=147.035 s, sync=0.006 s, total=147.043 s LOG: checkpoint starting: time LOG: checkpoint complete: wrote 108 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=35.410 s, sync=14.921 s, total=54.811 s LOG: checkpoint starting: time LOG: checkpoint complete: wrote 393 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=88.835 s, sync=43.210 s, total=135.728 s LOG: checkpoint starting: time LOG: checkpoint complete: wrote 914 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=148.162 s, sync=14.249 s, total=170.481 s LOG: checkpoint starting: time LOG: checkpoint complete: wrote 202 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=53.152 s, sync=0.004 s, total=53.159 s LOG: checkpoint starting: time LOG: checkpoint complete: wrote 897 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=42.414 s, sync=1.175 s, total=58.957 s LOG: checkpoint starting: shutdown immediate LOG: checkpoint complete: wrote 666 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=0.027 s, sync=1.600 s, total=1.630 s LOG: checkpoint starting: time LOG: checkpoint complete: wrote 627 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=125.856 s, sync=0.006 s, total=125.864 s LOG: checkpoint starting: time And pg_stat_bgwriter : postgres=# select *,now() from pg_stat_bgwriter; checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc | now ---+-++-- -+ --+-+---+ -- - 388 | 13 | 494948 | 4306591 |13555 | 7458743 |7835244602 | 2011-11-26 00:43:47.232924+01 (1 row) postgres=# select *,now() from pg_stat_bgwriter; checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc | now ---+-++-- -+ --+-+---+ -- - 389 | 13 | 501802 | 4352198 |13809 | 7469220 |7839778941 | 2011-11-26 00:49:00.680779+01 (1 row) Processes that were writing were SELECT queries against database. What was the filesystem involved? What is the underlying disk layout (you said it's SSD's, but how what type, how many, and in what configuration? Also how is Postgres set up on top of the disks (all of $PGDATA and OS on one volume? Split up?) Also, how many active connections do you typically have? Can you reduce your sort mem to something like 4MB, and set log_temp_files to 0? Robert Treat conjecture: xzilla.net consulting: omniti.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] General performance/load issue
Rahh :/ It's getting worse and worse :/ Database has to be restarted every 2 hours causing much traffic loss :/ As far as the server is concerned, it was running great 7 days ago and had been running like this for months. I really don't get why it suddenly went I/oingŠ Here's the current postgresql.conf : shared_buffers = 6GB # min 128kB # (change requires restart) temp_buffers = 40MB # min 800kB # (change requires restart) work_mem = 96MB# min 64k maintenance_work_mem = 256MB # min 1MB max_stack_depth = 2MB # min 100kB effective_cache_size = 32GB max_files_per_process = 8192 # min 25 # (change requires restart) fsync = on # turns forced synchronization on or offf synchronous_commit = off # immediate fsync at commit # supported by the operating system: wal_buffers = 16MB # min 32kB checkpoint_segments = 40 # in logfile segments, min 1, 16MB eachcheckpoint_timeout = 5min # range 30s-1h checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0 seq_page_cost = 1.0 # measured on an arbitrary scale random_page_cost = 2.0 # same scale as above Isn't there anything I can do to keep my database Uprunning even with bad performance? Filesystem is ext3. Running over a hardware RAID-1 config. Gaëtan Le 26/11/11 15:12, « Tomas Vondra » t...@fuzzy.cz a écrit : On 26 Listopad 2011, 10:45, Gaëtan Allart wrote: A better view of iotop : TID PRIO USER DISK READ DISK WRITE SWAPIN IOCOMMAND 31875 be/4 postgres0.00 B/s 15.23 M/s 0.00 % 0.00 % postgres: database database 46.105.104.205(50228) SELECT 30985 be/4 postgres0.00 B/s 10.55 M/s 0.00 % 0.00 % postgres: database database 46.105.104.205(47672) SELECT As you can see, SELECTS are writing very much on the disk. At this moment, pg_stat_activity show many many queries running at the same time. It looks like that all queries are suddenly writing on disk and not a particular one, making me think of a buffer issue or something. No, I don't think this is an issue with the size of shared buffers. That'd influence the pg_stat_bgwriter - the buffers_backend would grow much faster, and it's not the case. So I'm guessing it's either hint bits or sorting. If I had to, I'd probably guess about the hint bits - see for example this thread http://archives.postgresql.org/pgsql-performance/2008-05/msg00182.php it more or less behaves like what you described so far. This is a example of 10 MB/s writing query : SELECT COUNT(*) FROM (SELECT DISTINCT table.id AS id, table.flux_id AS flux_id, table.locale_id AS locale_id, table.url_article AS url_article, table.original_url AS original_url, table.name AS name, table.description AS description, table.content AS content, table.permis AS permis, table.reviewed AS reviewed, table.author_id AS author_id, table.poster_id AS poster_id, table.post_date AS post_date, table.edit_date AS edit_date, table.add_date AS add_date, table.comments_open AS comments_open, table.site_id AS site_id, table.is_local AS is_local, table.status AS status, table.visits AS visits, table.votes AS votes, table.score AS score, arti Post EXPLAIN ANALYZE output of this query (use explain.depesz.com to post it). Checkpoints logs still show very long write times : LOG: checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s, total=63.625 s No, that's fine - that's what spread checkpoints do. Once the checkpoint_timeout expires, the system decides to perform a checkpoint, i.e. it has to write all dirty (modified) shared buffers to the disk. But it knows the next checkpoint will happen in checkpoint_timeout, so it has about 5 minutes to write all the data. So it says something like 'I need to write 540MB want to write that in 270 seconds (5 minutes * completion_target), so I'll write at 2MB/s'. So the write phase is expected to take long. But it has to sync the data at the end, and that's where the problems usually happen - so the important thing is 'sync' and that improved significantly. 6 seconds is not great but it's not a big issue I guess. Couldn't this be a hardware issue ? I don't think so. The problem is that the queries write a lot of data, and the SSD can't help in this case. The SSDs provide much better random performance (compared to spinners), but the sequential performance is not much better. So pushing 70MB/s to the SSD may be fully utilized. Have you tested the basic performance (like bonnie++, dd etc) before going into production? There are some common SSD-related mistakes (e.g. not aligning the partitions properly) that may easily cause 30% performance loss. But this won't fix the problem, although you could get a bit better performance. Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] General performance/load issue
Uhm… I'm seeing dozens and dozens of temporary file creations in logs : LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.1425, size 25340 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.195, size 2720340 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp13772.3495, size 24724 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp13772.2674, size 2712452 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.1850, size 25284 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.822, size 2717464 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.1050, size 25060 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.501, size 2700248 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp13772.3768, size 23156 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp13772.2132, size 2713204 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.1886, size 23744 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.311, size 2829600 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp13772.3849, size 25088 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp13772.2352, size 2770352 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.1267, size 25592 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.765, size 2803744 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp13772.3775, size 2 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp13772.2686, size 2858836 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.1094, size 24948 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.106, size 2800140 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.1399, size 23912 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.211, size 2761788 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp13772.3747, size 27188 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp13772.2639, size 2749672 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.1053, size 24276 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.452, size 2948712 Does this help ? Gaëtan Le 26/11/11 17:47, « Gaëtan Allart » gae...@nexylan.com a écrit : Rahh :/ It's getting worse and worse :/ Database has to be restarted every 2 hours causing much traffic loss :/ As far as the server is concerned, it was running great 7 days ago and had been running like this for months. I really don't get why it suddenly went I/oingŠ Here's the current postgresql.conf : shared_buffers = 6GB # min 128kB # (change requires restart) temp_buffers = 40MB # min 800kB # (change requires restart) work_mem = 96MB# min 64k maintenance_work_mem = 256MB # min 1MB max_stack_depth = 2MB # min 100kB effective_cache_size = 32GB max_files_per_process = 8192 # min 25 # (change requires restart) fsync = on # turns forced synchronization on or offf synchronous_commit = off # immediate fsync at commit # supported by the operating system: wal_buffers = 16MB # min 32kB checkpoint_segments = 40 # in logfile segments, min 1, 16MB eachcheckpoint_timeout = 5min # range 30s-1h checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0 seq_page_cost = 1.0 # measured on an arbitrary scale random_page_cost = 2.0 # same scale as above Isn't there anything I can do to keep my database Uprunning even with bad performance? Filesystem is ext3. Running over a hardware RAID-1 config. Gaëtan Le 26/11/11 15:12, « Tomas Vondra » t...@fuzzy.cz a écrit : On 26 Listopad 2011, 10:45, Gaëtan Allart wrote: A better view of iotop : TID PRIO USER DISK READ DISK WRITE SWAPIN IOCOMMAND 31875 be/4 postgres0.00 B/s 15.23 M/s 0.00 % 0.00 % postgres: database database 46.105.104.205(50228) SELECT 30985 be/4 postgres0.00 B/s 10.55 M/s 0.00 % 0.00 % postgres: database database 46.105.104.205(47672) SELECT As you can see, SELECTS are writing very much on the disk. At this moment, pg_stat_activity show many many queries running at the same time. It looks like that all queries are suddenly writing on disk and not a particular one, making me think of a buffer issue or something. No, I don't think this is an issue with the size of shared buffers. That'd influence the pg_stat_bgwriter - the buffers_backend would grow much faster, and it's not the case. So I'm guessing it's either hint bits or sorting. If I had to, I'd probably guess about the hint bits - see for example this thread http://archives.postgresql.org/pgsql-performance/2008-05/msg00182.php it more or less behaves like what you described so far. This is a example of 10 MB/s writing query : SELECT COUNT(*) FROM (SELECT DISTINCT table.id AS id, table.flux_id AS flux_id, table.locale_id AS locale_id, table.url_article AS url_article, table.original_url AS original_url, table.name AS name, table.description AS description, table.content AS content, table.permis AS permis, table.reviewed AS reviewed, table.author_id AS author_id, table.poster_id AS poster_id,
Re: [GENERAL] General performance/load issue
Dne 26.11.2011 18:08, Gaëtan Allart napsal(a): Uhm… I'm seeing dozens and dozens of temporary file creations in logs : LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.1399, size 23912 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.211, size 2761788 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp13772.3747, size 27188 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp13772.2639, size 2749672 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.1053, size 24276 LOG: temporary file: path base/pgsql_tmp/pgsql_tmp15388.452, size 2948712 Please, add timestamp to log_line_prefix so that we know how frequently that happens. Anyway the output is a bit strange. It's writing a lot of temp files that are significantly smaller (about 3MB) than work_mem (128MB). Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] General performance/load issue
Tomas Vondra t...@fuzzy.cz writes: Anyway the output is a bit strange. It's writing a lot of temp files that are significantly smaller (about 3MB) than work_mem (128MB). The most obvious guess about what's doing that is a hash join that has a drastic overestimate of how many rows it has to hash, so that it's batching the join unnecessarily. AFAIR, sorts don't spill to disk until they've actually reached work_mem, but hash join works differently. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] General performance/load issue
Dne 26.11.2011 17:47, Gaëtan Allart napsal(a): Rahh :/ It's getting worse and worse :/ Database has to be restarted every 2 hours causing much traffic loss :/ As far as the server is concerned, it was running great 7 days ago and had been running like this for months. I really don't get why it suddenly went I/oingŠ Given the info provided so far we can rule out checkpoint issues. These sudden changes in performance happen when the when the stats are off or vacuum is not able to maintain all the tables. 1) Post EXPLAIN ANALYZE of the queries that cause a lot of writes. 2) How much did the database grew recently? 3) Is the autovacuum running fine? Check the logs, and maybe set log_autovacuum_min_duration = 100 or a different value. Set the log_line_prefix (add '%t'). 4) Check the pg_stat_all_tables: - large n_dead_tup values (especially compared to n_live_tup) - tables with a lot of changed (n_tup_ins + n_tup_upd + n_tup_del + n_tup_hot_upd) that were not vacuumed / analyzed recently Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] CPU move
Hi, We're planning to move our postgreSQL database from one CPU box to another box. I'm considering an alternative procedure for the move as the standard one (pg_dump from the old, copy dump to the new box, psql to restore in the new) will take about 10 hours to complete. The ideia is installing the same Linux and PostgreSQL versions in the new box and copy the entire database cluster directory from the old to the new one using the scp Linux command. Checked the locale files and they were not customized and are the same in the same box. Which risks am I exposed in this approach? Thank you! -- Reimer 47-3347-1724 47-9183-0547 msn: carlos.rei...@opendb.com.br
Re: [GENERAL] CPU move
On Saturday, November 26, 2011 10:18:56 AM Carlos Henrique Reimer wrote: Hi, We're planning to move our postgreSQL database from one CPU box to another box. I'm considering an alternative procedure for the move as the standard one (pg_dump from the old, copy dump to the new box, psql to restore in the new) will take about 10 hours to complete. The ideia is installing the same Linux and PostgreSQL versions in the new box and copy the entire database cluster directory from the old to the new one using the scp Linux command. Checked the locale files and they were not customized and are the same in the same box. Which risks am I exposed in this approach? This will work, if the versions are the same and the source database is stopped before you start the copy. Make sure you fix up the permissions afterwards. It would be faster to use rsync - you can run a first pass with the source database running, then stop it and run a last rsync -ar --delete to pick up any changes. Your only downtime need be during the last pass. Make sure you've copied over any sysctl.conf settings. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to add conversion between LATIN1 and WIN1251 ?
On Friday, November 25, 2011 11:28:06 pm Condor wrote: No, charset of databases is the same. I use the same ENV when I upgrade sql servers and recreate psql database directory. About client encoding, I never ever has before a configured postgresql on my work station where I connect to servers. Even postgres user and config file did not exists and this worked fine in psql versions below 9.1 That is why I included a link to the Release Notes. There has been a change in behavior in 9.1. I am assuming that you are using psql to connect. If you want the details here is the commit: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=02e14562a806a96f38120c96421d39dfa7394192 -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] General performance/load issue
On 26.11.2011 19:08, Tom Lane wrote: Tomas Vondra t...@fuzzy.cz writes: Anyway the output is a bit strange. It's writing a lot of temp files that are significantly smaller (about 3MB) than work_mem (128MB). The most obvious guess about what's doing that is a hash join that has a drastic overestimate of how many rows it has to hash, so that it's batching the join unnecessarily. AFAIR, sorts don't spill to disk until they've actually reached work_mem, but hash join works differently. OK, so it probably is not sorting, probably. What about other nodes using hash table - aggregates etc.? Would those spill to the disk in case of overestimation? Gaetan did not post the whole query, but it starts like this: SELECT COUNT(*) FROM (SELECT DISTINCT table.id AS id, table.flux_id AS flux_id, table.locale_id AS locale_id, ... so it obviously uses a DISTINCT clause, probably evaluated using a hash aggregate. And this is allegedly one of the queries that write a lot. Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] text search synonym dictionary anomaly with numbers
I am working with street address data in which 'first st' has been entered as '1 st' and so on. So I have created a text search dictionary with entries: first 1 1st 1 And initially it seems to be working properly: SELECT ts_lexize('rwg_synonym','first'); ts_lexize --- {1} SELECT ts_lexize('rwg_synonym','1st'); ts_lexize --- {1} But my queries on '1st' are not returning the expected results: SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1'); count --- 403 - this is what I want SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('first'); count --- 403 - this is also good SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1st'); count --- 4 - this is not good. There are 4 records that do have '1st', but why am I not getting 403 records? Thanks for reading, Rich -- Richard Greenwood richard.greenw...@gmail.com www.greenwoodmap.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general