Re: [GENERAL] General performance/load issue

2011-11-27 Thread Gaëtan Allart
Actually, this is because I changed sort_mem to 4 Mb as asked by Robert.

I removed this setting..

Gaëtan

Le 26/11/11 18:58, « Tomas Vondra » t...@fuzzy.cz a écrit :

Dne 26.11.2011 18:08, Gaëtan Allart napsal(a):
 UhmŠ
 
 I'm seeing dozens and dozens of temporary file creations in logs :
 
 LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.1399, size
23912
 LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.211, size
 2761788
 LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp13772.3747, size
27188
 LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp13772.2639, size
 2749672
 LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.1053, size
24276
 LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.452, size
 2948712

Please, add timestamp to log_line_prefix so that we know how frequently
that happens.

Anyway the output is a bit strange. It's writing a lot of temp files
that are significantly smaller (about 3MB) than work_mem (128MB).

Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-27 Thread Tomas Vondra
On 27 Listopad 2011, 10:45, Gaëtan Allart wrote:
 Actually, this is because I changed sort_mem to 4 Mb as asked by Robert.

 I removed this setting..

This is exactly the reason why it's recommended not to do any changes
until it's reasonably certain what is the caause.

Did increasing the work_mem again solved the issues? Is the database
running fine?

If no, post the iotop output again, so we can see what was the impact. And
post the EXPLAIN ANALYZE of the query, so we can see if the estimates are
OK.

Anyway, Rober was right - the work_mem=128MB is rather large and should be
decreased. Use something like a binary search to find the right value -
use 64MB and watch the log for temp file messages. If there's a lot of
messages, choose the upper interval and set (64+128)/2 = 96MB, otherwise
32MB. And so on. That should lead you to the minimal work_mem value.

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-26 Thread Gaëtan Allart
A better view of iotop :

  TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN  IOCOMMAND
16974 be/4 postgres   46.33 K/s0.00 B/s  0.00 %  7.21 % postgres:
database database 46.105.111.92(54930) idle
  383 be/4 postgres7.72 K/s0.00 B/s  0.00 %  1.56 % postgres:
database database 46.105.104.205(51706) idle in transaction
15934 be/4 postgres0.00 B/s   38.61 K/s  0.00 %  0.04 % postgres: wal
writer process
31487 be/4 postgres0.00 B/s   11.32 M/s  0.00 %  0.00 % postgres:
database database 46.105.104.205(48616) SELECT
29541 be/4 postgres0.00 B/s   11.52 M/s  0.00 %  0.00 % postgres:
database database 46.105.104.205(40675) SELECT
31419 be/4 postgres0.00 B/s5.99 M/s  0.00 %  0.00 % postgres:
database database 46.105.104.205(48476) SELECT
31875 be/4 postgres0.00 B/s   15.23 M/s  0.00 %  0.00 % postgres:
database database 46.105.104.205(50228) SELECT
30985 be/4 postgres0.00 B/s   10.55 M/s  0.00 %  0.00 % postgres:
database database 46.105.104.205(47672) SELECT
30902 be/4 postgres0.00 B/s6.04 M/s  0.00 %  0.00 % postgres:
database database 176.31.228.6(43295) SELECT
30903 be/4 postgres0.00 B/s4.79 M/s  0.00 %  0.00 % postgres:
database database 46.105.104.205(47565) SELECT
15933 be/4 postgres0.00 B/s   77.22 K/s  0.00 %  0.00 % postgres:
writer process



As you can see, SELECTS are writing very much on the disk.

At this moment, pg_stat_activity show many many queries running at the
same time.
It looks like that all queries are suddenly writing on disk and not a
particular one, making me think of a buffer issue or something.

This is a example of 10 MB/s writing query :

SELECT COUNT(*) FROM (SELECT DISTINCT table.id AS id,
table.flux_id AS flux_id, table.locale_id AS locale_id,
table.url_article AS url_article, table.original_url AS
original_url, table.name AS name, table.description AS
description, table.content AS content, table.permis AS
permis, table.reviewed AS reviewed, table.author_id AS
author_id, table.poster_id AS poster_id, table.post_date AS
post_date, table.edit_date AS edit_date, table.add_date AS
add_date, table.comments_open AS comments_open, table.site_id
AS site_id, table.is_local AS is_local, table.status AS
status, table.visits AS visits, table.votes AS votes,
table.score AS score, arti



Checkpoints logs still show very long write times :

LOG:  checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s,
total=63.625 s



Couldn't this be a hardware issue ?

Best regards,

Gaëtan

Le 26/11/11 04:42, « Tomas Vondra » t...@fuzzy.cz a écrit :

Dne 26.11.2011 00:17, Cédric Villemain napsal(a):
 Le 25 novembre 2011 23:47, Gaëtan Allart gae...@nexylan.com a écrit :
 Hello Tomas and Cédric,

 Right now, the server is not all right. Load is above 30 and queries
are
 slow like hell.


 Here's the complete iotop. Note the 71 MB/s writes (apparently on
SELECT
 queries).

 Total DISK READ: 633.35 K/s | Total DISK WRITE: 71.06 M/s
  TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN IOCOMMAND


 27352 be/4 postgres   15.64 K/s   86.01 K/s  0.00 % 99.99 % postgres:
 database database 176.31.228.6(38816) SELECT
 20226 be/4 postgres7.82 K/s0.00 B/s  0.00 % 99.99 % postgres:
 database database 176.31.228.6(34166) SELECT
 26950 be/4 postgres   23.46 K/s0.00 B/s  0.00 % 82.14 % postgres:
 database database 46.105.104.205(40820) SELECT
 23160 be/4 postgres3.91 K/s0.00 B/s  0.00 % 81.14 % postgres:
 database database 46.105.104.205(58091) SELECT
 29184 be/4 postgres7.82 K/s0.00 B/s  0.00 % 79.17 % postgres:
 database database 46.105.104.205(51047) SELECT

Gaetan, you keep deleting the interesting rows for some reason. None of
the rows you've posted writes more than a few KB/s - what about the rows
that actually write those 71MB/s?

 Here is dirty_expire_centisecs :

 cat /proc/sys/vm/dirty_expire_centisecs
 3000

OK. That's a default value and it's usually too high (just like the
ratios), but don't change it until we find out what's wrong.

 Bgwriter configuration is default :

 #bgwriter_delay = 200ms   # 10-1ms between rounds
 #bgwriter_lru_maxpages = 100  # 0-1000 max buffers written/round
 #bgwriter_lru_multiplier = 2.0  # 0-10.0 multipler on buffers
scanned/round

OK.

 Is there anything I can provide to help you ?
 
 the checkpoints logs and the output of pg_stat_bgwriter (as asked by
Tomas).
 
 It is probable that shared_buffers are too small for your workload
 (expected), do you still have issue with checkpoint sync duration ?

I don't think that's the case. Too small shared buffers usually cause a
lot of reads (especially when all the queries are SELECTs as here), but
he has problems with writes.

And according to the pg_stat_bgwriter Gaetan posted, the checkpoints
wrote about 54MB, bgwriter about 370MB and backends about 80MB (during
the 5 minutes between snapshots).

So I'm really wondering where those 70MB/s of writes come 

Re: [GENERAL] General performance/load issue

2011-11-26 Thread Tomas Vondra
On 26 Listopad 2011, 10:45, Gaëtan Allart wrote:
 A better view of iotop :

   TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN  IOCOMMAND
 31875 be/4 postgres0.00 B/s   15.23 M/s  0.00 %  0.00 % postgres:
 database database 46.105.104.205(50228) SELECT
 30985 be/4 postgres0.00 B/s   10.55 M/s  0.00 %  0.00 % postgres:
 database database 46.105.104.205(47672) SELECT

 As you can see, SELECTS are writing very much on the disk.

 At this moment, pg_stat_activity show many many queries running at the
 same time.
 It looks like that all queries are suddenly writing on disk and not a
 particular one, making me think of a buffer issue or something.

No, I don't think this is an issue with the size of shared buffers. That'd
influence the pg_stat_bgwriter - the buffers_backend would grow much
faster, and it's not the case.

So I'm guessing it's either hint bits or sorting. If I had to, I'd
probably guess about the hint bits - see for example this thread

http://archives.postgresql.org/pgsql-performance/2008-05/msg00182.php

it more or less behaves like what you described so far.

 This is a example of 10 MB/s writing query :

 SELECT COUNT(*) FROM (SELECT DISTINCT table.id AS id,
 table.flux_id AS flux_id, table.locale_id AS locale_id,
 table.url_article AS url_article, table.original_url AS
 original_url, table.name AS name, table.description AS
 description, table.content AS content, table.permis AS
 permis, table.reviewed AS reviewed, table.author_id AS
 author_id, table.poster_id AS poster_id, table.post_date AS
 post_date, table.edit_date AS edit_date, table.add_date AS
 add_date, table.comments_open AS comments_open, table.site_id
 AS site_id, table.is_local AS is_local, table.status AS
 status, table.visits AS visits, table.votes AS votes,
 table.score AS score, arti

Post EXPLAIN ANALYZE output of this query (use explain.depesz.com to post
it).

 Checkpoints logs still show very long write times :

 LOG:  checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log
 file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s,
 total=63.625 s

No, that's fine - that's what spread checkpoints do. Once the
checkpoint_timeout expires, the system decides to perform a checkpoint,
i.e. it has to write all dirty (modified) shared buffers to the disk. But
it knows the next checkpoint will happen in checkpoint_timeout, so it has
about 5 minutes to write all the data.

So it says something like 'I need to write 540MB want to write that in 270
seconds (5 minutes * completion_target), so I'll write at 2MB/s'. So the
write phase is expected to take long. But it has to sync the data at the
end, and that's where the problems usually happen - so the important thing
is 'sync' and that improved significantly. 6 seconds is not great but it's
not a big issue I guess.

 Couldn't this be a hardware issue ?

I don't think so. The problem is that the queries write a lot of data, and
the SSD can't help in this case. The SSDs provide much better random
performance (compared to spinners), but the sequential performance is not
much better. So pushing 70MB/s to the SSD may be fully utilized.

Have you tested the basic performance (like bonnie++, dd etc) before going
into production? There are some common SSD-related mistakes (e.g. not
aligning the partitions properly) that may easily cause 30% performance
loss. But this won't fix the problem, although you could get a bit better
performance.

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-26 Thread Robert Treat
On Fri, Nov 25, 2011 at 6:48 PM, Gaëtan Allart gae...@nexylan.com wrote:
 Here are the latest checkpoint logs :

 LOG:  checkpoint complete: wrote 842 buffers (0.1%); 0 transaction log
 file(s) added, 0 removed, 0 recycled; write=168.970 s, sync=0.005 s,
 total=168.977 s
 LOG:  checkpoint starting: time
 LOG:  checkpoint complete: wrote 318 buffers (0.0%); 0 transaction log
 file(s) added, 0 removed, 2 recycled; write=63.818 s, sync=0.006 s,
 total=63.825 s
 LOG:  checkpoint starting: time
 LOG:  checkpoint complete: wrote 744 buffers (0.1%); 0 transaction log
 file(s) added, 0 removed, 0 recycled; write=147.035 s, sync=0.006 s,
 total=147.043 s
 LOG:  checkpoint starting: time
 LOG:  checkpoint complete: wrote 108 buffers (0.0%); 0 transaction log
 file(s) added, 0 removed, 3 recycled; write=35.410 s, sync=14.921 s,
 total=54.811 s
 LOG:  checkpoint starting: time
 LOG:  checkpoint complete: wrote 393 buffers (0.0%); 0 transaction log
 file(s) added, 0 removed, 0 recycled; write=88.835 s, sync=43.210 s,
 total=135.728 s
 LOG:  checkpoint starting: time
 LOG:  checkpoint complete: wrote 914 buffers (0.1%); 0 transaction log
 file(s) added, 0 removed, 1 recycled; write=148.162 s, sync=14.249 s,
 total=170.481 s
 LOG:  checkpoint starting: time
 LOG:  checkpoint complete: wrote 202 buffers (0.0%); 0 transaction log
 file(s) added, 0 removed, 1 recycled; write=53.152 s, sync=0.004 s,
 total=53.159 s
 LOG:  checkpoint starting: time
 LOG:  checkpoint complete: wrote 897 buffers (0.1%); 0 transaction log
 file(s) added, 0 removed, 1 recycled; write=42.414 s, sync=1.175 s,
 total=58.957 s
 LOG:  checkpoint starting: shutdown immediate
 LOG:  checkpoint complete: wrote 666 buffers (0.1%); 0 transaction log
 file(s) added, 0 removed, 1 recycled; write=0.027 s, sync=1.600 s,
 total=1.630 s
 LOG:  checkpoint starting: time
 LOG:  checkpoint complete: wrote 627 buffers (0.1%); 0 transaction log
 file(s) added, 0 removed, 1 recycled; write=125.856 s, sync=0.006 s,
 total=125.864 s
 LOG:  checkpoint starting: time


 And pg_stat_bgwriter :

 postgres=# select *,now() from pg_stat_bgwriter;
  checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean
 | maxwritten_clean | buffers_backend | buffers_alloc |              now

 ---+-++---+
 --+-+---+--
 -
               388 |              13 |             494948 |       4306591
 |            13555 |         7458743 |    7835244602 | 2011-11-26
 00:43:47.232924+01
 (1 row)

 postgres=# select *,now() from pg_stat_bgwriter;
  checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean
 | maxwritten_clean | buffers_backend | buffers_alloc |              now

 ---+-++---+
 --+-+---+--
 -
               389 |              13 |             501802 |       4352198
 |            13809 |         7469220 |    7839778941 | 2011-11-26
 00:49:00.680779+01
 (1 row)

 Processes that were writing were SELECT queries against database.


What was the filesystem involved? What is the underlying disk layout
(you said it's SSD's, but how what type, how many, and in what
configuration? Also how is Postgres set up on top of the disks (all of
$PGDATA and OS on one volume? Split up?)

Also, how many active connections do you typically have? Can you
reduce your sort mem to something like 4MB, and set log_temp_files to
0?


Robert Treat
conjecture: xzilla.net
consulting: omniti.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-26 Thread Gaëtan Allart
Hello Robert,

I'm having around 30 active connections (average).
As far as disk layout is concerned, ever thing's on the same disk (raid 1
with 2 SSDs).

Gaëtan

Le 26/11/11 15:25, « Robert Treat » r...@xzilla.net a écrit :

On Fri, Nov 25, 2011 at 6:48 PM, Gaëtan Allart gae...@nexylan.com wrote:
 Here are the latest checkpoint logs :

 LOG:  checkpoint complete: wrote 842 buffers (0.1%); 0 transaction log
 file(s) added, 0 removed, 0 recycled; write=168.970 s, sync=0.005 s,
 total=168.977 s
 LOG:  checkpoint starting: time
 LOG:  checkpoint complete: wrote 318 buffers (0.0%); 0 transaction log
 file(s) added, 0 removed, 2 recycled; write=63.818 s, sync=0.006 s,
 total=63.825 s
 LOG:  checkpoint starting: time
 LOG:  checkpoint complete: wrote 744 buffers (0.1%); 0 transaction log
 file(s) added, 0 removed, 0 recycled; write=147.035 s, sync=0.006 s,
 total=147.043 s
 LOG:  checkpoint starting: time
 LOG:  checkpoint complete: wrote 108 buffers (0.0%); 0 transaction log
 file(s) added, 0 removed, 3 recycled; write=35.410 s, sync=14.921 s,
 total=54.811 s
 LOG:  checkpoint starting: time
 LOG:  checkpoint complete: wrote 393 buffers (0.0%); 0 transaction log
 file(s) added, 0 removed, 0 recycled; write=88.835 s, sync=43.210 s,
 total=135.728 s
 LOG:  checkpoint starting: time
 LOG:  checkpoint complete: wrote 914 buffers (0.1%); 0 transaction log
 file(s) added, 0 removed, 1 recycled; write=148.162 s, sync=14.249 s,
 total=170.481 s
 LOG:  checkpoint starting: time
 LOG:  checkpoint complete: wrote 202 buffers (0.0%); 0 transaction log
 file(s) added, 0 removed, 1 recycled; write=53.152 s, sync=0.004 s,
 total=53.159 s
 LOG:  checkpoint starting: time
 LOG:  checkpoint complete: wrote 897 buffers (0.1%); 0 transaction log
 file(s) added, 0 removed, 1 recycled; write=42.414 s, sync=1.175 s,
 total=58.957 s
 LOG:  checkpoint starting: shutdown immediate
 LOG:  checkpoint complete: wrote 666 buffers (0.1%); 0 transaction log
 file(s) added, 0 removed, 1 recycled; write=0.027 s, sync=1.600 s,
 total=1.630 s
 LOG:  checkpoint starting: time
 LOG:  checkpoint complete: wrote 627 buffers (0.1%); 0 transaction log
 file(s) added, 0 removed, 1 recycled; write=125.856 s, sync=0.006 s,
 total=125.864 s
 LOG:  checkpoint starting: time


 And pg_stat_bgwriter :

 postgres=# select *,now() from pg_stat_bgwriter;
  checkpoints_timed | checkpoints_req | buffers_checkpoint |
buffers_clean
 | maxwritten_clean | buffers_backend | buffers_alloc |  now

 
---+-++--
-+
 
--+-+---+
--
 -
   388 |  13 | 494948 |   4306591
 |13555 | 7458743 |7835244602 | 2011-11-26
 00:43:47.232924+01
 (1 row)

 postgres=# select *,now() from pg_stat_bgwriter;
  checkpoints_timed | checkpoints_req | buffers_checkpoint |
buffers_clean
 | maxwritten_clean | buffers_backend | buffers_alloc |  now

 
---+-++--
-+
 
--+-+---+
--
 -
   389 |  13 | 501802 |   4352198
 |13809 | 7469220 |7839778941 | 2011-11-26
 00:49:00.680779+01
 (1 row)

 Processes that were writing were SELECT queries against database.


What was the filesystem involved? What is the underlying disk layout
(you said it's SSD's, but how what type, how many, and in what
configuration? Also how is Postgres set up on top of the disks (all of
$PGDATA and OS on one volume? Split up?)

Also, how many active connections do you typically have? Can you
reduce your sort mem to something like 4MB, and set log_temp_files to
0?


Robert Treat
conjecture: xzilla.net
consulting: omniti.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-26 Thread Gaëtan Allart
Rahh :/

It's getting worse and worse :/ Database has to be restarted every 2 hours
causing much traffic loss :/

As far as the server is concerned, it was running great 7 days ago and had
been running like this for months. I really don't get why it suddenly went
I/oingŠ


Here's the current postgresql.conf :

shared_buffers = 6GB   # min 128kB
 # (change requires restart)
temp_buffers = 40MB   # min 800kB
 # (change requires restart)
work_mem = 96MB# min 64k
maintenance_work_mem = 256MB  # min 1MB
max_stack_depth = 2MB   # min 100kB

effective_cache_size = 32GB


max_files_per_process = 8192  # min 25
 # (change requires restart)

fsync = on   # turns forced synchronization on or offf
synchronous_commit = off  # immediate fsync at commit
 # supported by the operating system:
wal_buffers = 16MB   # min 32kB


checkpoint_segments = 40  # in logfile segments, min 1, 16MB
eachcheckpoint_timeout = 5min  # range 30s-1h
checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0




seq_page_cost = 1.0   # measured on an arbitrary scale
random_page_cost = 2.0   # same scale as above





Isn't there anything I can do to keep my database Uprunning even with bad
performance?

Filesystem is ext3. Running over a hardware RAID-1 config.


Gaëtan

Le 26/11/11 15:12, « Tomas Vondra » t...@fuzzy.cz a écrit :

On 26 Listopad 2011, 10:45, Gaëtan Allart wrote:
 A better view of iotop :

   TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN  IOCOMMAND
 31875 be/4 postgres0.00 B/s   15.23 M/s  0.00 %  0.00 % postgres:
 database database 46.105.104.205(50228) SELECT
 30985 be/4 postgres0.00 B/s   10.55 M/s  0.00 %  0.00 % postgres:
 database database 46.105.104.205(47672) SELECT

 As you can see, SELECTS are writing very much on the disk.

 At this moment, pg_stat_activity show many many queries running at the
 same time.
 It looks like that all queries are suddenly writing on disk and not a
 particular one, making me think of a buffer issue or something.

No, I don't think this is an issue with the size of shared buffers. That'd
influence the pg_stat_bgwriter - the buffers_backend would grow much
faster, and it's not the case.

So I'm guessing it's either hint bits or sorting. If I had to, I'd
probably guess about the hint bits - see for example this thread

http://archives.postgresql.org/pgsql-performance/2008-05/msg00182.php

it more or less behaves like what you described so far.

 This is a example of 10 MB/s writing query :

 SELECT COUNT(*) FROM (SELECT DISTINCT table.id AS id,
 table.flux_id AS flux_id, table.locale_id AS locale_id,
 table.url_article AS url_article, table.original_url AS
 original_url, table.name AS name, table.description AS
 description, table.content AS content, table.permis AS
 permis, table.reviewed AS reviewed, table.author_id AS
 author_id, table.poster_id AS poster_id, table.post_date AS
 post_date, table.edit_date AS edit_date, table.add_date AS
 add_date, table.comments_open AS comments_open,
table.site_id
 AS site_id, table.is_local AS is_local, table.status AS
 status, table.visits AS visits, table.votes AS votes,
 table.score AS score, arti

Post EXPLAIN ANALYZE output of this query (use explain.depesz.com to post
it).

 Checkpoints logs still show very long write times :

 LOG:  checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log
 file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s,
 total=63.625 s

No, that's fine - that's what spread checkpoints do. Once the
checkpoint_timeout expires, the system decides to perform a checkpoint,
i.e. it has to write all dirty (modified) shared buffers to the disk. But
it knows the next checkpoint will happen in checkpoint_timeout, so it has
about 5 minutes to write all the data.

So it says something like 'I need to write 540MB want to write that in 270
seconds (5 minutes * completion_target), so I'll write at 2MB/s'. So the
write phase is expected to take long. But it has to sync the data at the
end, and that's where the problems usually happen - so the important thing
is 'sync' and that improved significantly. 6 seconds is not great but it's
not a big issue I guess.

 Couldn't this be a hardware issue ?

I don't think so. The problem is that the queries write a lot of data, and
the SSD can't help in this case. The SSDs provide much better random
performance (compared to spinners), but the sequential performance is not
much better. So pushing 70MB/s to the SSD may be fully utilized.

Have you tested the basic performance (like bonnie++, dd etc) before going
into production? There are some common SSD-related mistakes (e.g. not
aligning the partitions properly) that may easily cause 30% performance
loss. But this won't fix the problem, although you could get a bit better
performance.

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-26 Thread Gaëtan Allart
Uhm…

I'm seeing dozens and dozens of temporary file creations in logs :

LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.1425, size 25340
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.195, size
2720340
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp13772.3495, size 24724
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp13772.2674, size
2712452
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.1850, size 25284
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.822, size
2717464
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.1050, size 25060
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.501, size
2700248
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp13772.3768, size 23156
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp13772.2132, size
2713204
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.1886, size 23744
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.311, size
2829600
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp13772.3849, size 25088
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp13772.2352, size
2770352
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.1267, size 25592
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.765, size
2803744
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp13772.3775, size 2
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp13772.2686, size
2858836
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.1094, size 24948
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.106, size
2800140
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.1399, size 23912
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.211, size
2761788
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp13772.3747, size 27188
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp13772.2639, size
2749672
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.1053, size 24276
LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.452, size
2948712




Does this help ?

Gaëtan

Le 26/11/11 17:47, « Gaëtan Allart » gae...@nexylan.com a écrit :

Rahh :/

It's getting worse and worse :/ Database has to be restarted every 2 hours
causing much traffic loss :/

As far as the server is concerned, it was running great 7 days ago and had
been running like this for months. I really don't get why it suddenly went
I/oingŠ


Here's the current postgresql.conf :

shared_buffers = 6GB   # min 128kB
 # (change requires restart)
temp_buffers = 40MB   # min 800kB
 # (change requires restart)
work_mem = 96MB# min 64k
maintenance_work_mem = 256MB  # min 1MB
max_stack_depth = 2MB   # min 100kB

effective_cache_size = 32GB


max_files_per_process = 8192  # min 25
 # (change requires restart)

fsync = on   # turns forced synchronization on or offf
synchronous_commit = off  # immediate fsync at commit
 # supported by the operating system:
wal_buffers = 16MB   # min 32kB


checkpoint_segments = 40  # in logfile segments, min 1, 16MB
eachcheckpoint_timeout = 5min  # range 30s-1h
checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0




seq_page_cost = 1.0   # measured on an arbitrary scale
random_page_cost = 2.0   # same scale as above





Isn't there anything I can do to keep my database Uprunning even with bad
performance?

Filesystem is ext3. Running over a hardware RAID-1 config.


Gaëtan

Le 26/11/11 15:12, « Tomas Vondra » t...@fuzzy.cz a écrit :

On 26 Listopad 2011, 10:45, Gaëtan Allart wrote:
 A better view of iotop :

   TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN  IOCOMMAND
 31875 be/4 postgres0.00 B/s   15.23 M/s  0.00 %  0.00 % postgres:
 database database 46.105.104.205(50228) SELECT
 30985 be/4 postgres0.00 B/s   10.55 M/s  0.00 %  0.00 % postgres:
 database database 46.105.104.205(47672) SELECT

 As you can see, SELECTS are writing very much on the disk.

 At this moment, pg_stat_activity show many many queries running at the
 same time.
 It looks like that all queries are suddenly writing on disk and not a
 particular one, making me think of a buffer issue or something.

No, I don't think this is an issue with the size of shared buffers.
That'd
influence the pg_stat_bgwriter - the buffers_backend would grow much
faster, and it's not the case.

So I'm guessing it's either hint bits or sorting. If I had to, I'd
probably guess about the hint bits - see for example this thread

http://archives.postgresql.org/pgsql-performance/2008-05/msg00182.php

it more or less behaves like what you described so far.

 This is a example of 10 MB/s writing query :

 SELECT COUNT(*) FROM (SELECT DISTINCT table.id AS id,
 table.flux_id AS flux_id, table.locale_id AS locale_id,
 table.url_article AS url_article, table.original_url AS
 original_url, table.name AS name, table.description AS
 description, table.content AS content, table.permis AS
 permis, table.reviewed AS reviewed, table.author_id AS
 author_id, table.poster_id AS poster_id, 

Re: [GENERAL] General performance/load issue

2011-11-26 Thread Tomas Vondra
Dne 26.11.2011 18:08, Gaëtan Allart napsal(a):
 Uhm…
 
 I'm seeing dozens and dozens of temporary file creations in logs :
 
 LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.1399, size 23912
 LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.211, size
 2761788
 LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp13772.3747, size 27188
 LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp13772.2639, size
 2749672
 LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.1053, size 24276
 LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp15388.452, size
 2948712

Please, add timestamp to log_line_prefix so that we know how frequently
that happens.

Anyway the output is a bit strange. It's writing a lot of temp files
that are significantly smaller (about 3MB) than work_mem (128MB).

Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-26 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes:
 Anyway the output is a bit strange. It's writing a lot of temp files
 that are significantly smaller (about 3MB) than work_mem (128MB).

The most obvious guess about what's doing that is a hash join that has
a drastic overestimate of how many rows it has to hash, so that it's
batching the join unnecessarily.  AFAIR, sorts don't spill to disk until
they've actually reached work_mem, but hash join works differently.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-26 Thread Tomas Vondra
Dne 26.11.2011 17:47, Gaëtan Allart napsal(a):
 Rahh :/
 
 It's getting worse and worse :/ Database has to be restarted every 2 hours
 causing much traffic loss :/
 
 As far as the server is concerned, it was running great 7 days ago and had
 been running like this for months. I really don't get why it suddenly went
 I/oingŠ

Given the info provided so far we can rule out checkpoint issues.

These sudden changes in performance happen when the when the stats are
off or vacuum is not able to maintain all the tables.

1) Post EXPLAIN ANALYZE of the queries that cause a lot of writes.

2) How much did the database grew recently?

3) Is the autovacuum running fine? Check the logs, and maybe set

 log_autovacuum_min_duration = 100

   or a different value. Set the log_line_prefix (add '%t').

4) Check the pg_stat_all_tables:

   - large n_dead_tup values (especially compared to n_live_tup)

   - tables with a lot of changed (n_tup_ins + n_tup_upd + n_tup_del +
 n_tup_hot_upd) that were not vacuumed / analyzed recently

Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-26 Thread Tomas Vondra
On 26.11.2011 19:08, Tom Lane wrote:
 Tomas Vondra t...@fuzzy.cz writes:
 Anyway the output is a bit strange. It's writing a lot of temp files
 that are significantly smaller (about 3MB) than work_mem (128MB).
 
 The most obvious guess about what's doing that is a hash join that has
 a drastic overestimate of how many rows it has to hash, so that it's
 batching the join unnecessarily.  AFAIR, sorts don't spill to disk until
 they've actually reached work_mem, but hash join works differently.

OK, so it probably is not sorting, probably. What about other nodes
using hash table - aggregates etc.? Would those spill to the disk in
case of overestimation?

Gaetan did not post the whole query, but it starts like this:

SELECT COUNT(*) FROM (SELECT DISTINCT table.id AS id,
table.flux_id AS flux_id, table.locale_id AS locale_id,
...

so it obviously uses a DISTINCT clause, probably evaluated using a hash
aggregate. And this is allegedly one of the queries that write a lot.

Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-25 Thread Tomas Vondra
On 24 Listopad 2011, 23:19, Cédric Villemain wrote:
 Le 24 novembre 2011 20:38, Gaëtan Allart gae...@nexylan.com a écrit :
 Finally, it crashed againŠ :-(

 Here's the output of iotop while databased was inaccessible :

 32361 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(34847) idle
 32244 be/4 postgres  163.48 K/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(34660) SELECT
 32045 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(33765) SELECT
 32158 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(34112) SELECT
 32242 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(34632) SELECT
 32372 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(38858) idle in transaction
 32231 be/4 postgres   15.57 K/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(38602) SELECT
 28811 be/4 postgres    3.89 K/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(40594) SELECT
 32190 be/4 postgres    3.89 K/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(38497) SELECT

This iotop output is rather incomplete - it does not show which of the
processes actually did the I/O. The fact that a process does 99% of the
I/O and yet performs no reads or writes is a bit strange.

Use iotop -o and post the result.

 Does this help?

 yes.
 It seem you have an issue with your checkpoint syncing time, it is
 fixed in 9.1 and backported in 9.0 here :
 http://projects.2ndquadrant.com/backports

People generally don't want to apply backports on their own, especially
when it's a production server and when it's unclear it actually fixes the
issue they have. I'm not sure about that.

 It is possible you have other problems that explains the issue you
 have. An immediate solution before trying a patch is to reduce your
 shared_buffer setting to something very low, like 1GB.

Well, using low shared_buffers was used especially before 8.3, when the
spread checkpoints were not available. It prevents the I/O overload when
the database suddenly decides to write all of the dirty buffers. But he's
on 9.0 (so he already has spread checkpoints).

Plus the number of buffers he's writing is negligible - usually about 700
buffers (6MB), 3192 buffers (25MB) at most. That surely should not be a
problem for the SSD he's using.

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-25 Thread Cédric Villemain
Le 25 novembre 2011 11:25, Tomas Vondra t...@fuzzy.cz a écrit :
 On 24 Listopad 2011, 23:19, Cédric Villemain wrote:
 Le 24 novembre 2011 20:38, Gaëtan Allart gae...@nexylan.com a écrit :
 Finally, it crashed againŠ :-(

 Here's the output of iotop while databased was inaccessible :

 32361 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(34847) idle
 32244 be/4 postgres  163.48 K/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(34660) SELECT
 32045 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(33765) SELECT
 32158 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(34112) SELECT
 32242 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(34632) SELECT
 32372 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(38858) idle in transaction
 32231 be/4 postgres   15.57 K/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(38602) SELECT
 28811 be/4 postgres    3.89 K/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(40594) SELECT
 32190 be/4 postgres    3.89 K/s    0.00 B/s  0.00 % 99.99 % postgres:
 mydb
 mydb host(38497) SELECT

 This iotop output is rather incomplete - it does not show which of the
 processes actually did the I/O. The fact that a process does 99% of the
 I/O and yet performs no reads or writes is a bit strange.

 Use iotop -o and post the result.

 Does this help?

 yes.
 It seem you have an issue with your checkpoint syncing time, it is
 fixed in 9.1 and backported in 9.0 here :
 http://projects.2ndquadrant.com/backports

 People generally don't want to apply backports on their own, especially
 when it's a production server and when it's unclear it actually fixes the
 issue they have. I'm not sure about that.

I agree that most people don't want to do that themselves, but if it
happens to be the solution they can proceed or ask someone to do it.
People want to see their production system back to a normal situation,
here the limited information are not enought to be sure, but the
checkpoint sync time are clear: sync time are not correct.
It is very probable that compacting the fsync will help, but it is not
sure it is required yet.


 It is possible you have other problems that explains the issue you
 have. An immediate solution before trying a patch is to reduce your
 shared_buffer setting to something very low, like 1GB.

 Well, using low shared_buffers was used especially before 8.3, when the
 spread checkpoints were not available. It prevents the I/O overload when
 the database suddenly decides to write all of the dirty buffers. But he's
 on 9.0 (so he already has spread checkpoints).

It is a different animal here.


 Plus the number of buffers he's writing is negligible - usually about 700
 buffers (6MB), 3192 buffers (25MB) at most. That surely should not be a
 problem for the SSD he's using.

See the blog entry from Greg Smith:
 http://blog.2ndquadrant.com/en/2011/06/backporting-and-checkpoint-tro.html

And the slides of his talk at pgconf2011:
 
http://www.2ndquadrant.com/static/2quad/media/pdfs/talks/WriteStuff-PGCon2011.pdf

I was just pointing that there are known issues in this area, with
known solutions.

Getting more information on vacuum activity, bgwriter activity should help too.


Gaëtan, do you still have critical performance issue, or is it back to
normal/loaded (but not overloaded) situation ?

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-25 Thread Tomas Vondra
On 25 Listopad 2011, 12:43, Cédric Villemain wrote:
 Le 25 novembre 2011 11:25, Tomas Vondra t...@fuzzy.cz a écrit :
 On 24 Listopad 2011, 23:19, Cédric Villemain wrote:

 It seem you have an issue with your checkpoint syncing time, it is
 fixed in 9.1 and backported in 9.0 here :
 http://projects.2ndquadrant.com/backports

 People generally don't want to apply backports on their own, especially
 when it's a production server and when it's unclear it actually fixes
 the
 issue they have. I'm not sure about that.

 I agree that most people don't want to do that themselves, but if it
 happens to be the solution they can proceed or ask someone to do it.
 People want to see their production system back to a normal situation,
 here the limited information are not enought to be sure, but the
 checkpoint sync time are clear: sync time are not correct.
 It is very probable that compacting the fsync will help, but it is not
 sure it is required yet.

Yes, the sync times are quite crazy. Especially given the tiny number of
buffers to be written and the fact that the SSD should handle random I/O
quite well.

 It is possible you have other problems that explains the issue you
 have. An immediate solution before trying a patch is to reduce your
 shared_buffer setting to something very low, like 1GB.

 Well, using low shared_buffers was used especially before 8.3, when the
 spread checkpoints were not available. It prevents the I/O overload when
 the database suddenly decides to write all of the dirty buffers. But
 he's
 on 9.0 (so he already has spread checkpoints).

 It is a different animal here.


 Plus the number of buffers he's writing is negligible - usually about
 700
 buffers (6MB), 3192 buffers (25MB) at most. That surely should not be a
 problem for the SSD he's using.

 See the blog entry from Greg Smith:
  http://blog.2ndquadrant.com/en/2011/06/backporting-and-checkpoint-tro.html

 And the slides of his talk at pgconf2011:
  
 http://www.2ndquadrant.com/static/2quad/media/pdfs/talks/WriteStuff-PGCon2011.pdf

 I was just pointing that there are known issues in this area, with
 known solutions.

Thanks for the links, interesting stuff. Still, my impression is that the
SSD is stressed by something else, and the high fsync values during a
checkpoint are merely a symptom. So fixing a checkpoint (using the
backpatch) won't actually fix the issue. But I'm just guessing here.

 Getting more information on vacuum activity, bgwriter activity should help
 too.

Yes, that'd be nice. Gaëtan, can you post bgwriter-related options from
postgresql.conf and two snapshosts from pg_stat_bgwriter (say 5 minutes
apart, collected when the db is slow)? A complete 'iotop -o' output would
be nice too.

BTW what filesystem are we talking about? What mount options have you used?

What about the /proc/sys/vm/dirty_expire_centiseconds?

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-25 Thread Gaëtan Allart
Hello Tomas and Cédric,

Right now, the server is not all right. Load is above 30 and queries are
slow like hell.


Here's the complete iotop. Note the 71 MB/s writes (apparently on SELECT
queries).

Total DISK READ: 633.35 K/s | Total DISK WRITE: 71.06 M/s
  TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN IOCOMMAND
   
   
27352 be/4 postgres   15.64 K/s   86.01 K/s  0.00 % 99.99 % postgres:
database database 176.31.228.6(38816) SELECT
20226 be/4 postgres7.82 K/s0.00 B/s  0.00 % 99.99 % postgres:
database database 176.31.228.6(34166) SELECT
26950 be/4 postgres   23.46 K/s0.00 B/s  0.00 % 82.14 % postgres:
database database 46.105.104.205(40820) SELECT
23160 be/4 postgres3.91 K/s0.00 B/s  0.00 % 81.14 % postgres:
database database 46.105.104.205(58091) SELECT
29184 be/4 postgres7.82 K/s0.00 B/s  0.00 % 79.17 % postgres:
database database 46.105.104.205(51047) SELECT
27271 be/4 postgres   23.46 K/s  234.58 K/s  0.00 % 77.15 % postgres:
database database 46.105.104.205(42315) SELECT
28224 be/4 postgres0.00 B/s0.00 B/s  0.00 % 74.09 % postgres:
database database 46.105.104.205(49871) SELECT
27450 be/4 postgres   23.46 K/s0.00 B/s  0.00 % 70.12 % postgres:
database database 46.105.104.205(49096) SELECT
29206 be/4 postgres  121.20 K/s0.00 B/s  0.00 % 62.99 % postgres:
database database 176.31.228.6(39787) SELECT
 8202 be/4 postgres  129.02 K/s0.00 B/s  0.00 % 60.59 % postgres:
database database 91.121.89.14(57291) SELECT
29209 be/4 postgres3.91 K/s0.00 B/s  0.00 % 55.75 % postgres:
database database 176.31.228.6(39807) idle
27046 be/4 postgres3.91 K/s0.00 B/s  0.00 % 52.35 % postgres:
database database 46.105.104.205(41285) SELECT
29217 be/4 postgres   54.73 K/s0.00 B/s  0.00 % 50.18 % postgres:
database database 46.105.104.205(51178) SELECT
 3249 be/4 root3.91 K/s  320.59 K/s  0.00 % 40.31 % [kjournald]
28918 be/4 postgres7.82 K/s0.00 B/s  0.00 % 34.83 % postgres:
database database 46.105.104.205(50164) SELECT
26856 be/4 postgres7.82 K/s  234.58 K/s  0.00 % 34.30 % postgres:
database database 46.105.104.205(40589) SELECT
29205 be/4 postgres7.82 K/s0.00 B/s  0.00 % 34.17 % postgres:
database database 176.31.228.6(39784) SELECT
 4175 be/4 root0.00 B/s0.00 B/s  0.00 %  1.84 % [flush-8:0]
28905 be/4 postgres0.00 B/s0.00 B/s  0.00 %  0.02 % postgres:
database database 46.105.104.205(50125) SELECT
28919 be/4 postgres0.00 B/s0.00 B/s  0.00 %  0.01 % postgres:
database database 46.105.104.205(50167) SELECT
19807 be/4 postgres3.91 K/s0.00 B/s  0.00 %  0.00 % postgres:
database database 176.31.228.6(33598) SELECT
28703 be/4 postgres0.00 B/s   86.01 K/s  0.00 %  0.00 % postgres:
database database 176.31.228.6(39260) SELECT
29183 be/4 postgres7.82 K/s0.00 B/s  0.00 %  0.00 % postgres:
database database 46.105.104.205(51044) SELECT


Here is dirty_expire_centisecs :

cat /proc/sys/vm/dirty_expire_centisecs
3000


Bgwriter configuration is default :

#bgwriter_delay = 200ms   # 10-1ms between rounds
#bgwriter_lru_maxpages = 100  # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0  # 0-10.0 multipler on buffers scanned/round




Is there anything I can provide to help you ?



Thanks again,

Gaëtan



Le 25/11/11 14:12, « Tomas Vondra » t...@fuzzy.cz a écrit :

On 25 Listopad 2011, 12:43, Cédric Villemain wrote:
 Le 25 novembre 2011 11:25, Tomas Vondra t...@fuzzy.cz a écrit :
 On 24 Listopad 2011, 23:19, Cédric Villemain wrote:

 It seem you have an issue with your checkpoint syncing time, it is
 fixed in 9.1 and backported in 9.0 here :
 http://projects.2ndquadrant.com/backports

 People generally don't want to apply backports on their own, especially
 when it's a production server and when it's unclear it actually fixes
 the
 issue they have. I'm not sure about that.

 I agree that most people don't want to do that themselves, but if it
 happens to be the solution they can proceed or ask someone to do it.
 People want to see their production system back to a normal situation,
 here the limited information are not enought to be sure, but the
 checkpoint sync time are clear: sync time are not correct.
 It is very probable that compacting the fsync will help, but it is not
 sure it is required yet.

Yes, the sync times are quite crazy. Especially given the tiny number of
buffers to be written and the fact that the SSD should handle random I/O
quite well.

 It is possible you have other problems that explains the issue you
 have. An immediate solution before trying a patch is to reduce your
 shared_buffer setting to something very low, like 1GB.

 Well, using low shared_buffers was used especially before 8.3, when the
 spread checkpoints were not available. It prevents the I/O overload
when
 the database suddenly decides to write all of the dirty buffers. But
 he's
 on 9.0 (so he already has spread checkpoints).

 It is a 

Re: [GENERAL] General performance/load issue

2011-11-25 Thread Cédric Villemain
Le 25 novembre 2011 23:47, Gaëtan Allart gae...@nexylan.com a écrit :
 Hello Tomas and Cédric,

 Right now, the server is not all right. Load is above 30 and queries are
 slow like hell.


 Here's the complete iotop. Note the 71 MB/s writes (apparently on SELECT
 queries).

 Total DISK READ: 633.35 K/s | Total DISK WRITE: 71.06 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO    COMMAND


 27352 be/4 postgres   15.64 K/s   86.01 K/s  0.00 % 99.99 % postgres:
 database database 176.31.228.6(38816) SELECT
 20226 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 99.99 % postgres:
 database database 176.31.228.6(34166) SELECT
 26950 be/4 postgres   23.46 K/s    0.00 B/s  0.00 % 82.14 % postgres:
 database database 46.105.104.205(40820) SELECT
 23160 be/4 postgres    3.91 K/s    0.00 B/s  0.00 % 81.14 % postgres:
 database database 46.105.104.205(58091) SELECT
 29184 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 79.17 % postgres:
 database database 46.105.104.205(51047) SELECT
 27271 be/4 postgres   23.46 K/s  234.58 K/s  0.00 % 77.15 % postgres:
 database database 46.105.104.205(42315) SELECT
 28224 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 74.09 % postgres:
 database database 46.105.104.205(49871) SELECT
 27450 be/4 postgres   23.46 K/s    0.00 B/s  0.00 % 70.12 % postgres:
 database database 46.105.104.205(49096) SELECT
 29206 be/4 postgres  121.20 K/s    0.00 B/s  0.00 % 62.99 % postgres:
 database database 176.31.228.6(39787) SELECT
  8202 be/4 postgres  129.02 K/s    0.00 B/s  0.00 % 60.59 % postgres:
 database database 91.121.89.14(57291) SELECT
 29209 be/4 postgres    3.91 K/s    0.00 B/s  0.00 % 55.75 % postgres:
 database database 176.31.228.6(39807) idle
 27046 be/4 postgres    3.91 K/s    0.00 B/s  0.00 % 52.35 % postgres:
 database database 46.105.104.205(41285) SELECT
 29217 be/4 postgres   54.73 K/s    0.00 B/s  0.00 % 50.18 % postgres:
 database database 46.105.104.205(51178) SELECT
  3249 be/4 root        3.91 K/s  320.59 K/s  0.00 % 40.31 % [kjournald]
 28918 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 34.83 % postgres:
 database database 46.105.104.205(50164) SELECT
 26856 be/4 postgres    7.82 K/s  234.58 K/s  0.00 % 34.30 % postgres:
 database database 46.105.104.205(40589) SELECT
 29205 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 34.17 % postgres:
 database database 176.31.228.6(39784) SELECT
  4175 be/4 root        0.00 B/s    0.00 B/s  0.00 %  1.84 % [flush-8:0]
 28905 be/4 postgres    0.00 B/s    0.00 B/s  0.00 %  0.02 % postgres:
 database database 46.105.104.205(50125) SELECT
 28919 be/4 postgres    0.00 B/s    0.00 B/s  0.00 %  0.01 % postgres:
 database database 46.105.104.205(50167) SELECT
 19807 be/4 postgres    3.91 K/s    0.00 B/s  0.00 %  0.00 % postgres:
 database database 176.31.228.6(33598) SELECT
 28703 be/4 postgres    0.00 B/s   86.01 K/s  0.00 %  0.00 % postgres:
 database database 176.31.228.6(39260) SELECT
 29183 be/4 postgres    7.82 K/s    0.00 B/s  0.00 %  0.00 % postgres:
 database database 46.105.104.205(51044) SELECT


 Here is dirty_expire_centisecs :

 cat /proc/sys/vm/dirty_expire_centisecs
 3000


 Bgwriter configuration is default :

 #bgwriter_delay = 200ms   # 10-1ms between rounds
 #bgwriter_lru_maxpages = 100  # 0-1000 max buffers written/round
 #bgwriter_lru_multiplier = 2.0  # 0-10.0 multipler on buffers scanned/round




 Is there anything I can provide to help you ?

the checkpoints logs and the output of pg_stat_bgwriter (as asked by Tomas).

It is probable that shared_buffers are too small for your workload
(expected), do you still have issue with checkpoint sync duration ?

You should increase shared_buffers (you can iterate) *and* try to not
have the long checkpoint sync. (keep shared_buffers = 8GB, as already
mentioned upthread)

The output you provided does not show everything, and the select are
not looking to write that much. What process are writing ?




 Thanks again,

 Gaëtan



 Le 25/11/11 14:12, « Tomas Vondra » t...@fuzzy.cz a écrit :

On 25 Listopad 2011, 12:43, Cédric Villemain wrote:
 Le 25 novembre 2011 11:25, Tomas Vondra t...@fuzzy.cz a écrit :
 On 24 Listopad 2011, 23:19, Cédric Villemain wrote:

 It seem you have an issue with your checkpoint syncing time, it is
 fixed in 9.1 and backported in 9.0 here :
 http://projects.2ndquadrant.com/backports

 People generally don't want to apply backports on their own, especially
 when it's a production server and when it's unclear it actually fixes
 the
 issue they have. I'm not sure about that.

 I agree that most people don't want to do that themselves, but if it
 happens to be the solution they can proceed or ask someone to do it.
 People want to see their production system back to a normal situation,
 here the limited information are not enought to be sure, but the
 checkpoint sync time are clear: sync time are not correct.
 It is very probable that compacting the fsync will help, but it is not
 sure it is required yet.

Yes, the sync times are quite crazy. 

Re: [GENERAL] General performance/load issue

2011-11-25 Thread Gaëtan Allart
Here are the latest checkpoint logs :

LOG:  checkpoint complete: wrote 842 buffers (0.1%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=168.970 s, sync=0.005 s,
total=168.977 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 318 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 2 recycled; write=63.818 s, sync=0.006 s,
total=63.825 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 744 buffers (0.1%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=147.035 s, sync=0.006 s,
total=147.043 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 108 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 3 recycled; write=35.410 s, sync=14.921 s,
total=54.811 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 393 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=88.835 s, sync=43.210 s,
total=135.728 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 914 buffers (0.1%); 0 transaction log
file(s) added, 0 removed, 1 recycled; write=148.162 s, sync=14.249 s,
total=170.481 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 202 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 1 recycled; write=53.152 s, sync=0.004 s,
total=53.159 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 897 buffers (0.1%); 0 transaction log
file(s) added, 0 removed, 1 recycled; write=42.414 s, sync=1.175 s,
total=58.957 s
LOG:  checkpoint starting: shutdown immediate
LOG:  checkpoint complete: wrote 666 buffers (0.1%); 0 transaction log
file(s) added, 0 removed, 1 recycled; write=0.027 s, sync=1.600 s,
total=1.630 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 627 buffers (0.1%); 0 transaction log
file(s) added, 0 removed, 1 recycled; write=125.856 s, sync=0.006 s,
total=125.864 s
LOG:  checkpoint starting: time


And pg_stat_bgwriter :

postgres=# select *,now() from pg_stat_bgwriter;
 checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean
| maxwritten_clean | buffers_backend | buffers_alloc |  now
  
---+-++---+
--+-+---+--
-
   388 |  13 | 494948 |   4306591
|13555 | 7458743 |7835244602 | 2011-11-26
00:43:47.232924+01
(1 row)

postgres=# select *,now() from pg_stat_bgwriter;
 checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean
| maxwritten_clean | buffers_backend | buffers_alloc |  now
  
---+-++---+
--+-+---+--
-
   389 |  13 | 501802 |   4352198
|13809 | 7469220 |7839778941 | 2011-11-26
00:49:00.680779+01
(1 row)






Processes that were writing were SELECT queries against database.


Gaëtan


Le 26/11/11 00:17, « Cédric Villemain »
cedric.villemain.deb...@gmail.com a écrit :

Le 25 novembre 2011 23:47, Gaëtan Allart gae...@nexylan.com a écrit :
 Hello Tomas and Cédric,

 Right now, the server is not all right. Load is above 30 and queries are
 slow like hell.


 Here's the complete iotop. Note the 71 MB/s writes (apparently on SELECT
 queries).

 Total DISK READ: 633.35 K/s | Total DISK WRITE: 71.06 M/s
  TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN IOCOMMAND


 27352 be/4 postgres   15.64 K/s   86.01 K/s  0.00 % 99.99 % postgres:
 database database 176.31.228.6(38816) SELECT
 20226 be/4 postgres7.82 K/s0.00 B/s  0.00 % 99.99 % postgres:
 database database 176.31.228.6(34166) SELECT
 26950 be/4 postgres   23.46 K/s0.00 B/s  0.00 % 82.14 % postgres:
 database database 46.105.104.205(40820) SELECT
 23160 be/4 postgres3.91 K/s0.00 B/s  0.00 % 81.14 % postgres:
 database database 46.105.104.205(58091) SELECT
 29184 be/4 postgres7.82 K/s0.00 B/s  0.00 % 79.17 % postgres:
 database database 46.105.104.205(51047) SELECT
 27271 be/4 postgres   23.46 K/s  234.58 K/s  0.00 % 77.15 % postgres:
 database database 46.105.104.205(42315) SELECT
 28224 be/4 postgres0.00 B/s0.00 B/s  0.00 % 74.09 % postgres:
 database database 46.105.104.205(49871) SELECT
 27450 be/4 postgres   23.46 K/s0.00 B/s  0.00 % 70.12 % postgres:
 database database 46.105.104.205(49096) SELECT
 29206 be/4 postgres  121.20 K/s0.00 B/s  0.00 % 62.99 % postgres:
 database database 176.31.228.6(39787) SELECT
  8202 be/4 postgres  129.02 K/s0.00 B/s  0.00 % 60.59 % postgres:
 database database 91.121.89.14(57291) SELECT
 29209 be/4 postgres3.91 K/s0.00 B/s  0.00 % 55.75 % postgres:
 database database 176.31.228.6(39807) idle
 27046 be/4 postgres3.91 K/s0.00 B/s  0.00 % 52.35 % postgres:
 database database 

Re: [GENERAL] General performance/load issue

2011-11-25 Thread Tomas Vondra
Dne 26.11.2011 00:17, Cédric Villemain napsal(a):
 Le 25 novembre 2011 23:47, Gaëtan Allart gae...@nexylan.com a écrit :
 Hello Tomas and Cédric,

 Right now, the server is not all right. Load is above 30 and queries are
 slow like hell.


 Here's the complete iotop. Note the 71 MB/s writes (apparently on SELECT
 queries).

 Total DISK READ: 633.35 K/s | Total DISK WRITE: 71.06 M/s
  TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN IOCOMMAND


 27352 be/4 postgres   15.64 K/s   86.01 K/s  0.00 % 99.99 % postgres:
 database database 176.31.228.6(38816) SELECT
 20226 be/4 postgres7.82 K/s0.00 B/s  0.00 % 99.99 % postgres:
 database database 176.31.228.6(34166) SELECT
 26950 be/4 postgres   23.46 K/s0.00 B/s  0.00 % 82.14 % postgres:
 database database 46.105.104.205(40820) SELECT
 23160 be/4 postgres3.91 K/s0.00 B/s  0.00 % 81.14 % postgres:
 database database 46.105.104.205(58091) SELECT
 29184 be/4 postgres7.82 K/s0.00 B/s  0.00 % 79.17 % postgres:
 database database 46.105.104.205(51047) SELECT

Gaetan, you keep deleting the interesting rows for some reason. None of
the rows you've posted writes more than a few KB/s - what about the rows
that actually write those 71MB/s?

 Here is dirty_expire_centisecs :

 cat /proc/sys/vm/dirty_expire_centisecs
 3000

OK. That's a default value and it's usually too high (just like the
ratios), but don't change it until we find out what's wrong.

 Bgwriter configuration is default :

 #bgwriter_delay = 200ms   # 10-1ms between rounds
 #bgwriter_lru_maxpages = 100  # 0-1000 max buffers written/round
 #bgwriter_lru_multiplier = 2.0  # 0-10.0 multipler on buffers scanned/round

OK.

 Is there anything I can provide to help you ?
 
 the checkpoints logs and the output of pg_stat_bgwriter (as asked by Tomas).
 
 It is probable that shared_buffers are too small for your workload
 (expected), do you still have issue with checkpoint sync duration ?

I don't think that's the case. Too small shared buffers usually cause a
lot of reads (especially when all the queries are SELECTs as here), but
he has problems with writes.

And according to the pg_stat_bgwriter Gaetan posted, the checkpoints
wrote about 54MB, bgwriter about 370MB and backends about 80MB (during
the 5 minutes between snapshots).

So I'm really wondering where those 70MB/s of writes come from.

Two things just crossed my mind. The first one are hint bits - this may
cause a SELECT to write a lot of data. But I guess this is included in
the pg_stat_bgwriter stats.

The second one is on-disk sorting - this happens when a query needs to
sort so much data it can't be done in work_mem, so the data are pushed
to the disk, and AFAIK it's not included into the pg_stat_bgwriter.

But he has work_mem set to 128MB so it's rather unlikely. Gaetan, can
you verify that those queries that write the most data to the disk are
not performing any huge sorts etc? (See iotop which postgres process is
writing a lot of data and use the PID and pg_stat_activity to find out
which query it's executing.)

Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] General performance/load issue

2011-11-24 Thread Gaëtan Allart
Hello everyone,

I'm having some troubles with a Postgresql server.
We're using PG has a database backend for a very big website (lots of data
and much traffic).

The issue : server suddenly (1H after restart) becomes slow (queries not
responding), load rises (20 instead of 1), iowait rises (20 to 70%)
 
Version : 9.0.5
Server : Dual Xeon X5650 (24  cores total)
Memory : 48 GB
Disks : SSD


Top when overloaded :

21537 postgres  20   0 6420m 899m 892m D   22  1.9   0:01.86 postgres
   
   
21534 postgres  20   0 6554m 1.3g 1.1g D   14  2.7   0:01.72 postgres
   
   
21518 postgres  20   0 6419m 2.1g 2.1g D   10  4.6   0:02.10 postgres
   
   
21038 postgres  20   0 6716m 3.8g 3.5g D   10  8.0   0:45.46 postgres
   
   
21103 postgres  20   0 6571m 3.7g 3.6g D9  7.8   0:32.19 postgres
   
   
21079 postgres  20   0 6575m 3.8g 3.7g D8  8.1   0:36.39 postgres
   
   
21359 postgres  20   0 6563m 1.9g 1.8g D4  4.1   0:09.10 postgres
   
   
21422 postgres  20   0 6563m 1.9g 1.7g D4  3.9   0:08.34 postgres
   
   
19656 postgres  20   0 6727m 5.3g 5.0g D3 11.1   1:58.25 postgres
   
   
21418 postgres  20   0 6685m 2.1g 1.9g D2  4.5   0:12.42 postgres
   
   
21413 postgres  20   0 6693m 2.1g 1.8g D2  4.4   0:11.06 postgres
   
   
21541 postgres  20   0 6421m 719m 711m D1  1.5   0:00.48 postgres
   
   
14044 postgres  20   0 6418m 755m 750m D1  1.6   0:04.71 postgres
   
   
21326 postgres  20   0 6685m 2.2g 2.0g D1  4.7   0:15.82 postgres
   
   
21031 postgres  20   0 6688m 3.6g 3.4g D1  7.7   0:44.18 postgres
   
   
21055 postgres  20   0 6575m 4.1g 3.9g D1  8.6   0:39.11 postgres
   
   
21357 postgres  20   0 6693m 2.3g 2.0g D1  4.9   0:13.74 postgres


Memory when overloaded :

 ~ # free -m
 total   used   free sharedbuffers cached
Mem: 48339  47087   1251  0248  38720
-/+ buffers/cache:   8118  40220
Swap: 8190  1   8189



Postgresql.conf :

max_connections = 50
shared_buffers = 12G
temp_buffers = 40MB
work_mem = 128MB
maintenance_work_mem = 256MB
effective_cache_size = 12GB
max_files_per_process = 8192
fsync = off
checkpoint_segments = 256
checkpoint_timeout = 30min
checkpoint_completion_target = 0.9
seq_page_cost = 2.0
random_page_cost = 2.0


Did I do anything wrong? Any idea?

Regards,

Gaëtan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-24 Thread Tomas Vondra
On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
 Hello everyone,

 I'm having some troubles with a Postgresql server.
 We're using PG has a database backend for a very big website (lots of data
 and much traffic).

 The issue : server suddenly (1H after restart) becomes slow (queries not
 responding), load rises (20 instead of 1), iowait rises (20 to 70%)

 Version : 9.0.5
 Server : Dual Xeon X5650 (24  cores total)
 Memory : 48 GB
 Disks : SSD


 Top when overloaded :

Top is not the most useful tool here, I guess. Use iotop (will show you
which processes are doing the I/O) and tools like vmstat / iostat.

 Postgresql.conf :

 max_connections = 50
 shared_buffers = 12G
 temp_buffers = 40MB
 work_mem = 128MB
 maintenance_work_mem = 256MB
 max_files_per_process = 8192
 checkpoint_segments = 256
 checkpoint_timeout = 30min
 checkpoint_completion_target = 0.9

Fine. Let's see the options that look suspicious.

 effective_cache_size = 12GB

Why have you set it like this? According to the free output you've
posted the cache has about 38G, so why just 12G here? There are possible
reasons, but I don't think this is the case.

 fsync = off

A really bad idea. I guess your data are worthless to you, right?

 seq_page_cost = 2.0
 random_page_cost = 2.0

Eh? First of all, what really matters is the relative value of those two
values, and it's good habit to leave seq_page_cost = 1.0 and change just
the other values.

Plus the random I/O is not as cheap as sequential I/O even on SSD drives,
so I't recommend something like this:

seq_page_cost = 1.0
random_page_cost = 2.0 (or maybe 1.5)

Anyway this needs to be tested properly - watch the performance and tune
if needed.

 Did I do anything wrong? Any idea?

Not sure. My guess is you're getting bitten by a checkpoint. We need to
know a few more details.

1) What is dirty_background_ratio / dirty_ratio (see /proc/sys/vm/ directory)

2) enable log_checkpoints in postgresql.conf and see how it correlates to
the bad performance

3) check which processes are responsible for the I/O (use iotop)

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-24 Thread Gaëtan Allart
Hi Thomas,

I will be using iotop ;)
Right now, most i/o come from postgres: wal writer process.

- effective_cache_size

Okay, I'll rise it to 32Gb.

- fsync : changed to on ;)

- seq_pages : i'll run tests. Thanks.

- dirty :

cat /proc/sys/vm/dirty_ratio
20

cat /proc/sys/vm/dirty_background_ratio10



Thanks a lot Tomas. You're really helpful!

Gaëtan



Le 24/11/11 15:09, « Tomas Vondra » t...@fuzzy.cz a écrit :

On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
 Hello everyone,

 I'm having some troubles with a Postgresql server.
 We're using PG has a database backend for a very big website (lots of
data
 and much traffic).

 The issue : server suddenly (1H after restart) becomes slow (queries not
 responding), load rises (20 instead of 1), iowait rises (20 to 70%)

 Version : 9.0.5
 Server : Dual Xeon X5650 (24  cores total)
 Memory : 48 GB
 Disks : SSD


 Top when overloaded :

Top is not the most useful tool here, I guess. Use iotop (will show you
which processes are doing the I/O) and tools like vmstat / iostat.

 Postgresql.conf :

 max_connections = 50
 shared_buffers = 12G
 temp_buffers = 40MB
 work_mem = 128MB
 maintenance_work_mem = 256MB
 max_files_per_process = 8192
 checkpoint_segments = 256
 checkpoint_timeout = 30min
 checkpoint_completion_target = 0.9

Fine. Let's see the options that look suspicious.

 effective_cache_size = 12GB

Why have you set it like this? According to the free output you've
posted the cache has about 38G, so why just 12G here? There are possible
reasons, but I don't think this is the case.

 fsync = off

A really bad idea. I guess your data are worthless to you, right?

 seq_page_cost = 2.0
 random_page_cost = 2.0

Eh? First of all, what really matters is the relative value of those two
values, and it's good habit to leave seq_page_cost = 1.0 and change just
the other values.

Plus the random I/O is not as cheap as sequential I/O even on SSD drives,
so I't recommend something like this:

seq_page_cost = 1.0
random_page_cost = 2.0 (or maybe 1.5)

Anyway this needs to be tested properly - watch the performance and tune
if needed.

 Did I do anything wrong? Any idea?

Not sure. My guess is you're getting bitten by a checkpoint. We need to
know a few more details.

1) What is dirty_background_ratio / dirty_ratio (see /proc/sys/vm/
directory)

2) enable log_checkpoints in postgresql.conf and see how it correlates to
the bad performance

3) check which processes are responsible for the I/O (use iotop)

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-24 Thread Tomas Vondra
On 24 Listopad 2011, 15:27, Gaëtan Allart wrote:
 Hi Thomas,

 I will be using iotop ;)
 Right now, most i/o come from postgres: wal writer process.

What do you mean by most I/O - how much data is it writing? Is there a
vacuum running at the same time? What other processes are doing I/O?

Post a few lines of iostat -x 5 so we know what kind of I/O we're
dealing with.

 - dirty :

 cat /proc/sys/vm/dirty_ratio 20
 cat /proc/sys/vm/dirty_background_ratio 10

This means the cache may contain up to 3.2GB of data before the system
starts to write them out on background, and about 6.4GB before the
processes can't use the write cache.

What about /proc/sys/vm/dirty_expire_centiseconds?

Is there something interesting in the postgresql.log? Autovacuum maybe?

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-24 Thread Robert Treat
On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
 Hello everyone,

 I'm having some troubles with a Postgresql server.
 We're using PG has a database backend for a very big website (lots of data
 and much traffic).

 The issue : server suddenly (1H after restart) becomes slow (queries not
 responding), load rises (20 instead of 1), iowait rises (20 to 70%)

 Version : 9.0.5
 Server : Dual Xeon X5650 (24  cores total)
 Memory : 48 GB
 Disks : SSD


 Top when overloaded :

 Top is not the most useful tool here, I guess. Use iotop (will show you
 which processes are doing the I/O) and tools like vmstat / iostat.

 Postgresql.conf :

 max_connections = 50
 shared_buffers = 12G
 temp_buffers = 40MB
 work_mem = 128MB
 maintenance_work_mem = 256MB
 max_files_per_process = 8192
 checkpoint_segments = 256
 checkpoint_timeout = 30min
 checkpoint_completion_target = 0.9

 Fine. Let's see the options that look suspicious.


I think you missed some suspicious settings... I'd recommend setting
shared buffers to 8gb, and I'd likely reduce checkpoint segements to
30 and set the checkpoint timeout back to 5 minutes. Everything about
the way this server is configured (including those vm settings) is
pushing it towards delaying the WAL/Buffer/Checkpoint as long as
possible, which matches with the idea of good performance initial
followed by a period of poor performance and heavy i/o.

On a side note, I'd guess your work_mem is probably too high. 50
(connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM,
which is 25% of total ram on the box. That doesn't necessarily mean
game over, but it seem like it wouldn't be that hard to get thrashing
being set up that way. YMMV.

Robert Treat
conjecture: xzilla.net
consulting: omniti.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-24 Thread Tomas Vondra
On 24 Listopad 2011, 16:39, Robert Treat wrote:
 On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
 Postgresql.conf :

 max_connections = 50
 shared_buffers = 12G
 temp_buffers = 40MB
 work_mem = 128MB
 maintenance_work_mem = 256MB
 max_files_per_process = 8192
 checkpoint_segments = 256
 checkpoint_timeout = 30min
 checkpoint_completion_target = 0.9

 Fine. Let's see the options that look suspicious.


 I think you missed some suspicious settings... I'd recommend setting
 shared buffers to 8gb, and I'd likely reduce checkpoint segements to
 30 and set the checkpoint timeout back to 5 minutes. Everything about
 the way this server is configured (including those vm settings) is
 pushing it towards delaying the WAL/Buffer/Checkpoint as long as
 possible, which matches with the idea of good performance initial
 followed by a period of poor performance and heavy i/o.

Yes, checkpoints were my first thought too. OTOH the OP reported that most
of the I/O is caused by WAL writer - that's not exactly the part that does
the work during checkpoint. Plus the WAL may not be postponed, as it's
usually O_DIRECT and fsynced, right.

You're right that the writes are postponed, but I generally see that as a
good thing when combined with spread checkpoints. And even with those vm
settings (about 3.2GB for background writes), I wouldn't expect this
behaviour (because the page cache usually expires after 30 seconds). Say
you need 100% of the shared buffers is dirty and need to be written. You
have 27 minutes (30*0.9) to do that - that means about 8MB/s. With 30
seconds expire there might be about 240MB before the pdflush starts to
write the data to the SSD. And that can surely handle more than 50MB/s. So
why the long delay? The question is what else is going on there.

But all this is just guessing - I want to see the log_checkpoint message,
iostat results etc.

 On a side note, I'd guess your work_mem is probably too high. 50
 (connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM,
 which is 25% of total ram on the box. That doesn't necessarily mean
 game over, but it seem like it wouldn't be that hard to get thrashing
 being set up that way. YMMV.

This is one of the reasons why effective_cache_size should be lower than
32GB, probably ...

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-24 Thread Gaëtan Allart
Hi Robert,

Thanks for your help as well.
You're right about checkpoints, it's running pretty good at start then
encounter heavy i/os.

I've changed theses settings and also reduced work_mem a little and
reduced effective_cache_size btw.

LOG:  parameter work_mem changed to 96MB
LOG:  parameter effective_cache_size changed to 24GB
LOG:  parameter checkpoint_segments changed to 40
LOG:  parameter checkpoint_timeout changed to 5min

Apparently, it's been running fine since I made the first changes
recommenced by Tomas. Let's wait for a couple of hours again to confirm
this.



Gaëtan





Le 24/11/11 16:39, « Robert Treat » r...@xzilla.net a écrit :

On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
 Hello everyone,

 I'm having some troubles with a Postgresql server.
 We're using PG has a database backend for a very big website (lots of
data
 and much traffic).

 The issue : server suddenly (1H after restart) becomes slow (queries
not
 responding), load rises (20 instead of 1), iowait rises (20 to 70%)

 Version : 9.0.5
 Server : Dual Xeon X5650 (24  cores total)
 Memory : 48 GB
 Disks : SSD


 Top when overloaded :

 Top is not the most useful tool here, I guess. Use iotop (will show
you
 which processes are doing the I/O) and tools like vmstat / iostat.

 Postgresql.conf :

 max_connections = 50
 shared_buffers = 12G
 temp_buffers = 40MB
 work_mem = 128MB
 maintenance_work_mem = 256MB
 max_files_per_process = 8192
 checkpoint_segments = 256
 checkpoint_timeout = 30min
 checkpoint_completion_target = 0.9

 Fine. Let's see the options that look suspicious.


I think you missed some suspicious settings... I'd recommend setting
shared buffers to 8gb, and I'd likely reduce checkpoint segements to
30 and set the checkpoint timeout back to 5 minutes. Everything about
the way this server is configured (including those vm settings) is
pushing it towards delaying the WAL/Buffer/Checkpoint as long as
possible, which matches with the idea of good performance initial
followed by a period of poor performance and heavy i/o.

On a side note, I'd guess your work_mem is probably too high. 50
(connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM,
which is 25% of total ram on the box. That doesn't necessarily mean
game over, but it seem like it wouldn't be that hard to get thrashing
being set up that way. YMMV.

Robert Treat
conjecture: xzilla.net
consulting: omniti.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-24 Thread Gaëtan Allart
Tomas,

I've enabled logging of checkpoints.
I'm waiting for the next i/o crisisŠ

Gaëtan





Le 24/11/11 17:02, « Tomas Vondra » t...@fuzzy.cz a écrit :

On 24 Listopad 2011, 16:39, Robert Treat wrote:
 On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
 Postgresql.conf :

 max_connections = 50
 shared_buffers = 12G
 temp_buffers = 40MB
 work_mem = 128MB
 maintenance_work_mem = 256MB
 max_files_per_process = 8192
 checkpoint_segments = 256
 checkpoint_timeout = 30min
 checkpoint_completion_target = 0.9

 Fine. Let's see the options that look suspicious.


 I think you missed some suspicious settings... I'd recommend setting
 shared buffers to 8gb, and I'd likely reduce checkpoint segements to
 30 and set the checkpoint timeout back to 5 minutes. Everything about
 the way this server is configured (including those vm settings) is
 pushing it towards delaying the WAL/Buffer/Checkpoint as long as
 possible, which matches with the idea of good performance initial
 followed by a period of poor performance and heavy i/o.

Yes, checkpoints were my first thought too. OTOH the OP reported that most
of the I/O is caused by WAL writer - that's not exactly the part that does
the work during checkpoint. Plus the WAL may not be postponed, as it's
usually O_DIRECT and fsynced, right.

You're right that the writes are postponed, but I generally see that as a
good thing when combined with spread checkpoints. And even with those vm
settings (about 3.2GB for background writes), I wouldn't expect this
behaviour (because the page cache usually expires after 30 seconds). Say
you need 100% of the shared buffers is dirty and need to be written. You
have 27 minutes (30*0.9) to do that - that means about 8MB/s. With 30
seconds expire there might be about 240MB before the pdflush starts to
write the data to the SSD. And that can surely handle more than 50MB/s. So
why the long delay? The question is what else is going on there.

But all this is just guessing - I want to see the log_checkpoint message,
iostat results etc.

 On a side note, I'd guess your work_mem is probably too high. 50
 (connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM,
 which is 25% of total ram on the box. That doesn't necessarily mean
 game over, but it seem like it wouldn't be that hard to get thrashing
 being set up that way. YMMV.

This is one of the reasons why effective_cache_size should be lower than
32GB, probably ...

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-24 Thread Cédric Villemain
Le 24 novembre 2011 17:02, Tomas Vondra t...@fuzzy.cz a écrit :
 On 24 Listopad 2011, 16:39, Robert Treat wrote:
 On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
 Postgresql.conf :

 max_connections = 50
 shared_buffers = 12G
 temp_buffers = 40MB
 work_mem = 128MB
 maintenance_work_mem = 256MB
 max_files_per_process = 8192
 checkpoint_segments = 256
 checkpoint_timeout = 30min
 checkpoint_completion_target = 0.9

 Fine. Let's see the options that look suspicious.


 I think you missed some suspicious settings... I'd recommend setting
 shared buffers to 8gb, and I'd likely reduce checkpoint segements to
 30 and set the checkpoint timeout back to 5 minutes. Everything about
 the way this server is configured (including those vm settings) is
 pushing it towards delaying the WAL/Buffer/Checkpoint as long as
 possible, which matches with the idea of good performance initial
 followed by a period of poor performance and heavy i/o.

 Yes, checkpoints were my first thought too. OTOH the OP reported that most
 of the I/O is caused by WAL writer - that's not exactly the part that does
 the work during checkpoint. Plus the WAL may not be postponed, as it's
 usually O_DIRECT and fsynced, right.

 You're right that the writes are postponed, but I generally see that as a
 good thing when combined with spread checkpoints. And even with those vm
 settings (about 3.2GB for background writes), I wouldn't expect this
 behaviour (because the page cache usually expires after 30 seconds). Say
 you need 100% of the shared buffers is dirty and need to be written. You
 have 27 minutes (30*0.9) to do that - that means about 8MB/s. With 30
 seconds expire there might be about 240MB before the pdflush starts to
 write the data to the SSD. And that can surely handle more than 50MB/s. So
 why the long delay? The question is what else is going on there.

 But all this is just guessing - I want to see the log_checkpoint message,
 iostat results etc.

 On a side note, I'd guess your work_mem is probably too high. 50
 (connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM,
 which is 25% of total ram on the box. That doesn't necessarily mean
 game over, but it seem like it wouldn't be that hard to get thrashing
 being set up that way. YMMV.

 This is one of the reasons why effective_cache_size should be lower than
 32GB, probably ...

according to 'free' output, 38GB is what is here right now.
effective_cache_size is just informative, so you can put it to 1TB
without memory issue.
And, it is OS cache+PG cache.

There is not enougth information yet to be sure on what's happening.
log_checkpoint output will help for sure.

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-24 Thread Gaëtan Allart
Finally, it crashed againŠ :-(

Here's the output of iotop while databased was inaccessible :

32361 be/4 postgres0.00 B/s0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(34847) idle
32244 be/4 postgres  163.48 K/s0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(34660) SELECT
32045 be/4 postgres7.78 K/s0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(33765) SELECT
32158 be/4 postgres7.78 K/s0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(34112) SELECT
32242 be/4 postgres7.78 K/s0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(34632) SELECT
32372 be/4 postgres0.00 B/s0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(38858) idle in transaction
32231 be/4 postgres   15.57 K/s0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(38602) SELECT
28811 be/4 postgres3.89 K/s0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(40594) SELECT
32190 be/4 postgres3.89 K/s0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(38497) SELECT



And the latest logs :

LOG:  checkpoint complete: wrote 3192 buffers (0.2%); 0 transaction log
file(s) added, 1 removed, 0 recycled; write=262.636 s, sync=135.456 s,
total=416.630 s
LOG:  checkpoint complete: wrote 716 buffers (0.0%); 0 transaction log
file(s) added, 1 removed, 0 recycled; write=165.497 s, sync=17.111 s,
total=193.199 s


WARNING:  pgstat wait timeout
STATEMENT:  SELECT 'commit',sum(pg_stat_get_db_xact_commit(oid)) FROM
pg_database WHERE datname=$1
  UNION ALL
 SELECT
'rollback',sum(pg_stat_get_db_xact_rollback(oid)) FROM pg_database WHERE
datname=$2

LOG:  checkpoint complete: wrote 699 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=203.023 s, sync=119.037 s,
total=335.012 s
LOG:  checkpoint complete: wrote 348 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=120.412 s, sync=0.020 s,
total=120.435 s





Does this help?

Gaëtan

Le 24/11/11 17:02, « Tomas Vondra » t...@fuzzy.cz a écrit :

On 24 Listopad 2011, 16:39, Robert Treat wrote:
 On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
 Postgresql.conf :

 max_connections = 50
 shared_buffers = 12G
 temp_buffers = 40MB
 work_mem = 128MB
 maintenance_work_mem = 256MB
 max_files_per_process = 8192
 checkpoint_segments = 256
 checkpoint_timeout = 30min
 checkpoint_completion_target = 0.9

 Fine. Let's see the options that look suspicious.


 I think you missed some suspicious settings... I'd recommend setting
 shared buffers to 8gb, and I'd likely reduce checkpoint segements to
 30 and set the checkpoint timeout back to 5 minutes. Everything about
 the way this server is configured (including those vm settings) is
 pushing it towards delaying the WAL/Buffer/Checkpoint as long as
 possible, which matches with the idea of good performance initial
 followed by a period of poor performance and heavy i/o.

Yes, checkpoints were my first thought too. OTOH the OP reported that most
of the I/O is caused by WAL writer - that's not exactly the part that does
the work during checkpoint. Plus the WAL may not be postponed, as it's
usually O_DIRECT and fsynced, right.

You're right that the writes are postponed, but I generally see that as a
good thing when combined with spread checkpoints. And even with those vm
settings (about 3.2GB for background writes), I wouldn't expect this
behaviour (because the page cache usually expires after 30 seconds). Say
you need 100% of the shared buffers is dirty and need to be written. You
have 27 minutes (30*0.9) to do that - that means about 8MB/s. With 30
seconds expire there might be about 240MB before the pdflush starts to
write the data to the SSD. And that can surely handle more than 50MB/s. So
why the long delay? The question is what else is going on there.

But all this is just guessing - I want to see the log_checkpoint message,
iostat results etc.

 On a side note, I'd guess your work_mem is probably too high. 50
 (connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM,
 which is 25% of total ram on the box. That doesn't necessarily mean
 game over, but it seem like it wouldn't be that hard to get thrashing
 being set up that way. YMMV.

This is one of the reasons why effective_cache_size should be lower than
32GB, probably ...

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] General performance/load issue

2011-11-24 Thread Cédric Villemain
Le 24 novembre 2011 20:38, Gaëtan Allart gae...@nexylan.com a écrit :
 Finally, it crashed againŠ :-(

 Here's the output of iotop while databased was inaccessible :

 32361 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
 mydb host(34847) idle
 32244 be/4 postgres  163.48 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
 mydb host(34660) SELECT
 32045 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
 mydb host(33765) SELECT
 32158 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
 mydb host(34112) SELECT
 32242 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
 mydb host(34632) SELECT
 32372 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
 mydb host(38858) idle in transaction
 32231 be/4 postgres   15.57 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
 mydb host(38602) SELECT
 28811 be/4 postgres    3.89 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
 mydb host(40594) SELECT
 32190 be/4 postgres    3.89 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
 mydb host(38497) SELECT



 And the latest logs :

 LOG:  checkpoint complete: wrote 3192 buffers (0.2%); 0 transaction log
 file(s) added, 1 removed, 0 recycled; write=262.636 s, sync=135.456 s,
 total=416.630 s
 LOG:  checkpoint complete: wrote 716 buffers (0.0%); 0 transaction log
 file(s) added, 1 removed, 0 recycled; write=165.497 s, sync=17.111 s,
 total=193.199 s


 WARNING:  pgstat wait timeout
 STATEMENT:  SELECT 'commit',sum(pg_stat_get_db_xact_commit(oid)) FROM
 pg_database WHERE datname=$1
                      UNION ALL
                     SELECT
 'rollback',sum(pg_stat_get_db_xact_rollback(oid)) FROM pg_database WHERE
 datname=$2

 LOG:  checkpoint complete: wrote 699 buffers (0.0%); 0 transaction log
 file(s) added, 0 removed, 0 recycled; write=203.023 s, sync=119.037 s,
 total=335.012 s
 LOG:  checkpoint complete: wrote 348 buffers (0.0%); 0 transaction log
 file(s) added, 0 removed, 0 recycled; write=120.412 s, sync=0.020 s,
 total=120.435 s





 Does this help?

yes.
It seem you have an issue with your checkpoint syncing time, it is
fixed in 9.1 and backported in 9.0 here :
http://projects.2ndquadrant.com/backports

It is possible you have other problems that explains the issue you
have. An immediate solution before trying a patch is to reduce your
shared_buffer setting to something very low, like 1GB.



 Gaëtan

 Le 24/11/11 17:02, « Tomas Vondra » t...@fuzzy.cz a écrit :

On 24 Listopad 2011, 16:39, Robert Treat wrote:
 On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
 Postgresql.conf :

 max_connections = 50
 shared_buffers = 12G
 temp_buffers = 40MB
 work_mem = 128MB
 maintenance_work_mem = 256MB
 max_files_per_process = 8192
 checkpoint_segments = 256
 checkpoint_timeout = 30min
 checkpoint_completion_target = 0.9

 Fine. Let's see the options that look suspicious.


 I think you missed some suspicious settings... I'd recommend setting
 shared buffers to 8gb, and I'd likely reduce checkpoint segements to
 30 and set the checkpoint timeout back to 5 minutes. Everything about
 the way this server is configured (including those vm settings) is
 pushing it towards delaying the WAL/Buffer/Checkpoint as long as
 possible, which matches with the idea of good performance initial
 followed by a period of poor performance and heavy i/o.

Yes, checkpoints were my first thought too. OTOH the OP reported that most
of the I/O is caused by WAL writer - that's not exactly the part that does
the work during checkpoint. Plus the WAL may not be postponed, as it's
usually O_DIRECT and fsynced, right.

You're right that the writes are postponed, but I generally see that as a
good thing when combined with spread checkpoints. And even with those vm
settings (about 3.2GB for background writes), I wouldn't expect this
behaviour (because the page cache usually expires after 30 seconds). Say
you need 100% of the shared buffers is dirty and need to be written. You
have 27 minutes (30*0.9) to do that - that means about 8MB/s. With 30
seconds expire there might be about 240MB before the pdflush starts to
write the data to the SSD. And that can surely handle more than 50MB/s. So
why the long delay? The question is what else is going on there.

But all this is just guessing - I want to see the log_checkpoint message,
iostat results etc.

 On a side note, I'd guess your work_mem is probably too high. 50
 (connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM,
 which is 25% of total ram on the box. That doesn't necessarily mean
 game over, but it seem like it wouldn't be that hard to get thrashing
 being set up that way. YMMV.

This is one of the reasons why effective_cache_size should be lower than
32GB, probably ...

Tomas


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general




--