drvillo wrote:
-given the configuration attached (which is basically a vanilla one) and the
number of buffers written at each execution, are these execution times
normal or above average?

Given the configuration attached, most of them are normal. One problem may be that your vanilla configuration has checkpoint_segments set to 3. There is some logic in the checkpoint code to try and spread checkpoint writes out over a longer period of time. The intention is for a slower write spread to disrupt concurrent client activity less. It doesn't work all that well unless you give it some more segments to work with.

Also, with the default setting for shared_buffers, you are doing a lot more redundant writes than you should be. The following postgresql.conf changes should improve things for you:

shared_buffers=256MB
checkpoint_segments=10
wal_buffers=16MB

You may have to adjust your kernel shared memory memory settings for that to work. See http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for an intro to these and the other common parameters you should consider adjusting.

-in the case of the execution that overruns past the timeout, what are the
implications wrt the client application?

There really aren't any in the database. The server will immediately begin another checkpoint. Some additional disk space is used. So long as the server doesn't run out of disk space from that, clients shouldn't care.


-AFAIU client connections are basically stalled during checkpoints. Is it
reasonable to infer that the fact that the application blocking on a
getConnection() might be related to checkpoints being executed?

It can be.  What I suspect is happening during the bad one:

2011-04-22 06:51:41 CEST LOG:  checkpoint complete: wrote 108 buffers
(2.6%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=409.007 s, sync=4.672 s, total=414.070 s
2011-04-22 06:55:42 CEST LOG:  could not receive data from client: No
connection could be made because the target machine actively refused it.


Is that something is happening on the disks of the server that keeps the database from being able to write efficiently during this checkpoint. It then slows the checkpoint so much that clients are timing out.

The tuning changes I suggested will lower the total amount of I/O the server does between checkpoints, which will mean there is less information in the OS cache to write out when the checkpoint comes. That may help, if the problem is really in the database.

-considering some tuning on the PG side, should I try increasing
checkpoint_timeout and rising checkpoint_completion_target to lessen the
impact of IO on the client or should I shorten the period so there's less
stuff to write? from the number of buffers written on average I'd assume the
first option is the one to go for but I might miss some bit of reasoning
here...

Your problems are likely because the operating system cache is getting filled with something that is slowing checkpoints down. Maybe it's the regular database writes during the five minutes between checkpoints; maybe it's something else running on the server. Whatever is happening, you're unlikely to make it better by adjusting how often they happen. Either get the database to write less between checkpoints (like the changes I suggested), or figure out what else is doing the writes. I suspect they are coming from outside the database, only because if you really had high write activity on this server you'd also be having checkpoints more frequently, too.


I've read about
RAID5 not being a wise setup for disks hosting PG, what about RAID1?

The problem with RAID5 is that it lowers write performance of a larger number of disks so it's potentially no better than a single drive. RAID1 is essentially a single drive, too. You may discover you're just running over what one drive can do. Something odd does seem to be doing on though. Normally in your situation I would try to find some system downtime and test the read/write speed of the drives, look for issues there. As Robert said already, you shouldn't be running this slowly unless there's something going wrong.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to