Re: [PERFORM] Checkpoint execution overrun impact?

Greg Smith Thu, 12 May 2011 19:11:08 -0700

drvillo wrote:

-given the configuration attached (which is basically a vanilla one) and the
number of buffers written at each execution, are these execution times

normal or above average?

Given the configuration attached, most of them are normal. One problemmay be that your vanilla configuration has checkpoint_segments set to3. There is some logic in the checkpoint code to try and spreadcheckpoint writes out over a longer period of time. The intention isfor a slower write spread to disrupt concurrent client activity less.It doesn't work all that well unless you give it some more segments towork with.

Also, with the default setting for shared_buffers, you are doing a lotmore redundant writes than you should be. The following postgresql.confchanges should improve things for you:


shared_buffers=256MB
checkpoint_segments=10
wal_buffers=16MB

You may have to adjust your kernel shared memory memory settings forthat to work. Seehttp://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for anintro to these and the other common parameters you should consideradjusting.

-in the case of the execution that overruns past the timeout, what are the
implications wrt the client application?

There really aren't any in the database. The server will immediatelybegin another checkpoint. Some additional disk space is used. So longas the server doesn't run out of disk space from that, clients shouldn'tcare.

-AFAIU client connections are basically stalled during checkpoints. Is it
reasonable to infer that the fact that the application blocking on a
getConnection() might be related to checkpoints being executed?


It can be.  What I suspect is happening during the bad one:

2011-04-22 06:51:41 CEST LOG:  checkpoint complete: wrote 108 buffers
(2.6%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=409.007 s, sync=4.672 s, total=414.070 s
2011-04-22 06:55:42 CEST LOG:  could not receive data from client: No
connection could be made because the target machine actively refused it.

Is that something is happening on the disks of the server that keeps thedatabase from being able to write efficiently during this checkpoint.It then slows the checkpoint so much that clients are timing out.

The tuning changes I suggested will lower the total amount of I/O theserver does between checkpoints, which will mean there is lessinformation in the OS cache to write out when the checkpoint comes.That may help, if the problem is really in the database.

-considering some tuning on the PG side, should I try increasing
checkpoint_timeout and rising checkpoint_completion_target to lessen the
impact of IO on the client or should I shorten the period so there's less
stuff to write? from the number of buffers written on average I'd assume the
first option is the one to go for but I might miss some bit of reasoning
here...

Your problems are likely because the operating system cache is gettingfilled with something that is slowing checkpoints down. Maybe it's theregular database writes during the five minutes between checkpoints;maybe it's something else running on the server. Whatever is happening,you're unlikely to make it better by adjusting how often they happen.Either get the database to write less between checkpoints (like thechanges I suggested), or figure out what else is doing the writes. Isuspect they are coming from outside the database, only because if youreally had high write activity on this server you'd also be havingcheckpoints more frequently, too.

I've read about
RAID5 not being a wise setup for disks hosting PG, what about RAID1?

The problem with RAID5 is that it lowers write performance of a largernumber of disks so it's potentially no better than a single drive.RAID1 is essentially a single drive, too. You may discover you're justrunning over what one drive can do. Something odd does seem to be doingon though. Normally in your situation I would try to find some systemdowntime and test the read/write speed of the drives, look for issuesthere. As Robert said already, you shouldn't be running this slowlyunless there's something going wrong.


--
Greg Smith   2ndQuadrant US    [email protected]   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Checkpoint execution overrun impact?

Reply via email to