On Mon, Oct 27, 2008 at 8:23 PM, David Rees [EMAIL PROTECTED] wrote:
Hi,
I've got an OLTP application which occasionally suffers from slow
commit time. The process in question does something like this:
1. Do work
2. begin transaction
3. insert record
4. commit transaction
5. Do more work
6. begin transaction
7. update record
8. commit transaction
9. Do more work
The vast majority of the time, everything runs very quickly. The
median processing time for the whole thing is 7ms.
However, occasionally, processing time will jump up significantly -
the average processing time is around 20ms with the maximum processing
time taking 2-4 seconds for a small percentage of transactions. Ouch!
Turning on statement logging and analyzing the logs of the application
itself shows that step #4 is the culprit of the vast majority of the
slow transactions.
Software: CentOS 4.7, PostgreSQL 8.3.4, Slony-I 1.2.15 (the database
in question is replicated using slony)
Hardware: 2x Xeon 5130, 4GB RAM, 6-disk RAID10 15k RPM, BBU on the controller
Notable configuration changes:
shared_buffers = 800MB
temp_buffers = 200MB
work_mem = 16M
maintenance_work_mem = 800MB
vacuum_cost_delay = 10
checkpoint_segments = 10
effective_cache_size = 2500MB
I found this post[1] from a while back which was informative:
Both situations affect me in that I have Slony which I believe
executes triggers upon commit, and looking at the disk IO stats, there
is an elevated level of IO activity during this time, but it doesn't
appear to be heavy enough to cause the type of delays I am seeing.
Reading this page[2] indicates that I may want to increase my
checkpoint_segments, checkpoint_timeout and bgwriter settings, but
looking at pg_stat_bgwriter seems to indicate that my current settings
are probably OK?
# select * from pg_stat_bgwriter;
checkpoints_timed | checkpoints_req | buffers_checkpoint |
buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc
---+-++---+--+-+---
3834 | 105 |3091905 |
25876 | 110 | 2247576 | 2889873
Any suggestions on how to proceed and debug the problem from here?
My only other guess is that there is some sort of locking issues going
on which is slowing things down and that it may also be slony related,
as I also see a high number of slony related queries taking longer
than 1 second...
I bet your problem is disk syncing. Your xlogs are on the data volume
so any type of burst activity can push back commit times. If this is
the case, you have basically three solutions to this problem:
*) buy more disks (i's start with pushing the xlogs out to dedicated volume)
*) disable fsync (very unsafe) or synchronous commit (somewhat less unsafe)
*) checkpoint/bgwriter tuning: can provide incremental gains. This is
not magic...at best you can smooth out bursty checkpoints. If your
problems are really serious (yours don't seem to be), you have to look
at the previous options.
Have you temporarily disabling slony to see if the problem goes away?
(My guess is it's not slony).
merlin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance