I would like to thank everyone for their timely suggestions.
The problem appears to be resolved now. We verified/modified - locking/indexes/vacuum/checkpoints/IO bottleneck/queries, etc.
Couple significant changes were the number of checkpoint segments were increased, and we moved over the database to a new SAN RAID10 volume (which was in plan anyway, just did it sooner).
I am dealing with an app here that uses pg to handle a few thousand concurrent web users. It seems that under heavy load, the INSERT and UPDATE statements to one or two specific tables keep queuing up, to the count of 150+ (one table has about 432K rows, other has about 2.6Million rows), resulting in ‘wait’s for other queries, and then everything piles up, with the load average shooting up to 10+.
We (development) have gone through the queries/explain analyzes and made sure the appropriate indexes exist among other efforts put in.
I would like to know if there is anything that can be changed for better from the systems perspective. Here’s what I have done and some recent changes from the system side:
-Upgraded from 7.4.0 to 7.4.1 sometime ago
-Upgraded from RH8 to RHEL 3.0
-The settings from postgresql.conf (carried over, basically) are:
shared_buffers = 10240 (80MB)
max_connections = 400
sort_memory = 1024
effective_cache_size = 262144 (2GB)
checkpoint_segments = 15
stats_start_collector = true
stats_command_string = true
Rest everything is at default
In /etc/sysctl.conf (512MB shared mem)
kernel.shmall = 536870912
kernel.shmmax = 536870912
-This is a new Dell 6650 (quad XEON 2.2GHz, 8GB RAM, Internal HW RAID10), RHEL 3.0 (2.4.21-20.ELsmp), PG 7.4.1
-Vaccum Full run everyday
-contrib/Reindex run everyday
-Disabled HT in BIOS
I would greatly appreciate any helpful ideas.
Thanks in advance,