Hello guys, I've been trying to "hunting down" my problem and reached the following:
1) Emre Hasegeli has suggested to reduce my shared buffers, but it's already low: total server memory: 141 GB shared_buffers: 16 GB Maybe it's too low? I've been thinking to increase to 32 GB. max_connections = 500 and ~400 connections average 2) Being "hanging" on "semop" I tried the following, as suggested on some "tuning page" over web. echo "250 32000 100 128" > /proc/sys/kernel/sem 3) I think my problem could be something related to "LwLocks", as I did some googling and found some related problems and slides. There is some way I can confirm this? 4) Rebooting the server didn't make any difference. Appreciate any help, Rafael On Tue, Jun 11, 2013 at 9:48 AM, Rafael Domiciano < rafael.domici...@gmail.com> wrote: > Hello all you guys, > > Since saturday I'm get stucked in a very strange situation: from time to > time (sometimes with intervals less than 10 minutes), the server get > "stucked"/"hang" (I dont know how to call it) and every connections on > postgres (dont matter if it's SELECT, UPDATE, DELETE, INSERT, startup, > authentication...) seems like get "paused"; after some seconds (say ~10 or > ~15 sec, sometimes less) everything "goes OK". > > So, my first trial was to check disks. Running "iostat" apparently showed > that disks was OK. It's a Raid10, 4 600GB SAS, IBM Storage DS3512, over FC. > IBM DS Storage Manager says that disks is OK. > > Then, memory. Apparently no swap being used: > [###@### data]# free -m > total used free shared buffers cached > Mem: 145182 130977 14204 0 43 121407 > -/+ buffers/cache: 9526 135655 > Swap: 6143 65 6078 > > No error on /var/log/messages. > > Following, is some strace of one processes, and some others, maybe, useful > infos. Every processes I've straced bring the same scenario: seems it get > stucked on semop. > > There's no modification in server since last monday, that I changed > pg_hba.conf to login in LDAP. The LDAP Server apparently is OK, and tcpdump > doesnt show any slow on response, neither big activity on this port. > > Any help appreciate, > > [###@### ~]# strace -ttp 5209 > Process 5209 attached - interrupt to quit > 09:01:54.122445 semop(2293765, {{15, -1, 0}}, 1) = 0 > 09:01:55.368785 semop(2293765, {{15, -1, 0}}, 1) = 0 > 09:01:55.368902 semop(2523148, {{11, 1, 0}}, 1) = 0 > 09:01:55.368978 semop(2293765, {{15, -1, 0}}, 1) = 0 > 09:01:55.369861 semop(2293765, {{15, -1, 0}}, 1) = 0 > 09:01:55.370648 semop(3047452, {{6, 1, 0}}, 1) = 0 > 09:01:55.370694 semop(2293765, {{15, -1, 0}}, 1) = 0 > 09:01:55.370762 semop(2785300, {{12, 1, 0}}, 1) = 0 > 09:01:55.370805 access("base/2048098929", F_OK) = 0 > 09:01:55.370953 open("base/2048098929/PG_VERSION", O_RDONLY) = 5 > > [###@### data]# ipcs -l > > - Shared Memory Limits - > max number of segments = 4096 > max seg size (kbytes) = 83886080 > max total shared memory (kbytes) = 17179869184 > min seg size (bytes) = 1 > > ------ Semaphore Limits -------- > max number of arrays = 128 > max semaphores per array = 250 > max semaphores system wide = 32000 > max ops per semop call = 32 > semaphore max value = 32767 > > ------ Messages: Limits -------- > max queues system wide = 32768 > max size of message (bytes) = 65536 > default max size of queue (bytes) = 65536 > > [###@### data]# ipcs -u > ----- Semaphore Status ------- > used arrays: 34 > allocated semaphores: 546 > > [###@### data]# uname -a > Linux ### 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 23:43:09 UTC 2012 > x86_64 x86_64 x86_64 GNU/Linux > > postgres=# select version(); > version > > -------------------------------------------------------------------------------------------------------------- > PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 > 20120305 (Red Hat 4.4.6-4), 64-bit > (1 registro) > > [###@### data]# cat /etc/redhat-release > CentOS release 6.3 (Final) >