Hello all you guys, I've sent the same problem in performance list. Some answered me, but didn't resolved the situation.
Since 2 weeks I'm get stucked in a very strange situation: from time to time (sometimes with intervals less than 10 minutes), the server get "stucked"/"hang" (I dont know how to call it) and every connections on postgres (dont matter if it's SELECT, UPDATE, DELETE, INSERT, startup, authentication...) seems like get "paused"; after some seconds (say ~10 or ~15 sec, sometimes less) everything "goes OK". So, my first trial was to check disks. Running "iostat" apparently showed that disks was OK. It's a Raid10, 4 600GB SAS, IBM Storage DS3512, over FC. IBM DS Storage Manager says that disks is OK. Then, memory. Apparently no swap being used: [###@### data]# free -m total used free shared buffers cached Mem: 145182 130977 14204 0 43 121407 -/+ buffers/cache: 9526 135655 Swap: 6143 65 6078 No error on /var/log/messages. Following is what I've tried: 1) Emre Hasegeli has suggested to reduce my shared buffers, but it's already low: total server memory: 141 GB shared_buffers: 16 GB Maybe it's too low? I've been thinking to increase to 32 GB. max_connections = 500 and ~400 connections average 2) Being "hanging" on "semop" I tried the following, as suggested on some "tuning page" over web. Is it right? echo "250 32000 200 128" > /proc/sys/kernel/sem 3) I think my problem could be something related to "LwLocks", as I did some googling and found some related problems and slides. There is some way I can confirm this? 4) Rebooting the server didn't make any difference. Following, is some strace of one process, and some others, maybe, useful infos. Every processes I've straced bring the same scenario: seems it get stucked on semop. Any help appreciate, [###@### ~]# strace -ttp 5209 Process 5209 attached - interrupt to quit 09:01:54.122445 semop(2293765, {{15, -1, 0}}, 1) = 0 09:01:55.368785 semop(2293765, {{15, -1, 0}}, 1) = 0 09:01:55.368902 semop(2523148, {{11, 1, 0}}, 1) = 0 09:01:55.368978 semop(2293765, {{15, -1, 0}}, 1) = 0 09:01:55.369861 semop(2293765, {{15, -1, 0}}, 1) = 0 09:01:55.370648 semop(3047452, {{6, 1, 0}}, 1) = 0 09:01:55.370694 semop(2293765, {{15, -1, 0}}, 1) = 0 09:01:55.370762 semop(2785300, {{12, 1, 0}}, 1) = 0 09:01:55.370805 access("base/2048098929", F_OK) = 0 09:01:55.370953 open("base/2048098929/PG_VERSION", O_RDONLY) = 5 [###@### ~]# strace -p 16877 -tt Process 16877 attached - interrupt to quit 09:57:56.305123 semop(163844, {{13, -1, 0}}, 1) = 0 09:57:59.453714 semop(163844, {{13, -1, 0}}, 1) = 0 09:58:04.004023 semop(163844, {{13, -1, 0}}, 1) = 0 09:58:04.004209 brk(0x1f44000) = 0x1f44000 09:58:04.004305 brk(0x1f42000) = 0x1f42000 [###@### data]# ipcs -l - Shared Memory Limits - max number of segments = 4096 max seg size (kbytes) = 83886080 max total shared memory (kbytes) = 17179869184 min seg size (bytes) = 1 ------ Semaphore Limits -------- max number of arrays = 128 max semaphores per array = 250 max semaphores system wide = 32000 max ops per semop call = 200 semaphore max value = 32767 ------ Messages: Limits -------- max queues system wide = 32768 max size of message (bytes) = 65536 default max size of queue (bytes) = 65536 [###@### data]# ipcs -u ----- Semaphore Status ------- used arrays: 34 allocated semaphores: 546 [###@### data]# uname -a Linux ### 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 23:43:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux postgres=# select version(); version -------------------------------------------------------------------------------------------------------------- PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit (1 registro) [###@### data]# cat /etc/redhat-release CentOS release 6.3 (Final)