Hi List, we are currently running a rather large postgresql-installation with approximately 4k Transactions and 50k index scans per second.
In the last days on some times of the day (irregular - 3-4 times a day), some of the postmaster processes are running with 100% cpu usage. That leads to a totally breakdown of the query execution. We see tons of statements which are correctly automatically aborted by our statement_timeout set to 15 seconds. I tried to search, but do not really recognize what the problem could be there... Some things i have checked: - We are not running any bulk jobs or maintenance scripts at this time - No system errors in any logs during that slowdowns - I/O Performance seems fine. No high IO Wait amount... But IO Write totally drops in that times because it seems that no postgres process can perform any update I just installed a script, which prints me out the top and ps axf information for facing out the problem. I will post a snippet of the top here: > top - 15:55:02 up 59 days, 37 min, 1 user, load average: 35.95, 14.04, 7.32 > Tasks: 2417 total, 54 running, 2363 sleeping, 0 stopped, 0 zombie > Cpu(s): 6.3%us, 1.0%sy, 0.0%ni, 90.2%id, 1.9%wa, 0.0%hi, 0.6%si, 0.0%st > Mem: 264523700k total, 250145228k used, 14378472k free, 207032k buffers > Swap: 2097144k total, 553624k used, 1543520k free, 166905748k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 29852 postgres 20 0 131g 59m 35m R 100.0 0.0 1:27.71 postmaster > > 29854 postgres 20 0 131g 70m 45m R 100.0 0.0 1:35.43 postmaster > > 17449 postgres 20 0 131g 1.2g 1.2g R 100.0 0.5 1:52.62 postmaster > > 29868 postgres 20 0 131g 1.1g 1.0g R 100.0 0.4 1:58.93 postmaster > > 30136 postgres 20 0 131g 77m 52m R 100.0 0.0 1:34.33 postmaster > > 30294 postgres 20 0 131g 66m 41m R 100.0 0.0 1:33.33 postmaster > > 30864 postgres 20 0 131g 66m 41m R 100.0 0.0 1:36.17 postmaster > > 30872 postgres 20 0 131g 61m 36m R 100.0 0.0 1:26.81 postmaster > > 30876 postgres 20 0 131g 68m 43m R 100.0 0.0 1:33.97 postmaster > > 30899 postgres 20 0 131g 68m 44m R 100.0 0.0 1:38.95 postmaster > > 30906 postgres 20 0 131g 67m 42m R 100.0 0.0 1:27.82 postmaster > > 31173 postgres 20 0 131g 68m 44m R 100.0 0.0 1:28.49 postmaster > > 31239 postgres 20 0 131g 71m 46m R 100.0 0.0 1:31.42 postmaster > > 31248 postgres 20 0 131g 90m 65m R 100.0 0.0 1:26.20 postmaster > > 34934 postgres 20 0 131g 5580 3456 R 100.0 0.0 1:23.96 postmaster > > 47945 postgres 20 0 131g 3.0g 3.0g R 100.0 1.2 6:08.41 postmaster > > 16116 postgres 20 0 131g 84m 59m R 100.0 0.0 1:30.60 postmaster > > 16304 postgres 20 0 131g 85m 60m R 100.0 0.0 1:38.89 postmaster > > 17104 postgres 20 0 131g 96m 72m R 100.0 0.0 1:27.54 postmaster > > 17111 postgres 20 0 131g 98m 73m R 100.0 0.0 1:38.23 postmaster > > 17320 postgres 20 0 131g 98m 74m R 100.0 0.0 1:38.51 postmaster > > 31221 postgres 20 0 131g 63m 38m R 100.0 0.0 1:33.63 postmaster > > 31272 postgres 20 0 131g 1.0g 1.0g R 100.0 0.4 1:32.71 postmaster > > 3290 postgres 20 0 131g 99m 74m R 100.0 0.0 1:32.76 postmaster > > 3459 postgres 20 0 131g 2.1g 2.0g R 100.0 0.8 1:44.92 postmaster > > 16492 postgres 20 0 131g 100m 75m R 100.0 0.0 1:33.36 postmaster > > 16562 postgres 20 0 131g 114m 89m R 100.0 0.0 1:35.14 postmaster > > 17146 postgres 20 0 131g 91m 66m R 100.0 0.0 1:37.39 postmaster > > 17403 postgres 20 0 131g 98m 73m R 100.0 0.0 1:32.13 postmaster > > 31100 postgres 20 0 131g 62m 38m R 100.0 0.0 1:29.06 postmaster > > 2019 postgres 20 0 131g 1.2g 1.2g R 98.7 0.5 1:40.91 postmaster > > 2150 postgres 20 0 131g 1.3g 1.3g R 98.7 0.5 2:53.14 postmaster > > 16048 postgres 20 0 131g 71m 46m R 98.7 0.0 1:29.75 postmaster > > 30190 postgres 20 0 131g 1.4g 1.3g R 98.7 0.5 0:55.98 postmaster > > 16112 postgres 20 0 131g 862m 827m R 97.1 0.3 0:48.00 postmaster > > 31202 postgres 20 0 131g 74m 49m R 97.1 0.0 1:34.62 postmaster > > 35658 postgres 20 0 131g 5948 3788 R 97.1 0.0 0:12.29 postmaster > > 16134 postgres 20 0 131g 1.9g 1.9g R 95.4 0.8 1:47.27 postmaster > > 31034 postgres 20 0 131g 69m 44m R 95.4 0.0 1:26.35 postmaster > > 16120 postgres 20 0 131g 1.2g 1.2g R 93.8 0.5 2:04.02 postmaster > > 30891 postgres 20 0 131g 57m 33m R 93.8 0.0 1:23.08 postmaster > > 31261 postgres 20 0 131g 81m 56m R 93.8 0.0 1:24.51 postmaster > > 29790 postgres 20 0 131g 62m 37m R 92.2 0.0 1:35.34 postmaster > > 30426 postgres 20 0 131g 62m 37m R 87.4 0.0 1:34.51 postmaster > > 30857 postgres 20 0 131g 50m 26m R 79.3 0.0 1:37.82 postmaster > > 507 root 39 19 0 0 0 R 67.9 0.0 19:19.71 khugepaged > > 16095 postgres 20 0 131g 83m 58m R 67.9 0.0 1:27.64 postmaster > > 30856 postgres 20 0 131g 69m 44m R 67.9 0.0 1:34.46 postmaster > > 17442 postgres 20 0 131g 2.4g 2.4g S 11.3 0.9 1:02.14 postmaster > Postgresql Version information: - PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.5 20110214 (Red Hat 4.4.5-6), 64-bit - Running Hot Replication to another node (same hardware setup there) Server Hardware: - 4x 12 Core AMD Magny cours - 256 GB of RAM (36% currently used) - 1,3 TB SAS Raid (LSI Raid controller) - 15k rpm If i lost to include some important informations for you analyzing my problem, let me please know. I did my best to post the question as accurate as possible for me. -- Mit freundlichen Grüßen Paul Dunkler