Hi List,

we are currently running a rather large postgresql-installation with 
approximately 4k Transactions and 50k index scans per second.

In the last days on some times of the day (irregular - 3-4 times a day), some 
of the postmaster processes are running with 100% cpu usage. That leads to a 
totally breakdown of the query execution. We see tons of statements which are 
correctly automatically aborted by our statement_timeout set to 15 seconds. I 
tried to search, but do not really recognize what the problem could be there...

Some things i have checked:
- We are not running any bulk jobs or maintenance scripts at this time
- No system errors in any logs during that slowdowns
- I/O Performance seems fine. No high IO Wait amount... But IO Write totally 
drops in that times because it seems that no postgres process can perform any 
update

I just installed a script, which prints me out the top and ps axf information 
for facing out the problem. I will post a snippet of the top here:

> top - 15:55:02 up 59 days, 37 min,  1 user,  load average: 35.95, 14.04, 7.32
> Tasks: 2417 total,  54 running, 2363 sleeping,   0 stopped,   0 zombie
> Cpu(s):  6.3%us,  1.0%sy,  0.0%ni, 90.2%id,  1.9%wa,  0.0%hi,  0.6%si,  0.0%st
> Mem:  264523700k total, 250145228k used, 14378472k free,   207032k buffers
> Swap:  2097144k total,   553624k used,  1543520k free, 166905748k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND          
>   
> 29852 postgres  20   0  131g  59m  35m R 100.0  0.0   1:27.71 postmaster      
>   
> 29854 postgres  20   0  131g  70m  45m R 100.0  0.0   1:35.43 postmaster      
>   
> 17449 postgres  20   0  131g 1.2g 1.2g R 100.0  0.5   1:52.62 postmaster      
>   
> 29868 postgres  20   0  131g 1.1g 1.0g R 100.0  0.4   1:58.93 postmaster      
>   
> 30136 postgres  20   0  131g  77m  52m R 100.0  0.0   1:34.33 postmaster      
>   
> 30294 postgres  20   0  131g  66m  41m R 100.0  0.0   1:33.33 postmaster      
>   
> 30864 postgres  20   0  131g  66m  41m R 100.0  0.0   1:36.17 postmaster      
>   
> 30872 postgres  20   0  131g  61m  36m R 100.0  0.0   1:26.81 postmaster      
>   
> 30876 postgres  20   0  131g  68m  43m R 100.0  0.0   1:33.97 postmaster      
>   
> 30899 postgres  20   0  131g  68m  44m R 100.0  0.0   1:38.95 postmaster      
>   
> 30906 postgres  20   0  131g  67m  42m R 100.0  0.0   1:27.82 postmaster      
>   
> 31173 postgres  20   0  131g  68m  44m R 100.0  0.0   1:28.49 postmaster      
>   
> 31239 postgres  20   0  131g  71m  46m R 100.0  0.0   1:31.42 postmaster      
>   
> 31248 postgres  20   0  131g  90m  65m R 100.0  0.0   1:26.20 postmaster      
>   
> 34934 postgres  20   0  131g 5580 3456 R 100.0  0.0   1:23.96 postmaster      
>   
> 47945 postgres  20   0  131g 3.0g 3.0g R 100.0  1.2   6:08.41 postmaster      
>   
> 16116 postgres  20   0  131g  84m  59m R 100.0  0.0   1:30.60 postmaster      
>   
> 16304 postgres  20   0  131g  85m  60m R 100.0  0.0   1:38.89 postmaster      
>   
> 17104 postgres  20   0  131g  96m  72m R 100.0  0.0   1:27.54 postmaster      
>   
> 17111 postgres  20   0  131g  98m  73m R 100.0  0.0   1:38.23 postmaster      
>   
> 17320 postgres  20   0  131g  98m  74m R 100.0  0.0   1:38.51 postmaster      
>   
> 31221 postgres  20   0  131g  63m  38m R 100.0  0.0   1:33.63 postmaster      
>   
> 31272 postgres  20   0  131g 1.0g 1.0g R 100.0  0.4   1:32.71 postmaster      
>   
>  3290 postgres  20   0  131g  99m  74m R 100.0  0.0   1:32.76 postmaster      
>   
>  3459 postgres  20   0  131g 2.1g 2.0g R 100.0  0.8   1:44.92 postmaster      
>   
> 16492 postgres  20   0  131g 100m  75m R 100.0  0.0   1:33.36 postmaster      
>   
> 16562 postgres  20   0  131g 114m  89m R 100.0  0.0   1:35.14 postmaster      
>   
> 17146 postgres  20   0  131g  91m  66m R 100.0  0.0   1:37.39 postmaster      
>   
> 17403 postgres  20   0  131g  98m  73m R 100.0  0.0   1:32.13 postmaster      
>   
> 31100 postgres  20   0  131g  62m  38m R 100.0  0.0   1:29.06 postmaster      
>   
>  2019 postgres  20   0  131g 1.2g 1.2g R 98.7  0.5   1:40.91 postmaster       
>   
>  2150 postgres  20   0  131g 1.3g 1.3g R 98.7  0.5   2:53.14 postmaster       
>   
> 16048 postgres  20   0  131g  71m  46m R 98.7  0.0   1:29.75 postmaster       
>   
> 30190 postgres  20   0  131g 1.4g 1.3g R 98.7  0.5   0:55.98 postmaster       
>   
> 16112 postgres  20   0  131g 862m 827m R 97.1  0.3   0:48.00 postmaster       
>   
> 31202 postgres  20   0  131g  74m  49m R 97.1  0.0   1:34.62 postmaster       
>   
> 35658 postgres  20   0  131g 5948 3788 R 97.1  0.0   0:12.29 postmaster       
>   
> 16134 postgres  20   0  131g 1.9g 1.9g R 95.4  0.8   1:47.27 postmaster       
>   
> 31034 postgres  20   0  131g  69m  44m R 95.4  0.0   1:26.35 postmaster       
>   
> 16120 postgres  20   0  131g 1.2g 1.2g R 93.8  0.5   2:04.02 postmaster       
>   
> 30891 postgres  20   0  131g  57m  33m R 93.8  0.0   1:23.08 postmaster       
>   
> 31261 postgres  20   0  131g  81m  56m R 93.8  0.0   1:24.51 postmaster       
>   
> 29790 postgres  20   0  131g  62m  37m R 92.2  0.0   1:35.34 postmaster       
>   
> 30426 postgres  20   0  131g  62m  37m R 87.4  0.0   1:34.51 postmaster       
>   
> 30857 postgres  20   0  131g  50m  26m R 79.3  0.0   1:37.82 postmaster       
>   
>   507 root      39  19     0    0    0 R 67.9  0.0  19:19.71 khugepaged       
>   
> 16095 postgres  20   0  131g  83m  58m R 67.9  0.0   1:27.64 postmaster       
>   
> 30856 postgres  20   0  131g  69m  44m R 67.9  0.0   1:34.46 postmaster       
>   
> 17442 postgres  20   0  131g 2.4g 2.4g S 11.3  0.9   1:02.14 postmaster       
>   



Postgresql Version information:
- PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.5 
20110214 (Red Hat 4.4.5-6), 64-bit
- Running Hot Replication to another node (same hardware setup there)

Server Hardware:
- 4x 12 Core AMD Magny cours
- 256 GB of RAM (36% currently used)
- 1,3 TB SAS Raid (LSI Raid controller) - 15k rpm

If i lost to include some important informations for you analyzing my problem, 
let me please know. I did my best to post the question as accurate as possible 
for me.

--
Mit freundlichen Grüßen

Paul Dunkler

Reply via email to