Re: server memory & swap eaten up then freaks out
On Tue, 2009-04-28 at 20:16 +0100, Pete Boyd wrote: > Before seeing this issue, this server has run fine for two years with > Sarge, and fine for a couple of months with Etch. Thunderbird 2 was > installed on workstations in December. Other than that barely any changes > have been made and nothing jumps out at me as a meaningful change. > ...other than a faulty web app that emailed 50GB of small emails, that > have been manually removed at the command-line from a .Trash maildir > directory with rm so as far as I can imagine are long gone. > > If anyone can suggest which way I should progress with this that would be > really appreciated thanks. Play with the settings until they fit your needs. The dovecot.conf is very well documented. > # Number of login processes to keep for listening new connections. login_processes_count = 3 > > # Maximum number of login processes to create. The listening process count > # usually stays at login_processes_count, but when multiple users start > logging > # in at the same time more extra processes are created. To prevent > fork-bombing > # we check only once in a second if new processes should be created - if all > # of them are used at the time, we double their amount until the limit set by > # this setting is reached. login_max_processes_count = 64 > > # Maximum number of connections allowed per each login process. This setting > # is used only if login_process_per_connection=no. Once the limit is reached, > # the process notifies master so that it can create a new login process. > # You should make sure that the process has at least > # 16 + login_max_connections * 2 available file descriptors. login_max_connections = 128 Frank -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Re: server memory & swap eaten up then freaks out
> On Mon, 2009-04-27 at 12:33 +0100, Pete Boyd wrote: >> I have a mail and Samba PDC server that, after a couple of days, runs >> out >> of RAM then swap then freaks out with oom killer kicking in, at which >> point it becomes very unresponsive and needs rebooting. >> >> Does anyone have any ideas of what could be causing this please? > > Watch the ten biggest processes, are they growing permanently? > ps aux | sort -nr -k5 | grep -v USER | head -10 > > Whats about the process count? How many processes running? Is this > number constantly growing? > ps aux | wc -l I monitored for 2 days. In day one only three staff were in and the server showed no strain. On day two there were 10 staff. I found that no one single process consumes RAM but the number of imap instances grows and grows throught the day, and with it the and amount of RAM and swap used. Eventually there were 250 imap instances, at which point nearly all of swap had been used up and I pre-emptively rebooted. I've summarised my findings below. Before seeing this issue, this server has run fine for two years with Sarge, and fine for a couple of months with Etch. Thunderbird 2 was installed on workstations in December. Other than that barely any changes have been made and nothing jumps out at me as a meaningful change. ...other than a faulty web app that emailed 50GB of small emails, that have been manually removed at the command-line from a .Trash maildir directory with rm so as far as I can imagine are long gone. If anyone can suggest which way I should progress with this that would be really appreciated thanks. This is how it looks right after a reboot: # ps aux | wc -l 139 # ps aux | grep imap | wc -l 20 # free -m total used free sharedbuffers cached Mem: 2027379 1647 0 19134 -/+ buffers/cache:225 1801 Swap: 2572 0 2572 This is how it looked at 15:20 just before I rebooted because it was looking as though it would fall over soon: # free -m total used free sharedbuffers cached Mem: 2027 1961 66 0232130 -/+ buffers/cache: 1597429 Swap: 2572 1772800 # ps aux | wc -l 370 # ps aux | grep imap | wc -l 250 This was vmstat -s at 14:50: 2075976 total memory 2023420 used memory 1726100 active memory 208488 inactive memory 52556 free memory 256460 buffer memory 168756 swap cache 2634620 total swap 1362060 used swap 1272560 free swap 92674 non-nice user cpu ticks 10611 nice user cpu ticks 1318293 system cpu ticks 38722678 idle cpu ticks 3335461 IO-wait cpu ticks 1057 IRQ cpu ticks 6754 softirq cpu ticks 0 stolen cpu ticks 11745364 pages paged in 21594305 pages paged out 41058 pages swapped in 342237 pages swapped out 13813415 interrupts 29292565 CPU context switches 1240817136 boot time 74626 forks I'm wondering if these settings in dovecot.conf could be used to fix the issue: # Set max. process size in megabytes. If you don't use # login_process_per_connection you might need to grow this. #login_process_size = 32 # Should each login be processed in it's own process (yes), or should one # login process be allowed to process multiple connections (no)? Yes is more # secure, espcially with SSL/TLS enabled. No is faster since there's no need # to create processes all the time. #login_process_per_connection = yes # Number of login processes to keep for listening new connections. #login_processes_count = 3 # Maximum number of login processes to create. The listening process count # usually stays at login_processes_count, but when multiple users start logging # in at the same time more extra processes are created. To prevent fork-bombing # we check only once in a second if new processes should be created - if all # of them are used at the time, we double their amount until the limit set by # this setting is reached. #login_max_processes_count = 128 # Maximum number of connections allowed per each login process. This setting # is used only if login_process_per_connection=no. Once the limit is reached, # the process notifies master so that it can create a new login process. # You should make sure that the process has at least # 16 + login_max_connections * 2 available file descriptors. #login_max_connections = 256 ps aux from 12:35: server:~# ps aux USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 1940 636 ?Ss Apr27 0:01 init [2] root 2 0.0 0.0 0 0 ?S< Apr27 0:00 [kthreadd] root 3 0.0 0.0 0 0 ?S< Apr27 0:00 [migration/0] root 4 0.0 0.0 0 0 ?S< Apr27 0:06 [ksoftirqd/0]
Re: server memory & swap eaten up then freaks out
Hi Pete, On Mon, 2009-04-27 at 12:33 +0100, Pete Boyd wrote: > I have a mail and Samba PDC server that, after a couple of days, runs out > of RAM then swap then freaks out with oom killer kicking in, at which > point it becomes very unresponsive and needs rebooting. > > Does anyone have any ideas of what could be causing this please? Watch the ten biggest processes, are they growing permanently? ps aux | sort -nr -k5 | grep -v USER | head -10 Whats about the process count? How many processes running? Is this number constantly growing? ps aux | wc -l Ciao Frank -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
server memory & swap eaten up then freaks out
I have a mail and Samba PDC server that, after a couple of days, runs out of RAM then swap then freaks out with oom killer kicking in, at which point it becomes very unresponsive and needs rebooting. Does anyone have any ideas of what could be causing this please? The server has a Pentium 4 era Intel Xeon CPU with 2GB RAM and 2GB swap. It runs Debian 4.0 Etch with volatile updates. The issue happens with Debian 4.0 Etch kernel 2.6.18 or EtchnHalf kernel 2.6.24. This server has operated fine for the past 2 years on Debian 3.1 Sarge, only having been upgraded to Etch 2 months ago. The mail server is setup as per http://workaround.org/articles/ispmail-etch/; the Samba setup is as per http://thegoldenear.org/toolbox/unices/samba-3-pdc-print-server-debian-etch.html. The symptom is that all available physical RAM is used up (as reported by free), then swap is used up, then the server freaks out, giving out of memory errors continually to the screen (see the actual message below). The server seems to operate at its maximum capacity of physical RAM for a day or two before eating into swap. Initially when I was seeing this, much of the time, but not all of the time, there were a _lot_ of 'imap' instances running. But now I don't see that at all and yet it crashes all the same. There are around 10 concurrent users, each with up to 5 mailboxes open in Thunderbird 2. Thunderbird's concurrent 'Maximum number of server connections to cache' is set to 1 (previously 5 but I've gotten this turned down recently, though, according to http://kb.mozillazine.org/IMAP_servers, should be OK with dovecot). I would have thought 2GB of RAM would be plenty, is that true? I ran memtest86+ and the memory tested OK. Here's the output of various tools and syslog, from back when I was getting lots of instances of imap: #top: 11598 vmail 18 0 4700 352 132 R6 0.0 1:16.50 imap 11309 vmail 18 0 4696 316 116 R6 0.0 1:19.11 imap 16184 vmail 18 0 63144 500 276 D4 0.0 1:05.17 imap 23445 vmail 18 0 63144 7112 276 R4 0.3 0:42.32 imap 27232 vmail 18 0 63144 7004 112 D4 0.3 0:28.84 imap 20573 vmail 18 0 63144 524 276 D3 0.0 0:45.58 imap 11865 vmail 18 0 4696 240 48 R3 0.0 1:17.85 imap 13167 vmail 18 0 5868 520 276 D3 0.0 1:15.14 imap 14976 vmail 18 0 63320 544 280 D3 0.0 1:11.70 imap 16272 vmail 18 0 63148 520 276 D3 0.0 1:09.01 imap 25847 vmail 18 0 63144 7172 276 D3 0.3 0:31.96 imap 5260 root 16 0 2892 1316 456 R3 0.1 0:02.92 top 4534 vmail 18 0 4824 452 276 D3 0.0 2:04.29 imap 4633 vmail 18 0 4824 460 276 D3 0.0 1:57.42 imap 13094 vmail 18 0 4700 480 276 D3 0.0 1:11.33 imap 23244 vmail 18 0 63148 488 276 D3 0.0 0:45.67 imap 24505 vmail 18 0 63148 7240 276 D3 0.3 0:40.54 imap 26400 vmail 18 0 63148 7156 276 D3 0.3 0:32.45 imap 27212 vmail 18 0 63144 7184 276 D3 0.3 2:18.90 imap 27226 vmail 18 0 63148 7232 276 D3 0.3 0:28.54 imap 2724 amavis18 0 55196 10m 44 R2 0.5 6:32.14 amavisd-new 3751 root 18 0 9668 516 136 D2 0.0 7:02.11 miniserv.pl 6401 vmail 18 0 4828 576 276 D2 0.0 1:46.62 imap 23341 vmail 18 0 63148 7068 276 D2 0.3 0:39.53 imap 28622 vmail 18 0 63148 7080 112 D2 0.3 0:26.98 imap 4563 vmail 18 0 4824 288 112 D2 0.0 1:56.91 imap 9561 vmail 18 0 4700 308 112 D2 0.0 1:31.07 imap 21819 vmail 18 0 63144 364 112 D2 0.0 0:46.23 imap 25949 vmail 18 0 63148 6924 112 D2 0.3 0:33.42 imap 9885 vmail 18 0 4700 324 112 D1 0.0 1:26.72 imap 10127 vmail 18 0 4700 308 112 D1 0.0 1:23.44 imap # free -m total used free sharedbuffers cached Mem: 2027 1976 50 0226612 -/+ buffers/cache: 1138889 Swap: 2572 0 2572 # vmstat -s 2076388 total memory 2023448 used memory 1395056 active memory 444268 inactive memory 52940 free memory 241696 buffer memory 500268 swap cache 2634620 total swap 56 used swap 2634564 free swap 33797 non-nice user cpu ticks 5477 nice user cpu ticks 731695 system cpu ticks 19954325 idle cpu ticks 378484 IO-wait cpu ticks 1368 IRQ cpu ticks 4493 softirq cpu ticks 0 stolen cpu ticks 3196946 pages paged in 6119050 pages paged out 0 pages swapped in 14 pages swapped out 17992684 interrupts 10515665 CPU context switches 1237926809 boot time 25277 forks >From /var/log/syslog: Mar 14 12:58:20 localhost kernel: oom-killer: gfp_mask=0x201d2, order