Re: server memory swap eaten up then freaks out

2009-04-29 Thread frank
On Tue, 2009-04-28 at 20:16 +0100, Pete Boyd wrote:

 Before seeing this issue, this server has run fine for two years with
 Sarge, and fine for a couple of months with Etch. Thunderbird 2 was
 installed on workstations in December. Other than that barely any changes
 have been made and nothing jumps out at me as a meaningful change.
 ...other than a faulty web app that emailed 50GB of small emails, that
 have been manually removed at the command-line from a .Trash maildir
 directory with rm so as far as I can imagine are long gone.
 
 If anyone can suggest which way I should progress with this that would be
 really appreciated thanks.

Play with the settings until they fit your needs. The dovecot.conf is
very well documented.

 # Number of login processes to keep for listening new connections.
login_processes_count = 3
 
 # Maximum number of login processes to create. The listening process count
 # usually stays at login_processes_count, but when multiple users start
 logging
 # in at the same time more extra processes are created. To prevent
 fork-bombing
 # we check only once in a second if new processes should be created - if all
 # of them are used at the time, we double their amount until the limit set by
 # this setting is reached.
login_max_processes_count = 64
 
 # Maximum number of connections allowed per each login process. This setting
 # is used only if login_process_per_connection=no. Once the limit is reached,
 # the process notifies master so that it can create a new login process.
 # You should make sure that the process has at least
 # 16 + login_max_connections * 2 available file descriptors.
login_max_connections = 128

Frank


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: server memory swap eaten up then freaks out

2009-04-28 Thread Pete Boyd
 On Mon, 2009-04-27 at 12:33 +0100, Pete Boyd wrote:
 I have a mail and Samba PDC server that, after a couple of days, runs
 out
 of RAM then swap then freaks out with oom killer kicking in, at which
 point it becomes very unresponsive and needs rebooting.

 Does anyone have any ideas of what could be causing this please?

 Watch the ten biggest processes, are they growing permanently?
 ps aux | sort -nr -k5 | grep -v USER | head -10

 Whats about the process count? How many processes running? Is this
 number constantly growing?
 ps aux | wc -l

I monitored for 2 days. In day one only three staff were in and the server
showed no strain. On day two there were 10 staff. I found that no one
single process consumes RAM but the number of imap instances grows and
grows throught the day, and with it the and amount of RAM and swap used.
Eventually there were 250 imap instances, at which point nearly all of
swap had been used up and I pre-emptively rebooted. I've summarised my
findings below.

Before seeing this issue, this server has run fine for two years with
Sarge, and fine for a couple of months with Etch. Thunderbird 2 was
installed on workstations in December. Other than that barely any changes
have been made and nothing jumps out at me as a meaningful change.
...other than a faulty web app that emailed 50GB of small emails, that
have been manually removed at the command-line from a .Trash maildir
directory with rm so as far as I can imagine are long gone.

If anyone can suggest which way I should progress with this that would be
really appreciated thanks.

This is how it looks right after a reboot:

# ps aux | wc -l
139

# ps aux | grep imap | wc -l
20

# free -m
 total   used   free sharedbuffers cached
Mem:  2027379   1647  0 19134
-/+ buffers/cache:225   1801
Swap: 2572  0   2572


This is how it looked at 15:20 just before I rebooted because it was
looking as though it would fall over soon:

# free -m
 total   used   free sharedbuffers cached
Mem:  2027   1961 66  0232130
-/+ buffers/cache:   1597429
Swap: 2572   1772800

# ps aux | wc -l
370

# ps aux | grep imap | wc -l
250


This was vmstat -s at 14:50:

  2075976  total memory
  2023420  used memory
  1726100  active memory
   208488  inactive memory
52556  free memory
   256460  buffer memory
   168756  swap cache
  2634620  total swap
  1362060  used swap
  1272560  free swap
92674 non-nice user cpu ticks
10611 nice user cpu ticks
  1318293 system cpu ticks
 38722678 idle cpu ticks
  3335461 IO-wait cpu ticks
 1057 IRQ cpu ticks
 6754 softirq cpu ticks
0 stolen cpu ticks
 11745364 pages paged in
 21594305 pages paged out
41058 pages swapped in
   342237 pages swapped out
 13813415 interrupts
 29292565 CPU context switches
   1240817136 boot time
74626 forks


I'm wondering if these settings in dovecot.conf could be used to fix the
issue:

# Set max. process size in megabytes. If you don't use
# login_process_per_connection you might need to grow this.
#login_process_size = 32

# Should each login be processed in it's own process (yes), or should one
# login process be allowed to process multiple connections (no)? Yes is more
# secure, espcially with SSL/TLS enabled. No is faster since there's no need
# to create processes all the time.
#login_process_per_connection = yes

# Number of login processes to keep for listening new connections.
#login_processes_count = 3

# Maximum number of login processes to create. The listening process count
# usually stays at login_processes_count, but when multiple users start
logging
# in at the same time more extra processes are created. To prevent
fork-bombing
# we check only once in a second if new processes should be created - if all
# of them are used at the time, we double their amount until the limit set by
# this setting is reached.
#login_max_processes_count = 128

# Maximum number of connections allowed per each login process. This setting
# is used only if login_process_per_connection=no. Once the limit is reached,
# the process notifies master so that it can create a new login process.
# You should make sure that the process has at least
# 16 + login_max_connections * 2 available file descriptors.
#login_max_connections = 256


ps aux from 12:35:

server:~# ps aux
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root 1  0.0  0.0   1940   636 ?Ss   Apr27   0:01 init [2]
root 2  0.0  0.0  0 0 ?S   Apr27   0:00 [kthreadd]
root 3  0.0  0.0  0 0 ?S   Apr27   0:00
[migration/0]
root 4  0.0  0.0  0 0 ?S   Apr27   0:06
[ksoftirqd/0]
root 5  0.0  

server memory swap eaten up then freaks out

2009-04-27 Thread Pete Boyd
I have a mail and Samba PDC server that, after a couple of days, runs out
of RAM then swap then freaks out with oom killer kicking in, at which
point it becomes very unresponsive and needs rebooting.

Does anyone have any ideas of what could be causing this please?

The server has a Pentium 4 era Intel Xeon CPU with 2GB RAM and 2GB swap.
It runs Debian 4.0 Etch with volatile updates. The issue happens with
Debian 4.0 Etch kernel 2.6.18 or EtchnHalf kernel 2.6.24. This server has
operated fine for the past 2 years on Debian 3.1 Sarge, only having been
upgraded to
Etch 2 months ago.

The mail server is setup as per
http://workaround.org/articles/ispmail-etch/; the Samba setup is as per
http://thegoldenear.org/toolbox/unices/samba-3-pdc-print-server-debian-etch.html.

The symptom is that all available physical RAM is used up (as reported by
free), then swap is used up, then the server freaks out, giving out of
memory errors continually to the screen (see the actual message below).

The server seems to operate at its maximum capacity of physical RAM for a
day or two before eating into swap.

Initially when I was seeing this, much of the time, but not all of the
time, there were a _lot_ of 'imap' instances running. But now I don't see
that at all and yet it crashes all the same.

There are around 10 concurrent users, each with up to 5 mailboxes open in
Thunderbird 2. Thunderbird's concurrent 'Maximum number of server
connections to cache' is set to 1 (previously 5 but I've gotten this
turned down recently, though, according to
http://kb.mozillazine.org/IMAP_servers, should be OK with dovecot).

I would have thought 2GB of RAM would be plenty, is that true?

I ran memtest86+ and the memory tested OK.

Here's the output of various tools and syslog, from back when I was
getting lots of instances of imap:

#top:

11598 vmail 18   0  4700  352  132 R6  0.0   1:16.50 imap
11309 vmail 18   0  4696  316  116 R6  0.0   1:19.11 imap
16184 vmail 18   0 63144  500  276 D4  0.0   1:05.17 imap
23445 vmail 18   0 63144 7112  276 R4  0.3   0:42.32 imap
27232 vmail 18   0 63144 7004  112 D4  0.3   0:28.84 imap
20573 vmail 18   0 63144  524  276 D3  0.0   0:45.58 imap
11865 vmail 18   0  4696  240   48 R3  0.0   1:17.85 imap
13167 vmail 18   0  5868  520  276 D3  0.0   1:15.14 imap
14976 vmail 18   0 63320  544  280 D3  0.0   1:11.70 imap
16272 vmail 18   0 63148  520  276 D3  0.0   1:09.01 imap
25847 vmail 18   0 63144 7172  276 D3  0.3   0:31.96 imap
 5260 root  16   0  2892 1316  456 R3  0.1   0:02.92 top
 4534 vmail 18   0  4824  452  276 D3  0.0   2:04.29 imap
 4633 vmail 18   0  4824  460  276 D3  0.0   1:57.42 imap
13094 vmail 18   0  4700  480  276 D3  0.0   1:11.33 imap
23244 vmail 18   0 63148  488  276 D3  0.0   0:45.67 imap
24505 vmail 18   0 63148 7240  276 D3  0.3   0:40.54 imap
26400 vmail 18   0 63148 7156  276 D3  0.3   0:32.45 imap
27212 vmail 18   0 63144 7184  276 D3  0.3   2:18.90 imap
27226 vmail 18   0 63148 7232  276 D3  0.3   0:28.54 imap
 2724 amavis18   0 55196  10m   44 R2  0.5   6:32.14 amavisd-new
 3751 root  18   0  9668  516  136 D2  0.0   7:02.11 miniserv.pl
 6401 vmail 18   0  4828  576  276 D2  0.0   1:46.62 imap
23341 vmail 18   0 63148 7068  276 D2  0.3   0:39.53 imap
28622 vmail 18   0 63148 7080  112 D2  0.3   0:26.98 imap
 4563 vmail 18   0  4824  288  112 D2  0.0   1:56.91 imap
 9561 vmail 18   0  4700  308  112 D2  0.0   1:31.07 imap
21819 vmail 18   0 63144  364  112 D2  0.0   0:46.23 imap
25949 vmail 18   0 63148 6924  112 D2  0.3   0:33.42 imap
 9885 vmail 18   0  4700  324  112 D1  0.0   1:26.72 imap
10127 vmail 18   0  4700  308  112 D1  0.0   1:23.44 imap


# free -m
 total   used   free sharedbuffers cached
Mem:  2027   1976 50  0226612
-/+ buffers/cache:   1138889
Swap: 2572  0   2572


# vmstat -s
  2076388  total memory
  2023448  used memory
  1395056  active memory
   444268  inactive memory
52940  free memory
   241696  buffer memory
   500268  swap cache
  2634620  total swap
   56  used swap
  2634564  free swap
33797 non-nice user cpu ticks
 5477 nice user cpu ticks
   731695 system cpu ticks
 19954325 idle cpu ticks
   378484 IO-wait cpu ticks
 1368 IRQ cpu ticks
 4493 softirq cpu ticks
0 stolen cpu ticks
  3196946 pages paged in
  6119050 pages paged out
0 pages swapped in
   14 pages swapped out
 17992684 interrupts
 10515665 CPU context switches
   1237926809 boot time
25277 forks

From /var/log/syslog:

Mar 14 12:58:20 localhost kernel: oom-killer: gfp_mask=0x201d2, 

Re: server memory swap eaten up then freaks out

2009-04-27 Thread frank
Hi Pete,

On Mon, 2009-04-27 at 12:33 +0100, Pete Boyd wrote:
 I have a mail and Samba PDC server that, after a couple of days, runs out
 of RAM then swap then freaks out with oom killer kicking in, at which
 point it becomes very unresponsive and needs rebooting.
 
 Does anyone have any ideas of what could be causing this please?

Watch the ten biggest processes, are they growing permanently?
ps aux | sort -nr -k5 | grep -v USER | head -10

Whats about the process count? How many processes running? Is this
number constantly growing?
ps aux | wc -l

Ciao
Frank


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org