have you set up serial console, and/or the DRAC with serial console,
so that if the networking layer is broken you can still have an
interactive shell?

have you got any kind of monitoring agent running on the box with a
separate management box (munin, cacti etc) so you can see trends in
memory usage, kernel free space, swap, active network sockets,
firewall/iptables states etc?

if it's not memory, it could be process table getting full, too many
sockets, too many iptables states, too many file handles open...


have you got the routers set up to syslog to a separate box so you can
capture what they did even if they stop writing to disk?

Reply via email to