This morning on one of our servers, spamd was having problems. There were 8 spamd children running, but "top" showed only two of them were using any CPU time even though there was a backlog of messages to be processed. The log file included lines like this:
Aug 24 08:49:22 localhost spamd[21051]: prefork: child states: BBBKKKKK Aug 24 08:49:23 localhost spamd[21051]: prefork: child states: BBBKKKKK Aug 24 08:49:32 localhost spamd[21051]: prefork: child states: BBBKKKKK Aug 24 08:49:41 localhost spamd[21051]: prefork: child states: BBBKKKKK Aug 24 08:49:49 localhost spamd[21051]: prefork: child states: BBBKKKKK Aug 24 08:49:51 localhost spamd[21051]: prefork: child states: BBBKKKKK Aug 24 08:49:59 localhost spamd[21051]: prefork: child states: BBBKKKKK Aug 24 08:50:00 localhost spamd[21051]: prefork: child states: BBBKKKKK Aug 24 08:50:04 localhost spamd[21051]: prefork: child states: BBBKKKKK Aug 24 08:50:06 localhost spamd[21051]: prefork: child states: BBBKKKKK Aug 24 08:50:12 localhost spamd[21051]: prefork: child states: BBBKKKKK which I think means 3 children busy, 5 children waiting to die. This (the multiple "K" children) had been going on for a few hours, which prevented new children from being spawned to handle the load. Restarting spamd via "kill -HUP" restored normal operation. Why were the killed processes not dying? System information: SunOS email 5.9 Generic_118558-39 sun4u sparc SUNW,Sun-Fire-V210 SpamAssassin Server version 3.2.3 running on Perl 5.8.8 with SSL support (IO::Socket::SSL 0.97) with zlib support (Compress::Zlib 1.41) Process information (combination of "top" and "ps"): Fri Aug 24 08:56:52 2007 last pid: 23996; load averages: 0.55, 0.54, 0.52 192 processes: 190 sleeping, 2 on cpu CPU states: 88.4% idle, 4.2% user, 3.4% kernel, 3.9% iowait, 0.0% swap Memory: 2048M real, 342M free, 1323M swap in use, 6109M swap free USER PID PPID STIME TIME STATE SIZE RES CPU spamd 23740 21051 08:55:39 0:17 sleep 72M 59M 11.40% spamd 20459 21051 08:16:13 7:08 cpu/1 77M 66M 6.27% root 21051 1 09:47:26 2:17 sleep 66M 58M 0.03% spamd 27830 21051 03:39:24 5:28 sleep 81M 70M 0.00% spamd 27926 21051 03:39:37 0:26 sleep 76M 63M 0.00% spamd 14411 21051 00:53:09 0:11 sleep 70M 51M 0.00% spamd 22780 21051 02:37:46 0:06 sleep 71M 57M 0.00% spamd 22775 21051 02:37:32 0:04 sleep 70M 56M 0.00% spamd 22776 21051 02:37:32 0:01 sleep 68M 54M 0.00% spamd startup command: ulimit -n 256 spamd -d -u spamd -r $pidfile -x -m 8 --syslog=local2 --syslog-socket=inet -i -A $me,$em1,$em2,$em3,$em4
