Howdy list,
I finally got a chance to run ktrace and kdump (freebsd things)
on my "runaway" sqwebmail processes today. (They don't show
up terribly often, but when they DO it brings my system to a
crawl.)
Basically, from what I could tell, one of these sqwebmail processes
would cause 3 or 4 others to hang. Either that, or it's spawning
them as subprocesses. I don't know how to tell if a process is a parent
or child under FreeBSD...
Below is a `ps auxwww | grep sqwebmail` output near the beginning
of my ktracing @ 9:34 AM:
------ BEGIN top output @ 9:34 AM ------
vpopmail 36068 17.3 6.2 130320 64528 ?? R 12:27AM 71:29.37 sqwebmail
vpopmail 36160 17.7 6.2 130544 64640 ?? R 12:28AM 71:28.22 sqwebmail
vpopmail 36315 17.7 6.2 129936 64336 ?? R 12:28AM 71:16.03 sqwebmail
vpopmail 36071 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
vpopmail 36163 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
vpopmail 36318 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
vpopmail 36322 0.0 6.9 128864 72144 ?? RN 12:28AM 69:52.01 sqwebmail
vpopmail 36325 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
root 55261 0.0 0.0 272 128 p1 R+ 9:34AM 0:00.00 grep sqwebmail
------ END top output @ 9:34 AM ------
And below is a session transcript from when I started killing the bad
processes (and had already run ktrace for a few minutes on most of them):
------ BEGIN top output @ 9:39 AM ------
[9:[EMAIL PROTECTED]:[~]# ps auxwww | grep sqwebmail
vpopmail 36068 17.3 6.2 130784 64760 ?? R 12:27AM 72:05.76 sqwebmail
vpopmail 36315 17.9 6.2 130400 64568 ?? R 12:28AM 71:52.35 sqwebmail
vpopmail 36160 16.5 6.2 130992 64864 ?? R 12:28AM 72:03.73 sqwebmail
vpopmail 36071 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
vpopmail 36163 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
vpopmail 36318 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
vpopmail 36322 0.0 6.1 128880 63880 ?? RN 12:28AM 69:52.48 sqwebmail
vpopmail 36325 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
root 57313 0.0 0.0 272 128 p1 R+ 9:38AM 0:00.00 grep sqwebmail
[9:[EMAIL PROTECTED]:[~]# kill 36068
[9:[EMAIL PROTECTED]:[~]# ps auxwww | grep sqwebmail
vpopmail 36315 17.9 9.7 130416 101388 ?? R 12:28AM 71:54.45 sqwebmail
vpopmail 36160 17.9 6.2 131024 64880 ?? R 12:28AM 72:05.97 sqwebmail
vpopmail 36163 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
vpopmail 36318 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
vpopmail 36322 0.0 6.1 128880 63880 ?? RN 12:28AM 69:52.48 sqwebmail
vpopmail 36325 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
[9:[EMAIL PROTECTED]:[~]# kill 36160
[9:[EMAIL PROTECTED]:[~]# ps auxwww | grep sqwebmail
vpopmail 36315 29.4 9.6 130496 100548 ?? R 12:28AM 72:00.67 sqwebmail
vpopmail 36318 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
vpopmail 36322 0.0 6.1 128880 63880 ?? RN 12:28AM 69:52.48 sqwebmail
vpopmail 36325 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
[9:[EMAIL PROTECTED]:[~]# ps auxwww | grep sqwebmail
vpopmail 36315 43.5 8.9 130592 92876 ?? R 12:28AM 72:08.03 sqwebmail
vpopmail 36318 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
vpopmail 36322 0.0 6.1 128880 63880 ?? RN 12:28AM 69:52.48 sqwebmail
vpopmail 36325 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
[9:[EMAIL PROTECTED]:[~]# kill 36322
[9:[EMAIL PROTECTED]:[~]# ps auxwww | grep sqwebmail
vpopmail 36315 61.9 6.2 66620 65068 ?? R 12:28AM 72:56.65 sqwebmail
vpopmail 36318 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
vpopmail 36322 0.0 6.1 128880 63884 ?? RN 12:28AM 69:53.08 sqwebmail
vpopmail 36325 0.0 0.0 1820 0 ?? IW - 0:00.00 sqwebmail
vpopmail 58658 0.0 0.0 0 0 ?? Z 9:39AM 0:00.00 (sqwebmail)
[9:[EMAIL PROTECTED]:[~]# kill 36315
[9:[EMAIL PROTECTED]:[~]# ps auxwww | grep sqwebmail
vpopmail 58743 0.0 0.0 0 0 ?? Z 9:40AM 0:00.00 (sqwebmail)
[9:[EMAIL PROTECTED]:[~]# ps auxwww | grep sqwebmail
Exit 1
------ END top output @ 9:39 AM ------
In the third from the bottom `ps auxwww | grep sqwebmail` output above,
process 36315 appears to be taking up roughly 66M of memory, and process
36322 appears to be taking up roughly 128M of memory. Not sure if that's
shared with other processes though...
Here is some ktrace data from the processes above:
------ BEGIN ktrace.out from process 36068 ------
36068 sqwebmail CALL break(0xbf50000)
36068 sqwebmail RET break 0
36068 sqwebmail CALL break(0xfdb5000)
36068 sqwebmail RET break 0
36068 sqwebmail CALL read(0x4,0x80d8000,0x4000)
36068 sqwebmail GIO fd 4 read 16384 bytes
"*
*
*
*
*
*
< this goes on for tens of thousands of lines >
*
*
*
*
*
*
*
*
*
"
36068 sqwebmail RET read 16384/0x4000
36068 sqwebmail CALL break(0xbfce000)
36068 sqwebmail RET break 0
36068 sqwebmail CALL break(0xfeb1000)
36068 sqwebmail RET break 0
36068 sqwebmail PSIG SIGTERM caught handler=0x807138c mask=0x0 code=0x0
36068 sqwebmail CALL write(0x2,0xbfbff7c8,0x9)
36068 sqwebmail GIO fd 2 wrote 9 bytes
"sqwebmail"
36068 sqwebmail RET write 9
36068 sqwebmail CALL write(0x2,0x281b8e8d,0xb)
36068 sqwebmail GIO fd 2 wrote 11 bytes
" in free():"
36068 sqwebmail RET write 11/0xb
36068 sqwebmail CALL write(0x2,0x281b8cc9,0xa)
36068 sqwebmail GIO fd 2 wrote 10 bytes
" warning: "
36068 sqwebmail RET write 10/0xa
36068 sqwebmail CALL write(0x2,0x281b8e6e,0xf)
36068 sqwebmail GIO fd 2 wrote 15 bytes
"recursive call
"
36068 sqwebmail RET write 15/0xf
36068 sqwebmail CALL exit(0)
------ END ktrace.out from process 36068 ------
The longer I let the process (and ktrace) run, the
more asterisks this file fills up with in the center.
------ BEGIN ktrace.out from process 36322 ------
36322 sqwebmail CALL read(0x4,0x80d8000,0x4000)
36322 sqwebmail GIO fd 4 read 16384 bytes
"*
*
*
*
< this goes on for tens of thousands of lines >
*
*
*
*
"
36322 sqwebmail RET read 16384/0x4000
36322 sqwebmail CALL break(0xbede000)
36322 sqwebmail RET break 0
36322 sqwebmail CALL break(0xfcd1000)
36322 sqwebmail RET break 0
36322 sqwebmail PSIG SIGTERM caught handler=0x807138c mask=0x0 code=0x0
36322 sqwebmail CALL write(0x2,0xbfbff7c8,0x9)
36322 sqwebmail GIO fd 2 wrote 9 bytes
"sqwebmail"
36322 sqwebmail RET write 9
36322 sqwebmail CALL write(0x2,0x281b8e8d,0xb)
36322 sqwebmail GIO fd 2 wrote 11 bytes
" in free():"
36322 sqwebmail RET write 11/0xb
36322 sqwebmail CALL write(0x2,0x281b8cc9,0xa)
36322 sqwebmail GIO fd 2 wrote 10 bytes
" warning: "
36322 sqwebmail RET write 10/0xa
36322 sqwebmail CALL write(0x2,0x281b8e6e,0xf)
36322 sqwebmail GIO fd 2 wrote 15 bytes
"recursive call
"
36322 sqwebmail RET write 15/0xf
36322 sqwebmail CALL exit(0)
------ END ktrace.out from process 36322 ------
Ditto.
The other process's ktrace.out's look pretty much
the same.
Any ideas? Anyone know how I can troubleshoot/debug
this more accurately?
Thanks!
--
Jesse Guardiani, Systems Administrator
WingNET Internet Services,
P.O. Box 2605 // Cleveland, TN 37320-2605
423-559-LINK (v) 423-559-5145 (f)
http://www.wingnet.net