Howdy list,

I finally got a chance to run ktrace and kdump (freebsd things)
on my "runaway" sqwebmail processes today. (They don't show
up terribly often, but when they DO it brings my system to a
crawl.)

Basically, from what I could tell, one of these sqwebmail processes
would cause 3 or 4 others to hang. Either that, or it's spawning
them as subprocesses. I don't know how to tell if a process is a parent
or child under FreeBSD...

Below is a `ps auxwww | grep sqwebmail` output near the beginning
of my ktracing @ 9:34 AM:

------ BEGIN top output @ 9:34 AM ------
vpopmail 36068 17.3  6.2 130320 64528  ??  R    12:27AM  71:29.37 sqwebmail
vpopmail 36160 17.7  6.2 130544 64640  ??  R    12:28AM  71:28.22 sqwebmail
vpopmail 36315 17.7  6.2 129936 64336  ??  R    12:28AM  71:16.03 sqwebmail
vpopmail 36071  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
vpopmail 36163  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
vpopmail 36318  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
vpopmail 36322  0.0  6.9 128864 72144  ??  RN   12:28AM  69:52.01 sqwebmail
vpopmail 36325  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
root     55261  0.0  0.0   272  128  p1  R+    9:34AM   0:00.00 grep sqwebmail
------ END top output @ 9:34 AM ------

And below is a session transcript from when I started killing the bad
processes (and had already run ktrace for a few minutes on most of them):

------ BEGIN top output @ 9:39 AM ------
[9:[EMAIL PROTECTED]:[~]# ps auxwww | grep sqwebmail
vpopmail 36068 17.3  6.2 130784 64760  ??  R    12:27AM  72:05.76 sqwebmail
vpopmail 36315 17.9  6.2 130400 64568  ??  R    12:28AM  71:52.35 sqwebmail
vpopmail 36160 16.5  6.2 130992 64864  ??  R    12:28AM  72:03.73 sqwebmail
vpopmail 36071  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
vpopmail 36163  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
vpopmail 36318  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
vpopmail 36322  0.0  6.1 128880 63880  ??  RN   12:28AM  69:52.48 sqwebmail
vpopmail 36325  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
root     57313  0.0  0.0   272  128  p1  R+    9:38AM   0:00.00 grep sqwebmail
[9:[EMAIL PROTECTED]:[~]# kill 36068
[9:[EMAIL PROTECTED]:[~]# ps auxwww | grep sqwebmail
vpopmail 36315 17.9  9.7 130416 101388  ??  R    12:28AM  71:54.45 sqwebmail
vpopmail 36160 17.9  6.2 131024 64880  ??  R    12:28AM  72:05.97 sqwebmail
vpopmail 36163  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
vpopmail 36318  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
vpopmail 36322  0.0  6.1 128880 63880  ??  RN   12:28AM  69:52.48 sqwebmail
vpopmail 36325  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
[9:[EMAIL PROTECTED]:[~]# kill 36160
[9:[EMAIL PROTECTED]:[~]# ps auxwww | grep sqwebmail
vpopmail 36315 29.4  9.6 130496 100548  ??  R    12:28AM  72:00.67 sqwebmail
vpopmail 36318  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
vpopmail 36322  0.0  6.1 128880 63880  ??  RN   12:28AM  69:52.48 sqwebmail
vpopmail 36325  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
[9:[EMAIL PROTECTED]:[~]# ps auxwww | grep sqwebmail
vpopmail 36315 43.5  8.9 130592 92876  ??  R    12:28AM  72:08.03 sqwebmail
vpopmail 36318  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
vpopmail 36322  0.0  6.1 128880 63880  ??  RN   12:28AM  69:52.48 sqwebmail
vpopmail 36325  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
[9:[EMAIL PROTECTED]:[~]# kill 36322
[9:[EMAIL PROTECTED]:[~]# ps auxwww | grep sqwebmail
vpopmail 36315 61.9  6.2 66620 65068  ??  R    12:28AM  72:56.65 sqwebmail
vpopmail 36318  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
vpopmail 36322  0.0  6.1 128880 63884  ??  RN   12:28AM  69:53.08 sqwebmail
vpopmail 36325  0.0  0.0  1820    0  ??  IW   -         0:00.00 sqwebmail
vpopmail 58658  0.0  0.0     0    0  ??  Z     9:39AM   0:00.00  (sqwebmail)
[9:[EMAIL PROTECTED]:[~]# kill 36315
[9:[EMAIL PROTECTED]:[~]# ps auxwww | grep sqwebmail
vpopmail 58743  0.0  0.0     0    0  ??  Z     9:40AM   0:00.00  (sqwebmail)
[9:[EMAIL PROTECTED]:[~]# ps auxwww | grep sqwebmail
Exit 1
------ END top output @ 9:39 AM ------

In the third from the bottom `ps auxwww | grep sqwebmail` output above,
process 36315 appears to be taking up roughly 66M of memory, and process
36322 appears to be taking up roughly 128M of memory. Not sure if that's
shared with other processes though...

Here is some ktrace data from the processes above:

------ BEGIN ktrace.out from process 36068 ------
 36068 sqwebmail CALL  break(0xbf50000)
 36068 sqwebmail RET   break 0
 36068 sqwebmail CALL  break(0xfdb5000)
 36068 sqwebmail RET   break 0
 36068 sqwebmail CALL  read(0x4,0x80d8000,0x4000)
 36068 sqwebmail GIO   fd 4 read 16384 bytes
       "*
        *
        *
        *
        *
        *
< this goes on for tens of thousands of lines >
        *
        *
        *
        *
        *
        *
        *
        *
        *
       "
 36068 sqwebmail RET   read 16384/0x4000
 36068 sqwebmail CALL  break(0xbfce000)
 36068 sqwebmail RET   break 0
 36068 sqwebmail CALL  break(0xfeb1000)
 36068 sqwebmail RET   break 0
 36068 sqwebmail PSIG  SIGTERM caught handler=0x807138c mask=0x0 code=0x0
 36068 sqwebmail CALL  write(0x2,0xbfbff7c8,0x9)
 36068 sqwebmail GIO   fd 2 wrote 9 bytes
       "sqwebmail"
 36068 sqwebmail RET   write 9
 36068 sqwebmail CALL  write(0x2,0x281b8e8d,0xb)
 36068 sqwebmail GIO   fd 2 wrote 11 bytes
       " in free():"
 36068 sqwebmail RET   write 11/0xb
 36068 sqwebmail CALL  write(0x2,0x281b8cc9,0xa)
 36068 sqwebmail GIO   fd 2 wrote 10 bytes
       " warning: "
 36068 sqwebmail RET   write 10/0xa
 36068 sqwebmail CALL  write(0x2,0x281b8e6e,0xf)
 36068 sqwebmail GIO   fd 2 wrote 15 bytes
       "recursive call
       "
 36068 sqwebmail RET   write 15/0xf
 36068 sqwebmail CALL  exit(0)
------ END ktrace.out from process 36068 ------


The longer I let the process (and ktrace) run, the
more asterisks this file fills up with in the center.


------ BEGIN ktrace.out from process 36322 ------
 36322 sqwebmail CALL  read(0x4,0x80d8000,0x4000)
 36322 sqwebmail GIO   fd 4 read 16384 bytes
       "*
        *
        *
        *
< this goes on for tens of thousands of lines >
        *
        *
        *
        *
       "
 36322 sqwebmail RET   read 16384/0x4000
 36322 sqwebmail CALL  break(0xbede000)
 36322 sqwebmail RET   break 0
 36322 sqwebmail CALL  break(0xfcd1000)
 36322 sqwebmail RET   break 0
 36322 sqwebmail PSIG  SIGTERM caught handler=0x807138c mask=0x0 code=0x0
 36322 sqwebmail CALL  write(0x2,0xbfbff7c8,0x9)
 36322 sqwebmail GIO   fd 2 wrote 9 bytes
       "sqwebmail"
 36322 sqwebmail RET   write 9
 36322 sqwebmail CALL  write(0x2,0x281b8e8d,0xb)
 36322 sqwebmail GIO   fd 2 wrote 11 bytes
       " in free():"
 36322 sqwebmail RET   write 11/0xb
 36322 sqwebmail CALL  write(0x2,0x281b8cc9,0xa)
 36322 sqwebmail GIO   fd 2 wrote 10 bytes
       " warning: "
 36322 sqwebmail RET   write 10/0xa
 36322 sqwebmail CALL  write(0x2,0x281b8e6e,0xf)
 36322 sqwebmail GIO   fd 2 wrote 15 bytes
       "recursive call
       "
 36322 sqwebmail RET   write 15/0xf
 36322 sqwebmail CALL  exit(0)
------ END ktrace.out from process 36322 ------

Ditto.

The other process's ktrace.out's look pretty much
the same.

Any ideas? Anyone know how I can troubleshoot/debug
this more accurately?

Thanks!

-- 
Jesse Guardiani, Systems Administrator
WingNET Internet Services,
P.O. Box 2605 // Cleveland, TN 37320-2605
423-559-LINK (v)  423-559-5145 (f)
http://www.wingnet.net



Reply via email to