On Fri, 31 Mar 2006, Daryl C. W. O'Shea wrote:

(copying Justin since this has to do with pre-forking)

Dan Mahoney, System Admin wrote:
On Fri, 10 Mar 2006, Daryl C. W. O'Shea wrote:

On 3/10/2006 11:22 AM, Dan Mahoney, System Admin wrote:

Okay,

I'm still getting these issues. I've corrected every other issue that's plagued us, and the thing still locks up. USUALLY when a user gets some form of dictionary spam. For the users I can identify I've been keeping copies of their stuff.

NOTE: This is under a stock 3.1.1, if there are any other patches I should be using from the previous conversations that are NOT in 3.1.1, please let me know, and I'll make sure I have those too. I'm seeing lots of the following:

Mar 30 21:52:14 quark spamd[45835]: __alarm__
Mar 30 21:52:14 quark spamd[45835]: __alarm__
Mar 30 21:52:14 quark spamd[45835]: spamd: copy_config timeout (with empty $@), respawning child process after 25 messages at /usr/local/bin/spamd line 982.
Mar 30 21:52:16 quark spamd[52479]: __alarm__
Mar 30 21:52:16 quark spamd[52479]: __alarm__
Mar 30 21:52:16 quark spamd[52479]: spamd: copy_config timeout (with empty $@), respawning child process after 9 messages at /usr/local/bin/spamd line 982.

This indicates that the patch from bug 4699 is working -- spamd now recognizes that the alarm timed out on copy_config.


And also some of this:

Mar 30 21:52:31 quark spamd[42292]: syswrite() on closed filehandle GEN88 at /usr/local/lib/perl5/5.8.6/mach/IO/Handle.pm line 451. Mar 30 21:52:31 quark spamd[42292]: Use of uninitialized value in concatenation (.) or string at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 330. Mar 30 21:52:31 quark spamd[42292]: prefork: write of ping failed to 52479 fd=: at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 330. Mar 30 21:52:31 quark spamd[42292]: Use of uninitialized value in concatenation (.) or string at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 127. Mar 30 21:52:31 quark spamd[42292]: prefork: killing failed child 52479 fd= at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 127. Mar 30 21:52:31 quark spamd[42292]: prefork: killed child 52479 at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 141. Mar 30 21:52:31 quark spamd[42292]: syswrite() on closed filehandle GEN70 at /usr/local/lib/perl5/5.8.6/mach/IO/Handle.pm line 451. Mar 30 21:52:31 quark spamd[42292]: Use of uninitialized value in concatenation (.) or string at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 330. Mar 30 21:52:31 quark spamd[42292]: prefork: write of ping failed to 45835 fd=: at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 330. Mar 30 21:52:31 quark spamd[42292]: Use of uninitialized value in concatenation (.) or string at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 127. Mar 30 21:52:31 quark spamd[42292]: prefork: killing failed child 45835 fd= at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 127. Mar 30 21:52:31 quark spamd[42292]: prefork: killed child 45835 at /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm line 141.

This indicates that the child is exiting, but SpamdForkScaling doesn't know about it until a ping fails 150 seconds later, so a new child isn't spawned for a long time after one of them commits suicide.


Example at or around Mar 30 01:48:16 in this file:

http://www.gushi.org/maillog33106-0.txt

And another similar lockup at Mar 30 21:49:50 -- SAME USER, go figure.

I don't have archived copies of this user's mail -- yet. I've set up archiving for them, and we have everything from now forward, but I'm convinced there's SOMETHING in the spam they're getting that causes a lockup.

I think it's actually load related... spamd is timing out the copy_config sooner than it's really taking under high load. If you were to change the alarm value from 10 to 100 or so, around spamd line 949 this may go away.

Oddly enough, one of our other machines on our network (which also runs spamd) also seems to die around the same time. I'm concerned about IT as well, but less than this one. Still, snagging logs there is probably not a bad idea.

Any idea what sort of load averages you've got when this starts to happen? It looks like it starts off with a couple children timing out, then you become short on children, mail starts stacking up, and it snowballs from there.

I know somewhere in those logs it started rejecting mail on load average 12. A simple one-liner in spamd to echo the load into the logs could be useful (I don't need a patch, but telling me what to put and where to put it could be useful). Alternatively I could just do something with logger(1), echo(1), uptime(1) and cron.

Somewhere along the lines last night I also lost connection to AIM (which runs from that netblock) so it's quite possibly network congestion related. Even so, if I theoretically had 30 seconds of latency 6 hours ago, spamd should theoretically NOT still be hanging now...

BTW, we should probably find or open a bugzilla ticket for this. Bug 4699 is related. The pre-fork issue is probably another bug of its own.

If you say to do that, then I'll certainly do that -- last thing I wanted to do was open a false bug, at least until the other unrelated annoyances were cleared up (and thank you guys immensely for that). Didn't you say you yourself (Daryl) were having a similar issue?

-Dan

--


--------Dan Mahoney--------
Techie,  Sysadmin,  WebGeek
Gushi on efnet/undernet IRC
ICQ: 13735144   AIM: LarpGM
Site:  http://www.gushi.org
---------------------------

Reply via email to