On 10/03/2008 4:46 AM, Johann Spies wrote:
> Hello Daryl,
> 
> Thanks for your reply.
> 
> On Fri, Mar 07, 2008 at 02:23:13PM -0500, Daryl C. W. O'Shea wrote:
>> On 05/03/2008 5:44 AM, Johann Spies wrote:
>>> On Thu, Feb 28, 2008 at 02:44:02PM +0200, Johann Spies wrote:
>>>> On a new mailserver with 8Gb ram and 2xdual-core CPU's we get regular
>>>> messages in the log:
>>>>
>>>> Feb 28 12:52:43 mail2 spamd[32558]: prefork: child states: BIBBB
>>>> Feb 28 12:52:44 mail2 spamd[459]: rules: failed to run TVD_STOCK1 test, 
>>>> skipping:
>>>> Feb 28 12:52:44 mail2 spamd[459]:  (child processing timeout at 
>>>> /usr/sbin/spamd line 1246.
>>>> Feb 28 12:52:44 mail2 spamd[459]: )
>>>>
>>>> And every time it involves TVD_STOCK1.
>> The rule doesn't look particular bad.  Have you been able to capture a
>> sample email that causes this?  Perhaps its an issue with a large
>> text/plain body with no line breaks.
> 
> 
> Unfortunately I do not know which messages caused the problem.
> 
>> 3.2.3 changes the way DNS timeouts are calculated (SA used to time out
>> its second round of DNS lookups way too early).  Is the machine (or
>> specifically the spamd children) actually busy, or is everything sitting
>> rather idle.
> 
> I am comparing an older server (mail1) with the new server (mail2) in
> this case.  Both running exim, clamav and spamassassin.  This
> statistics is over a period of about 24 hours on 7/8 March last week.
> 
> 
>                  mail1             mail2
> SA-version               3.0.3-2sarge1     3.2.3-0.volatile1
> Messages scanned   43338           22873
> Timeouts (exim)          0                 36
> --max-children           5                 15
> Ram              4G                7G

Try to roughly compare the actual amount of CPU time that the spamd
children are using on each server.  3.2 will use more resources than
3.0, but not twice as much (and throughput shouldn't be 1/2 on faster
hardware).  I suspect you'll find the children on the new server mainly
idle compared to on the old server.

> I have activated the --debug option now and so far have seen 14
> dns-timeouts in the past 40 minutes on mail2.

Keep an eye on your log volume free space if you're not rotating out
logs based on size.

Are the timeouts for the same zone(s)?  Test lookups against those zones
manually.  Is your upstream (or downstream) bandwidth usage near full
capacity?  To the two servers share the same DNS setup?  Is there
something else running on the new server that is driving the load
average up (a common cause of the "child processing timeout" message)?

A little more work... review the debug output for a bunch of messages
(you'll have to separate each message's debug info from the combined
debug log).  What parts of the scanning process are taking the most
amount of time?

Daryl

Reply via email to