On Mon, 1 Jun 2009, Rich Shepard wrote:

On Mon, 1 Jun 2009, John Hardin wrote:

 If these are system-generated messages, something is improperly training
 SA that they are spam. Do you use autolearn?

John,

 No. Once a week or so I run sa-learn specifying spam on the spam-uncaught
mbox file. Less frequently I run it on mail list files specifying them as
ham.

And I assume you look at the sapm-uncaught file before learning it?

If some log files got in there and were learned, that could explain the deterioration.

Have you kept your spam and ham corpa? I would suggest wiping your Bayes database and retraining it, after reviewing the corpa.

 Primarily I'd suggest you exclude locally-generated emails from SA
 completely. If you'd post the Received: headers from such a message and
 the procmail stanza where you pass messages to SA for scoring I could
 suggest something.

 Here are all headers from the mail log summary:

From [email protected] Mon Jun  1 11:25:44 2009
Return-Path: <[email protected]>
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.2.5-ph20040310.0 (2008-06-10) on
        salmo.appl-ecosys.com
X-Spam-Level: ****
X-Spam-Status: Yes, score=4.9 required=4.0 tests=ALL_TRUSTED,AWL,BAYES_99,
         EMPTY_BODY,NORMAL_HTTP_TO_IP,NUMERIC_HTTP_ADDR,URI_HEX,URI_NOVOWEL
         autolearn=no version=3.2.5-ph20040310.0
X-Spam-Report:
         * -1.3 ALL_TRUSTED Passed through trusted hosts only via SMTP
         *  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
         *      [score: 1.0000]
         *  2.5 EMPTY_BODY BODY: Message has subject but no body
         *  0.0 NORMAL_HTTP_TO_IP URI: Uses a dotted-decimal IP address in
         URL
         *  0.4 URI_HEX URI: URI hostname has long hexadecimal sequence
         *  0.0 NUMERIC_HTTP_ADDR URI: Uses a numeric IP address in URL
         *  1.6 URI_NOVOWEL URI: URI hostname has long non-vowel sequence
         * -1.8 AWL AWL: From: address is in the auto white-list
X-Original-To: [email protected]
Delivered-To: [email protected]
Received: from salmo.appl-ecosys.com (localhost.localdomain [127.0.0.1])
         by salmo.appl-ecosys.com (Postfix) with ESMTP id 8DA0F1026
         for <[email protected]>; Mon,  1 Jun 2009 11:25:44 -0700
         (PDT)

Okay, let's key on that one.

## Call SpamAssassin
: 0fw: spamassassin.lock
* < 256000
|  spamassassin

:0 fw: spamassassin.lock
* < 256000
* ! ^TO_abuse@
* ! ^List-Id: .*<?use...@.]spamassassin\.apache\.org>?
* ! ^Received: from salmo\.appl-ecosys\.com \(localhost\.localdomain 
\[127\.0\.0\.1\]) by salmo\.appl-ecosys\.com
| /usr/bin/spamc

Using spamc creates less load than launching spamassassin from scratch for every email, but you do have to manage the daemon (i.e. restart it if the rules change).

Are your resources really so limited that you want to serialize all email delivery? As a middle ground you might consider per-user lockfiles instead, e.g.:

   :0 fw: $HOME/.spamassassin.lock

I'd also suggest upping the size limit a bit, but that's not a big issue.

There are more complex things you can do; you might want to take a look at http://www.impsec.org/~jhardin/antispam/spamassassin.procmail

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 [email protected]    FALaholic #11174     pgpk -a [email protected]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  We have to realize that people who run the government can and do
  change. Our society and laws must assume that bad people -
  criminals even - will run the government, at least part of the
  time.                                               -- John Gilmore
-----------------------------------------------------------------------
 5 days until the 65th anniversary of D-Day

Reply via email to