Re: Identifying Source of False Positives

John Hardin Mon, 01 Jun 2009 13:36:43 -0700

On Mon, 1 Jun 2009, Rich Shepard wrote:

On Mon, 1 Jun 2009, John Hardin wrote:

 If these are system-generated messages, something is improperly training
 SA that they are spam. Do you use autolearn?


John,

 No. Once a week or so I run sa-learn specifying spam on the spam-uncaught
mbox file. Less frequently I run it on mail list files specifying them as
ham.


And I assume you look at the sapm-uncaught file before learning it?

If some log files got in there and were learned, that could explain thedeterioration.

Have you kept your spam and ham corpa? I would suggest wiping your Bayesdatabase and retraining it, after reviewing the corpa.

 Primarily I'd suggest you exclude locally-generated emails from SA
 completely. If you'd post the Received: headers from such a message and
 the procmail stanza where you pass messages to SA for scoring I could
 suggest something.


 Here are all headers from the mail log summary:

From [email protected] Mon Jun  1 11:25:44 2009

Return-Path: <[email protected]>
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.2.5-ph20040310.0 (2008-06-10) on
        salmo.appl-ecosys.com
X-Spam-Level: ****
X-Spam-Status: Yes, score=4.9 required=4.0 tests=ALL_TRUSTED,AWL,BAYES_99,
         EMPTY_BODY,NORMAL_HTTP_TO_IP,NUMERIC_HTTP_ADDR,URI_HEX,URI_NOVOWEL
         autolearn=no version=3.2.5-ph20040310.0
X-Spam-Report:
         * -1.3 ALL_TRUSTED Passed through trusted hosts only via SMTP
         *  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
         *      [score: 1.0000]
         *  2.5 EMPTY_BODY BODY: Message has subject but no body
         *  0.0 NORMAL_HTTP_TO_IP URI: Uses a dotted-decimal IP address in
         URL
         *  0.4 URI_HEX URI: URI hostname has long hexadecimal sequence
         *  0.0 NUMERIC_HTTP_ADDR URI: Uses a numeric IP address in URL
         *  1.6 URI_NOVOWEL URI: URI hostname has long non-vowel sequence
         * -1.8 AWL AWL: From: address is in the auto white-list
X-Original-To: [email protected]
Delivered-To: [email protected]
Received: from salmo.appl-ecosys.com (localhost.localdomain [127.0.0.1])
         by salmo.appl-ecosys.com (Postfix) with ESMTP id 8DA0F1026
         for <[email protected]>; Mon,  1 Jun 2009 11:25:44 -0700
         (PDT)


Okay, let's key on that one.

## Call SpamAssassin
: 0fw: spamassassin.lock
* < 256000
|  spamassassin


:0 fw: spamassassin.lock
* < 256000
* ! ^TO_abuse@
* ! ^List-Id: .*<?use...@.]spamassassin\.apache\.org>?
* ! ^Received: from salmo\.appl-ecosys\.com \(localhost\.localdomain 
\[127\.0\.0\.1\]) by salmo\.appl-ecosys\.com
| /usr/bin/spamc

Using spamc creates less load than launching spamassassin from scratch forevery email, but you do have to manage the daemon (i.e. restart it if therules change).

Are your resources really so limited that you want to serialize all emaildelivery? As a middle ground you might consider per-user lockfilesinstead, e.g.:


   :0 fw: $HOME/.spamassassin.lock

I'd also suggest upping the size limit a bit, but that's not a big issue.

There are more complex things you can do; you might want to take a look athttp://www.impsec.org/~jhardin/antispam/spamassassin.procmail


--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 [email protected]    FALaholic #11174     pgpk -a [email protected]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  We have to realize that people who run the government can and do
  change. Our society and laws must assume that bad people -
  criminals even - will run the government, at least part of the
  time.                                               -- John Gilmore
-----------------------------------------------------------------------
 5 days until the 65th anniversary of D-Day

Re: Identifying Source of False Positives

Reply via email to