On Mon, 1 Jun 2009, Rich Shepard wrote:
On Mon, 1 Jun 2009, John Hardin wrote:
If these are system-generated messages, something is improperly training
SA that they are spam. Do you use autolearn?
John,
No. Once a week or so I run sa-learn specifying spam on the spam-uncaught
mbox file. Less frequently I run it on mail list files specifying them as
ham.
And I assume you look at the sapm-uncaught file before learning it?
If some log files got in there and were learned, that could explain the
deterioration.
Have you kept your spam and ham corpa? I would suggest wiping your Bayes
database and retraining it, after reviewing the corpa.
Primarily I'd suggest you exclude locally-generated emails from SA
completely. If you'd post the Received: headers from such a message and
the procmail stanza where you pass messages to SA for scoring I could
suggest something.
Here are all headers from the mail log summary:
From [email protected] Mon Jun 1 11:25:44 2009
Return-Path: <[email protected]>
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.2.5-ph20040310.0 (2008-06-10) on
salmo.appl-ecosys.com
X-Spam-Level: ****
X-Spam-Status: Yes, score=4.9 required=4.0 tests=ALL_TRUSTED,AWL,BAYES_99,
EMPTY_BODY,NORMAL_HTTP_TO_IP,NUMERIC_HTTP_ADDR,URI_HEX,URI_NOVOWEL
autolearn=no version=3.2.5-ph20040310.0
X-Spam-Report:
* -1.3 ALL_TRUSTED Passed through trusted hosts only via SMTP
* 3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
* [score: 1.0000]
* 2.5 EMPTY_BODY BODY: Message has subject but no body
* 0.0 NORMAL_HTTP_TO_IP URI: Uses a dotted-decimal IP address in
URL
* 0.4 URI_HEX URI: URI hostname has long hexadecimal sequence
* 0.0 NUMERIC_HTTP_ADDR URI: Uses a numeric IP address in URL
* 1.6 URI_NOVOWEL URI: URI hostname has long non-vowel sequence
* -1.8 AWL AWL: From: address is in the auto white-list
X-Original-To: [email protected]
Delivered-To: [email protected]
Received: from salmo.appl-ecosys.com (localhost.localdomain [127.0.0.1])
by salmo.appl-ecosys.com (Postfix) with ESMTP id 8DA0F1026
for <[email protected]>; Mon, 1 Jun 2009 11:25:44 -0700
(PDT)
Okay, let's key on that one.
## Call SpamAssassin
: 0fw: spamassassin.lock
* < 256000
| spamassassin
:0 fw: spamassassin.lock
* < 256000
* ! ^TO_abuse@
* ! ^List-Id: .*<?use...@.]spamassassin\.apache\.org>?
* ! ^Received: from salmo\.appl-ecosys\.com \(localhost\.localdomain
\[127\.0\.0\.1\]) by salmo\.appl-ecosys\.com
| /usr/bin/spamc
Using spamc creates less load than launching spamassassin from scratch for
every email, but you do have to manage the daemon (i.e. restart it if the
rules change).
Are your resources really so limited that you want to serialize all email
delivery? As a middle ground you might consider per-user lockfiles
instead, e.g.:
:0 fw: $HOME/.spamassassin.lock
I'd also suggest upping the size limit a bit, but that's not a big issue.
There are more complex things you can do; you might want to take a look at
http://www.impsec.org/~jhardin/antispam/spamassassin.procmail
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
[email protected] FALaholic #11174 pgpk -a [email protected]
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
We have to realize that people who run the government can and do
change. Our society and laws must assume that bad people -
criminals even - will run the government, at least part of the
time. -- John Gilmore
-----------------------------------------------------------------------
5 days until the 65th anniversary of D-Day