Calling spamassassin directly yields very different results than calling spamassassin via amavis-new

Ben Johnson Wed, 09 Jan 2013 14:14:45 -0800

About five months ago, I experienced a problem that I *thought* I had
resolved, but I am observing similar behavior after retraining the Bayes
database. While the symptoms are similar, the root cause seems to be
different (thankfully). The original problem is documented at
http://spamassassin.1065346.n5.nabble.com/Very-spammy-messages-yield-BAYES-00-1-9-td101167.html
.


In any case, I am again seeing SA scores that seem way too low for the
message content in question. My "glue", as it were, is Amavis-New.

In particular, certain messages that are clearly SPAM are scored between
0 and 3 when processed via Amavis. However, if I process the same
messages with the "spamassassin" binary, directly, the scores are much
higher and much more in-line with what one would expect.

The X-Spam-Status header, when processed via Amavis, looks like this:

X-Spam-Status: No, score=0.8 tagged_above=-999 required=2
tests=[BAYES_50=0.8, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=disabled

When I process the same message with spamassassin, directly
(spamassassin -t -D < /tmp/msg.txt), the header looks like this:

----------------------------------------------------------------------
X-Spam-Status: Yes, score=7.5 required=5.0
tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,MISSING_MID,MISSING_SUBJECT,NO_HEADERS_MESSAGE,NO_RECEIVED,NO_RELAYS
autolearn=disabled version=3.3.1

[...]

Content analysis details:   (7.5 points, 5.0 required)

 pts rule name              description
---- ----------------------
--------------------------------------------------
-0.0 NO_RELAYS              Informational: message was not relayed via SMTP
 1.2 MISSING_HEADERS        Missing To: header
 2.0 BAYES_50               BODY: Bayes spam probability is 40 to 60%
                            [score: 0.5000]
 1.2 MISSING_MID            Missing Message-Id: header
 1.3 MISSING_SUBJECT        Missing Subject: header
-0.0 NO_RECEIVED            Informational: message has no Received headers
 1.8 MISSING_DATE           Missing Date: header
 0.0 NO_HEADERS_MESSAGE     Message appears to be missing most RFC-822
headers
----------------------------------------------------------------------

In short, my question is, how the **** is the message scoring 0.8 in one
case and 7.5 in another? That is a massive discrepancy.

>From what I can tell, the same tests aren't even being performed in each
case.

I have to assume that the options that are passed to SA are wildly
different in each case.

It bears mention that the server in question uses ISPConfig 3. ISPConfig
allows for SA policies to be configured per-domain and per-user, and
Amavis leverages MySQL to make that happen. If relevant, I can provide
more information about this aspect of my setup.

These are the only directives that I've added to /etc/spamassassin/local.cf:

----------------------------------------------------------------------
bayes_path /var/lib/amavis/.spamassassin/bayes

use_bayes 1
bayes_auto_expire 0
bayes_store_module              Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn                   DBI:mysql:sa_bayes:localhost
bayes_sql_username              sa_user
bayes_sql_password              [scrubbed]
bayes_sql_override_username amavis
----------------------------------------------------------------------

Given the first directive, SA should always use the same Bayes database
(the one I've configured in MySQL), regardless of how SA is called, right?

For those curious about the state of the Bayes database, here's the
output from "sa-learn --dump magic" (sorry for the wrapping):

0.000          0          3          0  non-token data: bayes db version
0.000          0       2007          0  non-token data: nspam
0.000          0       6554          0  non-token data: nham
0.000          0     188379          0  non-token data: ntokens
0.000          0 1356345829          0  non-token data: oldest atime
0.000          0 1357769317          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0 1357727978          0  non-token data: last expiry atime
0.000          0    1382400          0  non-token data: last expire
atime delta
0.000          0       3191          0  non-token data: last expire
reduction count

Ultimately, it seems that I should be trying to figure out how, exactly,
Amavis is calling SpamAssassin in the course of normal operation.

Thanks for any help here, folks!

-Ben

Calling spamassassin directly yields very different results than calling spamassassin via amavis-new

Reply via email to