About five months ago, I experienced a problem that I *thought* I had resolved, but I am observing similar behavior after retraining the Bayes database. While the symptoms are similar, the root cause seems to be different (thankfully). The original problem is documented at http://spamassassin.1065346.n5.nabble.com/Very-spammy-messages-yield-BAYES-00-1-9-td101167.html .
In any case, I am again seeing SA scores that seem way too low for the message content in question. My "glue", as it were, is Amavis-New. In particular, certain messages that are clearly SPAM are scored between 0 and 3 when processed via Amavis. However, if I process the same messages with the "spamassassin" binary, directly, the scores are much higher and much more in-line with what one would expect. The X-Spam-Status header, when processed via Amavis, looks like this: X-Spam-Status: No, score=0.8 tagged_above=-999 required=2 tests=[BAYES_50=0.8, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=disabled When I process the same message with spamassassin, directly (spamassassin -t -D < /tmp/msg.txt), the header looks like this: ---------------------------------------------------------------------- X-Spam-Status: Yes, score=7.5 required=5.0 tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,MISSING_MID,MISSING_SUBJECT,NO_HEADERS_MESSAGE,NO_RECEIVED,NO_RELAYS autolearn=disabled version=3.3.1 [...] Content analysis details: (7.5 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 NO_RELAYS Informational: message was not relayed via SMTP 1.2 MISSING_HEADERS Missing To: header 2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.5000] 1.2 MISSING_MID Missing Message-Id: header 1.3 MISSING_SUBJECT Missing Subject: header -0.0 NO_RECEIVED Informational: message has no Received headers 1.8 MISSING_DATE Missing Date: header 0.0 NO_HEADERS_MESSAGE Message appears to be missing most RFC-822 headers ---------------------------------------------------------------------- In short, my question is, how the **** is the message scoring 0.8 in one case and 7.5 in another? That is a massive discrepancy. >From what I can tell, the same tests aren't even being performed in each case. I have to assume that the options that are passed to SA are wildly different in each case. It bears mention that the server in question uses ISPConfig 3. ISPConfig allows for SA policies to be configured per-domain and per-user, and Amavis leverages MySQL to make that happen. If relevant, I can provide more information about this aspect of my setup. These are the only directives that I've added to /etc/spamassassin/local.cf: ---------------------------------------------------------------------- bayes_path /var/lib/amavis/.spamassassin/bayes use_bayes 1 bayes_auto_expire 0 bayes_store_module Mail::SpamAssassin::BayesStore::MySQL bayes_sql_dsn DBI:mysql:sa_bayes:localhost bayes_sql_username sa_user bayes_sql_password [scrubbed] bayes_sql_override_username amavis ---------------------------------------------------------------------- Given the first directive, SA should always use the same Bayes database (the one I've configured in MySQL), regardless of how SA is called, right? For those curious about the state of the Bayes database, here's the output from "sa-learn --dump magic" (sorry for the wrapping): 0.000 0 3 0 non-token data: bayes db version 0.000 0 2007 0 non-token data: nspam 0.000 0 6554 0 non-token data: nham 0.000 0 188379 0 non-token data: ntokens 0.000 0 1356345829 0 non-token data: oldest atime 0.000 0 1357769317 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 1357727978 0 non-token data: last expiry atime 0.000 0 1382400 0 non-token data: last expire atime delta 0.000 0 3191 0 non-token data: last expire reduction count Ultimately, it seems that I should be trying to figure out how, exactly, Amavis is calling SpamAssassin in the course of normal operation. Thanks for any help here, folks! -Ben