Re: Different bayes results from command line and through MTA

Sebastian Arcus Fri, 23 Dec 2016 16:20:45 -0800

On 23/12/16 17:02, Andrzej A. Filip wrote:

Sebastian Arcus <s.ar...@open-t.co.uk> wrote:

On 23/12/16 10:12, Sebastian Arcus wrote:

I know this hot potato has been discussed before - but I'm afraid it's
back to haunt me and I can't fathom it out. I'm getting again different
bayes results if I test a message on the command line, compared to it
going through exim -> spamassassin.


</snip>


OK - after staring for a good while at debug logs, I think I finally
found the culprit. The saved .eml file which I pass through spamc
contains the report embedded by spamassassin in the headers (that's
how my Exim is configured). This report includes the first few lines
of the actual email body. This in turn has the effect of effectively
doubling the Bayes score, as spamassassin tokenizes these sample lines
on top of the actual email body. As the email body for these
particular spam emails is small - the sample in the header is almost
equal in size with the text in the email body itself.

As soon as I manually delete the SA headers and report in the .eml
file, and pass the message again through spamc, I get identical Bayes
scores to the ones when the message passes initially through Exim ->
SA.

However, this raises some interesting questions. It would appear that
SA is incapable of recognising it's own reports in the header of the
emails, and tokenizes them as well and adds them to the Bayes
report. Is that right?

Also, does it mean that, as SA tokenizes all the info in the headers,
my own email address, as the recipient of the email, will also be
added to the database of spam tokens - when I ask SA to learn a
message as spam?

I seem to have ended up with more questions than I started :-)


Have you considered using bayes_ignore_header in spamassassin
configuration file?

https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html



Many thanks for the suggestion - I didn't know about bayes_ignore_header

One quick question - does anybody know if bayes_ignore_header takeseffect both when classifying email *and* when learning spam/ham?

Re: Different bayes results from command line and through MTA

Reply via email to