On 07/04/18 17:14, Reindl Harald wrote:

Am 07.04.2018 um 18:10 schrieb Sebastian Arcus:
And the way I customise the scores are based on the type of emails
received at this particular site. It might seem "idiotic" to you, but
there are reasons for those scores. Not everyone receives the same mix
of email - so it isn't constructive to start calling other people's
scoring "idiotic" just because they are not the same as your own or the
if a single misfired rule make a BAYES_00 message to a spam message it's
idiotic - it's that easy - with or without MSGID_SPAM_CAPS that can
happen at every moment in time and when you trust your bayes -0.2 is not
justified and if you don't trust your bayes train it

A default score of 3.1 for MSGID_SPAM_CAPS is pretty high - even compared with some of the DNS blacklists rules - and some of those are pretty powerful INMHO. Hence why I was trying to understand why this rule is assigned such a high score and what is the significance of it.

Secondly, I found in the past that a high negative score for BAYES_00 is counter-productive, because:

1. As soon as you receive a spam message with a new type of content, it essentially has a free ride until it gets put through the bayes training - as the high negative on BAYES_00 counteracts any other rule it hits - even pretty effective rules, such as Pyzor and blacklists.

2. Spammers have learned from the above, and I get a lot of spam which changes the wording all the time, so that bayes becomes essentially ineffective against it - but at the same time it stops other rules from working - because of the high negative scores on low BAYES.

3. Spammers have also learned from no.1 , and I see a lot of extremely short spam messages - just one short line of few words. Bayes seems to be extremely ineffective on these very short messages, not matter how much you train it - because of the small amount of data to work on, and with a little bit of cunning and varying the words used - they all score as BAYES_00. Again, the high negative score gives these spammers a guaranteed free ride, as it overrides any other rules.

So at least from the type of spam that I see, BAYES_00 with a large negative score is really counter-productive and it makes SA far less efficient at picking spam.

BAYES_00 doesn't necessarily mean "I am sure this is not spam" - as a good quality whitelist rule would, for example. It merely means "I haven't really seen this type of spam before", or simply "this message is too short and I really can't say anything useful about it". For these reasons, I don't think low BAYES scores should be given large negative scores - and hence why I changed them on my systems - with really good results.

Reply via email to