On 07/04/18 17:14, Reindl Harald wrote:
Am 07.04.2018 um 18:10 schrieb Sebastian Arcus:
And the way I customise the scores are based on the type of emails
received at this particular site. It might seem "idiotic" to you, but
there are reasons for those scores. Not everyone receives the same mix
of email - so it isn't constructive to start calling other people's
scoring "idiotic" just because they are not the same as your own or the
if a single misfired rule make a BAYES_00 message to a spam message it's
idiotic - it's that easy - with or without MSGID_SPAM_CAPS that can
happen at every moment in time and when you trust your bayes -0.2 is not
justified and if you don't trust your bayes train it
A default score of 3.1 for MSGID_SPAM_CAPS is pretty high - even
compared with some of the DNS blacklists rules - and some of those are
pretty powerful INMHO. Hence why I was trying to understand why this
rule is assigned such a high score and what is the significance of it.
Secondly, I found in the past that a high negative score for BAYES_00 is
1. As soon as you receive a spam message with a new type of content, it
essentially has a free ride until it gets put through the bayes training
- as the high negative on BAYES_00 counteracts any other rule it hits -
even pretty effective rules, such as Pyzor and blacklists.
2. Spammers have learned from the above, and I get a lot of spam which
changes the wording all the time, so that bayes becomes essentially
ineffective against it - but at the same time it stops other rules from
working - because of the high negative scores on low BAYES.
3. Spammers have also learned from no.1 , and I see a lot of extremely
short spam messages - just one short line of few words. Bayes seems to
be extremely ineffective on these very short messages, not matter how
much you train it - because of the small amount of data to work on, and
with a little bit of cunning and varying the words used - they all score
as BAYES_00. Again, the high negative score gives these spammers a
guaranteed free ride, as it overrides any other rules.
So at least from the type of spam that I see, BAYES_00 with a large
negative score is really counter-productive and it makes SA far less
efficient at picking spam.
BAYES_00 doesn't necessarily mean "I am sure this is not spam" - as a
good quality whitelist rule would, for example. It merely means "I
haven't really seen this type of spam before", or simply "this message
is too short and I really can't say anything useful about it". For these
reasons, I don't think low BAYES scores should be given large negative
scores - and hence why I changed them on my systems - with really good