Hi all,
I'd like to thank you Bill for looking into this. I was a bit
disappointed by the way the issue was handled at first on bugzilla.
I must agree that the server's locale could be information to be
considered but I don't think it solves the issue. I agree that this test
is effective on catching the type of spam it was intended for. I found a
number of spam messages caught by this while investigating the issue.
What should be considered is the message's language. All messages that
were false positives had the following mime encoding (messages were
actually in greek):
Content-Type: text/[plain|html]; charset="windows-1253" or
Content-Type: text/[plain|html]; charset="iso-8859-7"
while all messages that were actual spam and were properly detected had:
Content-Type: text/[plain|html]; charset="utf-8"
I'm afraid I cannot provide any sample of the false positives at the moment.
Hope the above helps. Spamassassin is a great project and we are trying
to help improve it
--
Savvas Karagiannidis
On 21/3/2019 16:52, John Wilcock wrote:
Le 21/03/2019 à 14:52, John Wilcock a écrit :
Le 20/03/2019 à 20:19, Bill Cole a écrit :
I've added these lines to the block that defines MIXED_ES which may
help some sites:
lang pl score MIXED_ES 0.01
lang cz score MIXED_ES 0.01
lang sk score MIXED_ES 0.01
lang hr score MIXED_ES 0.01
lang el score MIXED_ES 0.01
Those should get into the default rules channel within a few days.
All very well, except [...]
Also, there are *lots* of other languages that legitimately use E-like
characters that should be added to the list (e.g. there's a Cyrillic
"е", so you can add ru, bg, uk, be, bs, sr, kk, ky, mn, tg and others,
for a start; ). You'll be fighting a losing battle there...