Hi all,

I'd like to thank you Bill for looking into this. I was a bit disappointed by the way the issue was handled at first on bugzilla.

I must agree that the server's locale could be information to be considered but I don't think it solves the issue. I agree that this test is effective on catching the type of spam it was intended for. I found a number of spam messages caught by this while investigating the issue.

What should be considered is the message's language. All messages that were false positives had the following mime encoding (messages were actually in greek):

Content-Type: text/[plain|html]; charset="windows-1253" or
Content-Type: text/[plain|html]; charset="iso-8859-7"

while all messages that were actual spam and were properly detected had:

Content-Type: text/[plain|html]; charset="utf-8"

I'm afraid I cannot provide any sample of the false positives at the moment.

Hope the above helps. Spamassassin is a great project and we are trying to help improve it

--

Savvas Karagiannidis

On 21/3/2019 16:52, John Wilcock wrote:
Le 21/03/2019 à 14:52, John Wilcock a écrit :
Le 20/03/2019 à 20:19, Bill Cole a écrit :
I've added these lines to the block that defines MIXED_ES which may help some sites:

     lang pl  score MIXED_ES  0.01
     lang cz  score MIXED_ES  0.01
     lang sk  score MIXED_ES  0.01
     lang hr  score MIXED_ES  0.01
     lang el  score MIXED_ES  0.01

Those should get into the default rules channel within a few days.

All very well, except [...]
Also, there are *lots* of other languages that legitimately use E-like characters that should be added to the list (e.g. there's a Cyrillic "е", so you can add ru, bg, uk, be, bs, sr, kk, ky, mn, tg and others, for a start; ). You'll be fighting a losing battle there...

Reply via email to