On 21 Mar 2019, at 10:52, John Wilcock wrote:

Le 21/03/2019 à 14:52, John Wilcock a écrit :
Le 20/03/2019 à 20:19, Bill Cole a écrit :
I've added these lines to the block that defines MIXED_ES which may help some sites:

     lang pl  score MIXED_ES  0.01
     lang cz  score MIXED_ES  0.01
     lang sk  score MIXED_ES  0.01
     lang hr  score MIXED_ES  0.01
     lang el  score MIXED_ES  0.01

Those should get into the default rules channel within a few days.

All very well, except [...]
Also, there are *lots* of other languages that legitimately use E-like characters that should be added to the list (e.g. there's a Cyrillic "е", so you can add ru, bg, uk, be, bs, sr, kk, ky, mn, tg and others, for a start; ). You'll be fighting a losing battle there...

Actually not a battle I'm fighting...

I have seen direct reports of this rule (which is substantially more narrow than just 'has mixed e-like characters') matching ham in the above listed languages. I know that on the order of 0.001% of ham in the masscheck data submitted to SA Rule QA match the rule and that the bulk of that is from a single small corpus (from a Polish source) in which ~0.5% of ham matches. It appears that occasionally that match rate results in a classification false positive, which is a real but small and constraijned problem.

I have never seen an actual ham message matching the rule, much less had access to a mail stream including a steady stream of such messages. I have only ever seen vague reports of classification FPs, all of which cite the score as 3.999, which has not been accurate for most of the lifetime of the rule. As such, I have no real weapons in this battle and a foe who is invisible but noisy, to overstretch your analogy.

Individual sites are always free to kill or redefine rules from the default set or peg their scores to limit FPs.

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole

Reply via email to