Re: No longer just embedded =9D characters in blackmail emails.

Bill Cole Thu, 21 Mar 2019 08:34:31 -0700

On 21 Mar 2019, at 10:52, John Wilcock wrote:

Le 21/03/2019 à 14:52, John Wilcock a écrit :
Le 20/03/2019 à 20:19, Bill Cole a écrit :
I've added these lines to the block that defines MIXED_ES which mayhelp some sites:
     lang pl  score MIXED_ES  0.01
     lang cz  score MIXED_ES  0.01
     lang sk  score MIXED_ES  0.01
     lang hr  score MIXED_ES  0.01
     lang el  score MIXED_ES  0.01

Those should get into the default rules channel within a few days.
All very well, except [...]
Also, there are *lots* of other languages that legitimately use E-likecharacters that should be added to the list (e.g. there's a Cyrillic"е", so you can add ru, bg, uk, be, bs, sr, kk, ky, mn, tg andothers, for a start; ). You'll be fighting a losing battle there...


Actually not a battle I'm fighting...

I have seen direct reports of this rule (which is substantially morenarrow than just 'has mixed e-like characters') matching ham in theabove listed languages. I know that on the order of 0.001% of ham in themasscheck data submitted to SA Rule QA match the rule and that the bulkof that is from a single small corpus (from a Polish source) in which~0.5% of ham matches. It appears that occasionally that match rateresults in a classification false positive, which is a real but smalland constraijned problem.

I have never seen an actual ham message matching the rule, much less hadaccess to a mail stream including a steady stream of such messages. Ihave only ever seen vague reports of classification FPs, all of whichcite the score as 3.999, which has not been accurate for most of thelifetime of the rule. As such, I have no real weapons in this battle anda foe who is invisible but noisy, to overstretch your analogy.

Individual sites are always free to kill or redefine rules from thedefault set or peg their scores to limit FPs.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole

Re: No longer just embedded =9D characters in blackmail emails.

Reply via email to