On 12/05/2018 02:45 PM, John Hardin wrote:
I've added a "too many [ascii][unicode][ascii]" rule based on that but I suspect it will be pretty FP-prone and will be pretty large if we want to avoid whack-a-mole syndrome. For this, normalize + bayes is probably the best bet.

Is it possible to detect when a Unicode code point is being used in place of an ASCII / ANSI character specifically to avoid pattern detection? I.e. multiple Unicode code points that represent or are otherwise a stand in for an ASCII / ANSI "a"?

Or is keeping up with this list tantamount to whack-a-mole?

I would think that too high of a percentage of Unicode when bog standard ASCII / ANSI would suffice would be an indication in and of itself. I'm not seeing how legitimate (non-spam) email would trigger a false positive if the percentage was tuned correctly.



--
Grant. . . .
unix || die

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to