On 30 Nov 2018, at 8:29, Amir Caspi wrote:
On Nov 30, 2018, at 6:09 AM, RW <rwmailli...@googlemail.com> wrote:
The most substantial problem here is that these invisible characters
make it very hard to write ordinary body rules.
Thanks for the clarification on my confusion. Since HTML is already
getting rendered to text, then perhaps the conversion code should
strip (literally, just delete) any zero-width characters during this
conversion? That should make normal body rules, and Bayes, function
properly, no?
Not if they are *looking for* those characters.
Is there a reason not to strip out zero-width characters? That is, is
there any benefit or reason to maintain invisible chars versus
throwing them out?
The presence of zero-width characters is a very strong spam indicator.
It isn't quite perfect however, since at least one procedurally
legitimate and rather popular US entity is sending mail that people
affirmatively want to receive like this:
https://www.scconsult.com/atkspam.txt
--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole