On 30 Nov 2018, at 8:29, Amir Caspi wrote:

On Nov 30, 2018, at 6:09 AM, RW <rwmailli...@googlemail.com> wrote:

The most substantial problem here is that these invisible characters
make it very hard to write ordinary body rules.

Thanks for the clarification on my confusion. Since HTML is already getting rendered to text, then perhaps the conversion code should strip (literally, just delete) any zero-width characters during this conversion? That should make normal body rules, and Bayes, function properly, no?

Not if they are *looking for* those characters.

Is there a reason not to strip out zero-width characters? That is, is there any benefit or reason to maintain invisible chars versus throwing them out?

The presence of zero-width characters is a very strong spam indicator. It isn't quite perfect however, since at least one procedurally legitimate and rather popular US entity is sending mail that people affirmatively want to receive like this: https://www.scconsult.com/atkspam.txt

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole

Reply via email to