> Or one could do like Theo, and strip all HTML content from > the emails. :) Or do that. I'd love to do that. But unfortunately, some users actually like html mails. No accounting for taste :)
> The problem with the normalization, is like anything else. > One mans ham, > anothers spam. Repetitive letters show up in item codes, code > snippets, > fubar'd uuencoding, ect... > > It would also void out a lot of pre-exhisting rules that look > for some of > these filter bypassing codes. Which is why I suggested introducing a different code class. So "body", "rawbody", "full", "header", "uri" scan the regular mail, but "normalbody" scans the normalized mail. > I always try to turn their attempts to bypass, into spam flags. True, but with viiagraa, ciia-liis etc., you either get to write new rules every other day or write rather expensive ones. And in an international organization (not that I am, but just for arguments sake), can you sleep well awarding body /\bv.{0,3}i.{0,3}a.{0,3}g.{0,3}r.{0,3}a\b/i a score of 5.0 if some portugese, burmese or danish word would match that pattern as well? :) Regs, Sven