http://bugzilla.spamassassin.org/show_bug.cgi?id=1987





------- Additional Comments From [EMAIL PROTECTED]  2004-02-05 15:07 -------
Most MUA's, except Mozilla, tend to produce rather poor, if not horiffic HTML.

Why not just take the DTD:
http://www.w3.org/TR/html4/loose.dtd

And note tag that doesn't match the DTD's specs.

The problem I see with using Tidy is HTML has so much room to be wrong.  Most
mailers write bad code.  As much as it sucks, it's true.  Tidy demands
perfection.  Secondly, it's possible to forward webpages as email in some email
clients.  Hence the bad code in the webpage, triggers.  Lastly, when someone
messes with making a little table in their email, a few fonts.  And most
importantly, when they modify (rogue unneeded nested tags remain).  

That will all trigger Tidy, and wrongfully so, as that's not really spam.


Methods:
-  Check for bad tags <viagra> and count them.
-  Check for wacky paramters to tags.  Count them
-  Note the incorrect use of quotes for tags
-  Unclosed tags

I'm going to attach a sample spam that shows this pretty well.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to