On Mon, Nov 10, 2008 at 6:29 AM, Michael Scheidell <[EMAIL PROTECTED]> wrote: > looks like spammers are using <style> (some random text from books) </style> > to try to poison baysian > > seems text inside of <style></> doesn't show up on the page, and if they > wrap their spam in style tags, they can hide the background noise. > > I also noticed that valid emails won't have style tags AFTER THE </head> or > after the <body> tag, is this right? > > would tests that checked for <body>.*<style> inside html portion be a valid > test of spam?
http://www.w3schools.com/TAGS/tag_style.asp Supposedly <style> should only occur inside the <head> section. If everyone obeys the HTML standards to the letter, yes, that would be a valid test... of course, in that fairy-tale land, it would also be a test that hits no mail at all, spam or otherwise. You'll probably just have to test and find out. Realistically I doubt any email programs craft legitimate messages this way, so it's probably safe. If you've got a corpus of ham/spam, you could run a grep on them to get a rough idea. On the other hand, I don't see offhand why the spammers just couldn't put it in the head, then wrap the junk text in a <div class="myhiddentextstyle">. Sounds to me almost like you've got some lazy-coding spammers :). Jake