On Mon, Nov 10, 2008 at 6:29 AM, Michael Scheidell <[EMAIL PROTECTED]> wrote:
> looks like spammers are using <style> (some random text from books) </style>
> to try to poison baysian
>
> seems text inside of <style></> doesn't show up on the page, and if they
> wrap their spam in style tags, they can hide the background noise.
>
> I also noticed that valid emails won't have style tags AFTER THE </head> or
> after the <body> tag, is this right?
>
> would tests that checked for <body>.*<style> inside html portion be a valid
> test of spam?

http://www.w3schools.com/TAGS/tag_style.asp

Supposedly <style> should only occur inside the <head> section. If
everyone obeys the HTML standards to the letter, yes, that would be a
valid test... of course, in that fairy-tale land, it would also be a
test that hits no mail at all, spam or otherwise.

You'll probably just have to test and find out. Realistically I doubt
any email programs craft legitimate messages this way, so it's
probably safe. If you've got a corpus of ham/spam, you could run a
grep on them to get a rough idea.

On the other hand, I don't see offhand why the spammers just couldn't
put it in the head, then wrap the junk text in a <div
class="myhiddentextstyle">. Sounds to me almost like you've got some
lazy-coding spammers :).

Jake

Reply via email to