----- Original Message -----
From: "ian douglas" <[EMAIL PROTECTED]>
To: "Matthew Cline" <[EMAIL PROTECTED]>; "Spamassassin List"
<[EMAIL PROTECTED]>
Sent: Friday, August 01, 2003 4:24 AM
Subject: RE: [SAtalk] those pesky small v*agra ads
> > Hmmmm, maybe we should make some new rules that test the ratio
> of invisible text to visible text?
>
> But if the background is BLACK, white text is perfectly acceptable ...
> right?
>
> So defining "visible" vs "invisible" is your toughest chore.
Exactly. And that is close to impossible. Last night I enthusiastically made
three rules, which I added as .txt attachment, to avoid wrap:
1): MASKED_HTML_TEXT
This rule looks for a <body> element, with a hex bgcolor property, and
matches that against a font-color with the same value. That condition is
marked as possible spam. It will match in:
$body = '<body bgcolor = "#fffffe"> yada yada
yada <font face="two" color="#fffffe">';
2): MASKED_HTML_TEXT_1
Same as one, but looks for word-color codes (like "white"). It will match
in:
$body = '<body bgcolor = "white"> yada yada
yada <font face="two" color="white">';
3): MASKED_HTML_TEXT_2
Same as before, except looks for empty body element, and matches that with
either "white" or "#ffffff". It will match in:
$body = '<body> yada yada
yada <font face="two" color="#ffffff">
That is the good new. :) The bad news is, that the true background color, or
I should say, background appearance, is almost impossible to determine.
Consider table colors, <td> colors, etc. Not to mention that white,
stretched gif used for background color. And that is just 'old' style HTML.
:)
Hence I gave my rules a low score. But still, you might find them useful.
- Mark
full MASKED_HTML_TEXT /\<[^>]*?body
+?[^>]*?bgcolor[^>]*?(\043[a-f]{6})[^>]*?\>(.|\s)*?\<[^>]*?font
*?[^>]*?color[^>]*?(\1)[^>]*?\>/mi
describe MASKED_HTML_TEXT Masked HTML text
score MASKED_HTML_TEXT 0.5
full MASKED_HTML_TEXT_1 /\<[^>]*?body
+?[^>]*?bgcolor[^\043>]*?(\w{3,})[^\w>]*?\>(.|\s)*?\<[^>]*?font
*?[^>]*?color[^>]*?(\1)\W[^>]*?\>/mi
describe MASKED_HTML_TEXT_1 Masked HTML text
score MASKED_HTML_TEXT_1 0.5
full MASKED_HTML_TEXT_2 /\<[^>]*?body[^c>]*?\>(.|\s)*?\<[^>]*?font
*?[^>]*?color[^>]*?(white|\043ffffff)[^>]*?\>/mi
describe MASKED_HTML_TEXT_2 Masked HTML text
score MASKED_HTML_TEXT_2 0.3