> From: Charles Gregory [mailto:[EMAIL PROTECTED]
> On Thu, 19 Feb 2004, Berend De Schouwer wrote:
> > rawbody MK_HTML_TINY /\<font style=[a-z\-]*size:1px\>/i
> > describe MK_HTML_TINY Unreadably small font.
> > score MK_HTML_TINY 2.0
>
> There are actually 0px and 0/1pt versions of this trick. And
> is not always
> in a 'font'. My rule has evolved to:
> rawbody LOC_HTMLINVISTEXTZERO /style="[^>"]*font-size: *[01]p[tx]/i
> describe LOC_HTMLINVISTEXTZERO invisible text - zero point
> score LOC_HTMLINVISTEXTZERO 1.8
>
> I notice you don't use the quote ("). I also note that the
> spam you quote
> is just using 'size: 1;' without the pt/px.
>
> The game goes on.....
>
> - Charles
Indeed it does. There are other ways to make invisible text that do not get
caught by this rule - consider using single-quotes instead of double-quotes
<div style='font-size: 1px'>
or using an inline style block
<style>
#bayespoison { font-size:
1px;
}
</style>
<div id="bayespoison">
blah blah
</div>
or using other CSS elements like visibility: hidden or display: none etc.
It's almost enough to consider using some kind of optical character
recognition. Who's up for bundling an open-source browser with
spamassassin, sending HTML email through the browser, bundling an
open-source print-to-file mechanism to change the rendered HTML to an image,
then passing the image through an open-source optical character recognition
system? Then some kind of analysis can be performed on "only the visible
words."
Matthew van Eerde
Software Engineer
Hispanic Business Inc.
HireDiversity.com
805.964.4554 x902
[EMAIL PROTECTED]
http://www.hispanicbusiness.com
http://www.hirediversity.com