> From: Charles Gregory [mailto:[EMAIL PROTECTED]
> On Thu, 19 Feb 2004, Berend De Schouwer wrote:
> > rawbody         MK_HTML_TINY    /\<font style=[a-z\-]*size:1px\>/i
> > describe        MK_HTML_TINY    Unreadably small font.
> > score           MK_HTML_TINY    2.0
> 
> There are actually 0px and 0/1pt versions of this trick. And 
> is not always
> in a 'font'. My rule has evolved to:
> rawbody LOC_HTMLINVISTEXTZERO  /style="[^>"]*font-size: *[01]p[tx]/i
> describe LOC_HTMLINVISTEXTZERO invisible text - zero point
> score LOC_HTMLINVISTEXTZERO    1.8
> 
> I notice you don't use the quote ("). I also note that the 
> spam you quote
> is just using 'size: 1;' without the pt/px.
> 
> The game goes on.....
> 
> - Charles


Indeed it does.  There are other ways to make invisible text that do not get
caught by this rule - consider using single-quotes instead of double-quotes
<div style='font-size: 1px'>

or using an inline style block
<style>
#bayespoison { font-size:
1px;
}
</style>
<div id="bayespoison">
blah blah
</div>

or using other CSS elements like visibility: hidden or display: none etc.

It's almost enough to consider using some kind of optical character
recognition.  Who's up for bundling an open-source browser with
spamassassin, sending HTML email through the browser, bundling an
open-source print-to-file mechanism to change the rendered HTML to an image,
then passing the image through an open-source optical character recognition
system?  Then some kind of analysis can be performed on "only the visible
words."

Matthew van Eerde
Software Engineer
Hispanic Business Inc.
HireDiversity.com
805.964.4554 x902
[EMAIL PROTECTED]
http://www.hispanicbusiness.com
http://www.hirediversity.com

Reply via email to