Scott Rothgaber wrote to [EMAIL PROTECTED]:
> SA caught this one with the new Bayes poison rule but it missed the tiny
> font. I took a peek at 20_html_tests.cf but I'm Perl-impaired. :( Can
> anyone suggest a way to catch this:
>
> <font style=3Dfont-size:1px>
Assuming you're running 2.63, I don't believe there *is* any kind of
tiny font rule, unless I'm really missing something. HTML_FONT_BIG is an
eval rule ultimately handled by HTML.pm, and it's quite simple:
--- HTML.pm Sat Jan 17 17:56:07 2004
+++ /staff/ryan/HTML.pm Thu Jun 3 21:07:29 2004
@@ -383,6 +383,10 @@
if ($tag eq "font" && exists $attr->{size}) {
$self->{html}{big_font} = 1 if (($attr->{size} =~ /^\s*(\d+)/ && $1 > 3) ||
($attr->{size} =~ /\+(\d+)/ && $1 >= 1));
+ # Absolutely untested
+ $self->{html}{tiny_font} = 1 if (($attr->{size} =~ /^\s*(\d+)/ && $1 <= 1)
||
+ ($attr->{size} =~ /\-(\d+)/ && $1 >= 1) || # -1 or less
+ ($attr->{size} =~ /^\s*(\d+)\s*px/ && $1 <= 5)); # 5px or smaller
}
if ($tag eq "font" && exists $attr->{color}) {
my $bg = $self->{bgcolor_color}[-1];
Then, you'd need to define a new rule like so:
body HTML_FONT_TINY eval:html_test('tiny_font')
# score HTML_FONT_TINY 0.001 # ...responsibly
Unfortunately, I'm taking a break from frantically packing for a
much-needed vacation, and my head is fuzzy, so the above may well start
the next world war on compilation... but it may indeed be worthy of a
little fine tuning (i.e., what you consider "tiny" enough to be a spam
indicator) and corpus testing.
Good luck,
- Ryan
--
Ryan Thompson <[EMAIL PROTECTED]>
SaskNow Technologies - http://www.sasknow.com
901-1st Avenue North - Saskatoon, SK - S7K 1Y4
Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon
Toll-Free: 877-727-5669 (877-SASKNOW) North America