Scott Rothgaber wrote to [EMAIL PROTECTED]:

> SA caught this one with the new Bayes poison rule but it missed the tiny
> font. I took a peek at 20_html_tests.cf but I'm Perl-impaired.  :(  Can
> anyone suggest a way to catch this:
>
> <font style=3Dfont-size:1px>

Assuming you're running 2.63, I don't believe there *is* any kind of
tiny font rule, unless I'm really missing something. HTML_FONT_BIG is an
eval rule ultimately handled by HTML.pm, and it's quite simple:

--- HTML.pm     Sat Jan 17 17:56:07 2004
+++ /staff/ryan/HTML.pm Thu Jun  3 21:07:29 2004
@@ -383,6 +383,10 @@
   if ($tag eq "font" && exists $attr->{size}) {
     $self->{html}{big_font} = 1 if (($attr->{size} =~ /^\s*(\d+)/ && $1 > 3) ||
                            ($attr->{size} =~ /\+(\d+)/ && $1 >= 1));
+    # Absolutely untested
+    $self->{html}{tiny_font} = 1 if (($attr->{size} =~ /^\s*(\d+)/ && $1 <= 1) 
||
+        ($attr->{size} =~ /\-(\d+)/ && $1 >= 1) ||       # -1 or less
+        ($attr->{size} =~ /^\s*(\d+)\s*px/ && $1 <= 5)); # 5px or smaller
   }
   if ($tag eq "font" && exists $attr->{color}) {
     my $bg = $self->{bgcolor_color}[-1];

Then, you'd need to define a new rule like so:

    body HTML_FONT_TINY         eval:html_test('tiny_font')
    # score HTML_FONT_TINY      0.001   # ...responsibly

Unfortunately, I'm taking a break from frantically packing for a
much-needed vacation, and my head is fuzzy, so the above may well start
the next world war on compilation...  but it may indeed be worthy of a
little fine tuning (i.e., what you consider "tiny" enough to be a spam
indicator) and corpus testing.

Good luck,
- Ryan

-- 
  Ryan Thompson <[EMAIL PROTECTED]>

  SaskNow Technologies - http://www.sasknow.com
  901-1st Avenue North - Saskatoon, SK - S7K 1Y4

        Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
  Toll-Free: 877-727-5669     (877-SASKNOW)     North America

Reply via email to