http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From [EMAIL PROTECTED]  2004-03-13 08:50 -------
Thanks to Daniel Quinlan and Theo Van Dinter for quickly trying out
the variations and posting the data yesterday.

One minor suggestion, remove the unnecessary \S* from each regexp:

 $self->{html_text}[-1] =~ /[A-Za-z]\S*\z/ && $text =~ /^\S*[A-Za-z]/

Also, the "$" in the backhair regexp might also be changed to a \z.

Below is the data from last night's run
(http://www.pathname.com/~corpus/HTML.1day). (The accuracy is harder
to see in the DETAILS data, which includes html and non-html messages,
because HTML messages are more likely spam.)

Here are the additional cases that Daniel Quinlan added last night:

    if ($self->{html_text}[-1] =~ m{[^\s\(\)\<\>\[\]\$\,\"\;\/\#]\z}s &&
        $text =~ m{^[^\s\(\)\<\>\[\]\$\,\"\;\/\#]}s)
    {
      $self->{html}{t_obfuscation4}++;
    }
    if ($self->{html_text}[-1] =~ 
/[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]\z/s &&
        $text =~ /^[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]/s)
    {
      $self->{html}{t_obfuscation5}++;
    }

(For obfuscation4, a "." may have been left out of the 
  [^\s\(\)\<\>\[\]\$\,\"\;\/\#] )


OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
  63467    60908     2559    0.960   0.00    0.00  (all messages)
100.000  95.9680   4.0320    0.960   0.00    0.00  (all messages as %)

  2.929   2.9241   3.0481    0.490   0.11    0.01  00_05_T_HTML_OBFUSCATE5
  6.567   6.3309  12.1923    0.342   0.04    0.01  00_05_T_HTML_OBFUSCATE4
  3.547   3.4166   6.6432    0.340   0.04    0.01  00_05_T_HTML_OBFUSCATE3
  3.107   2.9914   5.8617    0.338   0.04    0.01  00_05_T_HTML_OBFUSCATE2
  7.777   7.0845  24.2673    0.226   0.02    0.01  00_05_T_HTML_OBFUSCATE
  7.429   6.7282  24.1110    0.218   0.02    0.01  00_05_T_HTML_OBFUSCATE1

 23.675  23.1677  35.7562    0.393   0.09    1.00  00_10_HTML_OBFUSCATE
  3.561   3.5759   3.2044    0.527   0.14    0.01  00_10_T_HTML_OBFUSCATE5
  4.012   3.9305   5.9398    0.398   0.06    0.01  00_10_T_HTML_OBFUSCATE2
  7.796   7.5983  12.5049    0.378   0.06    0.01  00_10_T_HTML_OBFUSCATE4
  4.462   4.3541   7.0340    0.382   0.06    0.01  00_10_T_HTML_OBFUSCATE3
  8.874   8.1582  25.9086    0.239   0.02    0.01  00_10_T_HTML_OBFUSCATE1

  0.904   0.9391   0.0782    0.923   0.71    0.01  05_10_T_HTML_OBFUSCATE2
  0.632   0.6518   0.1563    0.807   0.48    0.01  05_10_T_HTML_OBFUSCATE5
  1.229   1.2675   0.3126    0.802   0.47    0.01  05_10_T_HTML_OBFUSCATE4
  0.915   0.9375   0.3908    0.706   0.32    0.01  05_10_T_HTML_OBFUSCATE3
  1.445   1.4300   1.7976    0.443   0.08    0.01  05_10_T_HTML_OBFUSCATE1
  1.492   1.4744   1.9148    0.435   0.08    0.01  05_10_T_HTML_OBFUSCATE

  5.311   5.5034   0.7425    0.881   0.63    1.00  10_20_HTML_OBFUSCATE
  1.788   1.8635   0.0000    1.000   0.91    0.01  10_20_T_HTML_OBFUSCATE3
  1.760   1.8339   0.0000    1.000   0.91    0.01  10_20_T_HTML_OBFUSCATE2
  1.803   1.8766   0.0391    0.980   0.85    0.01  10_20_T_HTML_OBFUSCATE5
  1.910   1.9833   0.1563    0.927   0.72    0.01  10_20_T_HTML_OBFUSCATE4
  1.963   2.0342   0.2735    0.881   0.62    0.01  10_20_T_HTML_OBFUSCATE1

  3.657   3.8090   0.0391    0.990   0.88    1.00  20_30_HTML_OBFUSCATE
  1.654   1.7239   0.0000    1.000   0.91    0.01  20_30_T_HTML_OBFUSCATE2
  1.541   1.6057   0.0000    1.000   0.91    0.01  20_30_T_HTML_OBFUSCATE3
  1.538   1.6024   0.0000    1.000   0.91    0.01  20_30_T_HTML_OBFUSCATE4
  1.525   1.5893   0.0000    1.000   0.91    0.01  20_30_T_HTML_OBFUSCATE5
  1.480   1.5384   0.0782    0.952   0.78    0.01  20_30_T_HTML_OBFUSCATE1

  4.492   4.6792   0.0391    0.992   0.89    1.00  30_40_HTML_OBFUSCATE
  2.184   2.2756   0.0000    1.000   0.91    0.01  30_40_T_HTML_OBFUSCATE4
  2.138   2.2280   0.0000    1.000   0.91    0.01  30_40_T_HTML_OBFUSCATE5
  2.074   2.1606   0.0000    1.000   0.91    0.01  30_40_T_HTML_OBFUSCATE2
  2.069   2.1557   0.0000    1.000   0.91    0.01  30_40_T_HTML_OBFUSCATE3
  2.058   2.1442   0.0000    1.000   0.91    0.01  30_40_T_HTML_OBFUSCATE1




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to