http://bugzilla.spamassassin.org/show_bug.cgi?id=3163
------- Additional Comments From [EMAIL PROTECTED] 2004-03-13 08:50 -------
Thanks to Daniel Quinlan and Theo Van Dinter for quickly trying out
the variations and posting the data yesterday.
One minor suggestion, remove the unnecessary \S* from each regexp:
$self->{html_text}[-1] =~ /[A-Za-z]\S*\z/ && $text =~ /^\S*[A-Za-z]/
Also, the "$" in the backhair regexp might also be changed to a \z.
Below is the data from last night's run
(http://www.pathname.com/~corpus/HTML.1day). (The accuracy is harder
to see in the DETAILS data, which includes html and non-html messages,
because HTML messages are more likely spam.)
Here are the additional cases that Daniel Quinlan added last night:
if ($self->{html_text}[-1] =~ m{[^\s\(\)\<\>\[\]\$\,\"\;\/\#]\z}s &&
$text =~ m{^[^\s\(\)\<\>\[\]\$\,\"\;\/\#]}s)
{
$self->{html}{t_obfuscation4}++;
}
if ($self->{html_text}[-1] =~
/[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]\z/s &&
$text =~ /^[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]/s)
{
$self->{html}{t_obfuscation5}++;
}
(For obfuscation4, a "." may have been left out of the
[^\s\(\)\<\>\[\]\$\,\"\;\/\#] )
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
63467 60908 2559 0.960 0.00 0.00 (all messages)
100.000 95.9680 4.0320 0.960 0.00 0.00 (all messages as %)
2.929 2.9241 3.0481 0.490 0.11 0.01 00_05_T_HTML_OBFUSCATE5
6.567 6.3309 12.1923 0.342 0.04 0.01 00_05_T_HTML_OBFUSCATE4
3.547 3.4166 6.6432 0.340 0.04 0.01 00_05_T_HTML_OBFUSCATE3
3.107 2.9914 5.8617 0.338 0.04 0.01 00_05_T_HTML_OBFUSCATE2
7.777 7.0845 24.2673 0.226 0.02 0.01 00_05_T_HTML_OBFUSCATE
7.429 6.7282 24.1110 0.218 0.02 0.01 00_05_T_HTML_OBFUSCATE1
23.675 23.1677 35.7562 0.393 0.09 1.00 00_10_HTML_OBFUSCATE
3.561 3.5759 3.2044 0.527 0.14 0.01 00_10_T_HTML_OBFUSCATE5
4.012 3.9305 5.9398 0.398 0.06 0.01 00_10_T_HTML_OBFUSCATE2
7.796 7.5983 12.5049 0.378 0.06 0.01 00_10_T_HTML_OBFUSCATE4
4.462 4.3541 7.0340 0.382 0.06 0.01 00_10_T_HTML_OBFUSCATE3
8.874 8.1582 25.9086 0.239 0.02 0.01 00_10_T_HTML_OBFUSCATE1
0.904 0.9391 0.0782 0.923 0.71 0.01 05_10_T_HTML_OBFUSCATE2
0.632 0.6518 0.1563 0.807 0.48 0.01 05_10_T_HTML_OBFUSCATE5
1.229 1.2675 0.3126 0.802 0.47 0.01 05_10_T_HTML_OBFUSCATE4
0.915 0.9375 0.3908 0.706 0.32 0.01 05_10_T_HTML_OBFUSCATE3
1.445 1.4300 1.7976 0.443 0.08 0.01 05_10_T_HTML_OBFUSCATE1
1.492 1.4744 1.9148 0.435 0.08 0.01 05_10_T_HTML_OBFUSCATE
5.311 5.5034 0.7425 0.881 0.63 1.00 10_20_HTML_OBFUSCATE
1.788 1.8635 0.0000 1.000 0.91 0.01 10_20_T_HTML_OBFUSCATE3
1.760 1.8339 0.0000 1.000 0.91 0.01 10_20_T_HTML_OBFUSCATE2
1.803 1.8766 0.0391 0.980 0.85 0.01 10_20_T_HTML_OBFUSCATE5
1.910 1.9833 0.1563 0.927 0.72 0.01 10_20_T_HTML_OBFUSCATE4
1.963 2.0342 0.2735 0.881 0.62 0.01 10_20_T_HTML_OBFUSCATE1
3.657 3.8090 0.0391 0.990 0.88 1.00 20_30_HTML_OBFUSCATE
1.654 1.7239 0.0000 1.000 0.91 0.01 20_30_T_HTML_OBFUSCATE2
1.541 1.6057 0.0000 1.000 0.91 0.01 20_30_T_HTML_OBFUSCATE3
1.538 1.6024 0.0000 1.000 0.91 0.01 20_30_T_HTML_OBFUSCATE4
1.525 1.5893 0.0000 1.000 0.91 0.01 20_30_T_HTML_OBFUSCATE5
1.480 1.5384 0.0782 0.952 0.78 0.01 20_30_T_HTML_OBFUSCATE1
4.492 4.6792 0.0391 0.992 0.89 1.00 30_40_HTML_OBFUSCATE
2.184 2.2756 0.0000 1.000 0.91 0.01 30_40_T_HTML_OBFUSCATE4
2.138 2.2280 0.0000 1.000 0.91 0.01 30_40_T_HTML_OBFUSCATE5
2.074 2.1606 0.0000 1.000 0.91 0.01 30_40_T_HTML_OBFUSCATE2
2.069 2.1557 0.0000 1.000 0.91 0.01 30_40_T_HTML_OBFUSCATE3
2.058 2.1442 0.0000 1.000 0.91 0.01 30_40_T_HTML_OBFUSCATE1
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.