Tuesday, February 10, 2004, 4:50:09 PM, Keith C. Ivey <[EMAIL PROTECTED]> wrote:
KCI> Robert Menschel <[EMAIL PROTECTED]> wrote:
>> I use:
>> header RM_tz_TooMany ToCc =~
>>
>> /,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,/
>> describe RM_tz_TooMany List of recipients seems to exceed 20
>> score RM_tz_TooMany 0.342 # 161s/46h of 97268 corpus
>> (79437s/17831h) 01/24/04
>> # max: 456 spam, 118 ham, Sep 5 2003
KCI> The regex is better written as
KCI> /(?:,.*){24},/
KCI> It might be more efficient to use something
KCI> like this instead:
KCI> /(?:[^,]+,){24}/
KCI> (excluding commas from the stretches between the commas means
KCI> less backtracking when attempting to match).
I've finally gotten back to working on this rule, and used Keith's
last suggestion to create a multitude of TooMany rules. Results running
them against my corpus are below.
Analysis: It doesn't matter how many To's and Cc's are too many from
our/my point of view -- any test with 9 or more To's or Cc's results in
an S/O worse than random, and any test with 20 or more hits a higher
percentage of my ham than it does my spam.
YMMV, but I'm dropping this rule from my collection.
Bob Menschel
(First numeric frequencies, followed by percentage frequencies)
OVERALL SPAM HAM S/O SCORE NAME
100689 81249 19440 0.807 0.00 0.00 (all messages)
1188 1026 162 0.602 1.00 0.00 RM_tz_TooMany09
994 843 151 0.572 0.85 0.00 RM_tz_TooMany10
881 741 140 0.559 0.79 0.00 RM_tz_TooMany11
777 649 128 0.548 0.75 0.00 RM_tz_TooMany12
712 594 118 0.546 0.74 0.00 RM_tz_TooMany13
638 529 109 0.537 0.70 0.00 RM_tz_TooMany14
561 464 97 0.534 0.69 0.00 RM_tz_TooMany15
429 354 75 0.530 0.68 0.00 RM_tz_TooMany17
476 391 85 0.524 0.65 0.00 RM_tz_TooMany16
400 327 73 0.517 0.63 0.00 RM_tz_TooMany18
356 291 65 0.517 0.63 0.00 RM_tz_TooMany19
302 241 61 0.486 0.52 0.00 RM_tz_TooMany20
248 192 56 0.451 0.41 0.00 RM_tz_TooMany21
219 167 52 0.435 0.37 0.00 RM_tz_TooMany22
200 151 49 0.424 0.34 0.00 RM_tz_TooMany23
186 137 49 0.401 0.29 0.27 RM_tz_TooMany24
162 116 46 0.376 0.23 0.25 RM_tz_TooMany25
146 101 45 0.349 0.19 0.22 RM_tz_TooMany26
133 88 45 0.319 0.14 0.19 RM_tz_TooMany27
124 81 43 0.311 0.13 0.18 RM_tz_TooMany28
106 66 40 0.283 0.09 0.00 RM_tz_TooMany29
102 62 40 0.271 0.08 0.00 RM_tz_TooMany30
89 51 38 0.243 0.05 0.00 RM_tz_TooMany31
80 45 35 0.235 0.05 0.00 RM_tz_TooMany32
72 39 33 0.220 0.04 0.00 RM_tz_TooMany33
69 36 33 0.207 0.03 0.00 RM_tz_TooMany34
67 34 33 0.198 0.02 0.00 RM_tz_TooMany35
52 26 26 0.193 0.02 0.00 RM_tz_TooMany39
55 27 28 0.187 0.02 0.00 RM_tz_TooMany38
51 25 26 0.187 0.02 0.00 RM_tz_TooMany40
58 28 30 0.183 0.02 0.00 RM_tz_TooMany37
46 22 24 0.180 0.02 0.00 RM_tz_TooMany42
61 29 32 0.178 0.01 0.00 RM_tz_TooMany36
49 23 26 0.175 0.01 0.00 RM_tz_TooMany41
42 19 23 0.165 0.01 0.00 RM_tz_TooMany44
42 19 23 0.165 0.01 0.00 RM_tz_TooMany43
39 17 22 0.156 0.01 0.00 RM_tz_TooMany45
37 16 21 0.154 0.01 0.00 RM_tz_TooMany46
35 15 20 0.152 0.00 0.00 RM_tz_TooMany48
35 15 20 0.152 0.00 0.00 RM_tz_TooMany47
35 15 20 0.152 0.00 0.00 RM_tz_TooMany49
33 13 20 0.135 0.00 0.00 RM_tz_TooMany50
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
100689 81249 19440 0.807 0.00 0.00 (all messages)
100.000 80.6930 19.3070 0.807 0.00 0.00 (all messages as %)
1.180 1.2628 0.8333 0.602 1.00 0.00 RM_tz_TooMany09
0.987 1.0376 0.7767 0.572 0.85 0.00 RM_tz_TooMany10
0.875 0.9120 0.7202 0.559 0.79 0.00 RM_tz_TooMany11
0.772 0.7988 0.6584 0.548 0.75 0.00 RM_tz_TooMany12
0.707 0.7311 0.6070 0.546 0.74 0.00 RM_tz_TooMany13
0.634 0.6511 0.5607 0.537 0.70 0.00 RM_tz_TooMany14
0.557 0.5711 0.4990 0.534 0.69 0.00 RM_tz_TooMany15
0.426 0.4357 0.3858 0.530 0.68 0.00 RM_tz_TooMany17
0.473 0.4812 0.4372 0.524 0.65 0.00 RM_tz_TooMany16
0.397 0.4025 0.3755 0.517 0.63 0.00 RM_tz_TooMany18
0.354 0.3582 0.3344 0.517 0.63 0.00 RM_tz_TooMany19
0.300 0.2966 0.3138 0.486 0.52 0.00 RM_tz_TooMany20
0.246 0.2363 0.2881 0.451 0.41 0.00 RM_tz_TooMany21
0.218 0.2055 0.2675 0.435 0.37 0.00 RM_tz_TooMany22
0.199 0.1858 0.2521 0.424 0.34 0.00 RM_tz_TooMany23
0.185 0.1686 0.2521 0.401 0.29 0.27 RM_tz_TooMany24
0.161 0.1428 0.2366 0.376 0.23 0.25 RM_tz_TooMany25
0.145 0.1243 0.2315 0.349 0.19 0.22 RM_tz_TooMany26
0.132 0.1083 0.2315 0.319 0.14 0.19 RM_tz_TooMany27
0.123 0.0997 0.2212 0.311 0.13 0.18 RM_tz_TooMany28
0.105 0.0812 0.2058 0.283 0.09 0.00 RM_tz_TooMany29
0.101 0.0763 0.2058 0.271 0.08 0.00 RM_tz_TooMany30
0.088 0.0628 0.1955 0.243 0.05 0.00 RM_tz_TooMany31
0.079 0.0554 0.1800 0.235 0.05 0.00 RM_tz_TooMany32
0.072 0.0480 0.1698 0.220 0.04 0.00 RM_tz_TooMany33
0.069 0.0443 0.1698 0.207 0.03 0.00 RM_tz_TooMany34
0.067 0.0418 0.1698 0.198 0.02 0.00 RM_tz_TooMany35
0.052 0.0320 0.1337 0.193 0.02 0.00 RM_tz_TooMany39
0.055 0.0332 0.1440 0.187 0.02 0.00 RM_tz_TooMany38
0.051 0.0308 0.1337 0.187 0.02 0.00 RM_tz_TooMany40
0.058 0.0345 0.1543 0.183 0.02 0.00 RM_tz_TooMany37
0.046 0.0271 0.1235 0.180 0.02 0.00 RM_tz_TooMany42
0.061 0.0357 0.1646 0.178 0.01 0.00 RM_tz_TooMany36
0.049 0.0283 0.1337 0.175 0.01 0.00 RM_tz_TooMany41
0.042 0.0234 0.1183 0.165 0.01 0.00 RM_tz_TooMany44
0.042 0.0234 0.1183 0.165 0.01 0.00 RM_tz_TooMany43
0.039 0.0209 0.1132 0.156 0.01 0.00 RM_tz_TooMany45
0.037 0.0197 0.1080 0.154 0.01 0.00 RM_tz_TooMany46
0.035 0.0185 0.1029 0.152 0.00 0.00 RM_tz_TooMany48
0.035 0.0185 0.1029 0.152 0.00 0.00 RM_tz_TooMany47
0.035 0.0185 0.1029 0.152 0.00 0.00 RM_tz_TooMany49
0.033 0.0160 0.1029 0.135 0.00 0.00 RM_tz_TooMany50