Tuesday, February 10, 2004, 4:50:09 PM, Keith C. Ivey <[EMAIL PROTECTED]> wrote:

KCI> Robert Menschel <[EMAIL PROTECTED]> wrote:

>> I use:
>> header    RM_tz_TooMany           ToCc =~
>>     
>> /,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,/
>> describe  RM_tz_TooMany           List of recipients seems to exceed 20
>> score     RM_tz_TooMany           0.342  # 161s/46h of 97268 corpus 
>> (79437s/17831h) 01/24/04
>>                                          # max: 456 spam, 118 ham, Sep 5 2003

KCI> The regex is better written as 
KCI>    /(?:,.*){24},/

KCI> It might be more efficient to use something
KCI> like this instead:
KCI>    /(?:[^,]+,){24}/
KCI> (excluding commas from the stretches between the commas means
KCI> less backtracking when attempting to match).

I've finally gotten back to working on this rule, and used Keith's
last suggestion to create a multitude of TooMany rules.  Results running
them against my corpus are below.

Analysis: It doesn't matter how many To's and Cc's are too many from
our/my point of view -- any test with 9 or more To's or Cc's results in
an S/O worse than random, and any test with 20 or more hits a higher
percentage of my ham than it does my spam.

YMMV, but I'm dropping this rule from my collection.

Bob Menschel

(First numeric frequencies, followed by percentage frequencies)

OVERALL     SPAM      HAM     S/O          SCORE  NAME
 100689    81249    19440    0.807   0.00   0.00  (all messages)
   1188     1026      162    0.602   1.00   0.00  RM_tz_TooMany09
    994      843      151    0.572   0.85   0.00  RM_tz_TooMany10
    881      741      140    0.559   0.79   0.00  RM_tz_TooMany11
    777      649      128    0.548   0.75   0.00  RM_tz_TooMany12
    712      594      118    0.546   0.74   0.00  RM_tz_TooMany13
    638      529      109    0.537   0.70   0.00  RM_tz_TooMany14
    561      464       97    0.534   0.69   0.00  RM_tz_TooMany15
    429      354       75    0.530   0.68   0.00  RM_tz_TooMany17
    476      391       85    0.524   0.65   0.00  RM_tz_TooMany16
    400      327       73    0.517   0.63   0.00  RM_tz_TooMany18
    356      291       65    0.517   0.63   0.00  RM_tz_TooMany19
    302      241       61    0.486   0.52   0.00  RM_tz_TooMany20
    248      192       56    0.451   0.41   0.00  RM_tz_TooMany21
    219      167       52    0.435   0.37   0.00  RM_tz_TooMany22
    200      151       49    0.424   0.34   0.00  RM_tz_TooMany23
    186      137       49    0.401   0.29   0.27  RM_tz_TooMany24
    162      116       46    0.376   0.23   0.25  RM_tz_TooMany25
    146      101       45    0.349   0.19   0.22  RM_tz_TooMany26
    133       88       45    0.319   0.14   0.19  RM_tz_TooMany27
    124       81       43    0.311   0.13   0.18  RM_tz_TooMany28
    106       66       40    0.283   0.09   0.00  RM_tz_TooMany29
    102       62       40    0.271   0.08   0.00  RM_tz_TooMany30
     89       51       38    0.243   0.05   0.00  RM_tz_TooMany31
     80       45       35    0.235   0.05   0.00  RM_tz_TooMany32
     72       39       33    0.220   0.04   0.00  RM_tz_TooMany33
     69       36       33    0.207   0.03   0.00  RM_tz_TooMany34
     67       34       33    0.198   0.02   0.00  RM_tz_TooMany35
     52       26       26    0.193   0.02   0.00  RM_tz_TooMany39
     55       27       28    0.187   0.02   0.00  RM_tz_TooMany38
     51       25       26    0.187   0.02   0.00  RM_tz_TooMany40
     58       28       30    0.183   0.02   0.00  RM_tz_TooMany37
     46       22       24    0.180   0.02   0.00  RM_tz_TooMany42
     61       29       32    0.178   0.01   0.00  RM_tz_TooMany36
     49       23       26    0.175   0.01   0.00  RM_tz_TooMany41
     42       19       23    0.165   0.01   0.00  RM_tz_TooMany44
     42       19       23    0.165   0.01   0.00  RM_tz_TooMany43
     39       17       22    0.156   0.01   0.00  RM_tz_TooMany45
     37       16       21    0.154   0.01   0.00  RM_tz_TooMany46
     35       15       20    0.152   0.00   0.00  RM_tz_TooMany48
     35       15       20    0.152   0.00   0.00  RM_tz_TooMany47
     35       15       20    0.152   0.00   0.00  RM_tz_TooMany49
     33       13       20    0.135   0.00   0.00  RM_tz_TooMany50

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
 100689    81249    19440    0.807   0.00    0.00  (all messages)
100.000  80.6930  19.3070    0.807   0.00    0.00  (all messages as %)
  1.180   1.2628   0.8333    0.602   1.00    0.00  RM_tz_TooMany09
  0.987   1.0376   0.7767    0.572   0.85    0.00  RM_tz_TooMany10
  0.875   0.9120   0.7202    0.559   0.79    0.00  RM_tz_TooMany11
  0.772   0.7988   0.6584    0.548   0.75    0.00  RM_tz_TooMany12
  0.707   0.7311   0.6070    0.546   0.74    0.00  RM_tz_TooMany13
  0.634   0.6511   0.5607    0.537   0.70    0.00  RM_tz_TooMany14
  0.557   0.5711   0.4990    0.534   0.69    0.00  RM_tz_TooMany15
  0.426   0.4357   0.3858    0.530   0.68    0.00  RM_tz_TooMany17
  0.473   0.4812   0.4372    0.524   0.65    0.00  RM_tz_TooMany16
  0.397   0.4025   0.3755    0.517   0.63    0.00  RM_tz_TooMany18
  0.354   0.3582   0.3344    0.517   0.63    0.00  RM_tz_TooMany19
  0.300   0.2966   0.3138    0.486   0.52    0.00  RM_tz_TooMany20
  0.246   0.2363   0.2881    0.451   0.41    0.00  RM_tz_TooMany21
  0.218   0.2055   0.2675    0.435   0.37    0.00  RM_tz_TooMany22
  0.199   0.1858   0.2521    0.424   0.34    0.00  RM_tz_TooMany23
  0.185   0.1686   0.2521    0.401   0.29    0.27  RM_tz_TooMany24
  0.161   0.1428   0.2366    0.376   0.23    0.25  RM_tz_TooMany25
  0.145   0.1243   0.2315    0.349   0.19    0.22  RM_tz_TooMany26
  0.132   0.1083   0.2315    0.319   0.14    0.19  RM_tz_TooMany27
  0.123   0.0997   0.2212    0.311   0.13    0.18  RM_tz_TooMany28
  0.105   0.0812   0.2058    0.283   0.09    0.00  RM_tz_TooMany29
  0.101   0.0763   0.2058    0.271   0.08    0.00  RM_tz_TooMany30
  0.088   0.0628   0.1955    0.243   0.05    0.00  RM_tz_TooMany31
  0.079   0.0554   0.1800    0.235   0.05    0.00  RM_tz_TooMany32
  0.072   0.0480   0.1698    0.220   0.04    0.00  RM_tz_TooMany33
  0.069   0.0443   0.1698    0.207   0.03    0.00  RM_tz_TooMany34
  0.067   0.0418   0.1698    0.198   0.02    0.00  RM_tz_TooMany35
  0.052   0.0320   0.1337    0.193   0.02    0.00  RM_tz_TooMany39
  0.055   0.0332   0.1440    0.187   0.02    0.00  RM_tz_TooMany38
  0.051   0.0308   0.1337    0.187   0.02    0.00  RM_tz_TooMany40
  0.058   0.0345   0.1543    0.183   0.02    0.00  RM_tz_TooMany37
  0.046   0.0271   0.1235    0.180   0.02    0.00  RM_tz_TooMany42
  0.061   0.0357   0.1646    0.178   0.01    0.00  RM_tz_TooMany36
  0.049   0.0283   0.1337    0.175   0.01    0.00  RM_tz_TooMany41
  0.042   0.0234   0.1183    0.165   0.01    0.00  RM_tz_TooMany44
  0.042   0.0234   0.1183    0.165   0.01    0.00  RM_tz_TooMany43
  0.039   0.0209   0.1132    0.156   0.01    0.00  RM_tz_TooMany45
  0.037   0.0197   0.1080    0.154   0.01    0.00  RM_tz_TooMany46
  0.035   0.0185   0.1029    0.152   0.00    0.00  RM_tz_TooMany48
  0.035   0.0185   0.1029    0.152   0.00    0.00  RM_tz_TooMany47
  0.035   0.0185   0.1029    0.152   0.00    0.00  RM_tz_TooMany49
  0.033   0.0160   0.1029    0.135   0.00    0.00  RM_tz_TooMany50



Reply via email to