Had a spate of unwanted emails, apparently empty body, very short message
ids.

Built a set of rules:

header    RM_hm_ShortMsgid06     Message-ID =~ /^.{1,6}$/
describe  RM_hm_ShortMsgid06     Message ID is too short to be valid. Possible 
spam/virus sign
score     RM_hm_ShortMsgid06     0.800  #
header    RM_hm_ShortMsgid07     Message-ID =~ /^.{1,7}$/
describe  RM_hm_ShortMsgid07     Message ID is too short to be valid. Possible 
spam/virus sign
score     RM_hm_ShortMsgid07     0.800  #
header    RM_hm_ShortMsgid08     Message-ID =~ /^.{1,8}$/
describe  RM_hm_ShortMsgid08     Message ID is too short to be valid. Possible 
spam/virus sign
score     RM_hm_ShortMsgid08     0.800  #

etc.

Results:

Section 3 -- Frequencies Log
(First numeric frequencies, followed by percentage frequencies)

OVERALL     SPAM      HAM     S/O   SCORE  NAME
  97268    79437    17831    0.817   0.00    0.00  (all messages)
     27       27        0    1.000   1.00   1.80  RM_hm_ShortMsgid13
     24       24        0    1.000   1.00   1.80  RM_hm_ShortMsgid12
     21       21        0    1.000   1.00   1.80  RM_hm_ShortMsgid11
     16       16        0    1.000   1.00   1.80  RM_hm_ShortMsgid10
     13       13        0    1.000   1.00   1.80  RM_hm_ShortMsgid09
     12       12        0    1.000   1.00   1.80  RM_hm_ShortMsgid08
     11       11        0    1.000   1.00   1.80  RM_hm_ShortMsgid07
     10       10        0    1.000   1.00   1.80  RM_hm_ShortMsgid06
    139      125       14    0.667   0.24   1.80  RM_hm_ShortMsgid18
    108       94       14    0.601   0.15   1.80  RM_hm_ShortMsgid17
     89       75       14    0.546   0.09   1.80  RM_hm_ShortMsgid16
     51       41       10    0.479   0.04   1.80  RM_hm_ShortMsgid15
     43       33       10    0.426   0.00   1.80  RM_hm_ShortMsgid14

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
  97268    79437    17831    0.817   0.00    0.00  (all messages)
100.000  81.6682  18.3318    0.817   0.00    0.00  (all messages as %)
  0.028   0.0340   0.0000    1.000   1.00    1.80  RM_hm_ShortMsgid13
  0.025   0.0302   0.0000    1.000   1.00    1.80  RM_hm_ShortMsgid12
  0.022   0.0264   0.0000    1.000   1.00    1.80  RM_hm_ShortMsgid11
  0.016   0.0201   0.0000    1.000   1.00    1.80  RM_hm_ShortMsgid10
  0.013   0.0164   0.0000    1.000   1.00    1.80  RM_hm_ShortMsgid09
  0.012   0.0151   0.0000    1.000   1.00    1.80  RM_hm_ShortMsgid08
  0.011   0.0138   0.0000    1.000   1.00    1.80  RM_hm_ShortMsgid07
  0.010   0.0126   0.0000    1.000   1.00    1.80  RM_hm_ShortMsgid06
  0.143   0.1574   0.0785    0.667   0.24    1.80  RM_hm_ShortMsgid18
  0.111   0.1183   0.0785    0.601   0.15    1.80  RM_hm_ShortMsgid17
  0.091   0.0944   0.0785    0.546   0.09    1.80  RM_hm_ShortMsgid16
  0.052   0.0516   0.0561    0.479   0.04    1.80  RM_hm_ShortMsgid15
  0.044   0.0415   0.0561    0.426   0.00    1.80  RM_hm_ShortMsgid14

Shortest message id in ham was 14 chars long. Leave one character of
elbow room, and I'm adding the following rule to my custom rule set:

header    RM_hm_ShortMsgid12     Message-ID =~ /^.{1,12}$/
describe  RM_hm_ShortMsgid12     Message ID is too short to be valid. Possible 
spam/virus sign
score     RM_hm_ShortMsgid12     1.800  # 24s/0h of 97268 corpus (79437s/17831h) 
01/29/04

Bob Menschel






-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to