Re[2]: Truncated Messages and X-Mesage-Info: contents

Robert Menschel 22 May 2004 03:20:54 -0000

Hello Justin,

Friday, May 21, 2004, 1:32:51 AM, you wrote:


>> header    RM_hex_MessageInfo     exists:X-Message-Info
>> describe  RM_hex_MessageInfo     X-Message-Info header found
>> score     RM_hex_MessageInfo     4.000  # type=spamp
>> #stype    RM_hex_MessageInfo     spamp
>> #counts   RM_hex_MessageInfo     1392s/0h of 115937 corpus (94614s/21323h) 
>> 04/29/04

JM> hey Robert -- what does the "spamp" mean?

My home-grown mass-check script (which calls masses/mass-check and
hit-frequencies) not only gives me the #counts line above, but also
recommends scores based on a series of algorithms.

I indicate which algorithm should apply to any given rule in my special
#stype line above.  (The "# type=keyword" on the score line is an older
version of the same thing.)

The default algorithm is my "spam" rule, which starts at a very minimal
score for a single spam, and grows slowly to 1/3 of required-hits at
200s/0h, 400s/1h, 600s/2h, etc. The great majority of my custom rules
fall into this category.

My "spamp" rule (probable spam) is used for email characteristics which
very strongly suggest spam, such as the header above which is not used by
any non-spam email client. Another example:
header    RM_ft_KS5601             From:raw =~ /\=\?ks_c_5601\-1987\?/i
describe  RM_ft_KS5601             From header specifies display in Korean?, 
unnecessary unless spam hides subject
score     RM_ft_KS5601             1.000  # type=spamp 
#stype    RM_ft_KS5601             spamp 
#counts   RM_ft_KS5601             9s/0h of 125163 corpus (104972s/20191h) 
03/28/04
This rule scores RH/9 for 1-9 spam, 2*RH/9 for 10-99 spam, 3*RH/9 for
100-999 spam, etc.

My "spamg" rule (guaranteed spam) is used for BigEvil type rules, where
I'm very confident that they won't match ham. Example:
header    RM_hr_carat            Received =~ /\^/
describe  RM_hr_carat            Received header has apparently invalid 
character
score     RM_hr_carat            3.000  # type=spamg
#stype    RM_hr_carat            spamg
#counts   RM_hr_carat            8s/0h of 96854 corpus (75458s/21396h) 05/03/04
#hist     RM_hr_carat            Created by Bob Menschel May 3 2004
Scoring for these rules starts at RH/3 and goes up from there (provided
no ham hits).

I'm expecting/hoping much of this will go away when I'm able to migrate
to 3.0 and use the new perceptron methods for scoring my rules.

Bob Menschel

Re[2]: Truncated Messages and X-Mesage-Info: contents

Reply via email to