Hello Eric,
Thursday, February 5, 2004, 4:40:28 PM, you wrote:
EF> but the typo (below) saying "with with" is a good identifier for this
EF> particular program.
Agreed. "with with" hits 3 spam here, no ham.
Just developed this rule, which I'll be testing tonight:
header RM_hr_WithWith Received =~ / with with /
describe RM_hr_WithWith Spam identified by typo in received header
score RM_hr_WithWith 1.000 # type=spamp -
EF> The from/reply-to address made from this program is always a randomly
EF> generated username with a valid domain. The username seems to be 6 or more
EF> characters, often with few vowels. Here's a few examples:
EF> I wonder if a low/med scoring rule can be created to look for usernames of 6
EF> or more alpha only chars with large groups (4+) of back-to-back consonants?
EF> Sticking with 6 or more chars should avoid simple abbreviations like
EF> [EMAIL PROTECTED] or [EMAIL PROTECTED], but be more successful with
EF> [EMAIL PROTECTED]
I use:
header RM_fl_ConsWord6s From =~ /\b[bcghjklmnpqrtvwxz]{6,20}\b/
describe RM_fl_ConsWord6s To contains word consisting of consecutive
consonants
score RM_fl_ConsWord6s 3.000 # 460s/1h of 97268 corpus
(79437s/17831h) 01/24/04
header RM_fl_ConsWord9 From =~ /\b[bcghjklmnpqrstvwxz]{9,20}\b/
describe RM_fl_ConsWord9 From contains word consisting of consecutive
consonants
score RM_fl_ConsWord9 3.000 # type=spamp - 137s/0h of 97268 corpus
(79437s/17831h) 01/24/04
Note that the 6-consonant test has had "s" removed to cut down on ham
hits.
Bob Menschel