Hello Robert,
Wednesday, February 11, 2004, 8:05:38 AM, you wrote:
RSS> As indicated in the subject line, I'm getting negative hit rates on spam
RSS> which uses random dictionary words. Obviously sa-learn cannot learn how
RSS> to deal with such an approach, and my formerly brilliant
RSS> sendmail/spamassassin configuration is now next to useless - as I'm
RSS> getting 200 - 300 spam's per day.
RSS> Can anyone point me to a solution or a counter-counter measure to kill
RSS> this damn spam??
1) Yes, sa-learn DOES deal with these emails, and does so exceedingly
well here. I call them "bayes fodder", since those random words are
teaching bayes that emails with those random words are spam.
2) I then augment bayes with the following rules:
# longwords -- possible sign of random words placed into spam to confuse
anti-spam software
body RM_bpt_longwords68a /\b(?:[a-z]{6,}\s+){8}/
describe RM_bpt_longwords68a Long string of long words
score RM_bpt_longwords68a 1.500 # type=FP - 7429s/2h of 91714 corpus
(74113s/17601h) 01/23/04
# ham: userid list,
# "improving compatibility between computer
platforms demands certain levels "
body RM_bpt_longwords69a /\b(?:[a-z]{6,}\s+){9}/
describe RM_bpt_longwords69a Long string of long words
score RM_bpt_longwords69a 1.000 # type=max:1 (add to 59a,68a) - 6595s/1h of
91714 corpus (74113s/17601h) 01/23/04
# ham: userid list
body RM_bpt_longwords78a /\b(?:[a-z]{7,}\s+){8}/
describe RM_bpt_longwords78a Long string of long words
score RM_bpt_longwords78a 2.000 # type=max:2 (add to 68a) - 4163s/0h of
91714 corpus (74113s/17601h) 01/23/04
body RM_bpt_longwords59a /\b(?:[a-z]{5,}\s+){9}/
describe RM_bpt_longwords59a Long string of long words
score RM_bpt_longwords59a 1.500 # type=FP - 8753s/8h of 91714 corpus
(74113s/17601h) 01/23/04
# ham: userid list
body RM_bpt_longwords79a /\b(?:[a-z]{7,}\s+){9}/
describe RM_bpt_longwords79a Long string of long words
score RM_bpt_longwords79a 1.000 # type=max:1 (add to 78a) - 2950s/0h of
91714 corpus (74113s/17601h) 01/23/04
body RM_bpt_longwords96a /\b(?:[a-z]{9,}\s+){6}/
describe RM_bpt_longwords96a Long string of long words
score RM_bpt_longwords96a 4.000 # 1162s/0h of 91714 corpus (74113s/17601h)
01/23/04
body RM_bpt_longwords88a /\b(?:[a-z]{8,}\s+){8}/
describe RM_bpt_longwords88a Long string of long words
score RM_bpt_longwords88a 4.000 # 1025s/0h of 91714 corpus (74113s/17601h)
01/23/04
body RM_bpt_longwords89a /\b(?:[a-z]{8,}\s+){9}/
describe RM_bpt_longwords89a Long string of long words
score RM_bpt_longwords89a 1.000 # type=max:1 (add to 88a) - 590s/0h of
91714 corpus (74113s/17601h) 01/23/04
body RM_bpt_longwords97 /\b(?:\w{9,}\s+){7}/
describe RM_bpt_longwords97 Long string of long words
score RM_bpt_longwords97 3.000 # 545s/0h of 91714 corpus (74113s/17601h)
01/23/04
body RM_bpt_longwords98 /\b(?:\w{9,}\s+){8}/
describe RM_bpt_longwords98 Long string of long words
score RM_bpt_longwords98 1.000 # type=max:1 (add to 97) - 442s/0h of 91714
corpus (74113s/17601h) 01/23/04
body RM_bpt_longwords99 /\b(?:\w{9,}\s+){9}/
describe RM_bpt_longwords99 Long string of long words
score RM_bpt_longwords99 1.000 # type=max:1 (add to 98) - 330s/0h of 91714
corpus (74113s/17601h) 01/23/04
Bob Menschel