I have a question about - understanding how are rulesets generated for
spamassassin.

For example - consider the rule in 20_drugs.cf : 
header SUBJECT_DRUG_GAP_C       Subject =~
/\bc.{0,2}i.{0,2}a.{0,2}l.{0,2}i.{0,2}s\b/i
describe SUBJECT_DRUG_GAP_C     Subject contains a gappy version of 'cialis'

Who generated the regular expression
"/\bc.{0,2}i.{0,2}a.{0,2}l.{0,2}i.{0,2}s\b/i"

a. Is it done manually with people writing regex to see how efficiently they
capture spams?
b. Is there an algorithm that identifies large corpus of spam and the comes
up with these regex'es on its own?
c. Is it a combination of (a), (b)?

I know scores for rules are generated using "a neural network trained with
error back propagation"
http://wiki.apache.org/spamassassin/HowScoresAreAssigned

But how are the rules generated themselves? 

Thnx
-- 
View this message in context: 
http://www.nabble.com/SpamAssassin-Ruleset-Generation-tp25773508p25773508.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Reply via email to