SpamAssassin includes a naive bayesian classifier that can be used to recognize spam based on keywords (in a probabilistically trained way). The results of classification using the bayesian classifier are boiled down into one of several rules: BAYES_00, BAYES_05, BAYES_20, ..., BAYES_95, BAYES_99. These rules have statically assigned scores. Combined with a whole pelathora of other more complex rules (for things like header bugs, DNSBLs, body formatting, etc...) the scores for any rules a message triggers are added up and used to determine whether a message is actually spam.
The scores for these rules can be customized manually in ~/.spamassain/user_prefs or systemwide in files in /etc/spamassassin. Is there any utility for spamassassin that could be used to train the scores for all of its rules automatically, in a bayesian or support-vector-machine kind of way? Note that I'm not talking about training the bayesian filter, as I just explained, I'm curious about automatically training the step that comes after the bayesian filter. --Ken Bloom -- Ken Bloom. PhD candidate. Linguistic Cognition Laboratory. Department of Computer Science. Illinois Institute of Technology. http://www.iit.edu/~kbloom1/
pgpOAybNiAT0W.pgp
Description: PGP signature
_______________________________________________ vox-tech mailing list [email protected] http://lists.lugod.org/mailman/listinfo/vox-tech
