[TriLUG] Spamassassin question - Bayesian filtering

Jeremy Portzer Thu, 13 Mar 2003 10:10:36 -0800

Good afternoon folks,

I've been playing around with the new Spamassassin, version 2.50, which
includes Bayesian filtering (see http://www.paulgraham.com/spam.html for
the paper about this, mentioned at ESR's talk, and see the man page for
the "sa-learn" command).


As per the sa-learn man page, the default in SA 2.50 is to operate in
Unsupervised auto-learning.  This means that mail is populated in the
"ham/spam" databases based on whether SpamAssassin marks it as spam or
not, from the other rules.   The man page mentions that this "should be
supplemented with some supervised training in addition, if possible."

How do I go about "supplementing" the auto-learning mode?  One problem I
can see with auto-learning is that missed spams become marked as "ham"
(non-spam) and could mess up the database.  So I'm collecting these
mistakes, but how do I properly adjust the database?  Do I need to make
it "forget" the mistaken emails first, and then run them through
sa-learn with --ham?   Or is running them through with --ham enough?

Anyone know of resources/HOWTOs/examples with actual commands, instead
of generalized statements like "supplement with supervised training" ?

====

If anyone else is interested in testing SpamAssassin, it is installed on
the TriLUG mail server now.  Just put something like this in your
.procmailrc :

:0fw
| /usr/bin/spamc

Then your spam will be marked with the X-Spam-Status header, which you
can filter on if you like.

Regards,
Jeremy

-- 
/=====================================================================\
| Jeremy Portzer       [EMAIL PROTECTED]       trilug.org/~jeremy     |
| GPG Fingerprint: 712D 77C7 AB2D 2130 989F  E135 6F9F F7BC CC1A 7B92 |
\=====================================================================/

signature.asc
Description: This is a digitally signed message part

[TriLUG] Spamassassin question - Bayesian filtering

Reply via email to