kylepetersen wrote:
Can you point me to a procedure for Bayesian filter usage on James?  I've
seen it mentioned in the config.xml file (James' config.xml, that is), so it
seems to already be a part of James?

Yes that's right. The Bayesian analysis filter started appearing a couple of releases ago. I think it was contributed by Vincenzo Gianferrari Pini who did most of the work on it.

The web page describing it is on the Wiki [1].

You'll need to have MySQL installed on your server as the filter uses that to manage the corpus (the body of email it compares new emails to).

Essentially, when you receive a message that you consider to be spam you forward the email as an attachment (to preserve all its contents) to a special email address. Every ten minutes the Bayesian mailet checks for any new messages you have sent it and begins the training process. Likewise if you receive an email that is 'ham' (a good email) you forward it to the special ham email address.

Your server can then be set up to compare incoming emails against this database of good and back exemplars and processed as you wish. I have mine set up to delete emails which the system thinks are 50% or more likely to be spam. For some people they prefer to keep these emails in a separate mailbox in case 'good' emails fail the test. I did this at first as well but for me the false positives became so rare I thought I could live with it. In any case I've reduced the chances of false positives still further by enabling whitelist processing. This essentially disables spam processing for any messages received from people I have sent messages to before.

You may find like me that, at first, killing spam becomes a passion and you find yourself reading up on all sorts of schemes for killing it... like: -

 Spam Url Realtime Block Lists - [2]
 Tarpitting [3]
 Teergrubing [4]
 SMTP transaction delays [5]
 Sender Policy Framework [6]
 Greylisting [7]

However, I can save you the trouble of hurting your eyes any further and recommend Bayesian Analysis! You should keep in mind though that unlike some of the methods I have listed Bayesian Analysis is not really suitable for those situations where a server is handling a lot of individual accounts. Different people may inflict different spam on themselves as a result of their online behaviour. To work effectively in that case every user would have to be in charge of updating the filter for their own spam and that is a recipe for disaster.

Regards,
- David.



[1] http://wiki.apache.org/james/Bayesian_Analysis
[2] http://www.surbl.org/
[3] http://www.palomine.net/qmail/tarpit.html
[4] http://www.iks-jena.de/mitarb/lutz/usenet/teergrube.en.html
[5] http://tldp.org/HOWTO/Spam-Filtering-for-MX/smtpdelays.html
[6] http://www.openspf.org/svn/project/specs/rfc4408.html
[7] http://projects.puremagic.com/greylisting/whitepaper.html





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to