kylepetersen wrote:
Can you point me to a procedure for Bayesian filter usage on James? I've
seen it mentioned in the config.xml file (James' config.xml, that is), so it
seems to already be a part of James?
Yes that's right. The Bayesian analysis filter started appearing a
couple of releases ago. I think it was contributed by Vincenzo
Gianferrari Pini who did most of the work on it.
The web page describing it is on the Wiki [1].
You'll need to have MySQL installed on your server as the filter uses
that to manage the corpus (the body of email it compares new emails to).
Essentially, when you receive a message that you consider to be spam you
forward the email as an attachment (to preserve all its contents) to a
special email address. Every ten minutes the Bayesian mailet checks for
any new messages you have sent it and begins the training process.
Likewise if you receive an email that is 'ham' (a good email) you
forward it to the special ham email address.
Your server can then be set up to compare incoming emails against this
database of good and back exemplars and processed as you wish. I have
mine set up to delete emails which the system thinks are 50% or more
likely to be spam. For some people they prefer to keep these emails in
a separate mailbox in case 'good' emails fail the test. I did this at
first as well but for me the false positives became so rare I thought I
could live with it. In any case I've reduced the chances of false
positives still further by enabling whitelist processing. This
essentially disables spam processing for any messages received from
people I have sent messages to before.
You may find like me that, at first, killing spam becomes a passion and
you find yourself reading up on all sorts of schemes for killing it...
like: -
Spam Url Realtime Block Lists - [2]
Tarpitting [3]
Teergrubing [4]
SMTP transaction delays [5]
Sender Policy Framework [6]
Greylisting [7]
However, I can save you the trouble of hurting your eyes any further
and recommend Bayesian Analysis! You should keep in mind though that
unlike some of the methods I have listed Bayesian Analysis is not really
suitable for those situations where a server is handling a lot of
individual accounts. Different people may inflict different spam on
themselves as a result of their online behaviour. To work effectively
in that case every user would have to be in charge of updating the
filter for their own spam and that is a recipe for disaster.
Regards,
- David.
[1] http://wiki.apache.org/james/Bayesian_Analysis
[2] http://www.surbl.org/
[3] http://www.palomine.net/qmail/tarpit.html
[4] http://www.iks-jena.de/mitarb/lutz/usenet/teergrube.en.html
[5] http://tldp.org/HOWTO/Spam-Filtering-for-MX/smtpdelays.html
[6] http://www.openspf.org/svn/project/specs/rfc4408.html
[7] http://projects.puremagic.com/greylisting/whitepaper.html
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]