Re: Integrate Spamassassin with James?

David Legg Fri, 18 Jan 2008 02:42:09 -0800

kylepetersen wrote:

Can you point me to a procedure for Bayesian filter usage on James?  I've
seen it mentioned in the config.xml file (James' config.xml, that is), so it
seems to already be a part of James?

Yes that's right. The Bayesian analysis filter started appearing acouple of releases ago. I think it was contributed by VincenzoGianferrari Pini who did most of the work on it.


The web page describing it is on the Wiki [1].

You'll need to have MySQL installed on your server as the filter usesthat to manage the corpus (the body of email it compares new emails to).

Essentially, when you receive a message that you consider to be spam youforward the email as an attachment (to preserve all its contents) to aspecial email address. Every ten minutes the Bayesian mailet checks forany new messages you have sent it and begins the training process.Likewise if you receive an email that is 'ham' (a good email) youforward it to the special ham email address.

Your server can then be set up to compare incoming emails against thisdatabase of good and back exemplars and processed as you wish. I havemine set up to delete emails which the system thinks are 50% or morelikely to be spam. For some people they prefer to keep these emails ina separate mailbox in case 'good' emails fail the test. I did this atfirst as well but for me the false positives became so rare I thought Icould live with it. In any case I've reduced the chances of falsepositives still further by enabling whitelist processing. Thisessentially disables spam processing for any messages received frompeople I have sent messages to before.

You may find like me that, at first, killing spam becomes a passion andyou find yourself reading up on all sorts of schemes for killing it...like: -


 Spam Url Realtime Block Lists - [2]
 Tarpitting [3]
 Teergrubing [4]
 SMTP transaction delays [5]
 Sender Policy Framework [6]
 Greylisting [7]

However, I can save you the trouble of hurting your eyes any furtherand recommend Bayesian Analysis! You should keep in mind though thatunlike some of the methods I have listed Bayesian Analysis is not reallysuitable for those situations where a server is handling a lot ofindividual accounts. Different people may inflict different spam onthemselves as a result of their online behaviour. To work effectivelyin that case every user would have to be in charge of updating thefilter for their own spam and that is a recipe for disaster.


Regards,
- David.



[1] http://wiki.apache.org/james/Bayesian_Analysis
[2] http://www.surbl.org/
[3] http://www.palomine.net/qmail/tarpit.html
[4] http://www.iks-jena.de/mitarb/lutz/usenet/teergrube.en.html
[5] http://tldp.org/HOWTO/Spam-Filtering-for-MX/smtpdelays.html
[6] http://www.openspf.org/svn/project/specs/rfc4408.html
[7] http://projects.puremagic.com/greylisting/whitepaper.html





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Integrate Spamassassin with James?

Reply via email to