Re: Improving sa

Mike Jackson Mon, 28 Nov 2005 08:32:29 -0800

When manually applying the filters "Mark as SPAM" or "Mark as HAM", whichpipethe message to the command sa-learn --spam or sa-learn --ham respectively,it
takes up to a minute to process on a PIV 4.3Ghz HT with 1Gb of RAM, which
seems like ages.

I've noticed that the SQL backends to Bayes and AWL are quite a bit faster.If you're learning on a single-message basis, you might want toadd --no-sync to your sa-learn invocation so that it doesn't sync thejournal and the database with every single message. Do that as a cron job onan appropriate schedule, like once a day.

Second, it seems that spamassassin vs spam is nothing less then anarms-race,
with spamassassin perpetually running behind.

Well, of course. Any rules are going to be reactive to what they've seen,not proactive. The Bayesian filter gets much closer to being an "on the fly"reaction to the mail you see, but it still needs historial record to go on,not intuition. Anything else would end up resembling a Douglas Adams novel:)

As more and more rules are added, doesn't it come to a point wheredeciding if
a message is spam or ham takes longer and longer or up to a point where
spamassassin allone can't handle it anymore?

I'm not geeky enough to formulate this in fancier words, but it seems likethere's an upper threshold to how complicated you can make a mail message,therefore there should be an upper limit to the rules to identify a messageautomatically based on certain characteristics. But, there may come a timewhen the "arms race" goes thermonuclear and the only way to deal with spamis to nuke SMTP as we know it and formulate a new system that better dealswith the loopholes spammers exploit to send their ads.

Lastly, I am running spamassassin 3.1 out of the box, that is installedthe
rpm and that's it.
What can I do to increase effectiveness of spamassassin in diffrentiatingspamfrom ham? Right now, there's about 10% of all messages that come in on aday(4.500) that are injustly marked as ham or spam (10% is not a lot, butstill
45 messages each day!)


Uh, wouldn't 10% be 450 messages?  ;)

This is my prejudice showing, but personally I would compile SA from scratchrather than relying on an RPM. I rarely trust that precompiled packages aregoing to contain the options I want, or exclude the options I'll never use(but then, I'm also a FreeBSD user, and even when you install something fromports it's compiled from scratch and can be fine-tuned). Make sure you haveall the SQL tools you need and use your favorite database backend for Bayesand AWL. This is purely anecdotal, but it seems much faster on the severalservers where I've implemented it than the older database methods. I'd alsolook for other bottlenecks, because with a 4GHz processor and 1GB RAM, SAshould kick booty. Either something else is consuming your resources, orsomething's rotten in Denmark.

Re: Improving sa

Reply via email to