At 02:32 PM 10/18/2004, Mathieu Nantel wrote:
As I've read a few articles on DSPAM claiming that it's better/faster/sexier
than spamassassin, I would appreciate having this list's comment on DSPAM.
I'm sure quite a few of you have tried it and might have some interesting
experiences to share. My understanding is that DSPAM relies solely on
algorithms (Bayes, CHI^2), and that complications arise when you have to
teach your users on how to train the system (which SA doesn't require as it's
based on other things aside from Bayes).

You've pretty much summed up the answers yourself.

There's a lot of pure learning systems out there that work VERY well if you've got the time to train them, and keep them well trained. DSPAM, CRM114, spambayes, etc.

Pure Learning:

The strength of pure-learning systems is speed and simplicity.

They have the strength of quickly adapting to the learning that you feed them.

The weakness is that their need for training (can't run without it) and that their accuracy is entirely a function of the training quality. If you don't set up good training, they suck completely. (garbage in, garbage out)

One other weakness is a weakness to bayes-poison type attacks. Many deal quite well with this type of attack, but all have some degree of weakness to it that rule and dnsbl systems aren't susceptible to.


SpamAssaassin:

The strength of SA is it's use of many sources of spam criteria. It combines bayesian, regex rules, perl-coded-rules, DNSBLs, URIBLs, hash systems, past score-averaging systems. This makes it fairly resistant to poisoning. poison techniques that work for one element of analysis that SA uses won't work for all of them. However, to some degree this is both a strength and a weakness.

SA's also got learning ability, and even has a self-training ability based on the results of the other rules in the system. I know of no other self-trainers, but I could be wrong on this.

Unlike pure-learning systems SA has the ability to run without any training, for those who can't do bayes training. It's results are a bit less accurate, but it's quite workable, particularly with the aid of network checks.

Another Strength of SA is a high degree of user customization. You can add your own regex rules, and now even code-level plugins.

One of SA's weaknesses is speed and resource usage. A fully network-and-bayes enabled SA queries a lot of stuff, which can take time compared to a pure bayes-only system. It can also chew up a lot of memory (although pure learning systems can chew up a lot too, SA takes a bit of an extra hit here due to it's "kitchen sink" approach)

Another weakness is rate-of-release for new versions of the regex rules.






Reply via email to