The author of the paper, Gordon Cormack, has a lot of experience in the area
of information retrieval.  It would be a good idea to carefully analyse his
results and conclusions for ways to improve SpamAssassin and for approaches
that we should ignore.

I've been very skeptical of the hand wavy approaches with little theoretical
background or improper evaluation (e.g. Dobly and Bayesian chains).  The
results in Cormack's paper should warn us against blindly accepting "cool"
ideas without taking the steps to ensure their validity.

Lastly, the positive results speak for themselves.  Kudos guys!

Henry

> -----Original Message-----
> From: Justin Mason [mailto:[EMAIL PROTECTED]
> Sent: June 22, 2004 3:42 AM
> To: [EMAIL PROTECTED]
> Subject: interesting paper on SpamAssassin (fwd)
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> http://plg.uwaterloo.ca/~gvcormac/spamcormack.html
> 
> A good study comparing SpamAssassin (in several configurations) and
> several other spam filtering systems, over the course of 8 months (Aug
> 2003 to Mar 2004).   The measurements and methodology are all pretty
> sound, as far as I can see.
> 
> Well worth a read...
> 
> - --j.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.4 (GNU/Linux)
> Comment: Exmh CVS
> 
> iD8DBQFA19SiQTcbUG5Y7woRAikGAJ4ye0EFbwOC0CrMtX8wk/TiIrNVnACgxWX/
> 4XDSllJJSiRBFIklOrF93fE=
> =xzPv
> -----END PGP SIGNATURE-----

Reply via email to