At 12:16 PM -0700 5/30/07, Christopher Bort imposed structure on a
stream of electrons, yielding:
On 05/30/07 10:56, [EMAIL PROTECTED] (Charles Mangin) wrote:
one of them that i'm working on now is bayesian filtering within
spamassassin. i've got it marking/learning spam and ham, but it's slow
going. what i'd love to find is a compilation of example spams that i
can dump into my database so it can start with a critical mass of spam
to check against. jumpstart the "training" process, so to speak.
do any mail admins on this list know where to get such an archive,
other than to open up one of my own domains to the floodgates and just
capture it myself?
It is highly recommended that you train your Bayes database only
with messages that have actually been received at your own
installation. Using someone else's spam and ham is likely skew your
database and result in inaccuracies. In other words, SpamAssassin
needs to know what _you_ see as spam and ham, not what someone else
sees.
AMEN!
Training a Bayes database from someone else's spam/ham corpus is a
path to trouble. Aside from the subjective issue of mail you want
having been reported as spam by someone else and vice-versa, there is
a significant ephemeral quality to spam that causes trouble with
using any corpus that isn't extremely current. Because of how Bayes
filtering works, that means you can easily get significantly worse
results from a large aged database than a small but very current one.
--
Bill Cole
[EMAIL PROTECTED]
#############################################################
This message is sent to you because you are subscribed to
the mailing list <SIMS@mail.stalker.com>.
To unsubscribe, E-mail to: <[EMAIL PROTECTED]>
To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]>
To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]>
Send administrative queries to <[EMAIL PROTECTED]>