At 12:16 PM -0700 5/30/07, Christopher Bort imposed structure on a stream of electrons, yielding:
On 05/30/07 10:56, [EMAIL PROTECTED] (Charles Mangin) wrote:

one of them that i'm working on now is bayesian filtering within
spamassassin. i've got it marking/learning spam and ham, but it's slow
going. what i'd love to find is a compilation of example spams that i
can dump into my database so it can start with a critical mass of spam
to check against. jumpstart the "training" process, so to speak.

do any mail admins on this list know where to get such an archive,
other than to open up one of my own domains to the floodgates and just
capture it myself?

It is highly recommended that you train your Bayes database only with messages that have actually been received at your own installation. Using someone else's spam and ham is likely skew your database and result in inaccuracies. In other words, SpamAssassin needs to know what _you_ see as spam and ham, not what someone else sees.

AMEN!

Training a Bayes database from someone else's spam/ham corpus is a path to trouble. Aside from the subjective issue of mail you want having been reported as spam by someone else and vice-versa, there is a significant ephemeral quality to spam that causes trouble with using any corpus that isn't extremely current. Because of how Bayes filtering works, that means you can easily get significantly worse results from a large aged database than a small but very current one.


--
Bill Cole [EMAIL PROTECTED]


#############################################################
This message is sent to you because you are subscribed to
 the mailing list <SIMS@mail.stalker.com>.
To unsubscribe, E-mail to: <[EMAIL PROTECTED]>
To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]>
To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]>
Send administrative queries to  <[EMAIL PROTECTED]>

Reply via email to