On Mon, Jul 05, 2004 at 01:34:37PM -0700, Dan Quinlan wrote: > So, before we make the pre2 release and start mass-checks, there's one > thing I want to nail down in the corpus policy: should we just remove > any spam list that has tons of false positives?
It would depend what the FPs are from I'd say. > Removing the SpamAssassin ones is just common sense, but I looked at my > false positives and 59 out of 102 of my false positives are from another > anti-spam mailing list that frequently includes snippets of spam, URLs, Ah. IMO, any spam-related mails have no place in a ham corpus. They're not going to be considered "standard" for most people, and as you've said, they have a large tendency to include spam snippets/etc that cause filters to go all gonzo. -- Randomly Generated Tagline: "To love another person is to see the face of God." - Les Miserables
pgpejLmvY58XY.pgp
Description: PGP signature
