-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Daniel Quinlan writes: > Bob Menschel <[EMAIL PROTECTED]> writes: > > > I can see the reason for most of Daniel's suggestions, and while I > > think 12 months is too short a period for ham (I'd favor 18 or 24 > > months), I could live with that. > > I might be able to live with 18, but I think we should stick with 12 > because of the network tests (which are on for 2 of the 3 mass-check > runs if I recall correctly). The problem is that you get more and more > mail that is no longer representative of the current sender > configuration: SPF negative, host no longer exists, IP address has > changed, etc. Yes, I agree -- this is the problem with older ham. (esp. the SPF problem. SPF is very brittle on this point.) How's about putting stricter limits on the net check corpora? I would suggest though that Malte's point is also valid -- some "special case" reported FP mails should be kept in the ham corpus, if they really are special cases that the submitter is worried about. > > Ham bounces (valid bounces of ham sent from our systems) are ham, and > > should be in the ham corpus. Spam bounces (blind bounces of spam sent > > back to forged or faked from addresses) are spam, often containing the > > content of the spam as well as the notification. > > I agree those are spam, but since those can be addressed with techniques > like envelope rewriting that are 100% reliable and non-probabilistic, I > think we should just remove them. And the ham? I'm +1 on keeping ham bounces. Spam bounces, however, I don't think should be used in the corpus at all. > >>> 5. no mailing list moderation administative messages since these also > >>> contain spam > > > > They also contain ham. If a system administrator can differentiate > > between them, why shouldn't the spam messages be in a spam corpus, and > > the ham messages in a ham corpus? > > Moderators can't ignore either type of moderation message for a large > proportion of mailing list software (especially mailman). If anything, > they should all be ham and I don't think we want to do that. I think > it's better to just remove them. OK, I've come around to that view BTW. +1 - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFA2e1wQTcbUG5Y7woRApGIAJ96HbTdMromHvsVa/gH1BOev1FtvgCgtbDM dngT9ZZmVyR1VUa1MKwgT9U= =WjwV -----END PGP SIGNATURE-----
