Bob Menschel <[EMAIL PROTECTED]> writes: > I can see the reason for most of Daniel's suggestions, and while I > think 12 months is too short a period for ham (I'd favor 18 or 24 > months), I could live with that.
I might be able to live with 18, but I think we should stick with 12 because of the network tests (which are on for 2 of the 3 mass-check runs if I recall correctly). The problem is that you get more and more mail that is no longer representative of the current sender configuration: SPF negative, host no longer exists, IP address has changed, etc. > Ham bounces (valid bounces of ham sent from our systems) are ham, and > should be in the ham corpus. Spam bounces (blind bounces of spam sent > back to forged or faked from addresses) are spam, often containing the > content of the spam as well as the notification. I agree those are spam, but since those can be addressed with techniques like envelope rewriting that are 100% reliable and non-probabilistic, I think we should just remove them. >>> 5. no mailing list moderation administative messages since these also >>> contain spam > > They also contain ham. If a system administrator can differentiate > between them, why shouldn't the spam messages be in a spam corpus, and > the ham messages in a ham corpus? Moderators can't ignore either type of moderation message for a large proportion of mailing list software (especially mailman). If anything, they should all be ham and I don't think we want to do that. I think it's better to just remove them. -- Daniel Quinlan http://www.pathname.com/~quinlan/
