Bob Menschel <[EMAIL PROTECTED]> writes:

> I can see the reason for most of Daniel's suggestions, and while I
> think 12 months is too short a period for ham (I'd favor 18 or 24
> months), I could live with that.

I might be able to live with 18, but I think we should stick with 12
because of the network tests (which are on for 2 of the 3 mass-check
runs if I recall correctly).  The problem is that you get more and more
mail that is no longer representative of the current sender
configuration: SPF negative, host no longer exists, IP address has
changed, etc.
 
> Ham bounces (valid bounces of ham sent from our systems) are ham, and
> should be in the ham corpus.  Spam bounces (blind bounces of spam sent
> back to forged or faked from addresses) are spam, often containing the
> content of the spam as well as the notification.

I agree those are spam, but since those can be addressed with techniques
like envelope rewriting that are 100% reliable and non-probabilistic, I
think we should just remove them.
 
>>> 5. no mailing list moderation administative messages since these also
>>>    contain spam
> 
> They also contain ham. If a system administrator can differentiate
> between them, why shouldn't the spam messages be in a spam corpus, and
> the ham messages in a ham corpus?

Moderators can't ignore either type of moderation message for a large
proportion of mailing list software (especially mailman).  If anything,
they should all be ham and I don't think we want to do that.  I think
it's better to just remove them.

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Reply via email to