[EMAIL PROTECTED] (Justin Mason) writes: > Yes, I agree -- this is the problem with older ham. (esp. the SPF > problem. SPF is very brittle on this point.) > > How's about putting stricter limits on the net check corpora?
Well, do we really want to use an extra 6 months on only one of the runs? I think it would be better to use more or less the same data. > I would suggest though that Malte's point is also valid -- some "special > case" reported FP mails should be kept in the ham corpus, if they really > are special cases that the submitter is worried about. Yes, I *am* keeping my non-SpamAssassin-list spam-related mail in the corpus. The main reason to remove the SpamAssassin list mail is that we'll totally bias the corpus; I'm sure we'll have more than enough FPs for iffy rules by virtue of our everyday mail. > And the ham? I'm +1 on keeping ham bounces. Agreed, I am keeping ham bounces. Daniel -- Daniel Quinlan http://www.pathname.com/~quinlan/
