On 2014-04-06 17:21, John Hardin wrote:
On Sun, 6 Apr 2014, Dave Warren wrote:

Is older ham useful? It specifically mentions that older spam isn't useful, and why, but I'm thinking older ham is probably useful since old mail clients and legitimately sent mail never dies. But I could filter based on date.

There's some debate about that. :)

I personally agree with you. Others disagree.

I've been giving it some thought and I think that perhaps limiting it to the last few months will make it easier to get a sane set of TRUSTED_NETWORKS and INTERNAL_NETWORKS; I've got mail going back to ~2002 but no real recollection of how things were set up or named prior to 2007 or so.

Initially I'll limit it to mail within the last couple of months, but perhaps expand that up to 24-36 months for non-spam and 6 months for spam, is that sane/reasonable?


Yes, ham-only masscheck submissions would be very welcome.

Perfect, glad to hear it. At this point I've built a dedicated box to run the masscheck scripts, so now it's just a matter of putting together a corpus and doing some sanity checking and testing.

My current thought is to take user-fed spam and non-spam folders and place copies of messages into a staging path which will then be reviewed before being added to the corpus for learning. Hopefully I'll be ready to go live within a day or two.


--
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren


Reply via email to