On 2014-04-06 17:21, John Hardin wrote:
On Sun, 6 Apr 2014, Dave Warren wrote:
Is older ham useful? It specifically mentions that older spam isn't
useful, and why, but I'm thinking older ham is probably useful since
old mail clients and legitimately sent mail never dies. But I could
filter based on date.
There's some debate about that. :)
I personally agree with you. Others disagree.
I've been giving it some thought and I think that perhaps limiting it to
the last few months will make it easier to get a sane set of
TRUSTED_NETWORKS and INTERNAL_NETWORKS; I've got mail going back to
~2002 but no real recollection of how things were set up or named prior
to 2007 or so.
Initially I'll limit it to mail within the last couple of months, but
perhaps expand that up to 24-36 months for non-spam and 6 months for
spam, is that sane/reasonable?
Yes, ham-only masscheck submissions would be very welcome.
Perfect, glad to hear it. At this point I've built a dedicated box to
run the masscheck scripts, so now it's just a matter of putting together
a corpus and doing some sanity checking and testing.
My current thought is to take user-fed spam and non-spam folders and
place copies of messages into a staging path which will then be reviewed
before being added to the corpus for learning. Hopefully I'll be ready
to go live within a day or two.
--
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren