formail recipe

2004-03-10 Thread David Bear
Hope I'm not imposing too much on this group.. but since this group
has a collection of the best, brightest, and generous..

I wonder if someone might have a formail recipe that would randomly
select N messages from a mailbox of M messages?  I have a spam corpus
thats well over 1 and need to trim it down.


-- 
David Bear
phone:  480-965-8257
fax:480-965-9189
College of Public Programs/ASU
Wilson Hall 232
Tempe, AZ 85287-0803
 Beware the IP portfolio, everyone will be suspect of trespassing
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: formail recipe

2004-03-10 Thread Louis LeBlanc
I know what you mean.  Mine's over 6700, and that's just since 1/1/04.
I have no doubt whatsoever there are a good number of people here that
have that beat several times over in the same period of time.

What I do to trim mine down is just take the oldest messages out.
Naturally, this can be tricky since the Date: header is often bogus,
but it's a place to start.  Come the end of the quarter, I'll be
blocking off this archive folder and starting a new one.  At that
time, I'll be rebuilding my SA bayes db to make sure I have a
'correct' base.  The next quarters worth (which I'd like to delude
myself to believe will be smaller) will be feed in on a regular basis
to keep the bayes db on track.

The reason I suggest removing the oldest messages is that spammers
seem to evolve their methods, and the bayes db will be most accurate
with a more complete picture of CURRENT practices, with those methods no
longer being used not affecting the current db.  Over the last month,
I've seen their evolving methods start sneaking in under the SA radar,
and have slowly but surely dropped my threshold down to 1.0 rather
than the default 5.0.  So far, no FNs, and the FPs have gone away (for
now).

There will be lots of arguments to the contrary of at least some of
what I've said here, but the great thing about all this is you get to
decide what approach you have more confidence in.  This is the
approach I have more confidence in - though I'm open to any method of
tweaking that method.

Good luck.

Lou

On 03/10/04 09:27 AM, David Bear sat at the `puter and typed:
 Hope I'm not imposing too much on this group.. but since this group
 has a collection of the best, brightest, and generous..
 
 I wonder if someone might have a formail recipe that would randomly
 select N messages from a mailbox of M messages?  I have a spam corpus
 thats well over 1 and need to trim it down.
 
 
 -- 
 David Bear
 phone:480-965-8257
 fax:  480-965-9189
 College of Public Programs/ASU
 Wilson Hall 232
 Tempe, AZ 85287-0803
  Beware the IP portfolio, everyone will be suspect of trespassing
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 
 

-- 
Louis LeBlanc   [EMAIL PROTECTED]
Fully Funded Hobbyist, KeySlapper Extrordinaire :)
http://www.keyslapper.org ԿԬ

An age is called Dark not because the light fails to shine, but because
people refuse to see it.
-- James Michener, Space
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]