Re: Processing many mbox folders

Gary D. Margiotta Fri, 02 Jun 2006 20:47:15 -0700


Thanks, that handles the top level. ;)

Yeah, it was quick and simple for just the one scenario you had in youre-mail.

Me, I redirect mail using a combo of procmail and Postfix header checks to2 users on the border servers (hamfilter and spamfilter), then I do 2nightly script runs to sa-learn ham and spam. I feed somewhere around6,000 spam e-mails alone nightly to sa-learn. Maybe that's a bit much,but I get awesome results, and my FP rates are next to nil. Mind you, I'mdoing this site-wide on border servers, we pass 30k e-mails daily throughthose particular systems.

I figure I'll need to do something like:

find mail/Lists -type f -exec sa-learn --ham --mbox {} \;
(I'd need the same for mail/Friends and a few other top-level hierarchies,excluding my mail/Spam one. Within that tree, I need to put SpamAssassin andUncaught under --spam and FalsePositives under --ham.)

Well, another way you could do it is just keep a text list of your spamand ham folders -


ham.txt:

mail/foo/hambox1
mail/bar/hambox1

spam.txt:

mail/foo/spambox1
mail/bar/spambox1

Then, the original for loop would work:

#!/bin/sh
for x in `cat ham.txt`
do
        sa-learn --ham --progress --mbox $x >> outfile
done

cat outfile | mail [EMAIL PROTECTED]

#!/bin/sh
for y in `cat spam.txt`
do
        sa-learn --spam --progress --mbox $y >> outfile
done

cat outfile | mail [EMAIL PROTECTED]

But I want to exclude my .imap folders created by the dovecot IMAP server tohold state data. I might also need to wrap sa-learn in a script to lock themailboxes against modification by dovecot and procmail (my LDA).

To build the original text files, you could use find, or edit by hand.This way you could build a list of your mailboxes, and you caninclude/exclude whatever you want.

If you have those boxes as active, then yes. But then again, if you learna mailbox that you used to learn before, then it's a waste of cycles forthe mails sa has already seen.

And what would be the equivalent for mass-checks?


Don't use those, sorry...

-Gary

Re: Processing many mbox folders

Reply via email to