You covered a lot of ground here. Thanks.. If you have some spare cycles, I have follow up questions to get an understanding of how you process your email:

21 seconds at that includes fetch the samples via imap from two
folders, fire them against a bayes-only spamassasin instance,

What is a "bayes-only" instance? I don't follow. What other kinds of instances are there?


ignore
BAEYS_00/BAYES_99 messages, move the rest to the both training
folders, anonymize them, strip useless headers, fire sa-learn against

OK, so it looks like you are suggesting that emails get kind of pre-screened to determine if they are obvious spam or not.

And by anonymize, what do you mean? Remove the headers that contain email addresses? What other headers are useless? What exactly is the goal of anonymizing and removing the headers? I think I have a vague idea why but can't quite crystallize it in my head.

both folders, fire bogfilkter training against both folders and verify
that the new sampel files score with BEYS_99/BAYES_00 now

bogfilkter training?

So the goal is to get all the new emails to score either 99 (spam) or 00 (ham).

So once I verify they score 00 or 99, do I then throw them on the larger collection of ham/spam with all headers restored? And what do I do if they still don't score 00 or 99?


Reply via email to