We seeded the bayes db with manual training at the beginning (few hundred messages each ham and spam I think). Have been letting it auto-learn since then. Is that a bad paradigm?
I also saw a score config for HABEAS_VIOLATOR, but it wasn't triggered by our spam with habeas headers.
-glenn
Matt Kettler wrote:
At 06:05 PM 3/10/2004, [EMAIL PROTECTED] wrote:
We have auto-learn, and many spams don't make a high enough score to be auto-learned as spam. In addition, some spams actually score low enough (see the habeas problem I mentioned earlier) to be auto-learned as ham :-(
Autolearn is a good thing, but how much manual training are you doing?
Autolearning alone as your sole source of bayes training is a very bad idea, and prone to disaster.
I might also suggest the following to help mitigate some of the habeas damage:
bayes_ignore_header X-Habeas-SWE-1 bayes_ignore_header X-Habeas-SWE-2 bayes_ignore_header X-Habeas-SWE-3 bayes_ignore_header X-Habeas-SWE-4 bayes_ignore_header X-Habeas-SWE-5 bayes_ignore_header X-Habeas-SWE-6 bayes_ignore_header X-Habeas-SWE-7 bayes_ignore_header X-Habeas-SWE-8 bayes_ignore_header X-Habeas-SWE-9
This will make the bayes database never give ham nor spam points because an email has these headers.. since there's already a rule for them, there's no reason to give "double credit" and give them bayes consideration as well.
