Matt Kettler wrote:
Gary Smith wrote:
We have a process in place using the perl CPAN module for invoking SA. This is
outside of the scope of the normal mail system. Basically we use this to see
what scores emails would generate for some statistical stuff. The spam engine
this calls is to set use -100 as the score so that everything is considered
spam. Our production spam engine is set to 7. We are looking at the score
that the perl modules returns and logging it (rather than the isspam flag). To
complicate things a little more, we are using MySql for the bayes store. This
store is also used by our production boxes. This isn't the problem, just what
we are doing.
The CPAN module has this as the decription:
public instance (\%) process (String $msg, Boolean $is_check_p)
Description:
This method makes a call to the spamd server and depending on the value of
C<$is_check_p> either calls PROCESS or CHECK.
Given that the perl call as a boolean option for PROCESS and CHECK, I would assume that
they make some difference, but it really doesn't what the difference is. Currently in
our code we are it with a false value, which executes the "PROCESS" commnad.
What I'm wondering is will this through off bayes if we keep doing this as
everything that SA is returning is considered spam? I'm just worried that
these continued tests will cause bayes to get wacky. Also, should we be using
PROCESS or CHECK when doing this type of checks.
Gary
The bayes auto-learning system does not care what your "required_score"
is set to, and does not care if messages are tagged as spam or not. It
uses its own thresholds, and its own additional criteria for learning.
So, feeding it lots of mail with the threshold set to -100 shouldn't
matter at all.
If you're worried about it, set " bayes_auto_learn 0" in whatever conf
file you use for your statistical setup. That way, you can take
advantage of Bayes for scoring, but nothing you do on that system will
affect the db.
--
Bowie