That's okay, the problem just is one cannot be sure how accurate it is. Knowing
that you use MS would have been useful, anyway :-)
(BTW: my version of Mailwatch can't show this, do you use a CVS version?)
Indeed, this is the CVS version :-)
See the number of tokens, we have ten times yours with less learned mail. That
means that our db has much more tokens to qualify an email as ham or spam. Also
This is perhaps because I have been using only 'mistake-based' training (ie training only when false classificaiton happens). However this used to work fine.
your "hold time" is quite low, it's about a month. I think we haven tokens from
even a year ago. That's maybe a bit too much, but I strongly suggest upping
your bayes_expiry_max_db_size to something like 500.000 or so. Since you have a
much higher flux of messages than we have on that machine you are literally
"burning" your db to uselessness.
So what would you suggest? I certainly dont want to lose everything that has been learned till now.
And you learned by specifying the config file? I suspect that you are at least
occasionally using two SA configurations, the one coming with MS and the one
coming with SA.
Nope, there is definitely only the one comng with MS. I never use SA from the command line anyway.
Oh. Still possible, though. You don't need to have one, but on high volume
systems it's highly recommended. Check your SA config (whereever it is :-) for
bayes_learn_to_journal 1. I don't know if it is 1 by default, though. What do
you have starting with bayes in your config file?
# grep bayes /opt/MailScanner/etc/spam.assassin.prefs.conf # be created as /var/spool/spamassassin/bayes_msgcount, etc. #bayes_path /var/spool/spamassassin/bayes #bayes_file_mode 0600 bayes_path /var/spool/MailScanner/bayes/bayes bayes_file_mode 0666 # MailScanner: big bayes_toks.new files wasting space. bayes_auto_expire 0 bayes_expiry_max_db_size 500000 bayes_ignore_header X-MailScanner bayes_ignore_header X-MailScanner-SpamCheck bayes_ignore_header X-MailScanner-SpamScore bayes_ignore_header X-MailScanner-Information # use_bayes 0
Don't know if this would be of any help. As I said, I suspect you are using at
least two different bayes dbs. At least when you do it from the command line.
Run an "updatedb" and then "locate bayes" (this may not locate all files, f.i.
not in /var !).
I think there is only one.
MS, of course, can only use one and doesn't have a chance of confusing that, so
when it uses SA that learns and checks the same db. And so far that part seems
to be okay (except for the bigger size of bayes_seen, but as I said, this may
be normal for your setup, I really don't know). But you burn your tokens too
fast. At least that's what I think.
If I get it you mean that the tokens are lost very quickly? I think am confused , if bayes works with tokens, why does it need nspam and nham? Or are they just counters?
In general, do you think that setting bayes_expiry_max_db_size would be enough?
One final thing: Why even if i manually expire, the date of last expiration remains old?
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/