GRP Productions wrote on Mon, 14 Mar 2005 00:32:42 +0200: > You are right, I am using MailWatch. I just posted this output to be easy > for one to see the actual dates without having to convert.
That's okay, the problem just is one cannot be sure how accurate it is. Knowing that you use MS would have been useful, anyway :-) (BTW: my version of Mailwatch can't show this, do you use a CVS version?) Here is the > actual output: > > # /usr/bin/sa-learn -p /opt/MailScanner/etc/spam.assassin.prefs.conf --dump > magic > 0.000 0 3 0 non-token data: bayes db version > 0.000 0 49740 0 non-token data: nspam > 0.000 0 47167 0 non-token data: nham > 0.000 0 123325 0 non-token data: ntokens I didn't look at this closely before, but I think this ratio indicates a problem, f.i. this is from our own mail server (just getting our own mail, not our clients'): 0.000 0 30089 0 non-token data: nspam 0.000 0 12515 0 non-token data: nham 0.000 0 1001630 0 non-token data: ntokens See the number of tokens, we have ten times yours with less learned mail. That means that our db has much more tokens to qualify an email as ham or spam. Also your "hold time" is quite low, it's about a month. I think we haven tokens from even a year ago. That's maybe a bit too much, but I strongly suggest upping your bayes_expiry_max_db_size to something like 500.000 or so. Since you have a much higher flux of messages than we have on that machine you are literally "burning" your db to uselessness. > No it isn't. This is exactly the point I mentioned. But you didn't prove it ;-) But as I said earlier, > sa-learn claims it has learned, even from the web interface: > >SA Learn: Learned from 1 message(s) (1 message(s) examined). And you learned by specifying the config file? I suspect that you are at least occasionally using two SA configurations, the one coming with MS and the one coming with SA. > This is getting more suspicious: there is no bayes_journal file! Oh. Still possible, though. You don't need to have one, but on high volume systems it's highly recommended. Check your SA config (whereever it is :-) for bayes_learn_to_journal 1. I don't know if it is 1 by default, though. What do you have starting with bayes in your config file? > -rw-rw-rw- 1 root nobody 1236 Mar 14 00:22 bayes.mutex > -rw-rw-rw- 1 root nobody 10452992 Mar 14 00:22 bayes_seen > -rw-rw-rw- 1 root nobody 5509120 Mar 14 00:02 bayes_toks bayes_seen is quite high. I haven't ever seen that it is higher than bayes_toks on our systems. But maybe that's normal for high volume systems, I don't know. On the Mailscanner list many people complain about very big bayes_seen files. Someone else on this list should comment on the size. > I can assure you noone has touched anything inside this directory. If this > is the reason for the problems I've been facing, is there a way to recreate > the file without having to lose my current data? (perhaps by copying the > above files somewhere, execute sa-learn --clear and some time later restore > the above files?) Don't know if this would be of any help. As I said, I suspect you are using at least two different bayes dbs. At least when you do it from the command line. Run an "updatedb" and then "locate bayes" (this may not locate all files, f.i. not in /var !). MS, of course, can only use one and doesn't have a chance of confusing that, so when it uses SA that learns and checks the same db. And so far that part seems to be okay (except for the bigger size of bayes_seen, but as I said, this may be normal for your setup, I really don't know). But you burn your tokens too fast. At least that's what I think. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com IE-Center: http://ie5.de & http://msie.winware.org