Re: Bayes DB does not grow anymore

Kai Schaetzl 15 Mar 2005 16:34:02 -0000

GRP Productions wrote on Tue, 15 Mar 2005 01:12:53 +0200:

> >I have been trying to get something from CVS for several days now, no luck. 
>  
> Send me your email in private ([EMAIL PROTECTED]) to send it to you.


Thanks for the offer. You can send it to the email address I use for this list, 
or you could just send me an FTP URL for retrieval.

> I will probably start again from scratch. One point: Do you think I should 
> put custom rules inside /etc/mail/spamassassin or the default installation 
> is enough? 

Oh, yes. You need to have SURBL switched on via the init.pre (I think it's off 
by default) and you should use custom rules. I use a set of carefully chosen 
rulesets mostly from SARE and updated via rulesdujour and some more rules of my 
own accumulated over time.

> Yes I just added this. Should auto_expire remain always at 0?

I think on a heavy traffic machine it's preferrable to have it off, especially 
when using MailScanner. Otherwise the expiry can kick in at random times every 
few hours (you can set a minimum time, though, f.i. one day). Some people run a 
scheduled expiry three times a day. That's an advice which often comes up on 
the Mailscanner list (which is a very helpful list, btw).
Depends on how often you need it (whether it reaches the limit you want to hold 
more often or not). Starting with one expiry per night should be fine, but you 
should occasionally expire manually and look at the output, in case there are 
problems.


 Also, do you 
> think it would be better if the db NEVER expired?

No. One should get rid of really old tokens, they are only "ballast" in the db. 
I don't know how a big db behaves on a busy site. Ours contain 1 Mio. tokens 
and have a size of 40 MB. They work very well with no ressource hogging. But I 
have only a few thousand messages running thru each of our servers, there's 
probably none which gets more than 10.000 a day. If you get 100.000 it may be 
different.


 Would this value of 500000 
> achieve that? I don't want to come at work some day and see my tokens were 
> lost again :-( 

Just look at what the dump says about your oldest token. If your bayes 
"performance" is good than the hold time is probably of no interest, but if the 
spam detection from bayes is bad and you have a short hold time one of the 
things I would look at is the short hold time.


>  
> In general, should I do as you said, ie. trust the autolearn system and 
> never use sa-learn again, provided that I do not have the time to do full 
> training. 

That's what we do. I only learn messages which were categorized wrong. Not by 
Bayes, but by SA. Most messages which get a score lower than 5 still get a 
BAYES_99 which means that Bayes identifies them all. Nevertheless, I learn 
these messages because they are spam and it reassures Bayes that they are spam.
BTW: I have set BAYES_99 to 3.0, because it's so accurate for us.

>  
> Thanks for giving me so much of your time, and being so patient with my 
> silly questions.

No problem :-) I tend to be a bit snappy on first messages which look to me 
like the author could have done a bit more research, but once we are over that 
stage I hope I can give some good advice based on my experience.


Kai

-- 
Kai Sch�tzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org

Re: Bayes DB does not grow anymore

Reply via email to