GRP Productions wrote on Tue, 15 Mar 2005 01:12:53 +0200: > >I have been trying to get something from CVS for several days now, no luck. > > Send me your email in private ([EMAIL PROTECTED]) to send it to you.
Thanks for the offer. You can send it to the email address I use for this list, or you could just send me an FTP URL for retrieval. > I will probably start again from scratch. One point: Do you think I should > put custom rules inside /etc/mail/spamassassin or the default installation > is enough? Oh, yes. You need to have SURBL switched on via the init.pre (I think it's off by default) and you should use custom rules. I use a set of carefully chosen rulesets mostly from SARE and updated via rulesdujour and some more rules of my own accumulated over time. > Yes I just added this. Should auto_expire remain always at 0? I think on a heavy traffic machine it's preferrable to have it off, especially when using MailScanner. Otherwise the expiry can kick in at random times every few hours (you can set a minimum time, though, f.i. one day). Some people run a scheduled expiry three times a day. That's an advice which often comes up on the Mailscanner list (which is a very helpful list, btw). Depends on how often you need it (whether it reaches the limit you want to hold more often or not). Starting with one expiry per night should be fine, but you should occasionally expire manually and look at the output, in case there are problems. Also, do you > think it would be better if the db NEVER expired? No. One should get rid of really old tokens, they are only "ballast" in the db. I don't know how a big db behaves on a busy site. Ours contain 1 Mio. tokens and have a size of 40 MB. They work very well with no ressource hogging. But I have only a few thousand messages running thru each of our servers, there's probably none which gets more than 10.000 a day. If you get 100.000 it may be different. Would this value of 500000 > achieve that? I don't want to come at work some day and see my tokens were > lost again :-( Just look at what the dump says about your oldest token. If your bayes "performance" is good than the hold time is probably of no interest, but if the spam detection from bayes is bad and you have a short hold time one of the things I would look at is the short hold time. > > In general, should I do as you said, ie. trust the autolearn system and > never use sa-learn again, provided that I do not have the time to do full > training. That's what we do. I only learn messages which were categorized wrong. Not by Bayes, but by SA. Most messages which get a score lower than 5 still get a BAYES_99 which means that Bayes identifies them all. Nevertheless, I learn these messages because they are spam and it reassures Bayes that they are spam. BTW: I have set BAYES_99 to 3.0, because it's so accurate for us. > > Thanks for giving me so much of your time, and being so patient with my > silly questions. No problem :-) I tend to be a bit snappy on first messages which look to me like the author could have done a bit more research, but once we are over that stage I hope I can give some good advice based on my experience. Kai -- Kai Sch�tzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com IE-Center: http://ie5.de & http://msie.winware.org
