Your approach to teaching it should be fine. What you should do is periodically feed it some ham and spams so they don't become FP's. What I have had a problem with, solved by bayes, is that the SARE rules, in conjunction with the stock rules and some of the RBL's generate a score very close to the threashold for some hams. sometimes they sneak over the top. When bayes is in operation it solves the problem because most of the ham is dropped to 0-10% probability rules giving it a negative number (which have been lowered a little for our particular environment). I reset bayes a couple weeks back while I was offsite to fix another problem and during the time it was waiting to train many hams got marked as spams being off by ~.3 area. We fixed this temporarily by setting the threshold up while bayes was learning. (I didn't have access to my spam/ham databases from where I was so I couldn't train it until I got back). So, here is what I would do. Get my ham/spam mboxes in order, stop spamd, tarball the existing bayes, delete the file, restart spamd (so it will create the files) and then train it with your ham/spam. That way you don't have to wait until bayes has learned more than 200 ham/spams. BTW, there should be a document somewhere to the "Proper Care and Feeding of Bayes for a long and healthy life". If there isn't we should create one. Gary
________________________________ From: news on behalf of Alt Thomy Sent: Tue 7/13/2004 6:19 AM To: [EMAIL PROTECTED] Subject: Bayes issues Hi, my bayes looks like this: 0.000 0 2 0 non-token data: bayes db version 0.000 0 4588 0 non-token data: nspam 0.000 0 15006 0 non-token data: nham 0.000 0 148621 0 non-token data: ntokens 0.000 0 1088644104 0 non-token data: oldest atime 0.000 0 1089366749 0 non-token data: newest atime 0.000 0 1089366089 0 non-token data: last journal sync atime 0.000 0 1089335321 0 non-token data: last expiry atime 0.000 0 691200 0 non-token data: last expire atime delta 0.000 0 7297 0 non-token data: last expire reduction count I have been using it for a long time only with SA's autolearn, and recently I started training. Basically I train it only with false positives or false negatives (mistake-based learning). It seems to work fine, properly classifying spam and ham messages. Is my whole approach incorrect? Also, based on the above numbers of ham and spam, and considering that sa-learn's man page says that above 5,000 messages there is no significant improvement, how much more should I let it to grow? However, my experience says that, using a large number of SA rules, it would not be a problem to empty it, as the rules will most probably identify the spam. All I have to do is perform training in the same frequency I do it now (ie. it doesn't really matter if already manually 'learned' spams and hams are lost - my work remains the same!). It's a strange approach but it works for me (I have about 4,000 messages per day, of which about 40% is spam). I would appreciate any comments. Regards, Alty
