On Sunday 21 March 2004 15:49, Alan Baxter wrote: > It looks like you're using a database that you got from somewhere else > instead of one that's based on the spam and ham that you've seen since > you started using SA. That's not bad per se. Is bayes working > effectively for you? It won't expire any tokens until you've accessed > at least 112500, and I think it might be several weeks before you reach > that number as a single user.
Well, I use Gentoo and all I did was emerge the package. Unless there's a sample database with gentoo's package (which btw I don't think there is), I am sure this database was created by me. In the first weeks of use, I used sa-learn --spam everyday since about 40% of my spam wasn't being flagged. Nowadays, the accuracy I have is extremely good, I rarely get false positives or spam not being flagged. I must say I am happy with SA so far, however I started having this 5 minutes hang problem (since I use it as a pipe thru filter action in KMail, it would get my mail client to hang for 5 minutes everyday) and this wasn't happening in the first weeks of use. > Your bayes might be more effective if you eliminate all of those unused > tokens. You ought to be able to force it to remove all of the tokens > you haven't used by putting "bayes_expiry_max_db_size 28000" in your > user_prefs. This should cause it to purge all of the tokens that you > haven't used the next time an expiry is attempted. Once you've done > that you can remove the bayes_expiry_max_db_size line and bayes will > grow using only the tokens that you learn from your email. I have a > single user installation too, so I don't need auto expiration. I just > do a manual expire once every month or so. Well first I tried feeding the database as Theo Van Dinter suggested. Then I rerun the force expire and indeed it worked: debug: bayes: token count: 1100702, final goal reduction size: 988202 debug: bayes: First pass? Current: 1079895562, Last: 1079891344, atime: 1382400, count: 1789, newdelta: 2502, ratio: 552.376746785914 debug: bayes: Can't use estimation method for expiry, something fishy, calculating optimal atime delta (first pass) debug: bayes: atime token reduction debug: bayes: ======== =============== debug: bayes: 43200 1083283 debug: bayes: 86400 1083143 debug: bayes: 172800 1082351 debug: bayes: 345600 1078392 debug: bayes: 691200 1067224 debug: bayes: 1382400 901614 debug: bayes: 2764800 879583 debug: bayes: 5529600 859620 debug: bayes: 11059200 0 debug: bayes: 22118400 0 debug: bayes: First pass decided on 1382400 for atime delta debug: bayes: 6312 untie-ing debug: bayes: 6312 untie-ing db_toks debug: bayes: 6312 untie-ing db_seen debug: bayes: files locked, now unlocking lock debug: unlock: 6312 unlink /home/gmichels/.spamassassin/bayes.lock expired old Bayes database entries in 196 seconds 199088 entries kept, 901614 deleted token frequency: 1-occurence tokens: 93.19% token frequency: less than 8 occurrences: 5.69% debug: Syncing complete. debug: bayes: 6312 untie-ing Then I tried your "bayes_expiry_max_db_size 28000" suggestion, but there seems to have a lower limit for the db size: debug: bayes: expiry check keep size, 75% of max: 21000 debug: bayes: expiry keep size too small, resetting to 100,000 tokens So I guess I will leave it at 100,000, not use auto-expire and do it like you do, once a month. My main problem was KMail being hung for 5 minutes, and that's not going to happen anymore. cheers Gustavo
