On Sunday 21 March 2004 15:49, Alan Baxter wrote:

> It looks like you're using a database that you got from somewhere else
> instead of one that's based on the spam and ham that you've seen since
> you started using SA.  That's not bad per se.  Is bayes working
> effectively for you?  It won't expire any tokens until you've accessed
> at least 112500, and I think it might be several weeks before you reach
> that number as a single user.

Well, I use Gentoo and all I did was emerge the package. Unless there's a 
sample database with gentoo's package (which btw I don't think there is), I 
am sure this database was created by me.

In the first weeks of use, I used sa-learn --spam everyday since about 40% of 
my spam wasn't being flagged. Nowadays, the accuracy I have is extremely 
good, I rarely get false positives or spam not being flagged. I must say I am 
happy with SA so far, however I started having this 5 minutes hang problem 
(since I use it as a pipe thru filter action in KMail, it would get my mail 
client to hang for 5 minutes everyday) and this wasn't happening in the first 
weeks of use.

> Your bayes might be more effective if you eliminate all of those unused
> tokens.  You ought to be able to force it to remove all of the tokens
> you haven't used by putting "bayes_expiry_max_db_size 28000" in your
> user_prefs.  This should cause it to purge all of the tokens that you
> haven't used the next time an expiry is attempted.  Once you've done
> that you can remove the bayes_expiry_max_db_size line and bayes will
> grow using only the tokens that you learn from your email.  I have a
> single user installation too, so I don't need auto expiration.  I just
> do a manual expire once every month or so.

Well first I tried feeding the database as Theo Van Dinter suggested. Then I 
rerun the force expire and indeed it worked:

debug: bayes: token count: 1100702, final goal reduction size: 988202
debug: bayes: First pass?  Current: 1079895562, Last: 1079891344, atime: 
1382400, count: 1789, newdelta: 2502, ratio: 552.376746785914
debug: bayes: Can't use estimation method for expiry, something fishy, 
calculating optimal atime delta (first pass)
debug: bayes: atime     token reduction
debug: bayes: ========  ===============
debug: bayes: 43200     1083283
debug: bayes: 86400     1083143
debug: bayes: 172800    1082351
debug: bayes: 345600    1078392
debug: bayes: 691200    1067224
debug: bayes: 1382400   901614
debug: bayes: 2764800   879583
debug: bayes: 5529600   859620
debug: bayes: 11059200  0
debug: bayes: 22118400  0
debug: bayes: First pass decided on 1382400 for atime delta
debug: bayes: 6312 untie-ing
debug: bayes: 6312 untie-ing db_toks
debug: bayes: 6312 untie-ing db_seen
debug: bayes: files locked, now unlocking lock
debug: unlock: 6312 unlink /home/gmichels/.spamassassin/bayes.lock
expired old Bayes database entries in 196 seconds
199088 entries kept, 901614 deleted
token frequency: 1-occurence tokens: 93.19%
token frequency: less than 8 occurrences: 5.69%
debug: Syncing complete.
debug: bayes: 6312 untie-ing

Then I tried your "bayes_expiry_max_db_size 28000" suggestion, but there seems 
to have a lower limit for the db size:

debug: bayes: expiry check keep size, 75% of max: 21000
debug: bayes: expiry keep size too small, resetting to 100,000 tokens

So I guess I will leave it at 100,000, not use auto-expire and do it like you 
do, once a month. My main problem was KMail being hung for 5 minutes, and 
that's not going to happen anymore.

cheers
Gustavo

Reply via email to