Okay, I went ahead and deleted all of my bayes_* files and started back from scratch. In less than two days I have the same problem. Here is the sa-learn --dump magic output:
0.000 0 2 0 non-token data: bayes db version 0.000 0 5661 0 non-token data: nspam 0.000 0 2781 0 non-token data: nham 0.000 0 2345030 0 non-token data: ntokens 0.000 0 927744915 0 non-token data: oldest atime 0.000 0 1117360742 0 non-token data: newest atime 0.000 0 1085667739 0 non-token data: last journal sync atime 0.000 0 1085652585 0 non-token data: last expiry atime 0.000 0 172800 0 non-token data: last expire atime delta 0.000 0 18800 0 non-token data: last expire reduction count As you can see, I have a large number of tokens and a wide range in atimes. If I run --force-expire I get output very similar to what you see in my previous post. I guess I don't understand what atime is. Is it a numerical form of when the token was placed in the DB? If so, then why in the world does it slowly seem to be getting older tokens? Is this the problem? Is auto-learning using the wrong date/time when adding tokens? Anybody else experience this problem before? Thanks, Kris -----Original Message----- From: Kristopher Austin Sent: Friday, May 21, 2004 4:01 PM To: [EMAIL PROTECTED] Subject: RE: Bayes DB possible problem This is the output from sa-learn -D --force-expire. It seems that Theo's guess is correct according to the error toward the end. I guess the next question is what harm is there in leaving this until 3.0? I do not have a set of spam to feed the Bayes system anymore. I'm not quite sure how inaccurate SA will be if I start fresh. Any suggestions? Thanks for the help and I am running 2.63. debug: Score set 0 chosen. debug: running in taint mode? yes debug: Running in taint mode, removing unsafe env vars, and resetting PATH debug: PATH included '/usr/local/sbin', keeping. debug: PATH included '/usr/local/bin', keeping. debug: PATH included '/usr/sbin', keeping. debug: PATH included '/usr/bin', keeping. debug: PATH included '/sbin', keeping. debug: PATH included '/bin', keeping. debug: PATH included '/usr/bin/X11', which doesn't exist, dropping. debug: Final PATH set to: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin debug: using "/usr/share/spamassassin" for default rules dir debug: using "/etc/spamassassin" for site rules dir debug: using "/root/.spamassassin/user_prefs" for user prefs file debug: bayes: 27381 tie-ing to DB file R/O /etc/spamassassin/bayes_toks debug: bayes: 27381 tie-ing to DB file R/O /etc/spamassassin/bayes_seen debug: bayes: found bayes db version 2 debug: Score set 2 chosen. debug: Initialising learner debug: Initialising learner debug: Syncing Bayes journal and expiring old tokens... debug: lock: 27381 created /etc/spamassassin/bayes.lock.gateway2.oc.edu.27381 debug: lock: 27381 trying to get lock on /etc/spamassassin/bayes with 0 retries debug: lock: 27381 link to /etc/spamassassin/bayes.lock: link ok debug: bayes: 27381 tie-ing to DB file R/W /etc/spamassassin/bayes_toks debug: bayes: 27381 tie-ing to DB file R/W /etc/spamassassin/bayes_seen debug: bayes: found bayes db version 2 .. debug: bayes: expiry check keep size, 75% of max: 225000 debug: bayes: token count: 2331423, final goal reduction size: 2106423 debug: bayes: First pass? Current: 1085172306, Last: 1085163864, atime: 172800, count: 40459, newdelta: 3319, ratio: 52.0631503497368 debug: bayes: Can't use estimation method for expiry, something fishy, calculating optimal atime delta (first pass) debug: bayes: atime token reduction debug: bayes: ======== =============== debug: bayes: 43200 2330836 debug: bayes: 86400 2330836 debug: bayes: 172800 2330836 debug: bayes: 345600 2330836 debug: bayes: 691200 2330836 debug: bayes: 1382400 2330836 debug: bayes: 2764800 2330836 debug: bayes: 5529600 2330836 debug: bayes: 11059200 2330836 debug: bayes: 22118400 2330836 debug: bayes: couldn't find a good delta atime, need more token difference, skipping expire. debug: Syncing complete. debug: bayes: 27381 untie-ing debug: bayes: 27381 untie-ing db_toks debug: bayes: 27381 untie-ing db_seen debug: bayes: files locked, now unlocking lock debug: unlock: 27381 unlink /etc/spamassassin/bayes.lock -----Original Message----- From: Matt Kettler [mailto:[EMAIL PROTECTED] Sent: Friday, May 21, 2004 2:43 PM To: Kristopher Austin; [EMAIL PROTECTED] Subject: Re: Bayes DB possible problem At 02:35 PM 5/21/2004, Kristopher Austin wrote: >I was looking at my Bayes DB files and noticed that they seem very >large. Is this a problem? In your case, yes. >54K May 21 13:28 bayes_journal >82M May 21 13:28 bayes_seen >80M May 21 13:28 bayes_toks > >I went ahead and did a sa-learn --dump magic and this is the output: > >0.000 0 2 0 non-token data: bayes db version >0.000 0 70627 0 non-token data: nspam >0.000 0 29182 0 non-token data: nham >0.000 0 2041152 0 non-token data: ntokens >0.000 0 956386256 0 non-token data: oldest atime >0.000 0 2093049063 0 non-token data: newest atime >0.000 0 1085163866 0 non-token data: last journal <snip> >Does it seem unusual to have 2 million tokens in the database? Yes, it also seems strange for the "newest atime" to be so high relative to oldest and last journal times. What version of SA are you on? I've had problems with strange atimes on SA 2.5x, but I've been free of them ever since I upgraded to 2.63. Try doing an expire with debug output on: sa-learn -D --force-expire Maybe the debug output can offer some clues.
