Okay, I went ahead and deleted all of my bayes_* files and started back
from scratch.  In less than two days I have the same problem.  Here is
the sa-learn --dump magic output:

0.000          0          2          0  non-token data: bayes db version
0.000          0       5661          0  non-token data: nspam
0.000          0       2781          0  non-token data: nham
0.000          0    2345030          0  non-token data: ntokens
0.000          0  927744915          0  non-token data: oldest atime
0.000          0 1117360742          0  non-token data: newest atime
0.000          0 1085667739          0  non-token data: last journal
sync atime
0.000          0 1085652585          0  non-token data: last expiry
atime
0.000          0     172800          0  non-token data: last expire
atime delta
0.000          0      18800          0  non-token data: last expire
reduction count

As you can see, I have a large number of tokens and a wide range in
atimes.  If I run --force-expire I get output very similar to what you
see in my previous post.

I guess I don't understand what atime is.  Is it a numerical form of
when the token was placed in the DB?  If so, then why in the world does
it slowly seem to be getting older tokens?  Is this the problem?  Is
auto-learning using the wrong date/time when adding tokens?

Anybody else experience this problem before?

Thanks,
Kris

-----Original Message-----
From: Kristopher Austin 
Sent: Friday, May 21, 2004 4:01 PM
To: [EMAIL PROTECTED]
Subject: RE: Bayes DB possible problem

This is the output from sa-learn -D --force-expire.  It seems that
Theo's guess is correct according to the error toward the end.  I guess
the next question is what harm is there in leaving this until 3.0?  I do
not have a set of spam to feed the Bayes system anymore.  I'm not quite
sure how inaccurate SA will be if I start fresh.  Any suggestions?

Thanks for the help and I am running 2.63.

debug: Score set 0 chosen.
debug: running in taint mode? yes
debug: Running in taint mode, removing unsafe env vars, and resetting
PATH
debug: PATH included '/usr/local/sbin', keeping.
debug: PATH included '/usr/local/bin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/usr/bin', keeping.
debug: PATH included '/sbin', keeping.
debug: PATH included '/bin', keeping.
debug: PATH included '/usr/bin/X11', which doesn't exist, dropping.
debug: Final PATH set to:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
debug: using "/usr/share/spamassassin" for default rules dir
debug: using "/etc/spamassassin" for site rules dir
debug: using "/root/.spamassassin/user_prefs" for user prefs file
debug: bayes: 27381 tie-ing to DB file R/O /etc/spamassassin/bayes_toks
debug: bayes: 27381 tie-ing to DB file R/O /etc/spamassassin/bayes_seen
debug: bayes: found bayes db version 2
debug: Score set 2 chosen.
debug: Initialising learner
debug: Initialising learner
debug: Syncing Bayes journal and expiring old tokens...
debug: lock: 27381 created
/etc/spamassassin/bayes.lock.gateway2.oc.edu.27381
debug: lock: 27381 trying to get lock on /etc/spamassassin/bayes with 0
retries
debug: lock: 27381 link to /etc/spamassassin/bayes.lock: link ok
debug: bayes: 27381 tie-ing to DB file R/W /etc/spamassassin/bayes_toks
debug: bayes: 27381 tie-ing to DB file R/W /etc/spamassassin/bayes_seen
debug: bayes: found bayes db version 2
..
debug: bayes: expiry check keep size, 75% of max: 225000
debug: bayes: token count: 2331423, final goal reduction size: 2106423
debug: bayes: First pass?  Current: 1085172306, Last: 1085163864, atime:
172800, count: 40459, newdelta: 3319, ratio: 52.0631503497368
debug: bayes: Can't use estimation method for expiry, something fishy,
calculating optimal atime delta (first pass)
debug: bayes: atime     token reduction
debug: bayes: ========  ===============
debug: bayes: 43200     2330836
debug: bayes: 86400     2330836
debug: bayes: 172800    2330836
debug: bayes: 345600    2330836
debug: bayes: 691200    2330836
debug: bayes: 1382400   2330836
debug: bayes: 2764800   2330836
debug: bayes: 5529600   2330836
debug: bayes: 11059200  2330836
debug: bayes: 22118400  2330836
debug: bayes: couldn't find a good delta atime, need more token
difference, skipping expire.
debug: Syncing complete.
debug: bayes: 27381 untie-ing
debug: bayes: 27381 untie-ing db_toks
debug: bayes: 27381 untie-ing db_seen
debug: bayes: files locked, now unlocking lock
debug: unlock: 27381 unlink /etc/spamassassin/bayes.lock

-----Original Message-----
From: Matt Kettler [mailto:[EMAIL PROTECTED] 
Sent: Friday, May 21, 2004 2:43 PM
To: Kristopher Austin; [EMAIL PROTECTED]
Subject: Re: Bayes DB possible problem

At 02:35 PM 5/21/2004, Kristopher Austin wrote:
>I was looking at my Bayes DB files and noticed that they seem very
>large.  Is this a problem?

In your case, yes.


>54K May 21 13:28 bayes_journal
>82M May 21 13:28 bayes_seen
>80M May 21 13:28 bayes_toks
>
>I went ahead and did a sa-learn --dump magic and this is the output:
>
>0.000          0          2          0  non-token data: bayes db
version
>0.000          0      70627          0  non-token data: nspam
>0.000          0      29182          0  non-token data: nham
>0.000          0    2041152          0  non-token data: ntokens
>0.000          0  956386256          0  non-token data: oldest atime
>0.000          0 2093049063          0  non-token data: newest atime
>0.000          0 1085163866          0  non-token data: last journal

<snip>

>Does it seem unusual to have 2 million tokens in the database?


Yes, it also seems strange for the "newest atime" to be so high relative
to 
oldest and last journal times.

What version of SA are you on? I've had problems with strange atimes on
SA 
2.5x, but I've been free of them ever since I upgraded to 2.63.

Try doing an expire with debug output on:

sa-learn -D --force-expire

Maybe the debug output can offer some clues.





Reply via email to