I was looking at my Bayes DB files and noticed that they seem very large. Is this a problem?
In your case, yes.
54K May 21 13:28 bayes_journal 82M May 21 13:28 bayes_seen 80M May 21 13:28 bayes_toks
I went ahead and did a sa-learn --dump magic and this is the output:
0.000 0 2 0 non-token data: bayes db version 0.000 0 70627 0 non-token data: nspam 0.000 0 29182 0 non-token data: nham 0.000 0 2041152 0 non-token data: ntokens 0.000 0 956386256 0 non-token data: oldest atime 0.000 0 2093049063 0 non-token data: newest atime 0.000 0 1085163866 0 non-token data: last journal
<snip>
Does it seem unusual to have 2 million tokens in the database?
Yes, it also seems strange for the "newest atime" to be so high relative to oldest and last journal times.
What version of SA are you on? I've had problems with strange atimes on SA 2.5x, but I've been free of them ever since I upgraded to 2.63.
Try doing an expire with debug output on:
sa-learn -D --force-expire
Maybe the debug output can offer some clues.
