I see 3 DB's in my user directory (.spamassassin).
auto-whitelist (~80MB)
bayes_seen (~40MB)
bayes_toks (~20MB)
Was trying to find relation of 'bayes_expiry_max_db_size' to the physical
size of the above files. I'm finding some answers, I've run into some
seeming "contradictions". Had db_size set to 500,000, reduced to 250,000
and to 'default' (150,000) during testing.
In trying to lower 'db_size' and see how that affected physical sizes,
I ran sa-learn --force expires and saw these debug messages of 'Note':
[30905] dbg: bayes: expiry check keep size, 0.75 * max: 112500
[30905] dbg: bayes: token count: 0, final goal reduction size: -112500
[30905] dbg: bayes: reduction goal of -112500 is under 1,000 tokens, skipping
expire
[30905] dbg: bayes: expiry completed
---
First prob(contradiction). dbg above says "token count: 0". (This is with
a combined bayes db size of 60MB (_seen, _toks).
Seems to think I have no bayes data. Saw another dbg msg that indicated the
bayes classifier was untrained (<~150? entries) & disabled.
Dunno how it got zeroed, but tried adding 'ham' by running sa-learn over
my a despam'ed mailbox. First run showed:
Learned tokens from 55 message(s) (55 message(s) examined)
But subsequent runs of 'sa-learn with dbg+expire" still show token count: 0.
sa-learn --dump magic shows something different:
0.000 0 3 0 non-token data: bayes db version
0.000 0 556414 0 non-token data: nspam
0.000 0 574441 0 non-token data: nham
0.000 0 491743 0 non-token data: ntokens
0.000 0 1216456288 0 non-token data: oldest atime
0.000 0 1237796146 0 non-token data: newest atime
0.000 0 1220476831 0 non-token data: last journal sync atime
0.000 0 1217838535 0 non-token data: last expiry atime
0.000 0 1382400 0 non-token data: last expire atime delta
0.000 0 70612 0 non-token data: last expire reduction
count
---------
Does the above indicate 0 tokens? I.e. isn't 'ntokens' = 491743 mean
slightly under 500K tokens (my original limit before trying to run 'sa-learn
-expires + dbg' manually).
It's like the sa-learn magic shows a 'db' corresponding to my old limit
(that I think is still being 'auto-expired', so might not have pruned
figure as it runs about once per 24 hours, if I understand normal spamd
workings).
So is the --magic output, maybe what is seen and being 'size-controlled' by
auto-expire (was ~500K before recent test changes).
Why isn't 'sa-learn --force expire' seeing the TOKENs indicated in
sa-learn --dump magic? Debug messages are pointing at the same file
for both operations, so how can dump-magic indicated 500K, but the
debug of sa-learn --force-expire, is somehow seeing 0 TOKENs?
Am I misinterpreting the debug output?
Thanks,
Linda