Re: user-db size, content confusions (how many toks?)

Linda Walsh Tue, 31 Mar 2009 19:58:07 -0700

Matt Kettler wrote:

I see 3 DB's in my user directory (.spamassassin).
   auto-whitelist (~80MB),      bayes_seen (~40MB),     bayes_toks (~20MB)
Was trying to find relation of 'bayes_expiry_max_db_size' to the physical
size of the above files.

---

expiry will only affect bayes_toks. Currently neither auto-whitelist nor
bayes_seen have any expiry mechanism at all.

---
So they just grow without limit?  How often are they loaded?
Does only "spamd" access the auto-whitelist?

Optimally, I would assume spamd opens it upon start, but it needs to update
the disk file periodically (sync the db) for reliability.  How often does
it 'sync'?

bayes_seen can safely be deleted if you need to. It keeps track of what
messages have already been learned to prevent relearning them. However,
unless you're likely to re-feed messages to SA, bayes_seen isn't stictly
neccesary.

---
        Only refeeding would usually be 'ham', because I might rerun over
an "Inbox", that might have old messages in it.  I don't rerun "ham" training
often -- except to "despam" a message (one that was marked spam and shouldn't
have been).

I'm finding some answers, I've run into some seeming "contradictions"....
---
First prob(contradiction).  dbg above says "token count: 0".  (This is
with
a combined bayes db size of 60MB (_seen, _toks).

Are you sure your sa-learn was using the same DB path?

---
        Sure??  It listed the same filename (default location
/home/<user>/.spamassasssin/<bayes...>).  Other than that, I haven't
tried to trace perl running spamassassin, to see if it is really accessing
the same file.  Only going off the 'debug' messages (which correspond to the
settings in "user_prefs" that's in the default location dir.

From the sounds of it, sa-learn is using a directory with an empty DB.

----
        Yeah...Doesn't make sense to me -- how would "sa-learn --dump magic"
use a different location?  I.e. it showed ~500K tokens...

I.e. isn't 'ntokens' = 491743 mean slightly under 500K tokens
Yep, looks like you have 491,743 tokens to me.

It's like the sa-learn magic shows a 'db' corresponding to my old limit
(that I think is still being 'auto-expired', so might not have pruned
figure as it runs about once per 24 hours, if I understand normal spamd
workings).

Approximately. Also, be aware that in order for spamd to use new
settings it needs to be restarted.

----
        Having changed the user_prefs files back to the default
setting (i.e. deleted my previous addition) -- 2 days ago, and system was
rebooted 1day14hours ago, I'm certain spamd has been restarted.
YET: all db sizes are the same as before (no reduction in size
corresponding to going 'back' to a default 150K limit), though sa-learn
run with dbg and force-expire indicated 0 tokens -- but sa-learn w/dump magic
indicates 500K tokens.  How can "expire" say 0 toks but dump-magic say 500K?

        File timemstamps show all 3-db files have been updated today.
(Presumably by spamd processing email as it comes in).  But file sizes
still are @ sizes indicated at top of this message: 80/40/20-MB.

So is the --magic output, maybe what is seen and being
'size-controlled' by auto-expire?

Yes, at least, it should be.

Why isn't 'sa-learn --force expire' seeing the TOKENs indicated in

sa-learn --dump magic?

That is particularly strange to me, and it sounds like there's some
problems there.

---
*sigh*


Can you give a bit of detail, ie: what paths are you looking at for the
files, what version of SA,

---
        SA = old version of 3.1.7.
        Which at very least points to an upgrade possibly solving the problem,
BUT, this was working at one point, and don't know why it 'stopped'.  I'm
generally uncomfortable with fixing things that were working just because they
have randomly stopped working without knowing *why*, (though that discomfort has
become something I've just more had to deal with as the Microsoft SW
maintenance method becomes the norm (update and see if bug is gone...yes?  ok,
bug gone; (unclear if fixed or hidden, unclear about effects of other changes in
a new version...)

Am I misinterpreting the debug output?

No, you don't seem to be.

---
        Thanks for the confirmation of my 'reality'.  Really, the most logical
and time-efficient way to proceed is likely to upgrade to newer version at some
point soon (and ignore my discontent regarding 'not knowing' why or what caused
the break).

*sigh*
Linda

Re: user-db size, content confusions (how many toks?)

Reply via email to