Linda Walsh wrote: > Matt Kettler wrote: >>> I see 3 DB's in my user directory (.spamassassin). >>> auto-whitelist (~80MB), bayes_seen (~40MB), bayes_toks >>> (~20MB) >>> Was trying to find relation of 'bayes_expiry_max_db_size' to the >>> physical >>> size of the above files. > --- > >> expiry will only affect bayes_toks. Currently neither auto-whitelist nor >> bayes_seen have any expiry mechanism at all. > --- > So they just grow without limit? Yep. Not ideal, and there's bugs open on both. > How often are they loaded? IIRC, at the creation of a Mail::SpamAssassin instance, but I'm not well versed in that aspect of the code. > Does only "spamd" access the auto-whitelist? Well, any Mail::SpamAssassin instance. (spamd, the "spamassassin" script, etc). spamc, on the other hand, is not a Mail::SpamAssassin instance, and doesn't access *any* of the SA config files or databases.
> > Optimally, I would assume spamd opens it upon start, but it needs to > update > the disk file periodically (sync the db) for reliability. How often does > it 'sync'? In the case of the whitelist, it's per-message. In the case of the bayes_seen, every time a message is learned. > >> bayes_seen can safely be deleted if you need to. It keeps track of what >> messages have already been learned to prevent relearning them. However, >> unless you're likely to re-feed messages to SA, bayes_seen isn't stictly >> neccesary. > --- > Only refeeding would usually be 'ham', because I might rerun over > an "Inbox", that might have old messages in it. I don't rerun "ham" > training > often -- except to "despam" a message (one that was marked spam and > shouldn't > have been). > > > >>> I'm finding some answers, I've run into some seeming >>> "contradictions". ... >>> --- >>> First prob(contradiction). dbg above says "token count: 0". (This is >>> with >>> a combined bayes db size of 60MB (_seen, _toks). >> Are you sure your sa-learn was using the same DB path? > --- > Sure?? It listed the same filename (default location > /home/<user>/.spamassasssin/<bayes...>). Other than that, I haven't > tried to trace perl running spamassassin, to see if it is really > accessing > the same file. Only going off the 'debug' messages (which correspond > to the > settings in "user_prefs" that's in the default location dir. > > >> From the sounds of it, sa-learn is using a directory with an empty DB. > ---- > Yeah...Doesn't make sense to me -- how would "sa-learn --dump magic" > use a different location? I.e. it showed ~500K tokens... > > >>> I.e. isn't 'ntokens' = 491743 mean slightly under 500K tokens >> Yep, looks like you have 491,743 tokens to me. > >>> It's like the sa-learn magic shows a 'db' corresponding to my old limit >>> (that I think is still being 'auto-expired', so might not have pruned >>> figure as it runs about once per 24 hours, if I understand normal spamd >>> workings). >> Approximately. Also, be aware that in order for spamd to use new >> settings it needs to be restarted. > ---- > Having changed the user_prefs files back to the default > setting (i.e. deleted my previous addition) -- 2 days ago, and system was > rebooted 1day14hours ago, I'm certain spamd has been restarted. Hmm, can you set bayes_expiry_max_db_size in a user_prefs file? That seems like an option that might be privileged and only honored at the site-wide level. An absurdly large value can bog the whole server down when processing mail, so an end user could DoS your machine if allowed to set this. > > YET: all db sizes are the same as before (no reduction in size > corresponding to going 'back' to a default 150K limit), though sa-learn > run with dbg and force-expire indicated 0 tokens -- but sa-learn > w/dump magic > indicates 500K tokens. How can "expire" say 0 toks but dump-magic say > 500K? That's a big mystery to me. Doesn't make sense. > > File timemstamps show all 3-db files have been updated today. > (Presumably by spamd processing email as it comes in). But file sizes > still are @ sizes indicated at top of this message: 80/40/20-MB. > > >>> So is the --magic output, maybe what is seen and being >>> 'size-controlled' by auto-expire? >> Yes, at least, it should be. > > >>> Why isn't 'sa-learn --force expire' seeing the TOKENs indicated in >>> sa-learn --dump magic? >> That is particularly strange to me, and it sounds like there's some >> problems there. > --- > *sigh* > >> >> Can you give a bit of detail, ie: what paths are you looking at for the >> files, what version of SA, > --- > SA = old version of 3.1.7. > Which at very least points to an upgrade possibly solving the > problem, > BUT, this was working at one point, and don't know why it 'stopped'. I'm > generally uncomfortable with fixing things that were working just > because they > have randomly stopped working without knowing *why*, (though that > discomfort has > become something I've just more had to deal with as the Microsoft SW > maintenance method becomes the norm (update and see if bug is > gone...yes? ok, > bug gone; (unclear if fixed or hidden, unclear about effects of other > changes in > a new version...) Understood. That said, 3.1.7 is vulnerable to CVE-2007-0451 and CVE-2007-2873. You should seriously consider upgrading for the first one. http://wiki.apache.org/spamassassin/Security <http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2007-2873> > > >>> Am I misinterpreting the debug output? >> No, you don't seem to be. > --- > Thanks for the confirmation of my 'reality'. Really, the most > logical > and time-efficient way to proceed is likely to upgrade to newer > version at some > point soon (and ignore my discontent regarding 'not knowing' why or > what caused > the break). > > *sigh* > Linda > >>> >>> >> >