bayes expiry not happening when it should

2015-08-05 Thread Ian Zimmerman
~$ grep '^bayes_expiry_max_db_size' ~/.spamassassin/user_prefs | awk '{print 
$2}' 
200
~$ sa-learn --force-expire
bayes: synced databases from journal in 0 seconds: 2784 unique entries (2805 
total entries)
~$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  24501  0  non-token data: nspam
0.000  0  23548  0  non-token data: nham
0.000  02009202  0  non-token data: ntokens
0.000  0 100071  0  non-token data: oldest atime
0.000  0 1438755640  0  non-token data: newest atime
0.000  0 1438755988  0  non-token data: last journal sync atime
0.000  0 1438756034  0  non-token data: last expiry atime
0.000  0   11059200  0  non-token data: last expire atime delta
0.000  0  20174  0  non-token data: last expire reduction 
count

??wth???  I thought I _finally_ understood this stuff :-(

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Re: bayes expiry not happening when it should

2015-08-05 Thread RW
On Tue, 4 Aug 2015 23:36:51 -0700
Ian Zimmerman wrote:

 ~$ grep '^bayes_expiry_max_db_size' ~/.spamassassin/user_prefs | awk
 '{print $2}' 200
 ~$ sa-learn --force-expire

 0.000  02009202  0  non-token data: ntokens

 ??wth???  I thought I _finally_ understood this stuff :-(


The number of tokens is within 0.5% of the configured value. It's
designed to produce a value between 75% and roughly 150%.


Re: bayes expiry not happening when it should

2015-08-05 Thread Ian Zimmerman
On 2015-08-05 12:58 +0100, RW wrote:

 The number of tokens is within 0.5% of the configured value. It's
 designed to produce a value between 75% and roughly 150%.

I can't quite parse that answer, so let's be more specific.

Doc says:

  bayes_expiry_max_db_size  (default: 15)

What should be the maximum size of the Bayes tokens database?  When
expiry occurs, the Bayes system will keep either 75% of the maximum
value, or 100,000 tokens, whichever has a larger value.

From this (and the more elaborate description in the EXPIRATION section,
which I've also read) I thought it worked roughly like this:

if (ntokens  bayes_expiry_max_db_size)
do_nothing()
else
goal_ntokens = max(10, 0.75 * bayes_expiry_max_db_size)
while (ntokens  goal_ntokens)
kill_oldest_tokens()

If I misunderstood, how/where?  Sorry for my density :-(

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Re: bayes expiry not happening when it should

2015-08-05 Thread RW
On Wed, 5 Aug 2015 07:47:20 -0700
Ian Zimmerman wrote:

 On 2015-08-05 12:58 +0100, RW wrote:
 
  The number of tokens is within 0.5% of the configured value. It's
  designed to produce a value between 75% and roughly 150%.
 
 I can't quite parse that answer, so let's be more specific.
 
 Doc says:
 
   bayes_expiry_max_db_size  (default: 15)
 
 What should be the maximum size of the Bayes tokens database?
 When expiry occurs, the Bayes system will keep either 75% of the
 maximum value, or 100,000 tokens, whichever has a larger value.
 
 From this (and the more elaborate description in the EXPIRATION
 section, which I've also read) I thought it worked roughly like this:
 
 if (ntokens  bayes_expiry_max_db_size)
 do_nothing()
 

That bit is only for auto-expiry


 goal_ntokens = max(10, 0.75 * bayes_expiry_max_db_size)
 while (ntokens  goal_ntokens)
 kill_oldest_tokens()


What it actually does is estimate a cut-off time and then delete all
tokens older than that. How it gets the cut-off time is described the
next two sections:  EXPIRE LOGIC and ESTIMATION PASS LOGIC.


Re: bayes expiry not happening when it should

2015-08-05 Thread Ian Zimmerman
On 2015-08-05 19:34 +0100, RW wrote:

 What it actually does is estimate a cut-off time and then delete all
 tokens older than that. How it gets the cut-off time is described the
 next two sections:  EXPIRE LOGIC and ESTIMATION PASS LOGIC.

OMG.  For one thing, are the clauses in the definition of weird
conjunctive or disjunctive?

A more insolent question, why this complexity?  Why can't I force an
expire when I feel like it? :-P  Or can I?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.