Package: spamassassin
Version: 3.3.1-1
Severity: normal

Hello,

I experience some troubles about bayes tokens expiration.

I use a per-user MYSQL database to store bayes tokens.
I set `bayes_auto_expire 0` and `bayes_expiry_max_db_size 150000`.
I run a cron to force bayes expiry for each users like this:
----8<----
sa-learn --username=exam...@example.com --force-expire
---->8----

The problem is that bayes tokens doesn't seem to expire and the
`bayes_token` table recently exceeds 80 million records. For example one
of the user seem to have 288278 tokens:
----8<----
$ sa-learn --username=us...@example.com --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0        261          0  non-token data: nspam
0.000          0       6339          0  non-token data: nham
0.000          0     288278          0  non-token data: ntokens
0.000          0 1277480932          0  non-token data: oldest atime
0.000          0 1381762106          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0 1381754818          0  non-token data: last expiry
atime0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count
---->8----

So I try to manually run an expire for this user and I got this debug logs:
----8<----
$ sa-learn --username=us...@example.com --force-expire -D
dbg: bayes: bayes journal sync starting
dbg: bayes: bayes journal sync completed
dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x22da088)
implements 'learner_expire_old_training', priority 0
bayes: expiry starting
dbg: bayes: expiry check keep size, 0.75 * max: 112500
dbg: bayes: token count: 288103, final goal reduction size: 175603
dbg: bayes: first pass? current: 1381754809, Last: 1381716636, atime: 0,
count: 0, newdelta: 0, ratio: 0, period: 43200
dbg: bayes: can't use estimation method for expiry, unexpected result,
calculating optimal atime delta (first pass)
dbg: bayes: expiry max exponent: 9
Odbg: bayes: atime token reduction
dbg: bayes: ======== ===============
dbg: bayes: 43200 287217
dbg: bayes: 86400 287217
dbg: bayes: 172800 287189
dbg: bayes: 345600 285030
dbg: bayes: 691200 281466
dbg: bayes: 1382400 276177
dbg: bayes: 2764800 266007
dbg: bayes: 5529600 256971
dbg: bayes: 11059200 237786
dbg: bayes: 22118400 201142
dbg: bayes: couldn't find a good delta atime, need more token
difference, skipping expire
---->8----
It seems that sa-learn try to reduce the tokens numbers but finally skip
the expire process because it "couldn't find a good delta atime".

I try to reduce or increase the 'bayes_expiry_max_db_size' but it
doesn't fix the issue and the tokens numbers continue to grow out of the
150000 limit.

Is it an issue or maybe I misunderstood something?
Should I mannualy purge old tokens using something like `DELETE FROM
bayes_token WHERE bayes_token.atime <= ...`?

Thanks in advance for your help.

Best regards,

Thomas Pierson


-- System Information:
Debian Release: 6.0.7
  APT prefers oldstable-updates
  APT policy: (500, 'oldstable-updates'), (500, 'oldstable'), (100,
'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/8 CPU cores)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/dash

Versions of packages spamassassin depends on:
pn  libarchive-tar-perl    <none>            (no description available)
ii  libdigest-sha1-perl    2.13-1            NIST SHA-1 message digest
algorith
ii  libhtml-parser-perl    3.66-1            collection of modules that
parse H
ii  libnet-dns-perl        0.66-2            Perform DNS queries from a
Perl sc
ii  libnetaddr-ip-perl     4.028+dfsg-1      IP address manipulation module
ii  libsocket6-perl        0.23-1            Perl extensions for IPv6
ii  libsys-hostname-long-p 1.4-2             Figure out the long
(fully-qualifi
ii  libwww-perl            5.836-1           Perl HTTP/WWW client/server
librar
ii  perl                   5.10.1-17squeeze6 Larry Wall's Practical
Extraction
ii  perl-modules [libio-zl 5.10.1-17squeeze6 Core Perl modules

Versions of packages spamassassin recommends:
ii  gcc                    4:4.4.5-1         The GNU C compiler
ii  gnupg                  1.4.10-4+squeeze3 GNU privacy guard - a free
PGP rep
ii  libc6-dev              2.11.3-4          Embedded GNU C Library:
Developmen
ii  libio-socket-inet6-per 2.65-1.1          Object interface for
AF_INET6 doma
ii  libmail-spf-perl       2.007-1           Perl implementation of
Sender Poli
ii  make                   3.81-8            An utility for Directing
compilati
ii  perl [libsys-syslog-pe 5.10.1-17squeeze6 Larry Wall's Practical
Extraction
ii  re2c                   0.13.5-1          tool for generating fast
C-based r
ii  spamc                  3.3.1-1           Client for SpamAssassin
spam filte

Versions of packages spamassassin suggests:
ii  libdbi-perl            1.612-1           Perl Database Interface (DBI)
pn  libio-socket-ssl-perl  <none>            (no description available)
ii  libmail-dkim-perl      0.38-1            cryptographically identify
the sen
pn  libnet-ident-perl      <none>            (no description available)
ii  perl [libcompress-zlib 5.10.1-17squeeze6 Larry Wall's Practical
Extraction
pn  pyzor                  <none>            (no description available)
pn  razor                  <none>            (no description available)


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to