On 2011-08-01 16:50, monolit939 wrote:


Axb wrote:

On 2011-08-01 9:52, monolit939 wrote:


Axb wrote:

wrong!

http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.txt

see "bayes_path"

in your case:
bayes_path /var/mail/.spamassassin/bayes


Hello,

firstly, I have to thank for your advices. I added bayes_path
/var/mail/.spamassassin/bayes to local.cf. I used steps you recommneded
in
previous post , BUT I performed them as user root. I think, that
conversion
from Berkeley DB to SDBM was successful. Unfortunatelly Spamassassin
gives
the same results with Berkeley DB and SDBM.

I am not sure if Spamassassin really uses the SDBM database during
scannin
mails. I performed the following as root:

1) stop spamd
2) sa-learn --backup>   /tmp/bayes_export
3) add the following lines to local.cf
bayes_store_module           Mail::SpamAssassin::BayesStore::SDBM
bayes_path /var/mail/.spamassassin/bayes
4) sa-learn --restore /tmp/bayes_export

test change:
5) spamassassin -D --lint 2>&1 | grep -i bayes # I didnt notice any error
Jul 31 19:53:39.813 [2485] dbg: config: read file
/usr/share/spamassassin/23_bayes.cf
Jul 31 19:53:39.887 [2485] dbg: plugin: loading
Mail::SpamAssassin::Plugin::Bayes from @INC
Jul 31 19:53:40.688 [2485] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0xae6a2a0) implements
'learner_new',
priority 0
Jul 31 19:53:40.688 [2485] dbg: bayes: learner_new
self=Mail::SpamAssassin::Plugin::Bayes=HASH(0xae6a2a0),
bayes_store_module=Mail::SpamAssassin::BayesStore::SDBM
Jul 31 19:53:40.702 [2485] dbg: bayes: learner_new: got
store=Mail::SpamAssassin::BayesStore::SDBM=HASH(0xb167590)
Jul 31 19:53:40.702 [2485] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0xae6a2a0) implements
'learner_is_scan_available', priority 0
Jul 31 19:53:40.703 [2485] dbg: bayes: tie-ing to DB file R/O
/var/mail/.spamassassin/bayes_toks
Jul 31 19:53:40.703 [2485] dbg: bayes: tie-ing to DB file R/O
/var/mail/.spamassassin/bayes_seen
Jul 31 19:53:40.703 [2485] dbg: bayes: found bayes db version 3
Jul 31 19:53:40.703 [2485] dbg: bayes: DB journal sync: last sync: 0
Jul 31 19:53:40.729 [2485] dbg: bayes: DB journal sync: last sync: 0
Jul 31 19:53:40.730 [2485] dbg: bayes: corpus size: nspam = 311537, nham
=
240966
Jul 31 19:53:40.734 [2485] dbg: bayes: score = 0.468256978075479
Jul 31 19:53:40.735 [2485] dbg: bayes: DB expiry: tokens in DB: 118976,
Expiry max size: 150000, Oldest atime: 1255330288, Newest atime:
1266342672,
Last expire: 0, Current time: 1312134820
Jul 31 19:53:40.735 [2485] dbg: bayes: DB journal sync: last sync: 0
Jul 31 19:53:40.745 [2485] dbg: bayes: untie-ing
Jul 31 19:53:41.074 [2485] dbg: rules: ran eval rule BAYES_50 ======>
got
hit (1)
Jul 31 19:53:41.135 [2485] dbg: check:
tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS
Jul 31 19:53:41.136 [2485] dbg: timing: total 1327 ms - init: 896
(67.5%),
parse: 0.71 (0.1%), extract_message_metadata: 1.30 (0.1%),
get_uri_detail_list: 1.11 (0.1%), tests_pri_-1000: 8 (0.6%), compile_gen:
151 (11.4%), compile_eval: 17 (1.3%), tests_pri_-950: 5 (0.3%),
tests_pri_-900: 5 (0.4%), tests_pri_-400: 21 (1.6%), check_bayes: 16
(1.2%),
tests_pri_0: 337 (25.4%), tests_pri_500: 51 (3.8%)
if you see no errors
6) restart spamd
7) ls -lh /var/mail/.spamassassin/*
-rw-r--r-- 1 mail root  12K 2010-02-16 19:39
/var/mail/.spamassassin/auto-whitelist
-rw-r--r-- 1 mail root    6 2010-02-16 19:39
/var/mail/.spamassassin/auto-whitelist.mutex
-rw-r--r-- 1 mail root 2.7K 2011-07-31 19:53
/var/mail/.spamassassin/bayes_journal
-rw-rw-r-- 1 mail root 3.8K 2011-07-31 19:50
/var/mail/.spamassassin/bayes.mutex
-rw-r--r-- 1 mail root  78M 2010-02-09 12:40
/var/mail/.spamassassin/bayes_seen
-rw----r-- 1 root root  16K 2011-07-31 19:51
/var/mail/.spamassassin/bayes_seen.dir
-rw----r-- 1 root root 128M 2011-07-31 19:51
/var/mail/.spamassassin/bayes_seen.pag
-rw-r--r-- 1 mail root 5.1M 2010-02-16 18:51
/var/mail/.spamassassin/bayes_toks
-rw----r-- 1 root root 4.0K 2011-07-31 19:51
/var/mail/.spamassassin/bayes_toks.dir
-rw----r-- 1 root root 4.0M 2011-07-31 19:51
/var/mail/.spamassassin/bayes_toks.pag
-rw-r--r-- 1 mail root 1.2K 2010-02-09 10:20
/var/mail/.spamassassin/user_prefs

file /var/mail/.spamassassin/*
/var/mail/.spamassassin/auto-whitelist:       Berkeley DB (Hash, version
8,
native byte-order)
/var/mail/.spamassassin/auto-whitelist.mutex: ASCII text
/var/mail/.spamassassin/bayes_journal:        ASCII text
/var/mail/.spamassassin/bayes.mutex:          ASCII text
/var/mail/.spamassassin/bayes_seen:           Berkeley DB (Hash, version
8,
native byte-order)
/var/mail/.spamassassin/bayes_seen.dir:       DOS executable (device
driver)
for DOS
/var/mail/.spamassassin/bayes_seen.pag:       data
/var/mail/.spamassassin/bayes_toks:           Berkeley DB (Hash, version
9,
native byte-order)
/var/mail/.spamassassin/bayes_toks.dir:       DOS executable (device
driver)
for DOS
/var/mail/.spamassassin/bayes_toks.pag:       data
/var/mail/.spamassassin/mnt:                  setgid directory
/var/mail/.spamassassin/ol:                   setgid directory
/var/mail/.spamassassin/user_prefs:           ASCII English text


Finally I started this script:
#! /bin/bash

for i in $(ls /path/to/emails); do
         spamc -c -s 10000000<   $i
done

Results:
Scanning with Berkeley DB:
real    87m2.779s
user    0m16.881s
sys     0m33.826s

Scanning with SDBM:
real    86m32.543s
user    0m17.105s
sys     0m33.802s

As you can see the results are almost the same. I suspect spamassassin
that
during the second test (with SDBM) used still Berkeley database.

Is any possibility how to find out, which kind of database Spamassassin
uses?

you're seeing it:
bayes_store_module=Mail::SpamAssassin::BayesStore::SDBM

move away the old files (you don't need these anymore)
bayes_tokens
bayes_seen
bayes_journal

SDBM files are *.dir *.pkg



Hello,

I am afraid that doesnt work too. What have I done?

1) remove old files as you recomended (have a look):
/var/mail/.spamassassin# ls -la
-rw----r-- 1 root root     16384 2011-07-31 19:51 bayes_seen.dir
-rw----r-- 1 root root 134169600 2011-07-31 19:51 bayes_seen.pag
-rw----r-- 1 root root      4096 2011-07-31 19:51 bayes_toks.dir
-rw----r-- 1 root root   4194304 2011-07-31 19:51 bayes_toks.pag

2) stop spamassassin
3) start spamassassin
4) start the script
#! /bin/bash
for i in $(ls /path/to/emails); do
          spamc -c -s 10000000<   $i
done

The results:
real    84m55.472s
user    0m17.145s
sys     0m34.466s

Unfortunatelly the results are the same like previous. It probably means,
that Spamassassin still use the same type of database (Berkeley DB).

Any idea what could be wrong?

nothing seems wrong.

I have no idea what you're trying to prove or measure.
Bayes on steroids?

if whatever user runs your spamd can read/write to bayes then you're set.

sa-learn --dump magic
will show you in what state your bayes DB is in.

if you need more help, start by checking
http://spamassassin.apache.org/full/3.3.x/doc/

maybe someobody else can chip in and figure out what you need.




Reply via email to