Jeff Mincy <[email protected]> writes:
> I upgraded from 3.0.4 to 3.2.5. I have the feeling that sa-learn takes
> more time with 3.2.5 as it took with 3.0.4. Can this be true?
>
> It is not a problem, because it is done by cron-tab, but I am just
> curious.
>
> You can use spamc -L spam/ham to learn messages. Spamc -L is faster
> than sa-learn. The spamd daemon needs to be started with
> --allow-tell.
That is not really an answer on my question. ;-)
But it does not seem to be interesting in my situation.
First my code has to grow from:
sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
to:
for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
spamc -L ${typeStr} <${i}
done
Which is not even enough, because I need to take care of the situation
that the directory is empty and I need to implement code to show the
messages delivered by sa-learn.
Which a low level of spam it work, but if it becomes bigger, it does not
work:
date
echo ${echoStr}
sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
date
for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
spamc -L ${typeStr} <${i}
done
echo learned in the new way
date
gives:
za jan 9 16:09:25 CET 2010
Increase
Learned tokens from 0 message(s) (45 message(s) examined)
za jan 9 16:09:40 CET 2010
learned in the new way
za jan 9 16:10:00 CET 2010
So sa-learn takes 15 seconds and spamc -L 20 seconds. (And I need more
code. Beside taking care of an empty directory, I also need to implement
the feedback given by sa-learn.)
> You can try using bayes_learn_to_journal - and do a separate sa-learn
> --sync job in cron. Learning to the journal is faster.
I'll look into that.
> Also, What is the size of your database? Maybe you are spending lots
> of time doing expires or something.
sa-learn --dump magic gives:
0.000 0 3 0 non-token data: bayes db version
0.000 0 57538 0 non-token data: nspam
0.000 0 74876 0 non-token data: nham
0.000 0 166338 0 non-token data: ntokens
0.000 0 1257478501 0 non-token data: oldest atime
0.000 0 1263049426 0 non-token data: newest atime
0.000 0 1263049538 0 non-token data: last journal sync
atime
0.000 0 1263044805 0 non-token data: last expiry atime
0.000 0 5529600 0 non-token data: last expire atime
delta
0.000 0 1868 0 non-token data: last expire
reduction count
--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof