Jeff Mincy <j...@delphioutpost.com> writes:

>    I upgraded from 3.0.4 to 3.2.5. I have the feeling that sa-learn takes
>    more time with 3.2.5 as it took with 3.0.4. Can this be true?
>    
>    It is not a problem, because it is done by cron-tab, but I am just
>    curious.
>
> You can use spamc -L spam/ham to learn messages.  Spamc -L is faster
> than sa-learn.  The spamd daemon needs to be started with
> --allow-tell.

That is not really an answer on my question. ;-)

But it does not seem to be interesting in my situation.
First my code has to grow from:
    sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
to:
    for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
        spamc -L ${typeStr} <${i}
    done

Which is not even enough, because I need to take care of the situation
that the directory is empty and I need to implement code to show the
messages delivered by sa-learn.

Which a low level of spam it work, but if it becomes bigger, it does not
work:
    date
    echo ${echoStr}
    sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
    date
    for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
        spamc -L ${typeStr} <${i}
    done
    echo learned in the new way
    date
gives:
    za jan  9 16:09:25 CET 2010
    Increase
    Learned tokens from 0 message(s) (45 message(s) examined)
    za jan  9 16:09:40 CET 2010
    learned in the new way
    za jan  9 16:10:00 CET 2010

So sa-learn takes 15 seconds and spamc -L 20 seconds. (And I need more
code. Beside taking care of an empty directory, I also need to implement
the feedback given by sa-learn.)


> You can try using bayes_learn_to_journal - and do a separate sa-learn
> --sync job in cron.   Learning to the journal is faster.

I'll look into that.


> Also, What is the size of your database?   Maybe you are spending lots
> of time doing expires or something.

sa-learn --dump magic gives:
    0.000          0          3          0  non-token data: bayes db version
    0.000          0      57538          0  non-token data: nspam
    0.000          0      74876          0  non-token data: nham
    0.000          0     166338          0  non-token data: ntokens
    0.000          0 1257478501          0  non-token data: oldest atime
    0.000          0 1263049426          0  non-token data: newest atime
    0.000          0 1263049538          0  non-token data: last journal sync 
atime
    0.000          0 1263044805          0  non-token data: last expiry atime
    0.000          0    5529600          0  non-token data: last expire atime 
delta
    0.000          0       1868          0  non-token data: last expire 
reduction count

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Reply via email to