Jeff Mincy <[email protected]> writes:
> But it does not seem to be interesting in my situation.
> First my code has to grow from:
> sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
> to:
> for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
> spamc -L ${typeStr} <${i}
> done
>
> Which is not even enough, because I need to take care of the situation
> that the directory is empty and I need to implement code to show the
> messages delivered by sa-learn.
>
> Oh. You're learning all of the messages in a directory. spamc -L is
> faster than sa-learn for learning single messages because sa-learn is
> a perl script that has to load Mail::SpamAssassin each time. For a
> large directory the slower startup of sa-learn is less of an issue.
> sa-learn is fine for doing directories.
I have been continuing and have something that seems to work. The only
problem is that spamc gives another exit code as the documentation
suggests. I'll finish the code and when it is tested tonight I'll post
it here.
Again: it is not really necessary, but I like efficient code.
> Which a low level of spam it work, but if it becomes bigger, it does not
> work:
> date
> echo ${echoStr}
> sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
> date
> for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
> spamc -L ${typeStr} <${i}
> done
> echo learned in the new way
> date
> gives:
> za jan 9 16:09:25 CET 2010
> Increase
> Learned tokens from 0 message(s) (45 message(s) examined)
> za jan 9 16:09:40 CET 2010
> learned in the new way
> za jan 9 16:10:00 CET 2010
>
> So sa-learn takes 15 seconds and spamc -L 20 seconds. (And I need more
> code. Beside taking care of an empty directory, I also need to implement
> the feedback given by sa-learn.)
>
> You learned tokens from 0 messages and looked at 45 messages.
> You've already previously learned from those 45 messages, which is
> just timing how fast it can do nothing.
I changed the code and now it moves already processed messages out of
the way. As mentioned above: I'll post it after it has been tested.
> > Also, What is the size of your database? Maybe you are spending lots
> > of time doing expires or something.
>
> sa-learn --dump magic gives:
> 0.000 0 3 0 non-token data: bayes db
> version
> 0.000 0 57538 0 non-token data: nspam
> 0.000 0 74876 0 non-token data: nham
> 0.000 0 166338 0 non-token data: ntokens
> 0.000 0 1257478501 0 non-token data: oldest atime
> 0.000 0 1263049426 0 non-token data: newest atime
> 0.000 0 1263049538 0 non-token data: last journal
> sync atime
> 0.000 0 1263044805 0 non-token data: last expiry
> atime
> 0.000 0 5529600 0 non-token data: last expire
> atime delta
> 0.000 0 1868 0 non-token data: last expire
> reduction count
>
> Your database has 166338 tokens which is larger than the default
> bayes_expiry_max_db_size 150000. The last expiration ran this morning
> at 8:46. You could try letting the bayes database get larger and turn
> off bayes_auto_expire. If you turn off bayes_auto_expire you'll have
> to add something to cron to periodically expire tokens.
> bayes_auto_expire is fine for lower volumes of email, but can get in
> the way with higher volumes.
With the changed code it only takes a few seconds, so properly I do not
have to worry about this.
Also, the learning is already done in a cronjob at the time I am in
principle not working on the computer.
--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof