Hello Tony,

Tuesday, March 16, 2004, 2:45:59 PM, you wrote:

>> If you have access to the /etc/mail/spamassassin/local.cf 
>> file (may be in a different directory according to how SA is 
>> called), then you can add the parameter
>> > bayes_auto_learn 1

TB> Hey cool, done that now.  Just looked at the headers of a message
TB> received which says "autolearn=ham" This was a message from the SA
TB> group funnily enough - presumably that is correct?

Unless that message included spam samples, then no problem.

I suggest you set your non-spam auto-learn threshold to -0.01 to make
sure that spam that hits no rules is not accidentally learned as ham.

TB> I managed to get sa-learn to work for a non-root account by deleting the
TB> bayes* files as you suggested.
TB> Presumably the bayes database applies then to any [EMAIL PROTECTED] for the
TB> userid I run it under?

My understanding is that each domain with a $HOME will have one
$HOME/.spamassassin directory, and the bayes database built there will
apply to all [EMAIL PROTECTED] for that domain.

TB> I've had a look at your script and it's given me some ideas thanks - I have
TB> written a script which will look for all files called learn_spam or
TB> learn_ham and run sa-learn on them, then "empties" the files by removing
TB> them and touching them (is there a better way?)

cp /dev/null $file
or
cat </dev/null >$file
are two methods I've used to empty files.

TB> I know nothing about shell programming other than what I have picked up from
TB> Bob's script and google, so forgive if it's a little rough around the edges
TB> - is my first ever shell script!:

TB> ====================================
TB> #!/bin/sh
TB> if [ $1 -eq "d" ] ; then
TB>         SARGS="--showdots"
TB> fi

TB> echo "Learning SPAM"
TB> for FILE in `find $HOME -name learn_spam -print`
TB> do
TB>         echo "Processing $FILE"
TB>         sa-learn --spam --mbox $FILE $SARGS
TB>         rm $FILE
TB>         touch $FILE
TB> done

TB> echo "Learning HAM"
TB> for FILE in `find $HOME -name learn_ham -print`
TB> do
TB>         echo "Processing $FILE"
TB>         sa-learn --ham --mbox $FILE $SARGS
TB>         rm $FILE
TB>         touch $FILE
TB> done
TB> echo "Done"
TB> ====================================

TB> Any obvious flaws there guys, or something I could do better?   It *seems*
TB> to work okay anyway.
TB> Should I bung them all into one file first????

Looks good to me.  I wouldn't cat them all into one file first, since my
understanding is that the shorter/quicker sa-learn runs are better (less
chance they'll block bayes update by incoming email and auto-learn).

TB> The other thing is, how often should I run it - I've seen it mentioned
TB> before that you need about 200 spams and 200 hams for sa-learn to be
TB> effective - does this mean 200 _per run_ or that you need to have learned
TB> about that number in total for it to be effective?
TB> If the former, then presumably my script would be better off contatenating
TB> the spam and ham files before passing them to a single run of sa-learn?

I run my scripts once an hour.

You need 200+ spams and 200+ hams before Bayes takes effect and starts
applying its scores to your emails. It then remains effective unless you
drop below those numbers (such as by deleting the database files and
starting over). That has nothing to do with sa-learn. The more often
sa-learn runs, the more current your bayes database is. 

Bob Menschel



Reply via email to