Thanks Bob. Very interesting.
After you do the sa-learn for both the spam and ham, do you then delete
those files in order to keep them small ?
Thanks,
Mike
Robert Menschel wrote:
Hello M.Lewis,
Saturday, October 15, 2005, 10:40:29 PM, you wrote:
ML> Is there a best practices recommendation for how often to run sa-learn ?
IMO, best (least system impact, least problematic) is to run sa-learn
for each individual message as it's identified.
However, that's probably not feasible/practical for most systems. When
emails are batched (such as moved/copied to "spam" and "not-spam"
folders, or uploaded via ftp, or whatever), then obviously you need to
also batch the sa-learn runs.
My experience has had provided very reasonable experience with
sa-learn cycles every 10 minutes.
It costs almost nothing when ham/spam files are empty to run
while : ; do # run without end
while [[ ! -s pause-learn ]]; do # allow for pause
for file in /...mail-spool.../*.*am ; do # loop thru learn spools
if [[ spam file ]]
then sa-learn --spam $file
else sa-learn --ham $file
fi
done
sleep 600 # sleep 10 min between each cycle
done
if [[ -s expire-learn ]] # daily expire?
then sa-learn --expire ; rm expire-learn
fi
sleep 600 # sleep 10 min when paused
done
The problems I've run into are
1) when the files being fed into sa-learn are excessively large, so
frequent runs minimize that exposure,
2) when sa-learn tries to do its expire while other things are
running, so add a once-a-day expire, and pause learning runs during
that.
Bob Menschel