Thanks Bob. Very interesting.

After you do the sa-learn for both the spam and ham, do you then delete those files in order to keep them small ?

Thanks,
Mike


Robert Menschel wrote:
Hello M.Lewis,

Saturday, October 15, 2005, 10:40:29 PM, you wrote:


ML> Is there a best practices recommendation for how often to run sa-learn ?

IMO, best (least system impact, least problematic) is to run sa-learn
for each individual message as it's identified.
However, that's probably not feasible/practical for most systems. When
emails are batched (such as moved/copied to "spam" and "not-spam"
folders, or uploaded via ftp, or whatever), then obviously you need to
also batch the sa-learn runs.

My experience has had provided very reasonable experience with
sa-learn cycles every 10 minutes.

It costs almost nothing when ham/spam files are empty to run

while : ; do                             # run without end
while [[ ! -s pause-learn ]]; do         # allow for pause
for file in /...mail-spool.../*.*am ; do # loop thru learn spools
  if [[ spam file ]]
  then sa-learn --spam $file
  else sa-learn --ham  $file
  fi
done
sleep 600    # sleep 10 min between each cycle
done
if [[ -s expire-learn ]]                 # daily expire?
then sa-learn --expire ; rm expire-learn
fi
sleep 600    # sleep 10 min when paused
done


The problems I've run into are
1) when the files being fed into sa-learn are excessively large, so
frequent runs minimize that exposure,
2) when sa-learn tries to do its expire while other things are
running, so add a once-a-day expire, and pause learning runs during
that.

Bob Menschel





Reply via email to