On Mon, 2004-03-01 at 06:59, Steven Dickenson wrote:
> John Hardin wrote:
> > All:
> >
> > Here's what we're doing to allow Microsoft Outlook users to
> > train a global SA Bayes database:
>
> Seems like a bit much for a normal user to follow easily.
The users in my beta include fairly technically-challenged users, and
the only problem we've had so far was one instance of copying the HAM
folder to the SPAM directory on the spamassassin machine. (We get so few
false positives, y'know... :)
I cleaned that up and changed the script (attached) to be more picky
(and also to do --mbox imports) and emphasized where to put things in
the README.
After they do it a couple of times, it shouldn't be too difficult to
grasp.
> Your solution seems to be more oriented
> towards people using PST files.
It is.
--
John Hardin KA7OHZ
Internal Systems Administrator/Guru voice: (425) 672-1304
Apropos Retail Management Systems, Inc. fax: (425) 672-0192
-----------------------------------------------------------------------
Failure to plan ahead on someone else's part does not constitute an
emergency on my part.
- David W. Barts in a.s.r
-----------------------------------------------------------------------
Today: ICQ Corp goes away - have you installed Jabber yet?
#!/bin/bash
#
# Train spamassassin global bayes filter
#
# extract messages from .PST files
for DIR in /home/spamd/spam /home/spamd/ham
do
if [ -d "$DIR" ]
then
cd $DIR || continue
else
continue
fi
[ -d export ] || mkdir export
unset MSGTYPE LEARN
case $DIR in
*ham)
MSGTYPE='[Hh][Aa][Mm]'
LEARN='--ham'
;;
*spam)
MSGTYPE='[Ss][Pp][Aa][Mm]'
LEARN='--spam'
;;
*)
echo "$0: $DIR not supported"
continue
;;
esac
for PST in *.[Pp][Ss][Tt]
do
unset LEARNED
if [ -s "$PST" ]
then
echo "Processing $PST"
rm -rf export/*
/usr/local/bin/readpst -o export $PST
mv -f $PST ${DIR}.old
cd export
for MBOX in *-$MSGTYPE
do
if [ -s "$MBOX" ]
then
echo "Learning $LEARN from $PST/$MBOX"
/usr/bin/sa-learn $LEARN -C /etc/mail/spamassassin --mbox "$MBOX"
LEARNED=1
fi
done
cd $DIR
rm -rf export/*
if [ -z "$LEARNED" ]
then
echo "$0: NOTICE! Didn't find any mail folders in $PST to learn from..."
fi
fi
done
done
# only process properly-formatted saved messages
cd /home/spamd/spam
file * | grep -vi "mail text" | grep -vi "directory" | sed -e 's/:.*//' -e 's/\*//' | xargs -r -i mv {} /home/spamd/invalid-format-spam/
cd /home/spamd/ham
file * | grep -vi "mail text" | grep -vi "directory" | sed -e 's/:.*//' -e 's/\*//' | xargs -r -i mv {} /home/spamd/invalid-format-ham/
# educate SpamAssassin
cd /home/spamd
echo "Learning spams"
/usr/bin/sa-learn --spam -C /etc/mail/spamassassin -L /home/spamd/spam
echo "Learning hams"
/usr/bin/sa-learn --ham -C /etc/mail/spamassassin -L /home/spamd/ham
echo "Bayes Statistics:"
# Report status
/usr/bin/sa-learn --dump magic
# archive old messages
# this may need to be revisited
find /home/spamd/spam -type f -mtime +10 | xargs -r -i mv -f {} /home/spamd/spam.old/
find /home/spamd/ham -type f -mtime +10 | xargs -r -i mv -f {} /home/spamd/ham.old/
gzip -9f /home/spamd/spam.old/* /home/spamd/ham.old/* >/dev/null 2>&1