On Sun, 23 Sep 2012, James wrote:
I wrote this little script to update the bayes rules. I can do this on
my imap account but my pop3 account gets way more spam and the messages
are no longer on the machine with sa once I pop them off.
Any comments on my script?
Bear in mind that spams which don't score high enough to be quarantined or
discarded will end up in your inbox, as will false negatives. Training all
of the mail in all of your inboxes as ham will train these messages as ham
and make any small error in classification much worse over time.
During the initial training period you want to manually review messages
and build a ham corpus and a spam corpus. Once bayes is running you
generally only want to train on misclassified messages. This
decisionmaking process cannot be automated, or the errors wouldn't occur
in the first place.
You should set up per-user train-as-ham and train-as-spam mailboxes, and
only train from those, only for the users whose judgement you trust. Then,
those users should copy misclassified messages to the appropriate folder
and may also add samples of ham to the train-as-ham folder whenever
desired.
#!/bin/bash
IFS=$'\n'
FOLDERLIST=`find Maildir -name .INBOX\* -type d;`
for i in $FOLDERLIST; do
echo "Processing ""$i"
# `sudo sa-learn"--ham "$i"`
done
#`sudo sa-learn --spam Maildir/.Junk
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Users mistake widespread adoption of Microsoft Office for the
development of a document format standard.
-----------------------------------------------------------------------
115 days since the first successful private support mission to ISS (SpaceX)