On Sun, 23 Sep 2012, James wrote:

I wrote this little script to update the bayes rules. I can do this on my imap account but my pop3 account gets way more spam and the messages are no longer on the machine with sa once I pop them off.

Any comments on my script?

Bear in mind that spams which don't score high enough to be quarantined or discarded will end up in your inbox, as will false negatives. Training all of the mail in all of your inboxes as ham will train these messages as ham and make any small error in classification much worse over time.

During the initial training period you want to manually review messages and build a ham corpus and a spam corpus. Once bayes is running you generally only want to train on misclassified messages. This decisionmaking process cannot be automated, or the errors wouldn't occur in the first place.

You should set up per-user train-as-ham and train-as-spam mailboxes, and only train from those, only for the users whose judgement you trust. Then, those users should copy misclassified messages to the appropriate folder and may also add samples of ham to the train-as-ham folder whenever desired.


#!/bin/bash

IFS=$'\n'
FOLDERLIST=`find Maildir -name .INBOX\* -type d;`

for i in $FOLDERLIST; do
   echo "Processing ""$i"
#    `sudo sa-learn"--ham "$i"`
done

#`sudo sa-learn --spam Maildir/.Junk



--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Users mistake widespread adoption of Microsoft Office for the
  development of a document format standard.
-----------------------------------------------------------------------
 115 days since the first successful private support mission to ISS (SpaceX)

Reply via email to