--On Tuesday, September 23, 2003 07:26:08 -0700 [EMAIL PROTECTED] wrote:

1. update bogofilter's wordlists with every incoming message, using the
   -u option.  if i understand it, -u will first classify the spam, then
   update bogofilter's wordlist.  that seems like asking for trouble.
   if you filter to /dev/null based on bogofilter's output, how do you
   correct mistakes?  and it seems like mistakes here will cause more
   mistakes in the future.

i assume you do this with:

   :0fw
   | bogofilter -f -p -u -l -e -v

   also, shouldn't there be a "c" in the procmail colon line?  how does
   mail get past this recipe?  isn't it considered "delivered" when an
   email matches a recipe unless you use ":0c"?

A procmail recipe tagged with "f" is a filtering recipe. Procmail pipes the message through the specified program, then continues on using the filtered version of the message. It's not a delivering recipe, so "c" isn't needed.


I seeded bogofilter just like you did. I use maildirs for my email so every message is in a separate file, so I built a big list of every message less than a year old, divided them into spam & non-spam, and piped each set into bogofilter.

Incoming mail is piped through this set of rules:

       :0 fw
       | /usr/bin/bogofilter -u -2 -p -e

       # Spam? Save it in the spam folder
       :0
       * ^X-Bogosity: (yes|spam)
       $SPAM

It's a good idea to collect your spam rather than deleting it. You might want to delete your wordlist one day and build a new one; you'll need a collection of current spam to do that. More important, any time bogofilter makes a mistake you need to correct it, whether it was a false positive or false negative. I can't remember the last time I found non-spam in my spam folder, but it does happen from time to time.

You'll need to find a method of feeding mail back into bogofilter that works for you. I copy the mail into a special mailbox that's swept by a cron job several times per day. These messages are fed back into procmail using a special set of rules:

# Messages labelled spam. Tell bogofilter it's not, and save to INBOX
:0
* ^X-Bogosity: (Spam|Yes)
{
       :0 c
       | /usr/bin/bogofilter -Sn

       :0
       $DEFAULT
}

# Messages not labelled spam.
:0 E
{
       :0 c
       * ^X-Bogosity: (ham|no)
       | /usr/bin/bogofilter -Ns

       :0
       $SPAM
}

Note I'm not using bogofiler as a filter this time. Without -p (passthrough mode) it won't output a new copy of the message with the corrected spam header.
--
"We actually do 100,000 pages or more a day in Bork"
-- Marissa Mayer, Google
Kenneth Herron [EMAIL PROTECTED] 916-366-7338
_______________________________________________
vox-tech mailing list
[EMAIL PROTECTED]
http://lists.lugod.org/mailman/listinfo/vox-tech

Reply via email to