Hi jdow,

On Fri, 24 Sep 2004, jdow wrote:

> From: "Gregory Zornetzer" <[EMAIL PROTECTED]>
>
<cut for easy reading>
> >  I would like to train the Bayesian filter with these
> > messages, so using pine, I put them in a mail folder called spam, and I
> > run sa-learn on it as follows:
> > sa-learn --spam --mbox --showdots mail/spam
> >
> > Generally, I notice that sa-learn processes exactly one more message than
> > I thought was in the folder.  When I take a look in the folder with a text
> > edittor, I see that there's a fake message that reads as follows:
> > ---------
> > >From MAILER-DAEMON Tue Dec  9 23:05:26 2003
> > Date: Tue, 9 Dec 2003 23:05:26 -0600
> > From: Mail System Internal Data <[EMAIL PROTECTED]>
>
> Gregory, I have a cure for that. It's ugly and involved a few dozen lines
> of C code.
>
> I use the C code to find the second "^From " in the file. I save
> everything after that including the "From " to ./training/spam_train
> for training. I save everything before that to its original file. I
> arranged to do this with safe saves so data loss won't happen. Once
> I have cleaned out the spam mailbox I run salearn on the spam_train
> mailbox. Finally I append all the spam_train messages to "oldspam",
> delete spam_tain, and touch spam_train so it's present for the next
> round.
>
> I use the same generic code for learning ham as well as spam. I just
> change the input parameters around a little. It's all part of a
> script "satrain" that I run as a cron job once a day.
Makes sense.

>
> For one or two people this is quite satisfactory. For large numbers
> of users an alternative approach might be called for.
Heh, luckily, it's just a single-user install.  Though I get the feeling
that others in my group might start pestering the sysadmin for system-wide
spam protection.

>
> I can send you the source for the "imapstrip" utility I built for
> doing this. (Imap and Ipop3 have the same header file tehse days.)
Ah - thanks for the tip. I going to take a guess and say that it looks
pretty similar to the  following perl code I just wrote. (please excuse my
lack of finesse with  perl coding).  Except that this takes input on stdin
and writes to stdout.


#!/usr/bin/perl
$line = <STDIN>;
if ($line =~ /^From\sMAILER-DAEMON/) {
   do {
        $line = <STDIN>
   } until($line =~ /^From\s/ | $line eq "");
};
print $line;
while(<>) {
   print $_;
}


Guess its time for me to write some sripts.
Thanks,
-Greg

Reply via email to