Hi jdow, On Fri, 24 Sep 2004, jdow wrote:
> From: "Gregory Zornetzer" <[EMAIL PROTECTED]> > <cut for easy reading> > > I would like to train the Bayesian filter with these > > messages, so using pine, I put them in a mail folder called spam, and I > > run sa-learn on it as follows: > > sa-learn --spam --mbox --showdots mail/spam > > > > Generally, I notice that sa-learn processes exactly one more message than > > I thought was in the folder. When I take a look in the folder with a text > > edittor, I see that there's a fake message that reads as follows: > > --------- > > >From MAILER-DAEMON Tue Dec 9 23:05:26 2003 > > Date: Tue, 9 Dec 2003 23:05:26 -0600 > > From: Mail System Internal Data <[EMAIL PROTECTED]> > > Gregory, I have a cure for that. It's ugly and involved a few dozen lines > of C code. > > I use the C code to find the second "^From " in the file. I save > everything after that including the "From " to ./training/spam_train > for training. I save everything before that to its original file. I > arranged to do this with safe saves so data loss won't happen. Once > I have cleaned out the spam mailbox I run salearn on the spam_train > mailbox. Finally I append all the spam_train messages to "oldspam", > delete spam_tain, and touch spam_train so it's present for the next > round. > > I use the same generic code for learning ham as well as spam. I just > change the input parameters around a little. It's all part of a > script "satrain" that I run as a cron job once a day. Makes sense. > > For one or two people this is quite satisfactory. For large numbers > of users an alternative approach might be called for. Heh, luckily, it's just a single-user install. Though I get the feeling that others in my group might start pestering the sysadmin for system-wide spam protection. > > I can send you the source for the "imapstrip" utility I built for > doing this. (Imap and Ipop3 have the same header file tehse days.) Ah - thanks for the tip. I going to take a guess and say that it looks pretty similar to the following perl code I just wrote. (please excuse my lack of finesse with perl coding). Except that this takes input on stdin and writes to stdout. #!/usr/bin/perl $line = <STDIN>; if ($line =~ /^From\sMAILER-DAEMON/) { do { $line = <STDIN> } until($line =~ /^From\s/ | $line eq ""); }; print $line; while(<>) { print $_; } Guess its time for me to write some sripts. Thanks, -Greg