Hi all, I recently installed spamcop 3.0.0 onto my unix account on an SGI IRIX 6.5 box. I'm using perl 5.8.5, and I generally read my email with pine, though sometimes I'll remotely view it using Evolution through the machine's IMAP server.
The following is a portion of my .procmailrc file that is used for spamassassin filtering of my email: :0fw: spamassassin.lock * < 80000 | spamassassin :0: * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\* mail/spam-definitely :0: * ^X-Spam-Status: Yes mail/spam-probably I have noticed that the mail the gets into the spam-probably folder generally doesn't get autolearned by spamassassin. Also, I've noticed one message that snuck through the spam filter (it only got a score of 3, and I haven't gotten enough spams trained in the Bayesian filter to activate it.) I would like to train the Bayesian filter with these messages, so using pine, I put them in a mail folder called spam, and I run sa-learn on it as follows: sa-learn --spam --mbox --showdots mail/spam Generally, I notice that sa-learn processes exactly one more message than I thought was in the folder. When I take a look in the folder with a text edittor, I see that there's a fake message that reads as follows: --------- >From MAILER-DAEMON Tue Dec 9 23:05:26 2003 Date: Tue, 9 Dec 2003 23:05:26 -0600 From: Mail System Internal Data <[EMAIL PROTECTED]> Subject: DON'T DELETE THIS MESSAGE -- FOLDER INTERNAL DATA X-IMAP: 0945113015 0000000396 Status: RO This text is part of the internal format of your mail folder, and is not a real message. It is created automatically by the mail system software. If deleted, important folder data will be lost, and it will be re-created with the data reset to initial values. --------- I am worried that the Bayesian filter is learning this folder-internal-data message as spam and that this may skew the results of the filter in the future. Note that the folder-internal data message appears to change when the mailbox is changed, so each time I run sa-learn, the message will get learned again, and not simply passed over as an already-learned message. I've found some other people have asked a similar question in the past, but I didn't see any good answers to it. Should I submit a bugzilla report on this? Any scripts to automagically strip out this message from an MBOX file? Thanks very much, Greg Zornetzer gaz at nmrfam dot wisc dot edu