Hi all,

I recently installed spamcop 3.0.0 onto my unix account on an SGI IRIX 6.5
box.  I'm using perl 5.8.5, and I generally read my email with pine,
though sometimes I'll remotely view it using Evolution through the
machine's IMAP server.

The following is a portion of my .procmailrc file that is used for
spamassassin filtering of my email:

:0fw: spamassassin.lock
* < 80000
| spamassassin

:0:
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
mail/spam-definitely

:0:
* ^X-Spam-Status: Yes
mail/spam-probably


I have noticed that the mail the gets into the spam-probably folder
generally doesn't get autolearned by spamassassin.  Also, I've noticed
one message that snuck through the spam filter (it only got a score of 3,
and I haven't gotten enough spams trained in the Bayesian filter to
activate it.)  I would like to train the Bayesian filter with these
messages, so using pine, I put them in a mail folder called spam, and I
run sa-learn on it as follows:
sa-learn --spam --mbox --showdots mail/spam

Generally, I notice that sa-learn processes exactly one more message than
I thought was in the folder.  When I take a look in the folder with a text
edittor, I see that there's a fake message that reads as follows:
---------
>From MAILER-DAEMON Tue Dec  9 23:05:26 2003
Date: Tue, 9 Dec 2003 23:05:26 -0600
From: Mail System Internal Data <[EMAIL PROTECTED]>
Subject: DON'T DELETE THIS MESSAGE -- FOLDER INTERNAL DATA
X-IMAP: 0945113015 0000000396
Status: RO

This text is part of the internal format of your mail folder, and is not
a real message.  It is created automatically by the mail system software.
If deleted, important folder data will be lost, and it will be re-created
with the data reset to initial values.
---------
I am worried that the Bayesian filter is learning this
folder-internal-data message as spam and that this may skew the results of
the filter in the future.  Note that the folder-internal data message
appears to change when the mailbox is changed, so each time I run
sa-learn, the message will get learned again, and not simply passed over
as an already-learned message.

I've found some other people have asked a similar question in the past,
but I didn't see any good answers to it.  Should I submit a bugzilla
report on this?  Any scripts to automagically strip out this message from
an MBOX file?

Thanks very much,
Greg Zornetzer
gaz at nmrfam dot wisc dot edu

Reply via email to