On Thu, 2009-07-09 at 08:50 -0400, Steve Bertrand wrote:
> My question is, given that the messages have already been processed by
> the 'cuda's (with their header stamps in place), am I damaging, or at
> risk of confusing the learning process of SA when I classify these
> messages as SPAM?
> 
Not really answering your question, but I find its helpful to strip SA
headers out of the message collection I use for testing private rules.
Here's a simple bash shell script fragment that does the job and does it
fairly fast:

========================================================================
for f in data/*.txt
do
        echo "Cleaning $f" 
        gawk '
                BEGIN           { act = "copy" }
                /^X-Spam/       { act = "skip" }
                /^[A-WYZ]/      { act = "copy" }
                                {  
                                  if (act == "copy")
                                        { print }
                                }
        ' <$f >temp.txt
        mv temp.txt $f
done
========================================================================


Martin


Reply via email to