On Thu, 2009-07-09 at 08:50 -0400, Steve Bertrand wrote:
> My question is, given that the messages have already been processed by
> the 'cuda's (with their header stamps in place), am I damaging, or at
> risk of confusing the learning process of SA when I classify these
> messages as SPAM?
>
Not really answering your question, but I find its helpful to strip SA
headers out of the message collection I use for testing private rules.
Here's a simple bash shell script fragment that does the job and does it
fairly fast:
========================================================================
for f in data/*.txt
do
echo "Cleaning $f"
gawk '
BEGIN { act = "copy" }
/^X-Spam/ { act = "skip" }
/^[A-WYZ]/ { act = "copy" }
{
if (act == "copy")
{ print }
}
' <$f >temp.txt
mv temp.txt $f
done
========================================================================
Martin