Ok,
not sure if this is handy to anyone..

but seeing as Horde/Imp webmail's "Report as Spam" link was attaching the mail and making it difficult for me to process i thought i'd write a quick script to process a Maildir full of Horde/Imp 'Report as Spam' mails and strip them of the unnecessary headers, ready to be fed to sa-learn.

I have attached the script.

Hope it is useful to someone somewhere

jon

Loren Wilton wrote:
I now have a folder full of emails from legitimate users with verified
spam as attachments - the spam attachments appear to have all headers.

Can i just run "sa-learn --spam" on that folder? Does spamassassin know
to only look at the attachment?


No and no.


Or do i somehow have to extract these attachments first?


'Fraid so.

Likely someone already has some tools to do this sort of thing, but I don't,
so can't help there.


Any help would be most appreciated (i have been running sa-learn on the
whole folder, and am thinking that that might be wrong!)


Yea, it has learned that the headers from the users themselves constuitute
spam keys, which probably isn't a real good thing for overall filtering
accuracy.  It has also learned to associate any filtering headers in the
forwarded messages as part of the spam.

        Loren



#!/usr/bin/perl
#
# iisjsoo
# 3 March 2004
#
# This script is used to clean up mail forwarded into a Maildir directory by 
Horde/IMP through
# the 'Report as Spam' link.
# After running this script you can then run 'sa-learn' on the directory to 
learn all the mail
# as spam or ham.
# This script can be safely run over the same directory again and again (useful 
if you're just
# going to let more mail accumulate).

use strict;
use File::Find;
@ARGV = ('.') unless @ARGV;

my ($count, $filename, $outfile);

if ($#ARGV+1 != 1)
{
  print "Usage: $0 directory-name\n";
}

sub process_file {
  $count = 0;
  $filename = $_;
  $filename =~ s#^(\s)#./$1#;
  $outfile = $_ . ".out";
  if (-f  $filename)
  {
    open( FH, "< $filename") or die "could not open file";
    open( OUTFILE, ">$outfile" ) or die "could not open output file";
    while (<FH>)
    {
      if ($_ =~ m/^Return-Path.*$/)
      {
        $count++;
      }

      if ($count >= 2)
      {
        print OUTFILE $_;
      }
    }
    close (FH);
    close (OUTFILE);

    if ($count == 2) # This means there were the two 'Return-Path's
    {
      print "Moving $outfile to $filename\n";
      `mv $outfile $filename`;
    }
    else
    {
      print "Did not receive two Return-Paths: $filename\n";
      `rm $outfile`;
    }

  }
}

find(\&process_file, @ARGV);

Reply via email to