Ok, not sure if this is handy to anyone..
but seeing as Horde/Imp webmail's "Report as Spam" link was attaching the mail and making it difficult for me to process i thought i'd write a quick script to process a Maildir full of Horde/Imp 'Report as Spam' mails and strip them of the unnecessary headers, ready to be fed to sa-learn.
I have attached the script.
Hope it is useful to someone somewhere
jon
Loren Wilton wrote:
I now have a folder full of emails from legitimate users with verified spam as attachments - the spam attachments appear to have all headers.
Can i just run "sa-learn --spam" on that folder? Does spamassassin know to only look at the attachment?
No and no.
Or do i somehow have to extract these attachments first?
'Fraid so.
Likely someone already has some tools to do this sort of thing, but I don't, so can't help there.
Any help would be most appreciated (i have been running sa-learn on the whole folder, and am thinking that that might be wrong!)
Yea, it has learned that the headers from the users themselves constuitute spam keys, which probably isn't a real good thing for overall filtering accuracy. It has also learned to associate any filtering headers in the forwarded messages as part of the spam.
Loren
#!/usr/bin/perl # # iisjsoo # 3 March 2004 # # This script is used to clean up mail forwarded into a Maildir directory by Horde/IMP through # the 'Report as Spam' link. # After running this script you can then run 'sa-learn' on the directory to learn all the mail # as spam or ham. # This script can be safely run over the same directory again and again (useful if you're just # going to let more mail accumulate).
use strict;
use File::Find;
@ARGV = ('.') unless @ARGV;
my ($count, $filename, $outfile);
if ($#ARGV+1 != 1)
{
print "Usage: $0 directory-name\n";
}
sub process_file {
$count = 0;
$filename = $_;
$filename =~ s#^(\s)#./$1#;
$outfile = $_ . ".out";
if (-f $filename)
{
open( FH, "< $filename") or die "could not open file";
open( OUTFILE, ">$outfile" ) or die "could not open output file";
while (<FH>)
{
if ($_ =~ m/^Return-Path.*$/)
{
$count++;
}
if ($count >= 2)
{
print OUTFILE $_;
}
}
close (FH);
close (OUTFILE);
if ($count == 2) # This means there were the two 'Return-Path's
{
print "Moving $outfile to $filename\n";
`mv $outfile $filename`;
}
else
{
print "Did not receive two Return-Paths: $filename\n";
`rm $outfile`;
}
}
}
find(\&process_file, @ARGV);
