Sorry for taking so long to get it out here - I had a problem with our email servers not liking us getting spammy messages (you may have noticed the bounced Bellsouth.com emails from my co-worker).
I have to restart my spamd when I modify my badwords file. The data in
the file is read into memory as soon as spamd (or amavis, in my case) is
started, so there is no IO impact on it. Plus, it loads into a hash, so
it takes up very little memory. My concern about performance is the
split it does on every word - it's fine for the subject (usually
subjects only have a few words), but I don't think I'd run it against
the body, although it would be GREAT if I could.
You guys are welcome to use the code. It's blocking ~200 messages/day
on my servers which receives about 15,000 per day (after my blackhole
list blocks).
I have one other rule that's pretty cool too - it blocks about 320
messages/day. It also requires a change to EvalTests.pm. I got tired
of seeing spam come in like this: vi<notatag>agra. Most HTML email
clients ignore unknown tags, so it just shows "viagra" to the user, but
since it doesn't say "Viagra" (or anything close to it), it doesn't get
picked up by SA. This bit of code counts the number of unknown HTML
tags and if it's between some threshold, it either adds a couple points
to the spam score or, if there's too many, it just blocks it altogether.
This rule requires a control file. The file is a list of known html
tags (at least known to me). I've attached it here. Put it in
/etc/mail/spamassassin.
Here are the rules I added to local.cf. The 1st number is "at least
this number of bad tags" and the 2nd number is (optional) "no more than
this number of bad tags":
rawbody BAD_TAGS2 eval:check_for_bad_tags('2','7')
describe BAD_TAGS2 Between 2-7 bad HTML tags in message
score BAD_TAGS2 4.0
rawbody BAD_TAGS8 eval:check_for_bad_tags('8')
describe BAD_TAGS8 More than 8 bad HTML tags in message
score BAD_TAGS2 8.0
This one, too, is a low memory usage one - it loads all html tags into a
hash and it's done. To add new tags, you have to restart spamd.
It ignores XML documents (they have lots of unknown tags) and messages
with attachments (many word documents or zip files have < and > in
them). The code also ignores all non-html emails - this rule only
applies to HTML emails. It's also smart enough to ignore URLs. Some
email clients change urls to <url.com>. It also ignores <mailto:...>
tags.
Here's the code change (it looks a lot like the other rule). Edit the
EvalTest.pm library that comes with SA (mine is in
/opt/local/lib/perl5/site_perl/5.8.0/Mail/SpamAssassin/EvalTests.pm):
use vars qw{
$IP_ADDRESS $IPV4_ADDRESS
$CCTLDS_WITH_LOTS_OF_OPEN_RELAYS
$ROUND_THE_WORLD_RELAYERS
$WORD_OBFUSCATION_CHARS
$CHARSETS_LIKELY_TO_FP_AS_CAPS
%TAGS #<-------------------------- add this!
%BADS #<-------------------------- this is from the other rule (see
the previous posting from me), but not required for this rule
};
open(TAGS,"< /etc/mail/spamassassin/knowntags");
while(<TAGS>) {
chomp(my $tag=lc($_));
$TAGS{$tag}=1;
}
close(TAGS);
sub check_for_bad_tags {
my ($self, $body, $count, $max) = @_;
$max="" if (!defined max);
my $message="";
my $invalidcount=0;
foreach my $line (@$body) {
chomp($message.=lc($line));
}
return 0 if ($message=~/\<xml/ || $message=~/\<?xml/);
return 0 if ($message=~/spamassassin spam filter/);
return 0 if ($message=~/boundary=/ && $message=~/multipart/);
if ($message=~/\<body/ || $message=~/\<html/) {
my @tags=split(/\</,$message);
my $first=1;
foreach my $tag (@tags) {
if (!$first) {
my $thistag="";
if ($tag=~/^\/?([^\s\>\=]*)[\s\>\=]/) {
$thistag=$1;
}
if ($thistag!~/\S{1,[EMAIL PROTECTED],}\.[\S]{1,}/ &&
$thistag!~/www\.\S{1,}\.\S{1,}/ && $thistag!~/^http:/ &&
$thistag!~/^https:/ && $thistag!~/^\.\./ && $thistag!~/^mailto/) {
if (!$TAGS{$thistag} && $thistag!~/^\!\-\-/) {
$invalidcount++;
}
}
}
$first=0;
}
}
if ($max ne "") {
return 1 if ($invalidcount>$count && $invalidcount<=$max);
}
else {
return 1 if ($invalidcount>$count);
}
return 0;
}
-----Original Message-----
From: Chris Santerre [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 10, 2004 3:18 PM
To: [EMAIL PROTECTED]
Subject: RE: possibly a dumb comment, apologies if I'm being a n00b
> -----Original Message-----
> From: Hackworth, Keith A [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, March 10, 2004 1:12 PM
> To: R Michael Harman
> Cc: [EMAIL PROTECTED]
> Subject: RE: possibly a dumb comment, apologies if I'm being a n00b
>
>
> I've worked on one and it's working great for me. The only problem is
> that I had to modify the EvalTests.pm file that came in SA,
> so I'll have
> to add it again when I upgrade. The rule only applies to the subject.
> I haven't officially tested it, but it's running in my production
> environment due to its success. I'd appreciate any comments on this,
> besides me not officially testing it first 8+), and would REALLY
> appreciate it if someone could test it.
Yeah, I have a comment......FINALY!!! :)
I've looked at this for a while and it seems really cool. But I don't
publish anyones work like this unless they give me permission. This gem
was
buring a hole in my hard drive for quite some time! :) My question is
for
JM on this. Is there a penalty for readin this info from a file? Does it
get
reread everytime the rule is run? ANd finally, do changes to the control
file get picked up right away, or do you still need to restart spamd.
Also on the subject of OBFU, check out this link:
http://www.exit0.us/index.php/ChrissMediocreObfuScript
hth,
Chris
*****
"The information transmitted is intended only for the person or entity to which
it is addressed and may contain confidential, proprietary, and/or privileged
material. Any review, retransmission, dissemination or other use of, or taking
of any action in reliance upon, this information by persons or entities other
than the intended recipient is prohibited. If you received this in error,
please contact the sender and delete the material from all computers." 113
knowntags
Description: knowntags
