Martin -

Interesting. How many mailboxes does your deployment cover?



-----Original Message-----
From: Martin Gregorie [mailto:mar...@gregorie.org] 
Sent: Thursday, April 25, 2013 8:08 PM
To: users@spamassassin.apache.org
Subject: Re: More longer rules or fewer shorter ones?

On Thu, 2013-04-25 at 18:45 -0400, Andrew Talbot wrote:

> I like your point about the portmanteau rules (and I award you two 
> Points for using one of my favorite words in a new - yet appropriate - 
> manner!).
> 
:-)


> I never thought about scoring each rule as a 0.001 or something really 
> low then tying them all together with meta-rules. It's been a while 
> since I separated everything out but I believe I have around 1000 
> different checks (most of them portmanteau'd) so it seems like those 
> meta rules would just get ... Messy. But it's a good idea, and I think 
> I can especially make use of it in my Specific Word list.
> 
The metas aren't too bad, though I must admit to building some of them as metas 
of metas to keep all lines down to 72 chars or so. Most of these submetas are 
simply lists of other rules that have been ANDed or ORed together.

You may find that the Portmanteau Generator reduces your rule count because it 
too can generate metas, which I use to deal with situations where a term can 
appear in more than one case, e.g. a generated rule can have this form:

describe GENRULE Example rule  
header   __GR1   Reply-to =~ /(\@spam1\.com|\@spammer\.co\.uk|....)
header   __GR2   From     =~ /(\@spam1\.com|\@spammer\.co\.uk|....)
uri      __GR3   From     =~ /(\@spam1\.com|\@spammer\.co\.uk|....)
meta     GENRULE (__P1 || __P2 || __P3)
score    GENRULE 1.5

which has two advantages. First, that GENRULE is a single name that covers the 
same spammy term regardless of where it was used and secondly, since each 
generated rule has its own source file, this makes the three related lists 
easier to edit, since there's a good chance that a spammy term might be used in 
more than one of the related lists.
  
> Keeping the rules under 1-2mb is a good rule of thumb to follow.
> Luckily we're nowhere near that point yet. 
> 
Nor am I. As I said, my biggest generated rule is a bit over 9 KB.

> Can I ask how many rules you have, and how many of those are meta 
> rules?
>
I have 31 portmanteau rules, of which 9 contain metas. Only 12 of these have a 
score exceeding 1.0 and these are not usually used as part of higher level 
metarules

My local.cf is where any very specific rules live, along with the higher level 
metarules that use the low scoring portmanteau rules. This contains 129 rules 
which between them contain 96 'meta' statements. 36 of these have scores of 
under 1.0, so are probably used as components of metarules.  The total number 
of rules was obtained by using grep+wc to count lines containing '^score'.

my local.cf and portmanteau.cf files are both 29 KB in size.


Martin





Reply via email to