Re: [SAtalk] Re: Accumulator rules (Re: 'random' character sets)

David B Funk Fri, 07 Nov 2003 23:23:15 -0800

> DBF> On Fri, 7 Nov 2003, Robert Menschel wrote:
>
> >> Or better: what if we specified in the rule a maximum score to accumulate
> >> to? Maybe something like:
> >>
> >> accumbody  T_SAMPLE  /(?:word1|word2|word3|word4|word5)/i,max=2.5
> >> describe   T_SAMPLE  Message has medical words frequently used in spam
> >> score      T_SAMPLE  0.5
> >>
> >> Each time any of the five words was used, it'd score 0.5, to a maximum
> >> score of 2.5. No matter how long the message was, this rule could not by
> >> itself cause an FP, and would work in conjunction only with other rules
> >> to flag something as spam.
>
> DBF> A slight modification of the above idea, rather than 'max=2.5' have
> DBF> 'maxhits=5'. IE that particular rule fires no more than 5 times and then
> DBF> the matching engine can drop it and move on to the next rule.
>
> DBF> The final score would be 'nhits' * score. That way the matching engine
> DBF> does not need to worry about any score calculations, just tallying up
> DBF> number of matches.
> DBF> There should also be a default implicit 'maxhits' value to keep the
> DBF> matching process moving along and not slow things down too much. ;)


OK, here's the next revision (I've put my programmer's hat on ;).

When parsing the config files and generating the rules structures,
for each rule add two new variables: 'maxhits', default to the value 1 &
'nhits' init to 0.
If the rule has a "maxhits=n" argument, set the maxhits to that value.

When running the matching engine eval for a rule, each time there's a
hit, increment nhits and decrement maxhits. if maxhits < 1 terminate that
rule.

In the scoring and running the meta-rules, consider 'nhits' to be the
"value" of that rule. IE if == 0, then for boolean sake false, if != 0
then true, for arithmetic metas, it's the actual value of 'nhits'.

The score part, added to 'hits' is 'nhits' * score for that rule.

So if you leave maxhits = 1 for each rule (the default),  you have
everything working as it does right now. The accumulating part only
kicks in if a "maxhits=n" argument is added to a particular rule.

Thus you would only need to modify one user-visible part of the
conf stuff, add the "maxhits=" argument. All other rule stuff would
still look & act the same (in default).

Theoretically you need only modify two parts of the whole kit, Conf.pm
to parse the optional "maxhits=n" argument and the matching eval engine
to keep searching if maxhits > 0 (if I understand the code correctly ;).

Probably also want to modify the report stuff.

Looking at Robert's  'T_SAMPLE' example from his previous message, you
would implement it as:

body      __T_SAMPLE  /(?:word1|word2|word3|word4|word5)/i,maxhits=6
meta      T_SAMPLE (__T_SAMPLE)
score     T_SAMPLE  0.5
describe  T_SAMPLE  Message has medical words frequently used in spam
meta      T_SAMPLEA ( __T_SAMPLE > 5 )
score     T_SAMPLEA 2.0


Dave

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{



-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] Re: Accumulator rules (Re: 'random' character sets)

Reply via email to