> DBF> On Fri, 7 Nov 2003, Robert Menschel wrote: > > >> Or better: what if we specified in the rule a maximum score to accumulate > >> to? Maybe something like: > >> > >> accumbody T_SAMPLE /(?:word1|word2|word3|word4|word5)/i,max=2.5 > >> describe T_SAMPLE Message has medical words frequently used in spam > >> score T_SAMPLE 0.5 > >> > >> Each time any of the five words was used, it'd score 0.5, to a maximum > >> score of 2.5. No matter how long the message was, this rule could not by > >> itself cause an FP, and would work in conjunction only with other rules > >> to flag something as spam. > > DBF> A slight modification of the above idea, rather than 'max=2.5' have > DBF> 'maxhits=5'. IE that particular rule fires no more than 5 times and then > DBF> the matching engine can drop it and move on to the next rule. > > DBF> The final score would be 'nhits' * score. That way the matching engine > DBF> does not need to worry about any score calculations, just tallying up > DBF> number of matches. > DBF> There should also be a default implicit 'maxhits' value to keep the > DBF> matching process moving along and not slow things down too much. ;)
OK, here's the next revision (I've put my programmer's hat on ;). When parsing the config files and generating the rules structures, for each rule add two new variables: 'maxhits', default to the value 1 & 'nhits' init to 0. If the rule has a "maxhits=n" argument, set the maxhits to that value. When running the matching engine eval for a rule, each time there's a hit, increment nhits and decrement maxhits. if maxhits < 1 terminate that rule. In the scoring and running the meta-rules, consider 'nhits' to be the "value" of that rule. IE if == 0, then for boolean sake false, if != 0 then true, for arithmetic metas, it's the actual value of 'nhits'. The score part, added to 'hits' is 'nhits' * score for that rule. So if you leave maxhits = 1 for each rule (the default), you have everything working as it does right now. The accumulating part only kicks in if a "maxhits=n" argument is added to a particular rule. Thus you would only need to modify one user-visible part of the conf stuff, add the "maxhits=" argument. All other rule stuff would still look & act the same (in default). Theoretically you need only modify two parts of the whole kit, Conf.pm to parse the optional "maxhits=n" argument and the matching eval engine to keep searching if maxhits > 0 (if I understand the code correctly ;). Probably also want to modify the report stuff. Looking at Robert's 'T_SAMPLE' example from his previous message, you would implement it as: body __T_SAMPLE /(?:word1|word2|word3|word4|word5)/i,maxhits=6 meta T_SAMPLE (__T_SAMPLE) score T_SAMPLE 0.5 describe T_SAMPLE Message has medical words frequently used in spam meta T_SAMPLEA ( __T_SAMPLE > 5 ) score T_SAMPLEA 2.0 Dave -- Dave Funk University of Iowa <dbfunk (at) engineering.uiowa.edu> College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527 #include <std_disclaimer.h> Better is not better, 'standard' is better. B{ ------------------------------------------------------- This SF.Net email sponsored by: ApacheCon 2003, 16-19 November in Las Vegas. Learn firsthand the latest developments in Apache, PHP, Perl, XML, Java, MySQL, WebDAV, and more! http://www.apachecon.com/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk