W dniu 2017-03-14 16:23, Kris Deugau napisał(a):
mar...@mejor.pl wrote:
Hi!
Thanks to AXB seek-in-phrases-in-log works OK. Now I'm on the next
step with automated creating rules.
I suspect that mk_meta_rule_scores doesn't assign scores correctly. I
set in mk_meta_rule_scores:
my %scoremap = (
  '70' => '1.5',
  '4' => '2.0',
  '0.01' => '3.0',
);

If I understand correctly (quite possible I don't;  I haven't dug in
to the internals of this stage), the %scoremap hash above indicates
the percentage of the messages that need to hit on a subrule for that
subrule to be included in one of the meta rules.

So, 70% of the mail in the set would need to hit on a subrule for it
to be included in the first group, 4% in the second, and 0.01% in the
third.

If I read the information flow correctly, this is actually decided by
seek-phrases-in-log, which spits out subrules that reached a certain
hit rate in blocks, followed by the "# passed hit-rate threshold nnn"
line. mk_meta_rule_scores just takes that in, collects the rule names
in each block, and spits out the meta.


I made some tests and watch how output looks to understand how some paremeters works. Meseems that "--reqhitrate" works in this way: a) if --reqhitrate contains only one value then output od seek-phrases-in-log contains only rules that hits more than value passed to --reqhitrate. So this cuts off rules that are hitted rarely

b) if --reqhitrate contains more than one value then:
<high cut off level equal to higher value passed to --reqhitrate> rules <second value> other rules <low cut off>
example:
--reqhitrate "70 10 1" gives:
<100%> - no rules here - <70%> - rules that matches less than 70% of spam - <10%> rules that matches less than 10% of spam and more than 1% - <1%> - cut off, no rules here

(In fact the stock setup refers to mk_meta_rule, although it's nearly
identical to mk_meta_rule_scores.)

By raising the hit percentage to 70%, you're requiring that 70% of the
spam you're using must hit on one of the subrules.  TBH, by that
point, you may as well hand-extract a couple of the subrules and make
them static, standalone rules.

Have you tried to use mk_meta_rule_scores and did I get more values of scores than two? The default and the value in medium range. I suspect that mk_meta_rule_scores doesn't play well with ranges. It is something that I can live with it but if somewhere is bug I would try report it. If it will not be fixed it can save some time of other users trying to use this scipt.


My experience has been that you need lots of mail to generate multiple
metas in any case;  I've taken a different tack and separated out
different groups of mail for different generated rule sets instead.


Thanks,
Marcin

Reply via email to