Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-07 Thread Loren Wilton

I guess the risk is exactly the same as rulenames colliding.. better not
use very generic names and you can always prepend the rulename yourself. 
:-)


My other concern is thta as far as I know, SA rules are still limited to a 
single line of text. If the rule name plus item name gets long, the rule 
text using rule_name:item_name starts to become very long and unreadable, 
espcially when multiple items are used in a single rule body.


   Loren



Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-07 Thread John Hardin

On Sat, 8 May 2021, Henrik K wrote:


On Fri, May 07, 2021 at 02:44:48PM -0700, John Hardin wrote:

On Fri, 7 May 2021, Loren Wilton wrote:


The only nitpick I'd offer is that I'd prefer that the capture tokens be
at a single level, like rule names. So you might get:


$pms->{captured_values}->{NAME} = $+{NAME};

Then use it in a rule:

body MATCHER /My name is ${NAME}/


The risk with that is rules from multiple sources using colliding variable
names.

  body MATCHER /My name is ${FROM_NAME:NAME}/

...is explicit and doesn't carry that risk.


I guess the risk is exactly the same as rulenames colliding.. better not
use very generic names and you can always prepend the rulename yourself. :-)


heh. True.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 Tomorrow: the 76th anniversary of VE day


Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-07 Thread Henrik K
On Fri, May 07, 2021 at 02:44:48PM -0700, John Hardin wrote:
> On Fri, 7 May 2021, Loren Wilton wrote:
> 
> > The only nitpick I'd offer is that I'd prefer that the capture tokens be
> > at a single level, like rule names. So you might get:
> > 
> > > $pms->{captured_values}->{NAME} = $+{NAME};
> > > 
> > > Then use it in a rule:
> > > 
> > > body MATCHER /My name is ${NAME}/
> 
> The risk with that is rules from multiple sources using colliding variable
> names.
> 
>   body MATCHER /My name is ${FROM_NAME:NAME}/
> 
> ...is explicit and doesn't carry that risk.

I guess the risk is exactly the same as rulenames colliding.. better not
use very generic names and you can always prepend the rulename yourself. :-)



Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-07 Thread John Hardin

On Fri, 7 May 2021, Loren Wilton wrote:

The only nitpick I'd offer is that I'd prefer that the capture tokens be at a 
single level, like rule names. So you might get:



$pms->{captured_values}->{NAME} = $+{NAME};

Then use it in a rule:

body MATCHER /My name is ${NAME}/


The risk with that is rules from multiple sources using colliding 
variable names.


  body MATCHER /My name is ${FROM_NAME:NAME}/

...is explicit and doesn't carry that risk.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Autocorrect is the work of the Devil, and whoever invented it
  should go straight to hello.-- Windy Wilson
---
 Tomorrow: the 76th anniversary of VE day


Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-07 Thread Henrik K
On Fri, May 07, 2021 at 10:34:10AM -0700, Loren Wilton wrote:
>
> The only nitpick I'd offer is that I'd prefer that the capture tokens be at
> a single level, like rule names. So you might get:
> > 
> > body MATCHER /My name is ${NAME}/

Yes, probably more flexible this way.



Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-07 Thread Loren Wilton

Perl already has named capture groups as legit syntax, so it would be most
simple to actually use them.

https://perldoc.perl.org/perlre#(?%3CNAME%3Epattern)

header FROM_NAME /^From: "(?\w+)/


Good. I thought there was someting there, but I didn't remember the exact 
syntax and was too lazy to dig it out. Works for me.




... just save the matches it in the rule code
$pms->{captured_values}->{FROM_NAME}->{NAME} = $+{NAME};

Then use it in a rule:

body MATCHER /My name is ${FROM_NAME:NAME}/

Don't nitpick on ${}, could be any similar syntax.  Code adds this rule to
FROM_NAME dependency chain.  When FROM_NAME hits, run MATCHER regex
(obviously first recompile the regexp).


The only nitpick I'd offer is that I'd prefer that the capture tokens be at 
a single level, like rule names. So you might get:



$pms->{captured_values}->{NAME} = $+{NAME};

Then use it in a rule:

body MATCHER /My name is ${NAME}/


   Loren



[Bug 7735] Meta rules need to handle missing/unrun dependencies

2021-05-07 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

--- Comment #9 from Loren Wilton  ---
Minor correction to the previous post. I said there that there was a sync rule
list and a sync meta list, and the sync rules were processed in priority order,
then the sync meta rules. This isn't quite correct.

The two sync lists need to be interleaved. The sync rules of priority N are
run, then the sync meta rules of the same priority. The list assignment
processing will have seen to it that a meta rule is at a higher (later
evaluation) priority than any rules that it depends on.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7735] Meta rules need to handle missing/unrun dependencies

2021-05-07 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

Loren Wilton  changed:

   What|Removed |Added

 CC||lwil...@earthlink.net

--- Comment #8 from Loren Wilton  ---
Just thinking off the top of my head, I think I'd consider something along the
following lines.

1. Conceptually build four prioritized tree-lists of rules:
 a. non-net, non-meta
 b. non-net, meta
 c. net, non-meta
 d. net, meta

The tree nodes are the priority order, for each priority where rules exist
there is a list of rules of that priority. The rules are evaluated in tree
order and branch order, in the order given above.

While I've described this as four trees of lists, it could equally be done with
one partitioned tree, where each partition is assigned a base rule priority
higher than any rule priority that can be manually assigned, for instance 0-99,
100-199, etc. The goal is to create the above-described rule evaluation order
constraints.

Non-meta rules are trivially assigned a priority, as it is a given for each
rule.

Meta rules may have been given priorities, but that needs to be considered as a
minimum priority, and can only be used to delay the meta evaluation.

As an implementation I think I'd initially throw all meta rules into a
bucket/list of some sort, remembering their priorities, but not assigning them
to the trees until all non-meta rules are assigned to the correct tree nodes.

Then I would pull meta rules out of the bucket and look up each referenced
rule. I would remember the highest tree node of any referenced rule.

If all referenced rules are found for a meta, it does not depend on any other
meta rules, and can be assigned to the meta tree at either the level of the
highest dependency rule found, or the level given for the meta, only if that is
higher than the highest dependency rule.

If not all dependencies are found, I would throw it at the back of the
unprocessed meta list, with a flag saying it had been seen once. (Alternately,
into a reprocessing meta list separate from the current list.) 

When the current meta list is depleted, it can be replaced with the postponed
meta list, and that list again processed exactly as above. A flag can be set if
at least one meta is removed from the list into a tree. Any metas still with
unresolved dependencies can again be thrown back into an alternate pool. As
long as at least one meta is removed from the unprocessed list the reprocessing
can be repeated. If all remaining metas are passed and none are placed into the
tree, the remaining metas are either circular or depend on undefined rules. I
would throw them out as errors at that point.

Obviously there are faster ways to perform the above evaluation, but since it
only needs to be done once, simplicity may be a virtue in the implementation.

In any case, the end result is an ordered list if rules to evaluate in the
order they are found in the list. The non-net stuff gets done quickly, and the
net rules queued quickly. 

Metas that depend on async net rules are a pain because the net results are
returned out of order. I can't think of a clever way to deal with this, but a
simple brute-force approach is easy. Each net rule needs a list of the metas
that depend on it. Each net meta needs a count of dependencies required to
complete, and a counter initialized to zero. As each net rule completes, it
walks it's dependent list and increments the count for the dependency. If the
incremented count matches the total dependency count the meta rule can be run.

Overall I think that will handle getting things into the right evaluation order
fairly easily and also evaluating them in the correct order without too much
pain.

(Of course I don't recall what the short-circuit flag does exactly. It probably
throws a wrench in the above evaluation mechanism.)

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-07 Thread Henrik K
On Fri, May 07, 2021 at 09:07:08AM -0700, Loren Wilton wrote:
> > > >  header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1
> > > 
> > > Would :capture play well with (e.g.) :addr, :name, :raw, etc?
> > 
> > It might as well be a tflag or something.  Why limit capturing to headers
> > only?
> 
> I hadn't intended it to be limited to headers only, but I guess the syntax
> woudl have to be a little different for raw, body, full, etc, since they
> don't have a part keyword in the rule syntax.

Perl already has named capture groups as legit syntax, so it would be most
simple to actually use them.

https://perldoc.perl.org/perlre#(?%3CNAME%3Epattern)

header FROM_NAME /^From: "(?\w+)/

... just save the matches it in the rule code
$pms->{captured_values}->{FROM_NAME}->{NAME} = $+{NAME};

Then use it in a rule:

body MATCHER /My name is ${FROM_NAME:NAME}/

Don't nitpick on ${}, could be any similar syntax.  Code adds this rule to
FROM_NAME dependency chain.  When FROM_NAME hits, run MATCHER regex
(obviously first recompile the regexp).



Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-07 Thread Loren Wilton

>  header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1

Would :capture play well with (e.g.) :addr, :name, :raw, etc?


It might as well be a tflag or something.  Why limit capturing to headers
only?


I hadn't intended it to be limited to headers only, but I guess the syntax 
woudl have to be a little different for raw, body, full, etc, since they 
don't have a part keyword in the rule syntax.


Originally I hadn't wanted to have the ":Capture" part, just have the 
capture assignment following the rule body. But then, how do you know if 
there is a capture assignment at the end? I didn't like the idea of trying 
to stick it into the match flags, especially for the (probably rare) case of 
multiple captures in a single rule.


I suppose that the rule scanner probably is looking past the flags that may 
follow a regex closing bracket, so would pick up an assignment if there was 
one there. So, for instance, this should work:


   bodySOME_RULE /Your (\w+) Order/i $(__COMPANY)=\1

Alternately (which I don't much care for) we could have

   bodySOME_RULE /Your (\w+) Order for \$(\d+)/i
   assign__COMPANY,__AMOUNT

or keyworded

   assign1=__COMPANY,2=__AMOUNT

What worries me about that sort of syntax is there is no real 
juxtapositioning requirement between a rule name definition and any modifier 
flag lines with the same rule name. The capture could be in a completely 
different rule file, and I suppose could even be before the defining rule by 
a thousand lines or so in a single file. But you pretty much need to see 
both the regex and the assignments to know what is happening to what. So 
allowing the assignments to be separated from the regex isn't necessarily 
good.


   Loren



Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-07 Thread Henrik K
On Fri, May 07, 2021 at 06:08:00PM +0300, Henrik K wrote:
> 
> All this is petty details compared to the overall logic that is required in
> the background.

I'm mostly interested in tackling the meta-rule dependency mess right now:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735#c7

Variable capturing would be a logical enhancement that follows it, as the
supporting dependency logic would likely be already implemented then.

Any thoughts on implementing it efficiently are welcome.



Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-07 Thread Henrik K
On Fri, May 07, 2021 at 07:58:18AM -0700, John Hardin wrote:
> On Sun, 2 May 2021, Loren Wilton wrote:
> 
> > Now consider variable capture from the message:
> > 
> >  header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1
> 
> I like this syntax. I was thinking that the capture would be implied - any
> capturing group in a rule would automagically save its (single) match in a
> variable named after the rule (kept separate from the rule's score) for
> later use, but I like the explicit nature of this approach.
> 
> Would :capture play well with (e.g.) :addr, :name, :raw, etc?

It might as well be a tflag or something.  Why limit capturing to headers
only?

Or not a tflag at all.  Just a uncommon enclosure format that is parsed from
_any_ regex anywhere.

All this is petty details compared to the overall logic that is required in
the background.



Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-07 Thread John Hardin

On Sun, 2 May 2021, Loren Wilton wrote:


Now consider variable capture from the message:

 header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1


I like this syntax. I was thinking that the capture would be implied - any 
capturing group in a rule would automagically save its (single) match in a 
variable named after the rule (kept separate from the rule's score) for 
later use, but I like the explicit nature of this approach.


Would :capture play well with (e.g.) :addr, :name, :raw, etc?

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 Tomorrow: the 76th anniversary of VE day