Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-16 Thread ph10
On Sat, 14 Jul 2018, ND via Pcre-dev wrote:

> PCRE2 version 10.31 2018-02-12
> /(*NO_START_OPT)\A(?>(*:1)a)((*:2)x|)/mark
> ab
> 0: a
> 1:
> MK: 1
> 
> Resulting mark is "1" when no backtracking is allowed to it.

It just remembers "most recent mark" in the backtracking frame.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-16 Thread ph10
On Thu, 12 Jul 2018, ND via Pcre-dev wrote:

> And one more thing should also be clarified in docs:
> MARK name unlike MARK position is saved outside assertion or atomic group:

I have tried to clarify this.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-16 Thread ph10
On Sun, 15 Jul 2018, ND via Pcre-dev wrote:

> PCRE2 version 10.31 2018-02-12
> /(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/
> abc
> 0: bc
> 
> 
> If MARK in atomic don't matter for SKIP then why result is "bc" and not "abc"?
> If MARK in atomic matter for SKIP then why result is not "c"?

This was an obscure bug, which got the backtracking wrong. It was even 
wrong for /(?>a(*:1))b(?>)(*SKIP:1)x|.*/ and I am amazed nobody spotted 
it earlier. The bug was in the interpreter; JIT did not have the bug. I 
have fixed it and committed the patch. Thanks for the report. The 
pattern now matches "abc", as does Perl.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-15 Thread ph10
On Sun, 15 Jul 2018, ND via Pcre-dev wrote:

> PCRE2 version 10.31 2018-02-12
> /(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/
> abc
> 0: bc
> 
> 
> If MARK in atomic don't matter for SKIP then why result is "bc" and not "abc"?
> If MARK in atomic matter for SKIP then why result is not "c"?

This does look like a bug. Perl matches "abc" and that's what I would 
expect. I will investigate.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-15 Thread ND via Pcre-dev

And one more possibly bug:


PCRE2 version 10.31 2018-02-12
/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/
abc
 0: bc


If MARK in atomic don't matter for SKIP then why result is "bc" and not  
"abc"?

If MARK in atomic matter for SKIP then why result is not "c"?

--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-14 Thread ND via Pcre-dev

On 2018-07-14 15:12, ph10 wrote:

Feel free to look at the code and suggest patches. However, I don'tthink  
is is easy.




Sorry. I'm not ะก programmer.



It doesn't have to do anything
special when it passes a (*MARK:NAME) other than record a backtracking
point. Then when (*SKIP:NAME) is triggered, it backtracks till it hits a
matching (*MARK:NAME) and then the current position in the subject is
where to bumpalong to.



I wonder that engine not doing something special.
Consider this example:

PCRE2 version 10.31 2018-02-12
/(*NO_START_OPT)\A(?>(*:1)a)((*:2)x|)/mark
ab
 0: a
 1:
MK: 1


Resulting mark is "1" when no backtracking is allowed to it.
So I can guess PCRE do a special: it copies a pointers to mark names to  
another memory places. May be it copies pointer to current mark in every  
backtracking frame. May be something else *special* tactic.

Isn't it?




Keeping a separate table would require memory management, and its own
backtracking mechanism! If a branch that contains a (*MARK:NAME) fails
to match, the (*MARK) must be forgotten. Consider
 /(xxx(*MARK:A)xxx|yyy(*MARK:A)yyy)...(*SKIP:A).../



Now I don't see any extended backtracking needs. Only unsetting of  
"Mark-Position" fields of table.



Howbeit I see that such changes can be made in theory only after Perl  
changed accordingly. I can't neither report to Perl authors about this  
weird and unobvious behavior, nor programming in C.


So you free to close this topic.

--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-14 Thread ph10
On Sat, 14 Jul 2018, ND via Pcre-dev wrote:

> It seems instead of maintaining only MarkNames PCRE can maintain a table with
> MarkName-MarkPosition pares. And so not have need to backtrack to access MARK
> position data. And not loose MarkPosition information.

Feel free to look at the code and suggest patches. However, I don't 
think is is easy.

At present, the matching engine does everything by backtracking. Note
that this gives the same results as Perl. It doesn't have to do anything
special when it passes a (*MARK:NAME) other than record a backtracking
point. Then when (*SKIP:NAME) is triggered, it backtracks till it hits a
matching (*MARK:NAME) and then the current position in the subject is
where to bumpalong to.

Keeping a separate table would require memory management, and its own
backtracking mechanism! If a branch that contains a (*MARK:NAME) fails
to match, the (*MARK) must be forgotten. Consider
 
/(xxx(*MARK:A)xxx|yyy(*MARK:A)yyy)...(*SKIP:A).../

The *SKIP must activate whichever MARK matched, because they may have 
different bumpalong points. I think what you are suggesting would be 
very difficult to implement.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-14 Thread ND via Pcre-dev

On 2018-07-14 07:16, ph10 wrote:


>> Why it need to backtrack?
> Why not do a "bumpalong" advance to the next starting character strait  
away?
It has to backtrack to the *MARK because that is where the bumpalongdata  
is remembered. There may be many *MARKs, each with a differentname. You  
can't just keep a single data item.




I think this is something weird here: PCRE during matching process have  
information about all passed MARK names and MARK positions. Information  
about MARK names is somewhere nearby and can be easily retrieved.  
Information about MARK positions is saved somewhere deep and can be  
retrieved only by backtracking. If no backtracking available then there is  
no way to access information.


It seems instead of maintaining only MarkNames PCRE can maintain a table  
with MarkName-MarkPosition pares. And so not have need to backtrack to  
access MARK position data. And not loose MarkPosition information.


--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-14 Thread ph10
On Sat, 14 Jul 2018, ND via Pcre-dev wrote:

> On 2018-07-13 16:08, ph10 wrote:
> >When SKIP has a name, it backtracks until it hits a MARK with the samename.
> >
> 
> Why it need to backtrack?
> Why not do a "bumpalong" advance to the next starting character strait away?

It has to backtrack to the *MARK because that is where the bumpalong 
data is remembered. There may be many *MARKs, each with a different 
name. You can't just keep a single data item.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-13 Thread ND via Pcre-dev

On 2018-07-13 16:08, ph10 wrote:
When SKIP has a name, it backtracks until it hits a MARK with the same 
name.




Why it need to backtrack?
Why not do a "bumpalong" advance to the next starting character strait  
away?


--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-13 Thread ph10
On Fri, 13 Jul 2018, ND via Pcre-dev wrote:

> The SKIP verb don't need backtracking after it fires: there is bumpalong and
> new match. If MARK position is saved then there is no problem for engine to
> discard current matching and start new matching at saved position without any
> backtracking. Isn't it?

When SKIP has a name, it backtracks until it hits a MARK with the same 
name.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-13 Thread ND via Pcre-dev

On 2018-07-13 07:23, ph10 wrote:

On Thu, 12 Jul 2018, ND via Pcre-dev wrote:

And one more thing should also be clarified in docs:
> MARK name unlike MARK position is saved outside assertion or atomic  
group:
The MARK position *is* saved; it's just that there is never a backtrack 
into an atomic group, so that data is never accessed. But I'll take a 
look at the wording again.




The SKIP verb don't need backtracking after it fires: there is bumpalong  
and new match. If MARK position is saved then there is no problem for  
engine to discard current matching and start new matching at saved  
position without any backtracking. Isn't it?


So may be not impossibility of backtracking into atomic group is the  
reason of current behavior. May be Perl compatibility is it?


--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-13 Thread ph10
On Thu, 12 Jul 2018, ND via Pcre-dev wrote:

> And one more thing should also be clarified in docs:
> MARK name unlike MARK position is saved outside assertion or atomic group:

The MARK position *is* saved; it's just that there is never a backtrack 
into an atomic group, so that data is never accessed. But I'll take a 
look at the wording again.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-12 Thread ND via Pcre-dev

On 2018-07-12 07:25, ph10 wrote:
The (*MARK) is inside the assertion. That is what matters. I haveupdated  
the documentation to say this:

 The search for a (*MARK) name uses the normal backtracking mechanism,
  which means that it does not see (*MARK) settings that are inside
  atomic groups or assertions, because they are never re-entered by
  backtracking.



And one more thing should also be clarified in docs:
MARK name unlike MARK position is saved outside assertion or atomic group:


PCRE2 version 10.31 2018-02-12
/a(?=.(*:1))/mark
ab
 0: a
MK: 1


--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-12 Thread ph10
On Wed, 11 Jul 2018, ND via Pcre-dev wrote:

> I seen this docs before.
> But in example verb not appears inside assertion. It appears after it.

The (*MARK) is inside the assertion. That is what matters. I have 
updated the documentation to say this:

  The search for a (*MARK) name uses the normal backtracking mechanism,
  which means that it does not see (*MARK) settings that are inside
  atomic groups or assertions, because they are never re-entered by
  backtracking. Compare the following pcre2test examples:
 
  re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/  
data: abc 
 0: a   
 1: a   
data:
  re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/ 
 
data: abc   
 
 0: b  
 1: b 
   
  In the first example, the (*MARK) setting is in an atomic group, so it
  is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be
  ignored. This allows the second branch of the pattern to be tried at
  the first character position. In the second example, the (*MARK)
  setting is not in an atomic group. This allows (*SKIP:X) to
  immediately cause a new matching attempt to start at the second
  character. This time, the (*MARK) is never seen because "a" does not
  match "b", so the matcher immediately jumps to the second branch of
  the pattern.

This is exactly the same behaviour as Perl.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-11 Thread ND via Pcre-dev

On 2018-07-11 16:27, ph10 wrote:

This already appears in the docs:
 However, when one of these verbs appears inside an atomic group or in
  an assertion that is true, its effect is confined to that group,
  because once the group has been matched, there is never any
  backtracking into it.



I seen this docs before.
But in example verb not appears inside assertion. It appears after it.

--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


Re: [pcre-dev] (*SKIP:NAME) when (*MARK:NAME) is in assertion

2018-07-11 Thread ph10
On Sun, 8 Jul 2018, ND via Pcre-dev wrote:

> It seems if mark name is defined in assertion then SKIP with this name is
> ignored.
> May be a little docs clarification about this needed.

This already appears in the docs:

  However, when one of these verbs appears inside an atomic group or in
  an assertion that is true, its effect is confined to that group,
  because once the group has been matched, there is never any
  backtracking into it.

The "verbs" are the backtracking control verbs. Perl behaves in the same 
way. I will repeat the information above in some more places in the 
documentation.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev