On Sat, 27 Jul 2019, ND via Pcre-dev wrote:

> There are some kinds of problems that exist with max_lookbehind:

It was always a hack to try to make is possible to do multi-segment 
matching using the normal matching function, something for which it was
not designed.

> - bugs
> - performance issues
> - brings excessive work to user
> 
> Now I report only about potential bugs.

Unfortunately I believe we have reached the limit of what can be done to 
the existing PCRE2 design to support multi-segment matching using 
pcre2_match().

The code that is compiled for each lookbehind contains the fixed number 
of characters by which the matching position is moved back before 
running that lookbehind. When computing these values, it is easy to 
remember the biggest one and that is the value that is how max 
lookbehind was originally computed.

I did manage recently (with a bit of effort) to take note of directly
nested lookbehinds so that, for example, /(?<=(?<=..).)/ ends up with a
maximum of 3 rather than 2. This has changed the meaning of max 
lookbehind, and now that I have looked further, I am not entirely sure
that it is the right thing to do. (As it has not been released, it could
be removed.)

However, a nested *lookahead* is handled as an independent subpattern. It 
does not interact with an outer lookbehind. That is why you see this:

> /(?<=(?=(?<=.)).)/info,allusedtext
> Capture group count = 0
> Max lookbehind = 1

Each individual lookbehind has a length of 1. I have looked at the code 
and I can see no easy way of doing anything about this.

> /(?<=\A.)/info,allusedtext
> Capture group count = 0
> Max lookbehind = 1

This is correct, because the lookbehind does indeed just move back by 
one character.

> /(?<=\G.)/info,allusedtext
> Capture group count = 0
> Max lookbehind = 1

And this is the same.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to