On Sat, 27 Jul 2019, ND via Pcre-dev wrote: > There are some kinds of problems that exist with max_lookbehind:
It was always a hack to try to make is possible to do multi-segment matching using the normal matching function, something for which it was not designed. > - bugs > - performance issues > - brings excessive work to user > > Now I report only about potential bugs. Unfortunately I believe we have reached the limit of what can be done to the existing PCRE2 design to support multi-segment matching using pcre2_match(). The code that is compiled for each lookbehind contains the fixed number of characters by which the matching position is moved back before running that lookbehind. When computing these values, it is easy to remember the biggest one and that is the value that is how max lookbehind was originally computed. I did manage recently (with a bit of effort) to take note of directly nested lookbehinds so that, for example, /(?<=(?<=..).)/ ends up with a maximum of 3 rather than 2. This has changed the meaning of max lookbehind, and now that I have looked further, I am not entirely sure that it is the right thing to do. (As it has not been released, it could be removed.) However, a nested *lookahead* is handled as an independent subpattern. It does not interact with an outer lookbehind. That is why you see this: > /(?<=(?=(?<=.)).)/info,allusedtext > Capture group count = 0 > Max lookbehind = 1 Each individual lookbehind has a length of 1. I have looked at the code and I can see no easy way of doing anything about this. > /(?<=\A.)/info,allusedtext > Capture group count = 0 > Max lookbehind = 1 This is correct, because the lookbehind does indeed just move back by one character. > /(?<=\G.)/info,allusedtext > Capture group count = 0 > Max lookbehind = 1 And this is the same. Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev