On Sat, 10 Aug 2019, ND via Pcre-dev wrote: > I would appreciate if at first we reach a consensus on these suggestions > before make any rewrite of partial matching that you gonna do.
I do not intend to make any more changes to the partial matching code. I may do further updates to the documentation. > As a lookahead is an independent pattern, then we can calculate it's own > maxlookbehind and add it to outer lookbehind own maxlookbehind. We don't care > much about that "total" lookbehind be exactly minimum optimal value. We really > must care that it not less then this value. If you don't care about optimal values, why bother with lookbehind at all? Just retain the entire first segment after a partial match. > > >/(?<=\A.)/info,allusedtext > > >Capture group count = 0 > > >Max lookbehind = 1 > > > >This is correct, because the lookbehind does indeed just move back by > >one character. > > > > Not correct. We already talked about \A is in it's core equal (?<!.) > regardless of how it really written in PCRE code. > It is a lookbehind assertion with 1 length. Unfortunately, we are talking about different things here. The length of (?<=\A.) is 1 because it matches 1 character. When it is processed, the current point is moved back by 1. There is then a check that the current point is at the start, but no previous character is inspected. > Consider an example. Let's subject "abc" come by two segments "ab" and "c". > First we try to match segment "ab": > > /(?<=\Ab)c/ > ab\=ph > No match > > After this we keep maxlookbehind=1 Why? It hasn't given a partial match. After no match you should throw away everything. > last symbols ("b") and concatenates they > with second segment: > > /(?<=\Ab)c/ > bc\=offset=1 > 0: bc I do not see this, either with 10.33 or the current code. I see this: PCRE2 version 10.33 2019-04-16 re> /(?<=\Ab)c/ data> bc\=offset=1 0: c data> which is correct, of course, for an independent match. > It's a wrong result as it's obvious that whole subject must nomatch. Yes, I see what you are saying. I think this shows that my attempt to suggest using max_lookbehind for partial matching does not work for a number of cases. In other words: max_lookbehind is not what you want it to be. That means it cannot be used for partial matching. I think the answer is to recommend that the entire previous segment is kept after a partial match. It is also the case that \A (and probably \G as well) are confusing and useless in partial matching. Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev