I have just committed a patch that makes some small changes to the way partial matches are handled in the interpreter. I hope Zoltán will in due course pick these up for the JIT. (There are new tests at the end of testinput2 which have no_jit set at the moment.)
The changes are really quite small, but I think they address some of the issues that have been discussed at length here. I am sorry that it has taken so long for me to see the essential points, but I think I do how have the general principles clear. The result is, in fact, two very minor changes: (1) The "must have inspected at least one character" condition for recognizing a partial match is now extended with "OR the pattern must contain a lookbehind of non-zero length". This applies to both hard and soft partial matches. These two conditions ensure that a partial match is recognized when there is a possibility that adding more characters may enable a complete match to be found. Interestingly, I discovered that I had documented this situation already when (in pcre2partial) I wrote: For this reason, a "no match" result should be interpreted as "partial match of an empty string" when the pattern contains lookbehinds. This sentence has now been removed from pcre2partial because an empty partial match is now given. (2) It was already documented that \z and \Z should not match at the end of a subject if PCRE2_PARTIAL_HARD is set. This was not working when no characters had been inspected (and, after (1) was implemented, still not working for non-lookbehind patterns). I have made patterns such as /\z/ give appropriate partial matches. Further points: On Fri, 19 Jul 2019, ND via Pcre-dev wrote: > Alternative suggestion may be: <snip> > Disadvantages: > 1. It may be a breaking change. Indeed, and that is one reason I have not done it. Also, the changes I *have* made are very small, which I like. :-) Finally: There is still the problem of patterns for which a return value "no match in this segment and it will never match however many more characters are added" would be useful. ND quoted /(*COMMIT)(*F)/ as a simple example. Is (*COMMIT) the only way this might happen? There is an item on the Wish List requesting a way of determining whether a match was failed by a start-up optimization or by running a matching engine. I haven't done anything about it because it would require JIT work. What could be done is to add a new field to the match data that records why a match failed. A new function (e.g. pcre2_get_fail_reason) could return this to the user. Possible returns could be: PCRE2_FAILEDBY_START_OPTIMIZATION PCRE2_FAILEDBY_INTERPRETER PCRE2_FAILEDBY_INTERPRETER_COMMIT PCRE2_FAILEDBY_JIT_START_OPTIMIZATION PCRE2_FAILEDBY_JIT PCRE2_FAILEDBY_JIT_COMMIT PCRE2_FAILEDBY_DFA_START_OPTIMIZATION PCRE2_FAILEDBY_DFA_INTERPRETER Could this be useful? Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev