On Tue, 4 Jun 2019, ND via Pcre-dev wrote: > PCRE2 version 10.33 2019-04-16 > /A(?:(*ACCEPT))?B/ > A > No match > > /A(?:(*ACCEPT))?B/no_start_optimize > A > 0: A > > /A(*ACCEPT)?B/ > Failed: error 109 at offset 10: quantifier does not follow a repeatable item > A > > > I have a two questions with it: > 1. Start optimizer brakes a result to "no match" from "match". Is there > documented (I remember only example with (*COMMIT) where optimizer can make > "match" from "no match")? May be there is a way to correct this PCRE > optimization to not break a result.
It is documented that the start-up optimizations and the backtracking control verbs such as (*ACCEPT) do not play well together. In particular, the result of a match may change. Here is an extract from the pcre2pattern documentation: Note that (*COMMIT) at the start of a pattern is not the same as an anchor, unless PCRE2's start-of-match optimizations are turned off, as shown in this output from pcre2test: re> /(*COMMIT)abc/ data> xyzabc 0: abc data> re> /(*COMMIT)abc/no_start_optimize data> xyzabc No match For the first pattern, PCRE2 knows that any match must start with "a", so the optimization skips along the subject to "a" before applying the pattern to the first set of data. The match attempt then succeeds. The second pattern disables the optimization that skips along to the first character. The pattern is now applied starting at "x", and so the (*COMMIT) causes the match to fail without trying any other starting points. There is also a whole section called "Optimizations that affect backtracking verbs" which contains this: PCRE2 contains some optimizations that are used to speed up matching by running some checks at the start of each match attempt. For example, it may know the minimum length of matching subject, or that a particular character must be present. When one of these optimizations bypasses the running of a match, any included backtracking verbs will not, of course, be processed. > 2. If "(?:(*ACCEPT))?" syntax is allowed, then why a more ergonomic > "(*ACCEPT)?" is disabled? Repetition is allowed for groups such as (?:...) but not for individual backtracking verbs (for which it is meaningless). PCRE2 is not clever enough to realize that (?:(*ACCEPT))? has nothing other than (*ACCEPT) inside (?:). If it did, I would expect an error. Repeating any group that has an unconditional (*ACCEPT) does not seem meaningful but there are cases such as (a(*ACCEPT))? that could perhaps be useful. Such a group can only match zero or one times. It means "If 'a' end the match; if not 'a' carry on with the pattern." Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev