Couldn't we recognize (?:.*) as what it really is, i.e .* ZA Sent from Yahoo Mail on Android On Tue, Dec 5, 2017 at 4:56 AM, p...@hermes.cam.ac.uk<p...@hermes.cam.ac.uk> wrote: On Tue, 5 Dec 2017, Zoltán Herczeg wrote:
> do you use JIT? > > The engine has special single character optimizations to make /.*/ fast. ... and even if you don't use JIT, the interpreting engine can handle .* quickly. Think about it - the fragment .* can be processed by zipping along the subject until either the end or a newline is reached. If you have set the DOTALL option, it's even faster: it can just jump to the end of the subject, effectively doing no work at all. By contrast, when a non-capturing parenthesis is reached, a backtracking point has to be established so that the engine can backtrack if the first branch of the group fails, so that it can try other branches (it doesn't know there is only one branch; this is generic code). The backtrack is also necessary for repeated groups so that it can carry on after a failure as long as an appropriate number of matches have happened. A group such as (?:A){1,50} (where A may be complex) might match 45 times, then hit a failure, so it must try again after 44 matches, 43 matches, etc. In other words, quite a lot of work has to be done to process parenthesized groups, and currently there is no special optimization that can turn (?:.)* into .* (that's effectively what it is, but why would anybody write it that way?). This kind of thing is discussed at length in Friedl's "Mastering Regular Expressions" (O'Reilly). In Chapter 6 ("Crafting an efficient expression") he actually says, under the heading "Don't add superfluous parentheses", "Unless you need to know the last character matched by .* don't use (.)*". Anybody who is working on optimizing regex matching who has not read Friedl's book should rush out and buy a copy immediately ... and then read it very carefully. :-) Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev