Couldn't we recognize (?:.*) as what it really is, i.e .*
ZA

Sent from Yahoo Mail on Android 
 
  On Tue, Dec 5, 2017 at 4:56 AM, p...@hermes.cam.ac.uk<p...@hermes.cam.ac.uk> 
wrote:   On Tue, 5 Dec 2017, Zoltán Herczeg wrote:

> do you use JIT?
> 
> The engine has special single character optimizations to make /.*/ fast. 

... and even if you don't use JIT, the interpreting engine can handle .* 
quickly. Think about it - the fragment .* can be processed by zipping 
along the subject until either the end or a newline is reached. If you 
have set the DOTALL option, it's even faster: it can just jump to the 
end of the subject, effectively doing no work at all. 

By contrast, when a non-capturing parenthesis is reached, a backtracking 
point has to be established so that the engine can backtrack if the 
first branch of the group fails, so that it can try other branches (it 
doesn't know there is only one branch; this is generic code). The 
backtrack is also necessary for repeated groups so that it can carry on 
after a failure as long as an appropriate number of matches have 
happened. A group such as (?:A){1,50} (where A may be complex) might match 
45 times, then hit a failure, so it must try again after 44 matches, 43 
matches, etc.

In other words, quite a lot of work has to be done to process 
parenthesized groups, and currently there is no special optimization 
that can turn (?:.)* into .* (that's effectively what it is, but why 
would anybody write it that way?).

This kind of thing is discussed at length in Friedl's "Mastering Regular 
Expressions" (O'Reilly). In Chapter 6 ("Crafting an efficient 
expression") he actually says, under the heading "Don't add superfluous
parentheses", "Unless you need to know the last character matched by .*
don't use (.)*". Anybody who is working on optimizing regex matching who 
has not read Friedl's book should rush out and buy a copy immediately
... and then read it very carefully. :-)

Philip

-- 
Philip Hazel
-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 
  
-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to