Re: [pcre-dev] Matching file contents as one string using PCRE_DOTALL

ph10 Fri, 10 May 2013 08:19:02 -0700

On Fri, 10 May 2013, [email protected] wrote:

> My personal thanks for taking the time to comment in this thread.


It's my birthday, so I thought I'd give you a present. <grin>

> I don't want any form of LookBehind (to minimize stack use) occurring
> at all. Also,  if you're telling me I can drop the trailing .* when
> looking inside of files and it increases performance then I'm all for
> that! However, it will be awhile before I can create a parser that is
> highly efficient. Because of the grouping I thought .* made things
> work in testing. I'll double check.

I don't think look behinds affect stack usage any more than look aheads, 
which is what you are using. The only difference is where the look 
starts from. But any kind of assertion is more resource hungry than a 
straight search. I created a line of data nearly 200,000 characters 
long, and searched for the last word, which happened to be "STUDYING".
Using pcretest's "-tm 2000" option, I got these times (with only a few 
tests, on a Linux box):

/STUDYING/           0.09 ms
/(?=.*STUDYING.*)/   0.2ms
/(?=.*STUDYING)/     0.2ms

So leaving off the trailing .* doesn't make much difference. I also 
tried capturing and non-capturing brackets instead of your look aheads, 
and again got around 0.2ms.

But why is there such a difference? Answer: because of certain 
optimizations, which can be checked with pcretest:

/STUDYING/
Capturing subpattern count = 0
No options
First char = 'S'
Need char = 'G'

In this situation, PCRE can whip through the string till it finds 'S', 
before it starts doing the match. That's fast.

/(?:.*STUDYING.*)/
Capturing subpattern count = 0
No options
First char at start or follows newline
Need char = 'G'

In this case, match attempts will be tried at many more locations.

What I'm really pointing out in this post is that there is a tool 
(pcretest) that allows you to check out the performance of different 
patterns to see which works best.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Matching file contents as one string using PCRE_DOTALL

Reply via email to