On Fri, 10 May 2013, [email protected] wrote: > My personal thanks for taking the time to comment in this thread.
It's my birthday, so I thought I'd give you a present. <grin> > I don't want any form of LookBehind (to minimize stack use) occurring > at all. Also, if you're telling me I can drop the trailing .* when > looking inside of files and it increases performance then I'm all for > that! However, it will be awhile before I can create a parser that is > highly efficient. Because of the grouping I thought .* made things > work in testing. I'll double check. I don't think look behinds affect stack usage any more than look aheads, which is what you are using. The only difference is where the look starts from. But any kind of assertion is more resource hungry than a straight search. I created a line of data nearly 200,000 characters long, and searched for the last word, which happened to be "STUDYING". Using pcretest's "-tm 2000" option, I got these times (with only a few tests, on a Linux box): /STUDYING/ 0.09 ms /(?=.*STUDYING.*)/ 0.2ms /(?=.*STUDYING)/ 0.2ms So leaving off the trailing .* doesn't make much difference. I also tried capturing and non-capturing brackets instead of your look aheads, and again got around 0.2ms. But why is there such a difference? Answer: because of certain optimizations, which can be checked with pcretest: /STUDYING/ Capturing subpattern count = 0 No options First char = 'S' Need char = 'G' In this situation, PCRE can whip through the string till it finds 'S', before it starts doing the match. That's fast. /(?:.*STUDYING.*)/ Capturing subpattern count = 0 No options First char at start or follows newline Need char = 'G' In this case, match attempts will be tried at many more locations. What I'm really pointing out in this post is that there is a tool (pcretest) that allows you to check out the performance of different patterns to see which works best. Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
