On Mon, 12 Dec 2011, Philip Hazel wrote: > On Mon, 12 Dec 2011, Graycode wrote: > > > When PCRE is compiled with NO_RECURSE then it uses allocated heap > > storage instead of recursion with the stack. > > > > The majority of expressions that I use do not require recursion and > > perhaps neither does a large portion of the PCRE testdata suite. > > I think you might have misunderstood this. The terminology is rather > badly chosen. "Recursion" here does not refer to recursion in the > patterns, but to the fact that the match() function calls itself > recursively while performing the match. This uses memory on the system > stack, and that can run out in environments that have a limited system > stack.
I do not think there is a misunderstanding but I will try to explain my position for the sake of clarity. Normally the NO_RECURSE option is not used, so match() calls itself recursively and thereby uses more and more memory on the system stack as the level deepens. With the NO_RECURSE option, the match() function simulates recursion using the defined C macros RMATCH() and RRETURN(). Here the match() function does not call itself, so it does not use true stack-based recursion. The stack does not grow for each (simulated) recursion. Instead a "frame" to hold the necessary variables is allocated and later freed from the heap each time. Variables that were put on the stack, such as 'caseless' and 'condcode', remain as a single instance on the stack until the function returns. What I referred to as expressions that "do not require recursion" means those where the match() function does not invoke the RMATCH() macro. That covers both true stack-based recursion when NO_RECURSE is not used, and also the simulated recursion using heap allocation when it is used. I hope that clarifies it. Currently the initial "frame" of variables is allocated from the heap each time match() is invoked and NO_RECURSE is being used. It's the first thing that the match() function does. What I am suggesting here is that the initial first-time allocation can be avoided by having a single instance (total of one) of that initial frame be placed on the stack. > Almost every pattern except the very simplest will use one or > more recursive calls. For example, the pattern /(ab)(cd)(ef)/ recurses 4 > times (you can find this out by running pcretest with the -M option). Here I am treading into areas where my lack of knowledge about PCRE may highlight some confusion. My having added other debugging code indicates that the maximum depth for /(ab)(cd)(ef)/ was 3. There was the initial frame allocation each time match() was invoked, and then internal variable 'rdepth' never went beyond 2 more than the initital frame. I've been using a source HTML page of www.yahoo.com for my test data. The content was 178,485 bytes at the time I saved it. Using the example of /(ab)(cd)(ef)/ indicates that the match() function was invoked 7,151 times for a single call to pcre_exec(). During those the match() function allocated a heap frame 7,375 times not counting the initial frame. So the total number of allocated heap frames was 14,526. Using the PCRE doc/pcre.3 file as test data for the same pattern /(ab)(cd)(ef)/ the pcre_exec() invoked match() 291 times and during those it allocated another 297 heap frames (sum total of 588 heap allocations) with the same maximum depth of 3. Of the 5,150 expressions contained in the PCRE testdata files that I normally run, 1,959 (roughly 38%) needed no further "recursive" process within the match() function beyond the single initial frame. The match() function never invoked RMATCH() to simulate a recursion for any of those 1,959 samples. Hence my statement that a "large portion of the PCRE testdata suite" would not require recursion. Hopefully in a few days I'll find the time to re-verify the added debugging code that I put in place to count the things that are happening internally. For my basis testing I've been using the 4 regular expressions that were used by http://lh3lh3.users.sourceforge.net/reb.shtml to compare the performance of various engines. The change I proposed was not detrimental to any of those, and it did noticeably help one of them. It increased the efficiency of the 3rd (Date) expression by over 35 percent. This is based on elapsed time measured over running it in series many times on the yahoo HTML file. During the testing the data file was loaded into memory to avoid variances such as those that could be caused by I/O operations. > The NO_RECURSE feature is designed specifically not to put data on the > system stack. Instead of recursively calling match() it saves what it > needs on a heap stack, and jumps to the start of the function. Yes, I fully comprehend that. > > > --- pcre_exec.c (version 8.21-RC1) > > +++ pcre_exec.c (working copy) > > @@ -328,7 +328,8 @@ > > {\ > > heapframe *oldframe = frame;\ > > frame = oldframe->Xprevframe;\ > > - (pcre_stack_free)(oldframe);\ > > + if (oldframe != &frame_zero)\ > > + (pcre_stack_free)(oldframe);\ > > if (frame != NULL)\ > > {\ > > rrc = ra;\ > > @@ -485,8 +486,9 @@ > > heap whenever RMATCH() does a "recursion". See the macro definitions > > above. */ > > > > #ifdef NO_RECURSE > > -heapframe *frame = (heapframe *)(pcre_stack_malloc)(sizeof(heapframe)); > > -if (frame == NULL) RRETURN(PCRE_ERROR_NOMEMORY); > > +heapframe frame_zero; > ^^^^^^^^^^^^^^^^^^^^^^^^ > ^^^^^^^^^^^^^^^^^^^^^^^^ > ^^^^^^^^^^^^^^^^^^^^^^^^ > This statement is creating a frame *on the system stack*, which is > precisely the thing NO_RECURSE is trying to avoid doing. > > Disclaimer: I'm writing this in a hurry. Apologies if I have > misunderstood you, but if all you are doing is saving one malloc for a > frame, I don't think it will apply to many regexes, if any. I hope you get a chance to review this again in the future. I am suggesting that ONE and a maximum of only one frame be placed on the stack. This will add a one-time total of roughly 276 bytes of stack usage. Regardless of how many times match() and its RMATCH() is used during execution of an expression, there will never be more than a single instance of 'frame_zero' residing on the stack. This is a simple tweak that seems to provide some better speed when the NO_RECURSE option is used. That option is probably not used very often because this isn't enough yet to make it comparable to the speed of stack-based recursion. Regards, Graycode -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
