This discussion deals with the NO_RECURSE compile-time PCRE option. It is a follow-up on a previous modification titled "A tweak for the NO_RECURSE option" from last December.
Quoting parts of that earlier discussion: Normally the NO_RECURSE option is not used, so match() calls itself recursively and thereby uses more and more memory on the system stack as the level deepens. With the NO_RECURSE option, the match() function simulates recursion using the defined C macros RMATCH() and RRETURN(). Here the match() function does not call itself, so it does not use true stack-based recursion. For my basis testing I've been using the 4 regular expressions that were used by http://lh3lh3.users.sourceforge.net/reb.shtml to compare the performance of various engines. The change I proposed was not detrimental to any of those, and it did noticeably help one of them. It increased the efficiency of the 3rd (Date) expression by over 35 percent. This is based on elapsed time measured over running it in series many times on the yahoo HTML file. During the testing the data file was loaded into memory to avoid variances such as those that could be caused by I/O operations. ------ PCRE benefitted from that "tweak" because it avoided the need for a frame allocation every single time the match() function was invoked. This is limited to expressions that require recursive processing, there is no impact on expressions that do not require recursion to execute. Now here is more to the story. For a single exec() the match() function may be called repeatedly, and with each call it may allocate and release heap frames many times when executing an expression that requires recursive processing. There can be a huge number recursions being done. On my Windows system the elapsed clock time for the frame memory allocations and release was exceeding all of the other work that PCRE was doing. What was needed is for the pcre_exec() process to be able to retain all previous frame allocations done by the match() function. Basically whatever maximum allocations are done for simulating the recursion need to be retained throughout the execute process. They would be released only at the end of that process. Attached is (hopefully) a GIF image showing a summary of test results. It shows millisecond measurements of using the same 4 expressions on the same 178K html data. The unmodified NO_RECURSE took (average) over 300% more time, or basically about 4 times as long on that test data. Applying the modification to retain recursive frame allocations throughout the life of a call to pcre_exec() has reduced the process to be only about 7% longer. Said differently, the modification reduced the NO_RECURSE clock time by almost 75%. The 7% probably can't be eliminated since it probably represents the match() function accessing its variables through a frame pointer vs. them being on the stack. It's a significant improvement. This modification brings the NO_RECURSE option to be very competitive with stack-based recursion, and of course it still means the avoidance of run-time stack faults. The impact will vary depending upon how much recursion is needed for executing a particular expression. Again, there is no impact on expressions that do not require recursive processing. I'm sending a patch to Philip instead of posting it here in order to enable him to review and adjust whatever he wants. How it was done: 1) I defined a new item in CONFIG.H to specify whether to retain the previous frame allocation behavior or to use this new method. Perhaps Philip will choose not to have an option of selecting each method. 2) A way was need to communicate the frame basis between pcre_exec() and match() functions. While a new parameter could have been used, I chose instead to add a (void *) pointer to the definition of the "match_data" structure declared in pcre_internal.h. 3) In pcre_exec.c, a forward chain pointer was added to the "heapframe" structure. This was key as it allows the match() function to re-use the next frame if it's previously already been down to that recursion depth at any time for the execution of that particular expression. 4) The RMATCH() and RRETURN() macros were modified to support this modification. 5) The pcre_exec() function initializes a base frame and populates a pointer to it in the "match_data" structure for use by the match() function. In 3 places where function pcre_exec() normally returns after having called match() any number of times, the chain of allocated frames is released prior to returning. A new function was created to release the chain of allocated frames. Regards, Graycode
<<attachment: Timing_No_Recurse.gif>>
-- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
