Folks,

I have just committed seriously refactored code for pcre2_match() to the 
SVN repository. I have not yet updated the build system or the
documentation, which I will do over the next few weeks. There won't be a
new release for several months, but in the meantime it would be nice if 
anybody can run tests on the new code to try to shake it down as much as 
possible. (It runs all the current tests, of course, at least on my 
box.)

The JIT code is not yet updated to track the interpreter changes (see 
below) but Zoltán will be doing that in due course. As well as a lot of 
code tidies, the main changes are as follows:

1. Backtracking is no longer implemented by recursive function calls, 
and therefore does not use the system stack. The --disable-stack-for-
recursion build option is obsolete (I will make it give a warning). Once 
this is released, the regular reports of "stack exceeded" bugs should go 
away. Yay! Backtracking is implemented by using vector of fixed size 
"frames" (size depends on the number of captures in a pattern). An 
initial 10K vector (enough for ~50 frames) is allocated on the stack, 
but if this is too small, heap memory is used.

2. The "match limit" and "match limit recursion" features still work.
The first limits the number of backtracking points that are ever
established, which is effectively a limit on computing resource. The
second limits the depth of nested backtracking, which is effectively a
limit on the amount of heap memory that is used. I may change the name 
of "match limit recursion" to something more suitable - perhaps "match 
limit depth" though of course the old name will be a synonym.

3. The new implementation now allows backtracking into (possibly 
recursive) subroutine calls within the pattern, which is how Perl acts.
It would be easy to add a new option to force these calls to be atomic, 
but I would like to be sure that such an option is wanted/needed before 
adding it. An individual pattern can always use, for example, (?>(?1)) 
instead of just (?1) if atomic behaviour is wanted.

4. When a callout is called, a pointer to the ovector is made available. 
Formerly, this was the ovector supplied by the caller in a match_data 
block. Now it is an internal private vector.

I ran some timing tests on the testdata/testinput1 file and on my Linux 
box the new interpreter seems to run a bit faster than the old one.

All feedback is welcome!

Philip

-- 
Philip Hazel
-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to