On Sun, 10 Apr 2011, Herczeg Zoltán wrote:

> the current backtracking behaviour of PCRE for atomic blocks is not
> exactly logical to me, so I tried to compare it with PERL (v5.12.1). I
> don't know PERL (yet), this is actually my first PERL script:
> 
> my $str = "aaaabaaabaabab";
> @myarray = ($str =~ m/((?>(a+)b)+aabab)/);
> print join(",", @myarray), "\n";
> 
> Its result is: aaaabaaabaabab,aaa
> 
> I don't know how to extract the starting/ending positions, that is why
> I choose this tricky example, since I can see that the result of the
> internal capturing bracket is the second block. Even I don' understand
> why the array does not contain the full match...

The result of assigning a pattern match to an array in Perl is a list of 
the captured substrings (only). It does not contain the full match. That 
is why your array has only two elements set.

> With PCRE, the result is the following: 0, 14, 0, 14, 12, 13 - the
> internal capturing bracket holds the last 'a'. Am I interpret right
> that this is a different behaviour?

There seems to be a bug in PCRE. Using pcretest (and getting rid of your 
outermost parentheses, which just add complication), I see this:

/(?>(a+)b)+aabab/
aaaabaaabaabab
 0: aaaabaaabaabab
 1: a

Running the same thing through my Perl test script, I get this:

/(?>(a+)b)+aabab/
aaaabaaabaabab
 0: aaaabaaabaabab
 1: aaa

That is the same as you get. I think what is happening is that PCRE is
matching the atomic group four times, thereby setting the inner 
parentheses to "a", but then when it has to backtrack, it does not reset 
the value of the parentheses. However, I haven't actually looked at the
code to confirm this.

I have added this issue to my list of things to look at, which is now 
getting quite long. I am planning on restarting work on PCRE some time 
before the end of this month. It will probably take me several weeks to 
work through all the issues at my slow, retired rate of progress. :-(
I will then put out an 8.13 release candidate followed by a proper 
release. After that, I want to start straight away on integrating your
JIT code before anything else crops up.

Philip

-- 
Philip Hazel
-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to