On Tue & Thu, 15 & 17 Sep 2009, I wrote:

> If you write the alternatives the other way round, it works as you
> expect:
> 
> /^((.)(?1)\2|.?)$/

Er, actually, it doesn't work if there are an even number of characters 
in the subject.

> I think I will use this example to make the point more clearly in the
> documentation.

... and while I did so, I figured out exactly what is going on. This is 
a difference between the way PCRE (and Python) handle recursive 
patterns, and Perl. In PCRE, they are always treated as atomic groups. 
This means that when a lower recursion has matched something, it cannot 
be re-entered to try another alternative if there is a failure higher
up. That's why the pattern above doesn't work: it matches a character at
the lowest level, but cannot come back to match an empty string when the
subject has an odd number of characters.

> I don't know how it works, but it recognizes palindromes 
> such as "A man, a plan, a canal, Panama!".
> 
> /^\W*(?:((.)\W*(?1)\W*\2|)|((.)\W*(?3)\W*\4|\W*.\W*))\W*$/i

I do now know how it works. It is essentially the same as the pattern 
above, but with the even and odd cases written out separately, so that 
the alternation happens at the outer level.

> Hm, I see that *this* pattern also makes Perl 5.10 loop. 

Well, strictly, not "loop", but "take an inordinate amount of time". It 
works fine if you use possessive quantifiers for the "non-word" 
characters:

  /^\W+(?:((.)\W+(?1)\W+\2|)|((.)\W+(?3)\W+\4|\W+.\W+))\W+$/i

I did some measurements in pcretest, and the possessive quantifier 
speeds some cases up by a factor of 10 or more.

Philip

-- 
Philip Hazel

-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to