[pcre-dev] [Bug 1347] Deep recursion causing SegFault

Philip Hazel Wed, 17 Apr 2013 08:56:22 -0700

------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1347

--- Comment #4 from Philip Hazel <[email protected]>  2013-04-17 16:56:03 
---
On Tue, 16 Apr 2013, [email protected] wrote:

> > This is not a bug, though it is frequently mis-reported as one, so much
> > so that I keep this standard response on file:
> Unfortunately that's the default behaviour for most distributions
> causing quite some hard to debug crashes as (see stacktrace of my
> original report) there's not much relation between e.g. the executed PHP
> code and the actual crash down in libpcre - thus I'm getting quite
> regularly reports on crashes where there's not much debugging that can
> be done. In this case I got quite lucky as there's a clear sample
> causing the trouble where even the code to reproduce is quite minimal.

As you probably realize, I work only on PCRE. I don't use PHP and know 
nothing about it. All those PHP functions in your backtrack don't mean 
anthing to me, I'm afraid. Nor do I know anything about GeSHi or Lua. In
order to check for bugs, I have to have the pattern and subject string
that is being matched. Sorry if that sounds harsh!

> But even than I think here's the best way to workaround the issue. How
> much work would it be to allow for libpcre to be compiled ina kinda
> "hybrid mode" where stack-based matching is tried first and if reaching
> the (stack) recursion limit a retry using the heap-based implementation
> is performed with much higher limits?

A client program could do that for itself. It could compile two separate
PCRE libraries (for each data size - 8, 16, 32 - that it needs) and when
a call to the stack-based one returned with a recursion limit exceeded,
it could call the other library. [To avoid name clashes, there would 
have to be some fiddling, probably involving macros, when compiling the 
two libraries. Sorting our this detail would involve some work, I 
guess.]

> Is there a way to get some "stats" for a regexp like max recursion
> depth, taken branches, steps needed for solution, ...?

The pcretest program has a facility for determining the max recursion
depth, given a pattern and a subject string. Grep for \M in the pcretest
man page. 

> >   (a) Work on your regular expression pattern so that it uses less
> >       memory. Sometimes using atomic groups can help with this.
> Will need some debugging aid for this.

pcretest is your friend.

> Some questions on that:
> - - Do you know if PHP uses the pcrejit option? And if yes, since around
> which version?

Sorry, I know nothing about PHP.

> - - Given something like ^\d+(?:(?:A|B|BC|BD|C|DB)+\d+)*$ What would be
> the best way to ask libpcre to try various combinations of the letters
> while avoiding backtracking once one such consecutive group has been found?

Atomic groups and possessive quantifiers. For example, you probably 
should write \d++ instead of \d+ above. Given a run of digits, there is 
no point trying with all possible numbers of them, since what follows is 
never a digit. [PCRE can sometimes convert \d+ into \d++ for you 
automatically, but not in that example. It only does it when the very 
next item is obviously not a digit.]

Again, if you write (?:A|B|BC|BD|C|DB)++ once it has found A it will 
carry on to look for a digit, but if there isn't one, it won't look for
other letters. But that means you should re-order the alternatives, as
otherwise it will never find BC (for example) because it will find B 
first.

Regards,
Philip

-- 
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

[pcre-dev] [Bug 1347] Deep recursion causing SegFault

Reply via email to