On Thu, Sep 22, 2005 at 02:31:50AM -0700, Brian Candler wrote:
> # New Ticket Created by  Brian Candler 
> # Please include the string:  [perl #37230]
> # in the subject line of all future correspondence about this issue. 
> # <URL: https://rt.perl.org/rt3/Ticket/Display.html?id=37230 >
> 
> 
> Summary: regular expression overflows stack in 5.8.7 (but same test case
> works in 5.8.6).
> 
> A program which replicates this problem is attached, as a gzip file to
> prevent it being split by E-mail, as it contains one very long line (~32K)
> 
> Output of perl -v, perl -V and perlbug -d on the target system is also
> attached.
> 
> The regular expression is:
> 
>     ^"(?:[^"\\]|\\.)*"\s+\d+\s+(\d+)
> 
> (aside: it is used to parse a double-quoted string in an Apache log file; if
> the string contains a double quote it appears as \" and if it contains a
> backslash it appears as \\)
> 
> It was a real Apache log entry which made Perl bomb out, and this is what
> the attached test program contains.
> 
> I don't see any reason for deep recursion in this regexp: at each stage it
> has two options to match, { any character other than " or \ } or
> { \ followed by any character }, and if it cannot chomp one or the other
> then the regexp should fail. If it _can_ match either of those then there is
> no ambiguity and possibility of backtracking, as far as I can see.


Yeah, but whenever '[]' appears in a regex, it seems the optimizer
takes a vacation. ;-)

The problem with the construct  '([^"\\]|\\.)*' is that on each character,
there are three options to take: match it with '[^"\\]', match it with '\\.',
or don't match it at all (due to the *).

I prefer to match double quoted strings with:

    /"[^"\\]*(?:\\.[^"\\]*)*"/s

which eliminates all occurances of an alteration. If the above regex matches,
it doesn't have to backtrack at all.



Abigail

Attachment: pgpGZmswWSwxY.pgp
Description: PGP signature

Reply via email to