On Thu, Sep 22, 2005 at 02:31:50AM -0700, Brian Candler wrote: > # New Ticket Created by Brian Candler > # Please include the string: [perl #37230] > # in the subject line of all future correspondence about this issue. > # <URL: https://rt.perl.org/rt3/Ticket/Display.html?id=37230 > > > > Summary: regular expression overflows stack in 5.8.7 (but same test case > works in 5.8.6). > > A program which replicates this problem is attached, as a gzip file to > prevent it being split by E-mail, as it contains one very long line (~32K) > > Output of perl -v, perl -V and perlbug -d on the target system is also > attached. > > The regular expression is: > > ^"(?:[^"\\]|\\.)*"\s+\d+\s+(\d+) > > (aside: it is used to parse a double-quoted string in an Apache log file; if > the string contains a double quote it appears as \" and if it contains a > backslash it appears as \\) > > It was a real Apache log entry which made Perl bomb out, and this is what > the attached test program contains. > > I don't see any reason for deep recursion in this regexp: at each stage it > has two options to match, { any character other than " or \ } or > { \ followed by any character }, and if it cannot chomp one or the other > then the regexp should fail. If it _can_ match either of those then there is > no ambiguity and possibility of backtracking, as far as I can see.
Yeah, but whenever '[]' appears in a regex, it seems the optimizer takes a vacation. ;-) The problem with the construct '([^"\\]|\\.)*' is that on each character, there are three options to take: match it with '[^"\\]', match it with '\\.', or don't match it at all (due to the *). I prefer to match double quoted strings with: /"[^"\\]*(?:\\.[^"\\]*)*"/s which eliminates all occurances of an alteration. If the above regex matches, it doesn't have to backtrack at all. Abigail
pgpGZmswWSwxY.pgp
Description: PGP signature