Re: [HACKERS] Pathological regexp match

Michael Glaesemann Thu, 28 Jan 2010 19:43:10 -0800


On Jan 28, 2010, at 21:59 , Alvaro Herrera wrote:

Hi Michael,

Michael Glaesemann wrote:
We came across a regexp that takes very much longer than expected.
PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc(GCC) 4.1.2 20080704 (Red Hat 4.1.2-44), 64-bit
SELECT 'ooo...' ~ $r$Z(Q)[^Q]*A.*?(\1)$r$; -- omitted for emailbrevity
The ? after .* is pointless.

Interesting. I would expect that *? would be the non-greedy version of*, meaning match up to the first \1 (in this case the first Qfollowing A), rather than as much as possible.


For example, in Perl:

$ perl -e " if ('oooZQoooAoooQooQooQooo' =~ /Z(Q)[^Q]*A.*(\1)/){ print \$&; } else { print 'NO'; }" && echo

ZQoooAoooQooQooQ

$ perl -e " if ('oooZQoooAoooQooQooQooo' =~ /Z(Q)[^Q]*A.*?(\1)/){ print \$&; } else { print 'NO'; }" && echo

ZQoooAoooQ

If I'm reading the docs right, Postgres does support non-greedy * as *?:

<http://www.postgresql.org/docs/8.4/interactive/functions-matching.html#POSIX-QUANTIFIERS-TABLE>

However, as you point out, Postgres doesn't appear to take this intoaccount:

postgres=# select regexp_replace('oooZQoooAoooQooQooQooo', $r$(Z(Q)[^Q]*A.*(\2))$r$, $s$X$s$);

 regexp_replace
----------------
 oooXooo
(1 row)

postgres=# select regexp_replace('oooZQoooAoooQooQooQooo', $r$(Z(Q)[^Q]*A.*?(\2))$r$, $s$X$s$);

 regexp_replace
----------------
 oooXooo
(1 row)

Michael Glaesemann
michael.glaesem...@myyearbook.com




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pathological regexp match

Reply via email to