Re: [HACKERS] Pathological regexp match

2010-02-09 Thread Joachim Wieland
On Mon, Feb 8, 2010 at 6:07 PM, David E. Wheeler wrote: > On Feb 8, 2010, at 5:15 AM, Magnus Hagander wrote: > >> The text is about 180Kb. PostgreSQL takes ~40 seconds without the >> patch, ~36 seconds with it, to extract the match from it. Perl takes >> 0.016 seconds. > > Obviously we need to sup

Re: [HACKERS] Pathological regexp match

2010-02-08 Thread David E. Wheeler
On Feb 8, 2010, at 5:15 AM, Magnus Hagander wrote: > The text is about 180Kb. PostgreSQL takes ~40 seconds without the > patch, ~36 seconds with it, to extract the match from it. Perl takes > 0.016 seconds. Obviously we need to support Perl regular expressions in core. Not PCRE, but Perl. ;-P B

Re: [HACKERS] Pathological regexp match

2010-02-08 Thread Magnus Hagander
2010/2/1 Michael Glaesemann : > > On Jan 31, 2010, at 22:14 , Tom Lane wrote: > >> The Tcl folk accepted that patch, so I went ahead and applied it to >> our code.  It would still be a good idea for us to do any testing we >> can on it, though. > > I applied the patch and ran both the test query I

Re: [HACKERS] Pathological regexp match

2010-02-01 Thread Michael Glaesemann
On Jan 31, 2010, at 22:14 , Tom Lane wrote: The Tcl folk accepted that patch, so I went ahead and applied it to our code. It would still be a good idea for us to do any testing we can on it, though. I applied the patch and ran both the test query I submitted as well as original problematic

Re: [HACKERS] Pathological regexp match

2010-01-31 Thread Tom Lane
I wrote: > I found a possible patch that seems to improve matters: AFAICS the DFA > matching is independent of the checking that cdissect() and friends do, > so we can apply that check first instead of second. This brings the > runtime down from minutes to sub-second on my machine. However I don'

Re: [HACKERS] Pathological regexp match

2010-01-31 Thread Magnus Hagander
On Sat, Jan 30, 2010 at 08:07, Tom Lane wrote: > Michael Glaesemann writes: >> We came across a regexp that takes very much longer than expected. > > I poked into this a little bit.  What seems to be happening is that the > use of non-greedy quantifiers plus backreferences turns off most of the >

Re: [HACKERS] Pathological regexp match

2010-01-29 Thread Tom Lane
Michael Glaesemann writes: > We came across a regexp that takes very much longer than expected. I poked into this a little bit. What seems to be happening is that the use of non-greedy quantifiers plus backreferences turns off most of the optimization that the regexp engine usually does, leaving

Re: [HACKERS] Pathological regexp match

2010-01-29 Thread Alvaro Herrera
Magnus Hagander wrote: > 2010/1/29 Alvaro Herrera : > > (There's a badly needed CHECK_FOR_INTERRUPTS in this code BTW) > > Incidentally, I ran across the exact same issue with a non-greedy > regexp with a client earlier this week, and put on my TODO to figure > out a good place to stick a check fo

Re: [HACKERS] Pathological regexp match

2010-01-29 Thread Magnus Hagander
2010/1/29 Alvaro Herrera : > Hi Michael, > > Michael Glaesemann wrote: >> We came across a regexp that takes very much longer than expected. >> >> PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) >> 4.1.2 20080704 (Red Hat 4.1.2-44), 64-bit >> >> SELECT 'ooo...' ~ $r$Z(Q)[^Q

Re: [HACKERS] Pathological regexp match

2010-01-28 Thread Michael Glaesemann
On Jan 28, 2010, at 23:21 , Alvaro Herrera wrote: I think the reason for this is that the first * is greedy and thus the entire expression is considered greedy. The fact that you've made the second * non-greedy does not ungreedify the RE ... Note the docs say: The above rules associat

Re: [HACKERS] Pathological regexp match

2010-01-28 Thread Alvaro Herrera
Michael Glaesemann wrote: > However, as you point out, Postgres doesn't appear to take this into > account: > > postgres=# select regexp_replace('oooZQoooAoooQooQooQooo', $r$(Z(Q) > [^Q]*A.*(\2))$r$, $s$X$s$); > regexp_replace > > oooXooo > (1 row) > > postgres=# select regexp

Re: [HACKERS] Pathological regexp match

2010-01-28 Thread Alvaro Herrera
Michael Glaesemann wrote: > > On Jan 28, 2010, at 21:59 , Alvaro Herrera wrote: > > >Hi Michael, > > > >Michael Glaesemann wrote: > >>We came across a regexp that takes very much longer than expected. > >> > >>PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu, compiled by GCC > >>gcc (GCC) 4.1.2 20080

Re: [HACKERS] Pathological regexp match

2010-01-28 Thread Michael Glaesemann
On Jan 28, 2010, at 21:59 , Alvaro Herrera wrote: Hi Michael, Michael Glaesemann wrote: We came across a regexp that takes very much longer than expected. PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44), 64-bit SELECT 'ooo...' ~ $

Re: [HACKERS] Pathological regexp match

2010-01-28 Thread Alvaro Herrera
Hi Michael, Michael Glaesemann wrote: > We came across a regexp that takes very much longer than expected. > > PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 > 20080704 (Red Hat 4.1.2-44), 64-bit > > SELECT 'ooo...' ~ $r$Z(Q)[^Q]*A.*?(\1)$r$; -- omitted for email