Re: [HACKERS] Uninterruptable regexp_replace in 9.3.1 ?

2014-02-26 Thread Sandro Santilli
On Fri, Feb 21, 2014 at 08:17:36PM -0500, Tom Lane wrote:
 I wrote:
  Craig Ringer cr...@2ndquadrant.com writes:
  So I'd like to confirm that this issue doesn't affect 9.1.
 
  It doesn't.  I suspect it has something to do with 173e29aa5 or one
  of the nearby commits in backend/regex/.
 
 Indeed, git bisect fingers that commit as introducing the problem.

Is there anything I can do for this to not fade to oblivion ?
Will a mail to pgsql-bugs help ?

--strk;

 ()  ASCII ribbon campaign  --  Keep it simple !
 /\  http://strk.keybit.net/rants/ascii_mails.txt  


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Uninterruptable regexp_replace in 9.3.1 ?

2014-02-21 Thread Sandro Santilli
The following snippet reveals that 9.3.1 has a bug 
in regexp_matches, which uninterruptably keeps CPU
spinning for minutes:

-8---

\timing
SET statement_timeout = 2;
-- this is only to show statement_timeout is effective here
SELECT count(*) from generate_series(1, 10);
-- this one is uninterruptable!
SELECT regexp_matches($INPUT$








/a
$b$
$c$d
;
$INPUT$,
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*?\2)+)$REG$, 'g' );

-8---

The above has been tested to be harmless with PostgreSQL 9.1.11
in that the regexp_matches call is interrupted, but it is NOT
with PostgreSQL 9.3.1.

Is it a known bug ?

Please include my address in replies as I don't get notified
of list activity. Thanks.

--strk; 

 ()  ASCII ribbon campaign  --  Keep it simple !
 /\  http://strk.keybit.net/rants/ascii_mails.txt  


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uninterruptable regexp_replace in 9.3.1 ?

2014-02-21 Thread Sandro Santilli
I've just tested 9.3.3 and it is _also_ affected.
Should I report the regression somewhere else ?

--strk;

On Fri, Feb 21, 2014 at 10:17:59AM +0100, Sandro Santilli wrote:
 The following snippet reveals that 9.3.1 has a bug 
 in regexp_matches, which uninterruptably keeps CPU
 spinning for minutes:
 
 -8---
 
 \timing
 SET statement_timeout = 2;
 -- this is only to show statement_timeout is effective here
 SELECT count(*) from generate_series(1, 10);
 -- this one is uninterruptable!
 SELECT regexp_matches($INPUT$
 
 
 
 
 
 
 
 
 /a
 $b$
 $c$d
 ;
 $INPUT$,
 $REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*?\2)+)$REG$, 'g' );
 
 -8---
 
 The above has been tested to be harmless with PostgreSQL 9.1.11
 in that the regexp_matches call is interrupted, but it is NOT
 with PostgreSQL 9.3.1.
 
 Is it a known bug ?
 
 Please include my address in replies as I don't get notified
 of list activity. Thanks.
 
 --strk; 
 
  ()  ASCII ribbon campaign  --  Keep it simple !
  /\  http://strk.keybit.net/rants/ascii_mails.txt  


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uninterruptable regexp_replace in 9.3.1 ?

2014-02-21 Thread Craig Ringer
On 02/21/2014 05:17 PM, Sandro Santilli wrote:
 The following snippet reveals that 9.3.1 has a bug 
 in regexp_matches, which uninterruptably keeps CPU
 spinning for minutes:

Huh. So it does. That's interesting.

(You should generally report things to pgsql-b...@postgresql.org btw,
not -hackers)

Looks like it's busily looping within the regex.c code, never hitting a
CHECK_FOR_INTERRUPTS.

The real question IMO is why it's taking so long. It looks like
cfindloop(...) is being called multiple times, with each call taking a
couple of seconds.

A profile of the run is attached. I don't expect to have a chance to dig
into this right away, as I haven't touched the regexp code before and
would need to spend a bit of time studying it to achieve anything.
Hopefully the test, confirmation, and profile is useful.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
# Overhead   Command  Shared Object 
  Symbol
#     .  
...
#
40.33%  postgres  postgres   [.] longest

|
--- longest
cdissect
cdissect
cdissect
cdissect
cdissect
cdissect
   |  
   |--87.25%-- cdissect
   |  |  
   |  |--78.14%-- cdissect
   |  |  |  
   |  |  |--81.92%-- cdissect
   |  |  |  |  
   |  |  |  |--68.07%-- cdissect
   |  |  |  |  |  
   |  |  |  |  |--69.11%-- 
pg_regexec
   |  |  |  |  |  
   |  |  |  |   --30.89%-- cdissect
   |  |  |  | pg_regexec
   |  |  |  |  
   |  |  |   --31.93%-- pg_regexec
   |  |  |  
   |  |   --18.08%-- pg_regexec
   |  |  
   |   --21.86%-- pg_regexec
   |  
--12.75%-- pg_regexec

19.20%  postgres  postgres   [.] shortest   

|
--- shortest
cdissect
cdissect
cdissect
cdissect
cdissect
cdissect
cdissect
cdissect
cdissect
cdissect
pg_regexec

12.76%  postgres  postgres   [.] miss   

|
--- miss
   |  
   |--88.87%-- longest
   |  cdissect
   |  cdissect
   |  cdissect
   |  cdissect
   |  cdissect
   |  cdissect
   |  |  
   |  |--93.47%-- cdissect
   |  |  |  
   |  |  |--74.23%-- cdissect
   |  |  |  |  
   |  |  |  |--90.21%-- cdissect
   |  |  |  |  |  
   |  |  |  |  |--85.39%-- cdissect
   |  |  |  |  |  | 
 
   |  |  |  |  |  
|--91.67%-- pg_regexec
   |  |  |  |  |  | 
 
   |  |  |  |  |   
--8.33%-- cdissect
   |  |  |  |  |
 pg_regexec
   |  |  |  |  |  
   |  |  |  |   --14.61%-- 
pg_regexec
   |  |  |  |  
   |  |  |   --9.79%-- pg_regexec
   |  |  |  
   |  |   --25.77%-- pg_regexec
   |  |  
   |   --6.53%-- pg_regexec
   |  
--11.13%-- shortest
  cdissect
  cdissect
  cdissect
  cdissect
  cdissect

Re: [HACKERS] Uninterruptable regexp_replace in 9.3.1 ?

2014-02-21 Thread Florian Pflug
On Feb21, 2014, at 16:46 , Craig Ringer cr...@2ndquadrant.com wrote:
 The real question IMO is why it's taking so long. It looks like
 cfindloop(...) is being called multiple times, with each call taking a
 couple of seconds.

Yeah, I wondered about this too. I've shortened the example a bit - here
are a few observations

postgres=# select regexp_matches(' $a$b$c$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 33.026 ms

postgres=# select regexp_matches('  $a$b$c$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 60.594 ms

postgres=# select regexp_matches('   $a$b$c$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 114.410 ms

postgres=# select regexp_matches('$a$b$c$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 227.467 ms

postgres=# select regexp_matches(' $a$b$c$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 452.739 ms

postgres=# select regexp_matches('  $a$b$c$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 943.098 ms

postgres=# select regexp_matches(' $a$b$c$d$e$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 200.795 ms

postgres=# select regexp_matches(' $a$b$c$d$e$f$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 298.264 ms

postgres=# select regexp_matches(' $a$b$c$d$e$f$g$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 444.219 ms

postgres=# select regexp_matches(' $a$b$c$d$e$f$g$h$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 696.137 ms

postgres=# select regexp_matches(' $a$b$c$d$e$f$g$h$i$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 974.502 ms

postgres=# select regexp_matches(' $a$b$c$d$e$f$g$h$i$j$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 1369.703 ms

postgres=# select regexp_matches('  $a$b$c$d$e$f$g$h$i$j$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
Time: 2747.766 ms

In other words, the observes runtime is roughly 2^i * 1.5^j for inputs
consiting of i leading spaces (any character will probably do) followed
by j substring of the form $X$ (X is an arbitrary character).

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uninterruptable regexp_replace in 9.3.1 ?

2014-02-21 Thread Craig Ringer
On 02/22/2014 12:04 AM, Florian Pflug wrote:
 On Feb21, 2014, at 16:46 , Craig Ringer cr...@2ndquadrant.com wrote:
 The real question IMO is why it's taking so long. It looks like
 cfindloop(...) is being called multiple times, with each call taking a
 couple of seconds.
 
 Yeah, I wondered about this too. I've shortened the example a bit - here
 are a few observations
 
 [snip]

 In other words, the observes runtime is roughly 2^i * 1.5^j for inputs
 consiting of i leading spaces (any character will probably do) followed
 by j substring of the form $X$ (X is an arbitrary character).

The biggest change in regexp support in the introduction of proper
unicode support, but that was before 9.1.

The problem report claims that the issue does not occur on 9.1, but yet:

git diff REL9_1_STABLE master  -- ./src/backend/utils/adt/regexp.c

is utterly trivial; a copyright date line change, and 1609797c which
just tweaks the includes. 9.0 has a much bigger diff.

So I'd like to confirm that this issue doesn't affect 9.1. I can
reproduce the issue againts 9.2. I don't have 9.1 or older lying around
to test against right this second.

Sandro, can you please provide the output of SELECT version() from a
PostgreSQL version that is not slow with this query?



(BTW, I'm highly amused by how the development style has changed around
here. From git blame, this is from 1997:

I haven't actually measured the speed improvement, but it `looks' a lot
quicker visually when watching regression test output.

Ha.)


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uninterruptable regexp_replace in 9.3.1 ?

2014-02-21 Thread Florian Pflug
On Feb21, 2014, at 17:29 , Craig Ringer cr...@2ndquadrant.com wrote:
 The problem report claims that the issue does not occur on 9.1, but yet:
 
 git diff REL9_1_STABLE master  -- ./src/backend/utils/adt/regexp.c
 
 is utterly trivial; a copyright date line change, and 1609797c which
 just tweaks the includes. 9.0 has a much bigger diff.

On 9.1.12:

postgres=# select regexp_matches('  $a$b$c$d$e$f$g$h$i$j$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
   regexp_matches
-
 {  $a$b$c$d$e$f$g$h$i$j,NULL}
(1 row)
Time: 1.048 ms

On HEAD

postgres=# select regexp_matches('  $a$b$c$d$e$f$g$h$i$j$',
$REG$((?:[^'$;]+|[^]*|'(?:[^']*|'')*'|(\$[^$]*\$).*\2)+)$REG$, 'g');
 regexp_matches  
-
 {  ,NULL}
 {a,NULL}
 {b,NULL}
 {c,NULL}
 {d,NULL}
 {e,NULL}
 {f,NULL}
 {g,NULL}
 {h,NULL}
 {i,NULL}
 {j,NULL}
(11 rows)
Time: 4787.239 ms

Aha! Since we go rid of regex_flavor pre-9.1, I don't have an immediate 
suspect...

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uninterruptable regexp_replace in 9.3.1 ?

2014-02-21 Thread Tom Lane
Craig Ringer cr...@2ndquadrant.com writes:
 So I'd like to confirm that this issue doesn't affect 9.1.

It doesn't.  I suspect it has something to do with 173e29aa5 or one
of the nearby commits in backend/regex/.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uninterruptable regexp_replace in 9.3.1 ?

2014-02-21 Thread Tom Lane
I wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
 So I'd like to confirm that this issue doesn't affect 9.1.

 It doesn't.  I suspect it has something to do with 173e29aa5 or one
 of the nearby commits in backend/regex/.

Indeed, git bisect fingers that commit as introducing the problem.

What seems to be happening is that citerdissect() is trying some
combinatorially large number of ways to split the input string into
substrings that can satisfy the argument of the outer + iterator.
It keeps failing on the substring starting with the first '$', and
then vainly looking for other splits that dodge the problem.

I'm not entirely sure how come the previous coding didn't fall into
the same problem.  It's quite possible Henry Spencer is smarter than
I am, but there was certainly nothing there before that was obviously
avoiding this hazard.

Worthy of note is that I think pre-9.2 is actually giving the wrong
answer --- it's claiming the whole string matches the regex,
which it does not if I'm reading it right.  This may be related to
the problem that commit 173e29aa5 was trying to fix, ie failure to
enforce backref matches in some cases.  So one possible theory is
that by failing to notice that it *didn't* have a valid match,
the old code accidentally failed to go down the rabbit hole of trying
zillions of other ways to match.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers