On Sat, May 1, 2021, at 21:46, Tom Lane wrote: > I found a nasty performance problem in commit 824bf7190: given the > right sort of regex, checkmatchall() takes an unreasonable amount > of time.
Nice catch. > fix-exponential-cost-of-checkmatchall-1.patch I've successfully tested the patch on the regex corpus: SELECT is_match <> (subject ~ pattern), captured IS DISTINCT FROM regexp_match(subject, pattern, flags), COUNT(*) FROM performance_test GROUP BY 1,2 ORDER BY 1,2; ?column? | ?column? | count ----------+----------+--------- f | f | 3253889 (1 row) HEAD (651d005e76bc0b9542615f609b4d0d946035dc58) Time: 94096.020 ms (01:34.096) Time: 93102.287 ms (01:33.102) Time: 93333.746 ms (01:33.334) fix-exponential-cost-of-checkmatchall-1.patch Time: 95247.529 ms (01:35.248) Time: 92617.502 ms (01:32.618) Time: 93259.700 ms (01:33.260) I've also tested the problematic type of regexes: HEAD (651d005e76bc0b9542615f609b4d0d946035dc58) SELECT regexp_matches('', '(.|){20}',''); Time: 61.613 ms SELECT regexp_matches('', '(.|){25}',''); Time: 1928.674 ms (00:01.929) SELECT regexp_matches('', '(.|){27}',''); Time: 7789.601 ms (00:07.790) fix-exponential-cost-of-checkmatchall-1.patch SELECT regexp_matches('', '(.|){20}',''); Time: 0.965 ms SELECT regexp_matches('', '(.|){25}',''); Time: 0.586 ms SELECT regexp_matches('', '(.|){27}',''); Time: 0.788 ms Nice improvement, thanks. /Joel