On Wed, Mar 14, 2018 at 8:07 AM, Jeff Janes <jeff.ja...@gmail.com> wrote: > The following commit has caused a devastating performance regression > in concurrent refresh of MV: > > commit 7ca25b7de6aefa5537e0dbe56541bc41c0464f97 > Author: Tom Lane <t...@sss.pgh.pa.us> > Date: Wed Nov 29 22:00:29 2017 -0500 > > Fix neqjoinsel's behavior for semi/anti join cases. > > > The below reproduction goes from taking about 1 second to refresh, to taking > an amount of time I don't have the patience to measure. > > drop table foobar2 cascade; > create table foobar2 as select * from generate_series(1,200000); > create materialized view foobar3 as select * from foobar2; > create unique index on foobar3 (generate_series ); > analyze foobar3; > refresh materialized view CONCURRENTLY foobar3 ; > > > When I interrupt the refresh, I get a message including this line: > > CONTEXT: SQL statement "SELECT newdata FROM pg_temp_3.pg_temp_16420 newdata > WHERE newdata IS NOT NULL AND EXISTS (SELECT * FROM pg_temp_3.pg_temp_16420 > newdata2 WHERE newdata2 IS NOT NULL AND newdata2 OPERATOR(pg_catalog.*=) > newdata AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid) LIMIT 1" > > So I makes sense that the commit in question could have caused a change in > the execution plan. Because these are temp tables, I can't easily get my > hands on them to investigate further.
Ouch. A quadratic join. This looks like an invisible correlation problem. load 'auto_explain'; set auto_explain.log_min_duration = 0; set auto_explain.log_analyze = true; drop table if exists t cascade; create table t as select generate_series(1, 5000); create materialized view mv as select * from t; create unique index on mv(generate_series); analyze mv; refresh materialized view concurrently mv; HEAD: LOG: duration: 26101.612 ms plan: Query Text: SELECT newdata FROM pg_temp_3.pg_temp_16452 newdata WHERE newdata IS NOT NULL AND EXISTS (SELECT * FROM pg_temp_3.pg_temp_16452 newdata2 WHERE newdata2 IS NOT NULL AND newdata2 OPERATOR(pg_catalog.*=) newdata AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid) LIMIT 1 Limit (cost=0.00..90.52 rows=1 width=28) (actual time=26101.608..26101.608 rows=0 loops=1) -> Nested Loop Semi Join (cost=0.00..225220.96 rows=2488 width=28) (actual time=26101.606..26101.606 rows=0 loops=1) Join Filter: ((newdata2.ctid <> newdata.ctid) AND (newdata.* *= newdata2.*)) Rows Removed by Join Filter: 25000000 -> Seq Scan on pg_temp_16452 newdata (cost=0.00..73.00 rows=4975 width=34) (actual time=0.022..15.448 rows=5000 loops=1) Filter: (newdata.* IS NOT NULL) -> Materialize (cost=0.00..97.88 rows=4975 width=34) (actual time=0.000..0.500 rows=5000 loops=5000) -> Seq Scan on pg_temp_16452 newdata2 (cost=0.00..73.00 rows=4975 width=34) (actual time=0.010..4.033 rows=5000 loops=1) Filter: (newdata2.* IS NOT NULL) And with commit 7ca25b7de6aefa5537e0dbe56541bc41c0464f97 reverted: LOG: duration: 36.358 ms plan: Query Text: SELECT newdata FROM pg_temp_3.pg_temp_16470 newdata WHERE newdata IS NOT NULL AND EXISTS (SELECT * FROM pg_temp_3.pg_temp_16470 newdata2 WHERE newdata2 IS NOT NULL AND newdata2 OPERATOR(pg_catalog.*=) newdata AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid) LIMIT 1 Limit (cost=756.95..939.50 rows=1 width=28) (actual time=36.354..36.354 rows=0 loops=1) -> Merge Semi Join (cost=756.95..2947.51 rows=12 width=28) (actual time=36.352..36.352 rows=0 loops=1) Merge Cond: (newdata.* *= newdata2.*) Join Filter: (newdata2.ctid <> newdata.ctid) Rows Removed by Join Filter: 5000 -> Sort (cost=378.48..390.91 rows=4975 width=34) (actual time=9.622..10.300 rows=5000 loops=1) Sort Key: newdata.* USING *< Sort Method: quicksort Memory: 622kB -> Seq Scan on pg_temp_16470 newdata (cost=0.00..73.00 rows=4975 width=34) (actual time=0.021..4.986 rows=5000 loops=1) Filter: (newdata.* IS NOT NULL) -> Sort (cost=378.48..390.91 rows=4975 width=34) (actual time=7.378..8.010 rows=5000 loops=1) Sort Key: newdata2.* USING *< Sort Method: quicksort Memory: 622kB -> Seq Scan on pg_temp_16470 newdata2 (cost=0.00..73.00 rows=4975 width=34) (actual time=0.017..3.034 rows=5000 loops=1) Filter: (newdata2.* IS NOT NULL) -- Thomas Munro http://www.enterprisedb.com