On Wed, Mar 14, 2018 at 11:34 AM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > LOG: duration: 26101.612 ms plan: > Query Text: SELECT newdata FROM pg_temp_3.pg_temp_16452 newdata WHERE > newdata IS NOT NULL AND EXISTS (SELECT * FROM pg_temp_3.pg_temp_16452 > newdata2 WHERE newdata2 IS NOT NULL AND newdata2 > OPERATOR(pg_catalog.*=) newdata AND newdata2.ctid > OPERATOR(pg_catalog.<>) newdata.ctid) LIMIT 1 > Limit (cost=0.00..90.52 rows=1 width=28) (actual > time=26101.608..26101.608 rows=0 loops=1) > -> Nested Loop Semi Join (cost=0.00..225220.96 rows=2488 width=28) > (actual time=26101.606..26101.606 rows=0 loops=1) > Join Filter: ((newdata2.ctid <> newdata.ctid) AND (newdata.* *= > newdata2.*)) > Rows Removed by Join Filter: 25000000 > -> Seq Scan on pg_temp_16452 newdata (cost=0.00..73.00 > rows=4975 width=34) (actual time=0.022..15.448 rows=5000 loops=1) > Filter: (newdata.* IS NOT NULL) > -> Materialize (cost=0.00..97.88 rows=4975 width=34) (actual > time=0.000..0.500 rows=5000 loops=5000) > -> Seq Scan on pg_temp_16452 newdata2 (cost=0.00..73.00 > rows=4975 width=34) (actual time=0.010..4.033 rows=5000 loops=1) > Filter: (newdata2.* IS NOT NULL)
This plan is chosen because we're looking for just one row (LIMIT 1) that has equal data but a different ctid. In this case we're not going to find one, so we'll pay the full enormous cost of the nested loop, but the startup cost is estimated as 0 and we think we are going to find a row straight away. That's because we don't know that it's unlikely for there to be a row with the same columns but a different ctid. There is a fundamental and complicated estimation problem lurking here of course and I'm not sure what to think about that yet. Maybe there is a very simple fix for this particular problem: --- a/src/backend/commands/matview.c +++ b/src/backend/commands/matview.c @@ -660,7 +660,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner, "(SELECT * FROM %s newdata2 WHERE newdata2 IS NOT NULL " "AND newdata2 OPERATOR(pg_catalog.*=) newdata " "AND newdata2.ctid OPERATOR(pg_catalog.<>) " - "newdata.ctid) LIMIT 1", + "newdata.ctid)", tempname, tempname); if (SPI_execute(querybuf.data, false, 1) != SPI_OK_SELECT) elog(ERROR, "SPI_exec failed: %s", querybuf.data); That gets me back to the sort-merge plan, but maybe it's too superficial. -- Thomas Munro http://www.enterprisedb.com