[HACKERS] why does LIMIT 2 take orders of magnitude longer than LIMIT 1 in this query?

2014-10-31 Thread Chris Rogers
I'm on PostgreSQL 9.3.  This should reproduce on any table with 100,000+
rows.  The EXPLAIN ANALYZE shows many more rows getting scanned with LIMIT
2, but I can't figure out why.

Limit 1:

EXPLAIN ANALYZE WITH base AS (
  SELECT *, ROW_NUMBER() OVER () AS rownum FROM a_big_table
), filter AS (
  SELECT rownum, true AS thing FROM base
) SELECT * FROM base LEFT JOIN filter USING (rownum) WHERE filter.thing
LIMIT 1

Result:

Limit  (cost=283512.19..283517.66 rows=1 width=2114) (actual
time=0.019..0.019 rows=1 loops=1)
  CTE base
-  WindowAgg  (cost=0.00..188702.69 rows=4740475 width=101) (actual
time=0.008..0.008 rows=1 loops=1)
  -  Seq Scan on a_big_table  (cost=0.00..129446.75 rows=4740475
width=101) (actual time=0.003..0.003 rows=1 loops=1)
  CTE filter
-  CTE Scan on base base_1  (cost=0.00..94809.50 rows=4740475 width=8)
(actual time=0.000..0.000 rows=1 loops=1)
  -  Nested Loop  (cost=0.00..307677626611.24 rows=56180269915 width=2114)
(actual time=0.018..0.018 rows=1 loops=1)
Join Filter: (base.rownum = filter.rownum)
-  CTE Scan on base  (cost=0.00..94809.50 rows=4740475 width=2113)
(actual time=0.011..0.011 rows=1 loops=1)
-  CTE Scan on filter  (cost=0.00..94809.50 rows=2370238 width=9)
(actual time=0.002..0.002 rows=1 loops=1)
  Filter: thing
Total runtime: 0.057 ms

Limit 2:

EXPLAIN ANALYZE WITH base AS (
  SELECT *, ROW_NUMBER() OVER () AS rownum FROM a_big_table
), filter AS (
  SELECT rownum, true AS thing FROM base
) SELECT * FROM base LEFT JOIN filter USING (rownum) WHERE filter.thing
LIMIT 2

Result:

Limit  (cost=283512.19..283523.14 rows=2 width=2114) (actual
time=0.018..14162.283 rows=2 loops=1)
  CTE base
-  WindowAgg  (cost=0.00..188702.69 rows=4740475 width=101) (actual
time=0.008..4443.359 rows=4714243 loops=1)
  -  Seq Scan on a_big_table  (cost=0.00..129446.75 rows=4740475
width=101) (actual time=0.002..1421.622 rows=4714243 loops=1)
  CTE filter
-  CTE Scan on base base_1  (cost=0.00..94809.50 rows=4740475 width=8)
(actual time=0.001..10214.684 rows=4714243 loops=1)
  -  Nested Loop  (cost=0.00..307677626611.24 rows=56180269915 width=2114)
(actual time=0.018..14162.280 rows=2 loops=1)
Join Filter: (base.rownum = filter.rownum)
Rows Removed by Join Filter: 4714243
-  CTE Scan on base  (cost=0.00..94809.50 rows=4740475 width=2113)
(actual time=0.011..0.028 rows=2 loops=1)
-  CTE Scan on filter  (cost=0.00..94809.50 rows=2370238 width=9)
(actual time=0.009..6595.770 rows=2357122 loops=2)
  Filter: thing
Total runtime: 14247.374 ms


Re: [HACKERS] why does LIMIT 2 take orders of magnitude longer than LIMIT 1 in this query?

2014-10-31 Thread Tom Lane
Chris Rogers teuk...@gmail.com writes:
 I'm on PostgreSQL 9.3.  This should reproduce on any table with 100,000+
 rows.  The EXPLAIN ANALYZE shows many more rows getting scanned with LIMIT
 2, but I can't figure out why.

This is not -hackers material.

The first row pulled from the nestloop LEFT JOIN is created by joining
the first output row from base to the first output row from filter
(which is indirectly also the first row from base).  So, cheap; we've
only had to read one row from a_big_table.

The second attempt to pull a row from the nestloop LEFT JOIN requires
evaluating all the rest of the output of the filter CTE, to see if there
are any more filter rows matching the first base row.  (There are not,
since the rownum output is unique, but the nestloop doesn't know that.)
That in turn causes scanning all the rest of base and so all the rest
of a_big_table.  Only after that do we get to the second base row
at which another join output row is possible.

If the planner were about an order of magnitude cleverer than it is,
it might realize that this would happen and switch to another plan ...
although TBH the only way I can see to not have a large startup cost
would be to somehow realize that the outputs of both CTEs are sorted
by rownum and hence can be mergejoined without an explicit sort step.
That would require more understanding of the behavior of row_number()
than it's got or probably should have.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers