Chris Rogers <teuk...@gmail.com> writes:
> I'm on PostgreSQL 9.3.  This should reproduce on any table with 100,000+
> rows.  The EXPLAIN ANALYZE shows many more rows getting scanned with LIMIT
> 2, but I can't figure out why.

This is not -hackers material.

The first row pulled from the nestloop LEFT JOIN is created by joining
the first output row from "base" to the first output row from "filter"
(which is indirectly also the first row from "base").  So, cheap; we've
only had to read one row from "a_big_table".

The second attempt to pull a row from the nestloop LEFT JOIN requires
evaluating all the rest of the output of the "filter" CTE, to see if there
are any more "filter" rows matching the first "base" row.  (There are not,
since the "rownum" output is unique, but the nestloop doesn't know that.)
That in turn causes scanning all the rest of "base" and so all the rest
of "a_big_table".  Only after that do we get to the second "base" row
at which another join output row is possible.

If the planner were about an order of magnitude cleverer than it is,
it might realize that this would happen and switch to another plan ...
although TBH the only way I can see to not have a large startup cost
would be to somehow realize that the outputs of both CTEs are sorted
by rownum and hence can be mergejoined without an explicit sort step.
That would require more understanding of the behavior of row_number()
than it's got or probably should have.

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to