Am 21.12.2015 um 15:42 schrieb Tom Lane:
> Viktor Leis <l...@in.tum.de> writes:
>> I think it would be a good start to distinguish between nested loop
>> joins with and without a index.
> We do.
>> In my opinion, the latter should simply NEVER be chosen.
> So, if you're given a query with a non-equality join condition
> that doesn't match any index on either table, you think the planner
> should just fail? This is not realistic. Certainly nestloop with
> inner seqscan is likely to be slow, but that's already reflected
> in the cost estimates.
You are right that for non-equality joins there is no alternative, so
nested loop is obviously the right choice. However, that does not make
the selection of nested loop join in cases where a hash join would be
possible a good choice.
Please have a look at Figure 6 (page 6) in
http://www.vldb.org/pvldb/vol9/p204-leis.pdf Disabling nested loop
joins without index scan (going from (a) to (b)) results in great
improvements across the board. And even more importantly, it avoids
most of the cases where queries took unreasonably long and timed
out. Basically this amounts to the being able to run the query on
PostgreSQL or not.
The cost model does not save you because the estimated cardinality is
close to 1. Also note that a hash join is fast too if the estimate is
correct. Picking nested loop join in these situations is very risky
but there is practically no upside for this decision.
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: