On Tue, Dec 22, 2015 at 3:28 AM, Viktor Leis <l...@in.tum.de> wrote: > I get 128.894ms vs. 183.724ms, i.e., a 43% slowdown for the hash > join. However, let me stress that this is really the extreme case: > > - If the join has few matches (due to inserting a value different from > 1 into r2), hash and nested loop join have pretty much the same > performance. > > - If you add just one more row to r2, the hash join is faster by a > similar margin. > > - Also if there is disk IO or network involved, I suspect that you > will see no performance differences. > > There are many difficult tradeoffs in any query optimizer, but I do > not think picking nested loops where a hash join can be used is one of > those. To me this seems more like a self-inflicted wound.
Well, that's a fair perspective. I don't think there's any debate about the fact that we sometimes pick nested loops when we shouldn't, or that that this can be very painful. Figuring out exactly what to do about that at the code level is harder. >From my point of view, one interesting fact about database optimization is that the numbers 0 and 1 are phenomenally important special cases. It is often the case that a join will return at most 1 row per outer row, or that an aggregate will generate exactly 1 group, or whatever. And the code is littered with special cases - including Nested Loop - that cater to making such cases fast. Those cases arise frequently because people engineer their data so that they occur frequently. If we could bias the planner against picking nested loops in cases where they will figure to win only a little but might conceivably lose a lot, that would probably be a good idea. But it's not obvious exactly how to figure that out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers