2015-12-22 9:28 GMT+01:00 Viktor Leis <l...@in.tum.de>: > On 12/22/2015 02:40 AM, Craig Ringer wrote: > > On 21 December 2015 at 23:57, Viktor Leis <l...@in.tum.de <mailto: > l...@in.tum.de>> wrote: > > > > > > > > Please have a look at Figure 6 (page 6) in > > http://www.vldb.org/pvldb/vol9/p204-leis.pdf Disabling nested loop > > joins without index scan (going from (a) to (b)) results in great > > improvements across the board. And even more importantly, it avoids > > most of the cases where queries took unreasonably long and timed > > out. Basically this amounts to the being able to run the query on > > PostgreSQL or not. > > > > > > For that data, yes. But you're ignoring other important cases. Small or > even 1-element lookup tables can be one where a nestloop over a seqscan > turns out to be by far the fastest way to do the job. > > This can really add up if it's deep in a subplan that's excuted > repeatedly, or if it's part of queries that get run very frequently on a > busy OLTP system. > Ok here's what I presume to be the extreme case: Joining a large table > with a 1-entry table. > > create table r1 (a int not null); > create table r2 (b int not null); > insert into r1 select 1 from generate_series(1,1000000); > insert into r2 values (1); > analyze r1; > analyze r2; > > set enable_mergejoin to off; > set enable_nestloop to on; > set enable_hashjoin to off; > explain select count(*) from r1, r2 where r1.a = r2.b; > \timing > select count(*) from r1, r2 where r1.a = r2.b; > \timing > > set enable_nestloop to off; > set enable_hashjoin to on; > explain select count(*) from r1, r2 where r1.a = r2.b; > \timing > select count(*) from r1, r2 where r1.a = r2.b; > \timing > > I get 128.894ms vs. 183.724ms, i.e., a 43% slowdown for the hash > join. However, let me stress that this is really the extreme case: >
> - If the join has few matches (due to inserting a value different from > 1 into r2), hash and nested loop join have pretty much the same > performance. > > - If you add just one more row to r2, the hash join is faster by a > similar margin. > > - Also if there is disk IO or network involved, I suspect that you > will see no performance differences. > > There are many difficult tradeoffs in any query optimizer, but I do > not think picking nested loops where a hash join can be used is one of > those. To me this seems more like a self-inflicted wound. > this is oversimplification :( Probably correct in OLAP, but wrong in OLTP. The seq scan enforced by hash join can be problematic. Regards Pavel > -- > Viktor Leis > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >