Greg Stark <[EMAIL PROTECTED]> writes: > Dennis Bjorklund <[EMAIL PROTECTED]> writes: > > > On 22 Sep 2004, Greg Stark wrote: > > > > > Actually this looks like it's arguably a bug to me. Why does the hash > > > join execute the sequential scan at all? Shouldn't it also like the > > > merge join recognize that the other hashed relation is empty and skip > > > the sequential scan entirely? > > > > I'm not sure you can classify that as a bug. It's just that he in one of > > the plans started with the empty scan and bacause of that didn't need > > the other, but with the hash join it started with the table that had 16 > > rows and then got to the empty one. > > No, postgres didn't do things in reverse order. It hashed the empty table and > then went ahead and checked every record of the non-empty table against the > empty hash table.
Alright, attached is a simple patch that changes this. I don't really know enough of the overall code to be sure this is safe. But from what I see of the hash join code it never returns any rows unless there's a match except for outer joins. So I think it should be safe. test=# create table a (a integer); CREATE TABLE test=# create table b (a integer); CREATE TABLE test=# set enable_mergejoin = off; SET test=# explain analyze select * from a natural join b; QUERY PLAN ----------------------------------------------------------------------------------------------------------- Hash Join (cost=22.50..345.00 rows=5000 width=4) (actual time=0.022..0.022 rows=0 loops=1) Hash Cond: ("outer".a = "inner".a) -> Seq Scan on a (cost=0.00..20.00 rows=1000 width=4) (never executed) -> Hash (cost=20.00..20.00 rows=1000 width=4) (actual time=0.005..0.005 rows=0 loops=1) -> Seq Scan on b (cost=0.00..20.00 rows=1000 width=4) (actual time=0.002..0.002 rows=0 loops=1) Total runtime: 0.089 ms (6 rows) By comparison, note the sequential scan doesn't show "never executed" on 7.4.3 (sorry, I didn't think to run the query against 8.0 before I compiled the patched version): QUERY PLAN ----------------------------------------------------------------------------------------------------------- Hash Join (cost=22.50..345.00 rows=5000 width=4) (actual time=0.881..0.881 rows=0 loops=1) Hash Cond: ("outer".a = "inner".a) -> Seq Scan on a (cost=0.00..20.00 rows=1000 width=4) (actual time=0.001..0.001 rows=0 loops=1) -> Hash (cost=20.00..20.00 rows=1000 width=4) (actual time=0.008..0.008 rows=0 loops=1) -> Seq Scan on b (cost=0.00..20.00 rows=1000 width=4) (actual time=0.004..0.004 rows=0 loops=1) Total runtime: 1.105 ms (6 rows)
Index: backend/executor/nodeHash.c =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/backend/executor/nodeHash.c,v retrieving revision 1.86 diff -c -r1.86 nodeHash.c *** backend/executor/nodeHash.c 29 Aug 2004 04:12:31 -0000 1.86 --- backend/executor/nodeHash.c 22 Sep 2004 17:51:53 -0000 *************** *** 232,237 **** --- 232,238 ---- hashtable->buckets = NULL; hashtable->nbatch = nbatch; hashtable->curbatch = 0; + hashtable->ntup = 0; hashtable->innerBatchFile = NULL; hashtable->outerBatchFile = NULL; hashtable->innerBatchSize = NULL; *************** *** 493,498 **** --- 494,501 ---- heapTuple->t_len); hashTuple->next = hashtable->buckets[bucketno]; hashtable->buckets[bucketno] = hashTuple; + + hashtable->ntup ++; } else { Index: backend/executor/nodeHashjoin.c =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/backend/executor/nodeHashjoin.c,v retrieving revision 1.64 diff -c -r1.64 nodeHashjoin.c *** backend/executor/nodeHashjoin.c 29 Aug 2004 05:06:42 -0000 1.64 --- backend/executor/nodeHashjoin.c 22 Sep 2004 17:51:54 -0000 *************** *** 127,132 **** --- 127,140 ---- hashNode->hashtable = hashtable; (void) ExecProcNode((PlanState *) hashNode); + /* An empty hash table can't return any matches */ + if (hashtable->nbatch == 0 && + hashtable->ntup == 0 && + node->js.jointype != JOIN_LEFT) + { + return NULL; + } + /* * Open temp files for outer batches, if needed. Note that file * buffers are palloc'd in regular executor context. Index: include/executor/hashjoin.h =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/include/executor/hashjoin.h,v retrieving revision 1.32 diff -c -r1.32 hashjoin.h *** include/executor/hashjoin.h 29 Aug 2004 04:13:06 -0000 1.32 --- include/executor/hashjoin.h 22 Sep 2004 17:52:04 -0000 *************** *** 57,62 **** --- 57,64 ---- int nbatch; /* number of batches; 0 means 1-pass join */ int curbatch; /* current batch #, or 0 during 1st pass */ + int ntup; /* Total number of tuples hashed in this batch */ + /* * all these arrays are allocated for the life of the hash join, but * only if nbatch > 0:
-- greg
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings