Hi,

Please see the following plan:

postgres=# explain select * from small_table left outer join big_table using
(id);
                                 QUERY PLAN
----------------------------------------------------------------------------
 Hash Left Join  (cost=126408.00..142436.98 rows=371 width=12)
   Hash Cond: (small_table.id = big_table.id)
   ->  Seq Scan on small_table  (cost=0.00..1.09 rows=9 width=8)
   ->  Hash  (cost=59142.00..59142.00 rows=4100000 width=8)
         ->  Seq Scan on big_table  (cost=0.00..59142.00 rows=4100000
width=8)
(5 rows)

Here I have a puzzle, why not choose the small table to build hash table? It
can avoid multiple batches thus save significant I/O cost, isn't it?

We can perform this query in two phases:
1) inner join, using the small table to build hash table.
2) check whether each tuple in the hash table has matches before, which can
be done with another flag bit

The only compromise is the output order, due to the two separate phases. Not
sure whether the SQL standard requires it.

Thanks,
Li Jie

Reply via email to