Re: [HACKERS] New style of hash join proposal

Gregory Stark Mon, 17 Mar 2008 20:45:32 -0700

"Tom Lane" <[EMAIL PROTECTED]> writes:

> Gregory Stark <[EMAIL PROTECTED]> writes:
>> "Tom Lane" <[EMAIL PROTECTED]> writes:
>>> Nested Loop  (cost=5.39..198.81 rows=51 width=244)
>>> ->  HashAggregate  (cost=1.06..1.11 rows=5 width=4)
>>>     ->  Seq Scan on int4_tbl b  (cost=0.00..1.05 rows=5 width=4)
>>> ->  Bitmap Heap Scan on tenk1 a  (cost=4.33..39.41 rows=10 width=244)
>>> Recheck Cond: (a.thousand = b.f1)
>>>     ->  Bitmap Index Scan on tenk1_thous_tenthous  (cost=0.00..4.33 rows=10 
>>> width=0)
>>> Index Cond: (a.thousand = b.f1)
>>> (7 rows)
>
>> Sure, but that's still re-executing the bitmap index scan 51 times -- 
>> possibly
>> having to fetch the same records off disk repeatedly.


sorry 5 times

> It's not fetching any record repeatedly, because the HashAggregate
> step eliminated duplicate keys on the other side.

Only because it's an IN query. If it was a normal join which hash joins are
perfectly capable of handling then it would be. As it is it could be fetching
the same page repeatedly but not precisely the same tuples. 

It happens that transforming this query to 

explain analyze select * from tenk1 a where thousand = any (array(select f1 
from int4_tbl b));

makes it run about 40% faster. That could just be that the hash is small
enough that a linear search is more efficient than calling hashint4 though.
(Perhaps we should be protecting against that in dynahash, actually)

-- 
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New style of hash join proposal

Reply via email to