Re: [HACKERS] costing of hash join

Tom Lane Fri, 03 Jan 2014 14:52:16 -0800

Jeff Janes <jeff.ja...@gmail.com> writes:
> I'm trying to figure out why hash joins seem to be systematically underused
> in my hands.  In the case I am immediately looking at it prefers a merge
> join with both inputs getting seq scanned and sorted, despite the hash join
> being actually 2 to 3 times faster, where inputs and intermediate working
> sets are all in memory.  I normally wouldn't worry about a factor of 3
> error, but I see this a lot in many different situations.  The row
> estimates are very close to actual, the errors is only in the cpu estimates.


Can you produce a test case for other people to look at?

What datatype(s) are the join keys?

> A hash join is charged cpu_tuple_cost for each inner tuple for inserting it
> into the hash table:

Doesn't seem like monkeying with that is going to account for a 3x error.

Have you tried using perf or oprofile or similar to see where the time is
actually, rather than theoretically, going?

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] costing of hash join

Reply via email to