On Mon, Jan 27, 2014 at 10:00 AM, Simon Riggs <si...@2ndquadrant.com> wrote:

> On 27 January 2014 17:44, Pavel Stehule <pavel.steh...@gmail.com> wrote:
>
> > This topic is interesting - we found very bad performance with hashing
> large
> > tables with high work_mem. MergeJoin with quicksort was significantly
> > faster.
>
> I've seen this also.
>
> > I didn't deeper research - there is a possibility of virtualization
> > overhead.
>
> I took measurements and the effect was repeatable and happened for all
> sizes of work_mem, but nothing more to add.
>

I get similar results if I join on integers.  But joining on text, the hash
wins by a mile.

I use this as a simple test bed:

alter table pgbench_accounts drop CONSTRAINT pgbench_accounts_pkey;
update pgbench_accounts set filler = md5(aid::text);

set work_mem to whatever keeps the join off of disk for the given scale;
set enable_hashjoin to whatever;

select sum(a1.abalance*a2.abalance) from pgbench_accounts a1 join
pgbench_accounts a2 using (aid);
select sum(a1.abalance*a2.abalance) from pgbench_accounts a1 join
pgbench_accounts a2 using (filler);

hash integer:     1832.695 ms
merge integer:   1462.913 ms
hash text:          2353.115 ms
merge text:     11,218.628 ms

The cost estimates do not depend on the column used in the join despite a 6
fold difference in run time, so the planner is perhaps missing a trick
there.

Cheers,

Jeff

Reply via email to