On Mon, Jan 27, 2014 at 10:00 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
> On 27 January 2014 17:44, Pavel Stehule <pavel.steh...@gmail.com> wrote: > > > This topic is interesting - we found very bad performance with hashing > large > > tables with high work_mem. MergeJoin with quicksort was significantly > > faster. > > I've seen this also. > > > I didn't deeper research - there is a possibility of virtualization > > overhead. > > I took measurements and the effect was repeatable and happened for all > sizes of work_mem, but nothing more to add. > I get similar results if I join on integers. But joining on text, the hash wins by a mile. I use this as a simple test bed: alter table pgbench_accounts drop CONSTRAINT pgbench_accounts_pkey; update pgbench_accounts set filler = md5(aid::text); set work_mem to whatever keeps the join off of disk for the given scale; set enable_hashjoin to whatever; select sum(a1.abalance*a2.abalance) from pgbench_accounts a1 join pgbench_accounts a2 using (aid); select sum(a1.abalance*a2.abalance) from pgbench_accounts a1 join pgbench_accounts a2 using (filler); hash integer: 1832.695 ms merge integer: 1462.913 ms hash text: 2353.115 ms merge text: 11,218.628 ms The cost estimates do not depend on the column used in the join despite a 6 fold difference in run time, so the planner is perhaps missing a trick there. Cheers, Jeff