Re: Planner chose a much slower plan in hashjoin, using a large table as the inner table.

2019-11-28 Thread Andy Fan
On Thu, Nov 28, 2019 at 7:19 PM Jinbao Chen wrote: > Hi Andy, > > I just test the query on 12.1. But pg use big_table as inner. > > demo=# explain (costs off) select * from t_small, t_big where a = b; > QUERY PLAN > > Hash Join >Hash Cond: (t

Re: Planner chose a much slower plan in hashjoin, using a large table as the inner table.

2019-11-28 Thread Jinbao Chen
Hi Andy, I just test the query on 12.1. But pg use big_table as inner. demo=# explain (costs off) select * from t_small, t_big where a = b; QUERY PLAN Hash Join Hash Cond: (t_small.a = t_big.b) -> Seq Scan on t_small -> Hash -

Re: Planner chose a much slower plan in hashjoin, using a large table as the inner table.

2019-11-28 Thread Andy Fan
On Fri, Nov 22, 2019 at 6:51 PM Jinbao Chen wrote: > Hi hackers, > > I have made a patch to fix the problem. > > Added the selection rate of the inner table non-empty bucket > > The planner will use big table as inner table in hash join > if small table have fewer unique values. But this plan is

Re: Planner chose a much slower plan in hashjoin, using a large table as the inner table.

2019-11-22 Thread Jinbao Chen
Hi hackers, I have made a patch to fix the problem. Added the selection rate of the inner table non-empty bucket The planner will use big table as inner table in hash join if small table have fewer unique values. But this plan is much slower than using small table as inner table. In general, th

Re: Planner chose a much slower plan in hashjoin, using a large table as the inner table.

2019-11-19 Thread Jinbao Chen
I think we have the same understanding of this issue. Sometimes use smaller costs on scanning the chain in bucket like below would be better. run_cost += outer_path_rows * some_small_probe_cost; run_cost += hash_qual_cost.per_tuple * approximate_tuple_count(); In some version of GreenPlum(a databa

Re: Planner chose a much slower plan in hashjoin, using a large table as the inner table.

2019-11-18 Thread Thomas Munro
On Mon, Nov 18, 2019 at 7:48 PM Jinbao Chen wrote: > In the test case above, the small table has 3000 tuples and 100 distinct > values on column ‘a’. > If we use small table as inner table. The chan length of the bucket is 30. > And we need to > search the whole chain on probing the hash table.

Planner chose a much slower plan in hashjoin, using a large table as the inner table.

2019-11-17 Thread Jinbao Chen
Hi Hackers, The planner will use big table as inner table in hash join if small table have fewer unique values. But this plan is much slower than using small table as inner table. This problem occurs on master branch without parallel scan. For example create table t_small(a int); create table t_