On 2023/4/3 19:44, Tomas Vondra wrote:
On 4/3/23 12:23, Quan Zongliang wrote:
Hi,

I found that the enable_hashjoin disables HashJoin completely.
It's in the function add_paths_to_joinrel:

if (enable_hashjoin || jointype == JOIN_FULL)
     hash_inner_and_outer(root, joinrel, outerrel, innerrel,
                 jointype, &extra);

Instead, it should add a disable cost to the cost calculation of
hashjoin. And now final_cost_hashjoin does the same thing:

if (!enable_hashjoin)
     startup_cost += disable_cost;


enable_mergejoin has the same problem.

Test case:

CREATE TABLE t_score_01(
s_id int,
s_score int,
s_course char(8),
c_id int);

CREATE TABLE t_student_01(
s_id int,
s_name char(8));

insert into t_score_01 values(
generate_series(1, 1000000), random()*100, 'course', generate_series(1,
1000000));

insert into t_student_01 values(generate_series(1, 1000000), 'name');

analyze t_score_01;
analyze t_student_01;

SET enable_hashjoin TO off;
SET enable_nestloop TO off;
SET enable_mergejoin TO off;

explain select count(*)
from t_student_01 a join t_score_01 b on a.s_id=b.s_id;

After disabling all three, the HashJoin path should still be chosen.


It's not clear to me why that behavior would be desirable? Why is this
an issue you need so solve?

Because someone noticed that when he set enable_hashjoin, enable_mergejoin and enable_nestloop to off. The statement seemed to get stuck (actually because it chose the NestedLoop path, which took a long long time to run). If enable_hashjoin and enable_nestloop disable generating these two paths. Then enable_nestloop should do the same thing, but it doesn't.

AFAIK the reason why some paths are actually disabled (not built at all)
while others are only penalized by adding disable_cost is that we need
to end up with at least one way to execute the query. So we pick a path
that we know is possible (e.g. seqscan) and hard-disable other paths.
But the always-possible path is only soft-disabled by disable_cost.

For joins, we do the same thing. The hash/merge joins may not be
possible, because the data types may not have hash/sort operators, etc.
Nestloop is always possible. So we soft-disable nestloop but
hard-disable hash/merge joins.

I doubt we want to change this behavior, unless there's a good reason to
do that ...
It doesn't have to change. Because selecting NestedLoop doesn't really get stuck either. It just takes too long to run.

I will change the patch status to Withdrawn.


regards




Reply via email to