Re: [HACKERS] Calculation of param_source_rels in add_paths_to_joinrel

Ashutosh Bapat Thu, 10 Nov 2016 23:01:42 -0800

On Sat, Nov 5, 2016 at 2:16 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> writes:
>> There's code in add_paths_to_joinrel() which computes the set of
>> target relations that should overlap parameterization of any proposed
>> join path.
>> ...
>> The calculations that follow are based on joinrel->relids (baserels
>> covered by the join) and SpecialJoinInfo list in PlannerInfo. It is
>> not based on specific combination of relations being joined or the
>> paths being generated. We should probably do this computation once and
>> store the result in the joinrel and use it multiple times. That way we
>> can avoid computing the same set again and again for every pair of
>> joining relations and their order. Any reasons why we don't do this?
>
> I'm not terribly excited about this.  The issue is strictly local to
> add_paths_to_joinrel, but putting that set in a global data structure
> makes it nonlocal, and makes it that much harder to tweak the algorithm
> if we think of a better way.  (In particular, I think it's not all that
> obvious that the set must be independent of which two subset relations
> we are currently joining.)


Right now it appears that for every subset of relations, we have
different param_source_rels, which is clearly not. It takes a bit of
time to understand that. Adding it to a global data structure will at
least make the current implementation clear i.e param_source_rels does
not change with subset of relations being joined.

>
> If you can show a measurable performance improvement from this change,
> then maybe those downsides are acceptable.  But I do not think we should
> commit it without a demonstrated performance benefit from the added
> complexity and loss of flexibility.

I couldn't find a measurable time difference with or without my patch,
so multiple computations of param_source_rels aren't taking noticeable
time. I used following queries to measure the planning time through
explain analyze.

create view pc_view as select c1.oid c1o, c2.oid c2o, c3.oid c3o from
pg_class c1, pg_class c2 left join pg_class c3 on (c2.oid = c3.oid)
where c1.oid = c2.oid and c1.oid = c3.oid and c1.relname = c3.relname;
select v1, v2, v3 from pc_view v1, pc_view v2 left join pc_view v3 on
(v2.c3o = v3.c1o), pc_view v4 where v1.c3o = v2.c2o and v1.c2o =
v4.c3o limit 0;

>
>> Also, the way this code has been written, the declaration of variable
>> sjinfo masks the earlier declaration with the same name. I am not sure
>> if that's intentional, but may be we should use another variable name
>> for the inner sjinfo. I have not included that change in the patch.
>
> Hmm, yeah, that's probably not terribly good coding practice.

Attached a patch to fix this.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index cc7384f..8815789 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -124,42 +124,42 @@ add_paths_to_joinrel(PlannerInfo *root,
 	 * is a join order restriction that prevents joining one of our input rels
 	 * directly to the parameter source rel instead of joining to the other
 	 * input rel.  (But see allow_star_schema_join().)	This restriction
 	 * reduces the number of parameterized paths we have to deal with at
 	 * higher join levels, without compromising the quality of the resulting
 	 * plan.  We express the restriction as a Relids set that must overlap the
 	 * parameterization of any proposed join path.
 	 */
 	foreach(lc, root->join_info_list)
 	{
-		SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
+		SpecialJoinInfo *lc_sjinfo = (SpecialJoinInfo *) lfirst(lc);
 
 		/*
 		 * SJ is relevant to this join if we have some part of its RHS
 		 * (possibly not all of it), and haven't yet joined to its LHS.  (This
 		 * test is pretty simplistic, but should be sufficient considering the
 		 * join has already been proven legal.)  If the SJ is relevant, it
 		 * presents constraints for joining to anything not in its RHS.
 		 */
-		if (bms_overlap(joinrel->relids, sjinfo->min_righthand) &&
-			!bms_overlap(joinrel->relids, sjinfo->min_lefthand))
+		if (bms_overlap(joinrel->relids, lc_sjinfo->min_righthand) &&
+			!bms_overlap(joinrel->relids, lc_sjinfo->min_lefthand))
 			extra.param_source_rels = bms_join(extra.param_source_rels,
 										   bms_difference(root->all_baserels,
-													 sjinfo->min_righthand));
+												  lc_sjinfo->min_righthand));
 
 		/* full joins constrain both sides symmetrically */
-		if (sjinfo->jointype == JOIN_FULL &&
-			bms_overlap(joinrel->relids, sjinfo->min_lefthand) &&
-			!bms_overlap(joinrel->relids, sjinfo->min_righthand))
+		if (lc_sjinfo->jointype == JOIN_FULL &&
+			bms_overlap(joinrel->relids, lc_sjinfo->min_lefthand) &&
+			!bms_overlap(joinrel->relids, lc_sjinfo->min_righthand))
 			extra.param_source_rels = bms_join(extra.param_source_rels,
 										   bms_difference(root->all_baserels,
-													  sjinfo->min_lefthand));
+												   lc_sjinfo->min_lefthand));
 	}
 
 	/*
 	 * However, when a LATERAL subquery is involved, there will simply not be
 	 * any paths for the joinrel that aren't parameterized by whatever the
 	 * subquery is parameterized by, unless its parameterization is resolved
 	 * within the joinrel.  So we might as well allow additional dependencies
 	 * on whatever residual lateral dependencies the joinrel will have.
 	 */
 	extra.param_source_rels = bms_add_members(extra.param_source_rels,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Calculation of param_source_rels in add_paths_to_joinrel

Reply via email to