On 11/14/24 08:09, Richard Guo wrote:
On Wed, Nov 13, 2024 at 7:36 PM Andrei Lepikhov <lepi...@gmail.com> wrote:
On 11/13/24 13:49, Richard Guo wrote:
In thread [1], I try to add one more strategy that minimises the number
of comparison operator calls. It seems that it would work the same way
with the DISTINCT statement. Do you think it make sense in general and
can be a possible direction of improvement for the current patch?

I haven’t had a chance to follow that thread.  From a quick look at
that patch, it seems to improve the general costing logic for sorting.
If that’s the case, I think it would be beneficial in the areas where
we use cost_sort(), including in this patch.
Yes, the core of the discussion is cost calculation. It is a base for the final patch that adds a grouping strategy, according to which it may be profitable to put the column with max distinct value at the first position to make sorting less expensive. I think it makes sense to do the same in this DISTINCT feature, too (as a further improvement, of course).

disable features during severe slowdowns or bugs. It might make sense to
group them into a single 'Clause Reordering' parameter.
code.  If these GUCs are mainly for debugging, I think it's better to
keep them separate so that we can debug each optimization individually.
Ok.
See minor changes I propose in the attachment.

--
regards, Andrei Lepikhov
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 99063d0506..9006a14abc 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -2287,7 +2287,6 @@ static int
 pathkeys_useful_for_distinct(PlannerInfo *root, List *pathkeys)
 {
 	int			n_common_pathkeys;
-	ListCell   *lc;
 
 	/*
 	 * distinct_pathkeys may have become empty if all of the pathkeys were
@@ -2298,10 +2297,8 @@ pathkeys_useful_for_distinct(PlannerInfo *root, List *pathkeys)
 
 	/* walk the pathkeys and search for matching DISTINCT key */
 	n_common_pathkeys = 0;
-	foreach(lc, pathkeys)
+	foreach_node(PathKey, pathkey, pathkeys)
 	{
-		PathKey    *pathkey = (PathKey *) lfirst(lc);
-
 		/* no matching DISTINCT key, we're done */
 		if (!list_member_ptr(root->distinct_pathkeys, pathkey))
 			break;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2870ef58f0..c22e41e83b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5158,7 +5158,6 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
 			Path	   *input_path = (Path *) lfirst(lc);
 			Path	   *sorted_path;
 			List	   *useful_pathkeys_list = NIL;
-			ListCell   *lc2;
 
 			useful_pathkeys_list =
 				get_useful_pathkeys_for_distinct(root,
@@ -5166,10 +5165,8 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
 												 input_path->pathkeys);
 			Assert(list_length(useful_pathkeys_list) > 0);
 
-			foreach(lc2, useful_pathkeys_list)
+			foreach_node(List, useful_pathkeys, useful_pathkeys_list)
 			{
-				List	   *useful_pathkeys = (List *) lfirst(lc2);
-
 				sorted_path = make_ordered_path(root,
 												distinct_rel,
 												input_path,
@@ -5276,11 +5273,9 @@ get_useful_pathkeys_for_distinct(PlannerInfo *root, List *needed_pathkeys,
 {
 	List	   *useful_pathkeys_list = NIL;
 	List	   *useful_pathkeys = NIL;
-	ListCell   *lc;
 
 	/* always include the given 'needed_pathkeys' */
-	useful_pathkeys_list = lappend(useful_pathkeys_list,
-								   needed_pathkeys);
+	useful_pathkeys_list = lappend(useful_pathkeys_list, needed_pathkeys);
 
 	if (!enable_distinct_reordering)
 		return useful_pathkeys_list;
@@ -5294,10 +5289,8 @@ get_useful_pathkeys_for_distinct(PlannerInfo *root, List *needed_pathkeys,
 	 * list matches initial distinctClause pathkeys; otherwise, it won't have
 	 * the desired behavior.
 	 */
-	foreach(lc, path_pathkeys)
+	foreach_node(PathKey, pathkey, path_pathkeys)
 	{
-		PathKey    *pathkey = (PathKey *) lfirst(lc);
-
 		/*
 		 * The PathKey nodes are canonical, so they can be checked for
 		 * equality by simple pointer comparison.
@@ -5324,19 +5317,16 @@ get_useful_pathkeys_for_distinct(PlannerInfo *root, List *needed_pathkeys,
 		return useful_pathkeys_list;
 
 	/* Append the remaining PathKey nodes in needed_pathkeys */
-	useful_pathkeys = list_concat_unique_ptr(useful_pathkeys,
-											 needed_pathkeys);
+	useful_pathkeys = list_concat_unique_ptr(useful_pathkeys, needed_pathkeys);
 
 	/*
 	 * If the resulting pathkey list is the same as the 'needed_pathkeys',
 	 * just drop it.
 	 */
-	if (compare_pathkeys(needed_pathkeys,
-						 useful_pathkeys) == PATHKEYS_EQUAL)
+	if (compare_pathkeys(needed_pathkeys, useful_pathkeys) == PATHKEYS_EQUAL)
 		return useful_pathkeys_list;
 
-	useful_pathkeys_list = lappend(useful_pathkeys_list,
-								   useful_pathkeys);
+	useful_pathkeys_list = lappend(useful_pathkeys_list, useful_pathkeys);
 
 	return useful_pathkeys_list;
 }

Reply via email to