[PATCH] Teach planner to further optimize sort in distinct

Ankit Kumar Pandey Tue, 17 Jan 2023 11:28:18 -0800

Hi, this is extension of `teach planner to evaluate multiple windows inthe optimal order` work applied to distinct operation.

Based on discussions before(https://www.postgresql.org/message-id/flat/CAApHDvr7rSCVXzGfVa1L9pLpkKj6-s8NynK8o%2B98X9sKjejnQQ%40mail.gmail.com#e01327a3053d9281c40f281ef7105b42),


> All I imagine you need to do for it
> is to invent a function in pathkeys.c which is along the lines of what
> pathkeys_count_contained_in() does, but returns a List of pathkeys
> which are in keys1 but not in keys2 and NIL if keys2 has a pathkey
> that does not exist as a pathkey in keys1. In
> create_final_distinct_paths(), you can then perform an incremental
> sort on any input_path which has a non-empty return list and in
> create_incremental_sort_path(), you'll pass presorted_keys as the
> number of pathkeys in the path, and the required pathkeys the
> input_path->pathkeys + the pathkeys returned from the new function.


There is bit confusion in wording here:

"returns a List of pathkeys
which are in keys1 but not in keys2 and NIL if keys2 has a pathkey
that does not exist as a pathkey in keys1."

You mean extract common keys without ordering right?

Example: keys1 = (a,b,c), keys2 = (b,a)

returns (a,b)

and

keys1 = (a,b,c), keys = (d)

returns = ()

which translates to

needed_pathkeys = (a,b,c) = key2

input_pathkeys = (b,a) key1

returns (b,a) = common_keys

new needed_pathkeys = unique(common_keys + old needed_pathkeys)

=> new needed_pathkeys = (b,a,c)

The new needed_pathkeys matches input_pathkeys.

This is what I implemented in the patch.


The patched version yields the following plans:

set enable_hashagg=0;
set enable_seqscan=0;

explain (costs off) select distinct relname,relkind,count(*) over(partition by

relkind) from pg_Class;
                       QUERY PLAN
---------------------------------------------------------
 Unique
   ->  Incremental Sort
         Sort Key: relkind, relname, (count(*) OVER (?))
         Presorted Key: relkind
         ->  WindowAgg
               ->  Sort
                     Sort Key: relkind
                     ->  Seq Scan on pg_class
(8 rows)

explain (costs off) select distinct a, b, count(*) over (partition by b,a) from abcd;

                       QUERY PLAN
--------------------------------------------------------
 Unique
   ->  Incremental Sort
         Sort Key: b, a, (count(*) OVER (?))
         Presorted Key: b, a
         ->  WindowAgg
               ->  Incremental Sort
                     Sort Key: b, a
                     Presorted Key: b
                     ->  Index Scan using b_idx on abcd
(9 rows)

explain (costs off) select distinct a, b, count(*) over (partition by c,d) from abcd;

                       QUERY PLAN
--------------------------------------------------------
 Unique
   ->  Sort
         Sort Key: a, b, (count(*) OVER (?))
         ->  WindowAgg
               ->  Incremental Sort
                     Sort Key: c, d
                     Presorted Key: c
                     ->  Index Scan using c_idx on abcd
(8 rows)

Issue with index path still remains as pathkeys get purged bytruncate_useless_pathkeys

and hence are not available in create_final_distinct_paths for the aboveoptimizations.



I have attached a patch for the reference.


Thanks,

Ankit

diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 609df93dc9..13f6006577 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1968,3 +1968,32 @@ has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel)
 		return true;			/* might be able to use them for ordering */
 	return false;				/* definitely useless */
 }
+
+/*
+ * extract_common_pathkeys
+ *		returns a List of pathkeys
+ *	which are in keys1 but not in keys2 and NIL if keys2 has a pathkey
+ * that does not exist as a pathkey in keys1 
+ */
+List *
+extract_common_pathkeys(List* keys1, List *keys2)
+{
+	List *new_pk = NIL;
+	ListCell	*l1;
+	ListCell	*l2;
+	foreach(l1, keys1)
+	{
+		PathKey    *pathkey1 = (PathKey *) lfirst(l1);
+		foreach(l2, keys2)
+		{
+			PathKey    *pathkey2 = (PathKey *) lfirst(l2);
+			if (pathkey1 == pathkey2)
+			{
+				new_pk = lappend(new_pk, pathkey1);
+				break;
+			}
+		}
+	}
+	return new_pk;
+
+}
\ No newline at end of file
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 044fb24666..1802d28e75 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4844,11 +4844,28 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
 			Path	   *sorted_path;
 			bool		is_sorted;
 			int			presorted_keys;
+			List		*common_keys;
 
 			is_sorted = pathkeys_count_contained_in(needed_pathkeys,
 													input_path->pathkeys,
 													&presorted_keys);
 
+			/*
+			 * Check if there are common pathkeys (regardless of ordering)
+			 */
+			common_keys = extract_common_pathkeys(input_path->pathkeys, needed_pathkeys);
+			
+			if (common_keys)
+			{
+				/*
+				 * Now that we have common keys, we can add these to path
+				 */
+				needed_pathkeys = list_concat_unique(common_keys, needed_pathkeys);
+				is_sorted = pathkeys_count_contained_in(needed_pathkeys,
+													input_path->pathkeys,
+													&presorted_keys);
+			}
+
 			if (is_sorted)
 				sorted_path = input_path;
 			else
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 65a3c35611..b1b700e067 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -248,6 +248,7 @@ extern List *truncate_useless_pathkeys(PlannerInfo *root,
 									   RelOptInfo *rel,
 									   List *pathkeys);
 extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel);
+extern List *extract_common_pathkeys(List* keys1, List *keys2);
 extern List *append_pathkeys(List *target, List *source);
 extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 									   EquivalenceClass *eclass, Oid opfamily,

[PATCH] Teach planner to further optimize sort in distinct

Reply via email to