Thanks for the patch. This (missing) optimization popped-up repeatedly recently,
and I was planning to look into it for PG12. So now I don't have to, because
you've done all the hard work ;-)
You are welcome. Actually one of out customers faced the problem with GROUP BY
column order and exactly with reordering without any indexes, you mean it as
problem 2). Index optimization was noticed by me later. But based on your
suggested patch's order I split the patch to index and non-index part and
second part depends of first one. They touch the same part of code and they
could not be independent
1) add_path() ensures that we only keep the one cheapest path sorted path for
each pathkeys. This patch somewhat defeats that because it considers additional
pathkeys (essentially any permutation of group keys) as interesting. So we may
end up with more paths in the list.
Seems, index scan here could give benefits here only if:
1) it's a index only scan
2) it's a index full (opposite to only) scan but physical order of heap is
close to logical index order (ie table is clustered)
In other cases costs of disk seeking will be very high. But on this stage of
planing we don't know that facts yet. So we couldn't make a good decision here
and should believe in add_path() logic.
> I wonder if we should make the add_path() logic smarter to recognize when two
> paths have different pathkeys but were generated to match the same grouping,
> to reduce the number of paths added by this optimization. Currently we do
that > for each pathkeys list independently, but we're considering many more
pathkeys > orderings ...
Hm. I tend to say no.
select .. from t1 group by a, b
union
select .. from t2 group by a, b
t1 and t2 could have different set of indexes and different distribution, so
locally it could be cheaper to use one index (for example, one defined as (b, a)
and second as (a,b,c,d) - second is larger) but totally - another (example:
second table doesn't have (b,a) index)
2) sort reordering based on ndistinct estimates
But thinking about this optimization, I'm worried it relies on a couple of
important assumptions. For now those decisions could have be made by the person
writing the SQL query, but this optimization makes that impossible. So we really
need to get this right.
Hm, sql by design should not be used that way, but, of course, it's used :(
For example, it seems to disregard that different data types have different
comparison costs. For example comparing bytea will be far more expensive
compared to int4, so it may be much more efficient to compare int4 columns
first, even if there are far fewer distinct values in them.
as I can see cost_sort() doesn't pay attention to this details. And it should be
a separated patch to improve that.
Also, simply sorting the columns by their ndistinct estimate is somewhat naive,
because it assumes the columns are independent. Imagine for example a table with
three columns:
So clearly, when evaluating GROUP BY a,b,c it'd be more efficient to use
"(a,c,b)" than "(a,b,c)" but there is no way to decide this merely using
per-column ndistinct values. Luckily, we now have ndistinct multi-column
coefficients which could be used to decide this I believe (but I haven't tried).
Agree, but I don't know how to use it here. Except, may be:
1) first column - the column with bigger estimated number of groups
2) second column - the pair of (first, second) with bigger estimated number of
groups
3) third column - the triple of (first, second, third) with bigger ...
But seems even with that algorithm, ISTM, it could be implemented in cheaper
manner.
The real issue however is that this decision can't be made entirely locally.
Consider for example this:
explain select a,b,c, count(*) from t group by a,b,c order by c,b,a;
Which is clearly cheaper (at least according to the optimizer) than doing two
separate sorts. So the optimization actually does the wrong thing here, and it
needs to somehow consider the other ordering requirements (in this case ORDER
BY) - either by generating multiple paths with different orderings or some
heuristics.
Hm, thank you. I consider it is a bug of my implementation - basic idea was that
we try to match already existing or needed order and only if we fail or have
unmatched tail of pathkey list than we will try to find cheapest column order.
Fixed in v7 (0002-opt_group_by_index_and_order-v7.patch), but may be by naive
way: if we don't have a path pathkey first try to reorder columns accordingly to
order by clause. Test for your is also added.
I'm also wondering how freely we can change the group by result ordering. Assume
you disable hash aggregate and parallel query - currently we are guaranteed to
use group aggregate that produces exactly the ordering as specified in GROUP BY,
but this patch removes that "guarantee" and we can produce arbitrary permutation
of the ordering. But as I argued in other threads, such implicit guarantees are
really fragile, can silently break for arbitrary reasons (say, parallel query
will do just that) and can be easily fixed by adding a proper ORDER BY. So I
don't think this is an issue.
Agree. SQL by design doesn't give a warranty of particular order without
explicit ORDER BY clause.
The other random thought is how will/could this interact with the incremental
sorts patch. I know Alexander wanted to significantly limit where we actually
consider the incremental sort, but I'm not sure if this was one of those places
or not (is sure looks like a place where we would greatly benefit from it).
Seems, they should not directly interact. Patch tries to find cheapest column
order, Alexander's patch tries to make sort cheaper - they are a different
tasks. But I will try.
So to summarize this initial review - I do suggest splitting the patch into two
parts. One that does the index magic, and one for this reordering optimization.
The first part (additional indexes) seems quite fairly safe, likely to get
committable soon. The other part (ndistinct reordering) IMHO requires more
thought regarding costing and interaction with other query parts.
Thank you for review!
--
Teodor Sigaev E-mail: teo...@sigaev.ru
WWW: http://www.sigaev.ru/
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 4f93afdebc..8897157817 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -380,6 +380,128 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys,
return n;
}
+/*
+ * Work struct from sorting pathkeys
+ */
+typedef struct PathKeyNumGroup
+{
+ PathKey *key;
+ double numGroups;
+ int order; /* just to make qsort stable */
+} PathKeyNumGroup;
+
+static int
+pathkey_numgroups_cmp(const void *a, const void *b)
+{
+ const PathKeyNumGroup *pka = (const PathKeyNumGroup*)a,
+ *pkb = (const PathKeyNumGroup*)b;
+
+ if (pka->numGroups == pkb->numGroups)
+ /* make qsort stable */
+ return (pka->order > pkb->order) ? 1 : -1;
+ return (pka->numGroups > pkb->numGroups) ? -1 : 1;
+}
+
+/*
+ * Order tail of list of group pathkeys by uniqueness descendetly. It allows to
+ * speedup sorting. Returns newly allocated lists, old ones stay untouched.
+ * startNum defines a head of list which order should be prevented.
+ */
+void
+get_cheapest_group_keys_order(PlannerInfo *root, double nrows,
+ List *target_list,
+ List **group_pathkeys, List **group_clauses,
+ int startNum)
+{
+ PathKeyNumGroup *keys;
+ int nkeys = list_length(*group_pathkeys) - startNum;
+ List *pathkeyExprList,
+ *new_group_pathkeys = NIL,
+ *new_group_clauses = NIL;
+ ListCell *cell;
+ int i = 0;
+
+ if (nkeys < 2)
+ return; /* nothing to do */
+
+ /*
+ * Will try to match ORDER BY pathkeys in hope that one sort is cheaper than
+ * two
+ */
+ if (startNum == 0 && root->sort_pathkeys)
+ {
+ startNum = group_keys_reorder_by_pathkeys(root->sort_pathkeys,
+ group_pathkeys,
+ group_clauses);
+
+ nkeys = list_length(*group_pathkeys) - startNum;
+ if (nkeys < 2)
+ return; /* nothing to do */
+ }
+
+ keys = palloc(nkeys * sizeof(*keys));
+ pathkeyExprList = list_make1(NULL);
+
+ /*
+ * for each pathkeys to be ordered we count a number of group to sort them
+ * in descending order
+ */
+ for_each_cell(cell, list_nth_cell(*group_pathkeys, startNum))
+ {
+ PathKey *pathkey = (PathKey *) lfirst(cell);
+ Node *pathkeyExpr;
+ SortGroupClause *sgc;
+
+ sgc = get_sortgroupref_clause(pathkey->pk_eclass->ec_sortref,
+ *group_clauses);
+ pathkeyExpr = get_sortgroupclause_expr(sgc, target_list);
+ linitial(pathkeyExprList) = pathkeyExpr;
+ keys[i].numGroups= estimate_num_groups(root, pathkeyExprList,
+ nrows, NULL);
+ keys[i].key = pathkey;
+ keys[i].order = i;
+
+ i++;
+ }
+
+ list_free(pathkeyExprList);
+
+ qsort(keys, nkeys, sizeof(*keys), pathkey_numgroups_cmp);
+
+ /*
+ * Construct result lists
+ */
+ i = 0;
+ foreach(cell, *group_pathkeys)
+ {
+ PathKey *pathkey;
+ SortGroupClause *sgc;
+
+ if (i < startNum)
+ /* presorted head */
+ pathkey = (PathKey *) lfirst(cell);
+ else
+ /* pathkey, sorted above */
+ pathkey = keys[i - startNum].key;
+
+ new_group_pathkeys = lappend(new_group_pathkeys, pathkey);
+
+ sgc = get_sortgroupref_clause(pathkey->pk_eclass->ec_sortref,
+ *group_clauses);
+ new_group_clauses = lappend(new_group_clauses, sgc);
+
+ i++;
+ }
+
+ pfree(keys);
+
+ /* Just append the rest GROUP BY clauses */
+ new_group_clauses = list_concat_unique_ptr(new_group_clauses, *group_clauses);
+
+ *group_pathkeys = new_group_pathkeys;
+ *group_clauses = new_group_clauses;
+}
+
/*
* get_cheapest_path_for_pathkeys
* Find the cheapest path (according to the specified criterion) that
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1e7809edf2..257aa5889c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6208,11 +6208,20 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
{
/* Sort the cheapest-total path if it isn't already sorted */
if (!is_sorted)
+ {
+ if (!parse->groupingSets)
+ get_cheapest_group_keys_order(root,
+ path->rows,
+ extra->targetList,
+ &group_pathkeys,
+ &group_clauses,
+ n_preordered_groups);
path = (Path *) create_sort_path(root,
grouped_rel,
path,
group_pathkeys,
-1.0);
+ }
/* Now decide what to stick atop it */
if (parse->groupingSets)
@@ -6286,6 +6295,12 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
{
if (path != partially_grouped_rel->cheapest_total_path)
continue;
+ get_cheapest_group_keys_order(root,
+ path->rows,
+ extra->targetList,
+ &group_pathkeys,
+ &group_clauses,
+ n_preordered_groups);
path = (Path *) create_sort_path(root,
grouped_rel,
path,
@@ -6560,11 +6575,19 @@ create_partial_grouping_paths(PlannerInfo *root,
{
/* Sort the cheapest partial path, if it isn't already */
if (!is_sorted)
+ {
+ get_cheapest_group_keys_order(root,
+ path->rows,
+ extra->targetList,
+ &group_pathkeys,
+ &group_clauses,
+ n_preordered_groups);
path = (Path *) create_sort_path(root,
partially_grouped_rel,
path,
group_pathkeys,
-1.0);
+ }
if (parse->hasAggs)
add_path(partially_grouped_rel, (Path *)
@@ -6611,11 +6634,19 @@ create_partial_grouping_paths(PlannerInfo *root,
/* Sort the cheapest partial path, if it isn't already */
if (!is_sorted)
+ {
+ get_cheapest_group_keys_order(root,
+ path->rows,
+ extra->targetList,
+ &group_pathkeys,
+ &group_clauses,
+ n_preordered_groups);
path = (Path *) create_sort_path(root,
partially_grouped_rel,
path,
group_pathkeys,
-1.0);
+ }
if (parse->hasAggs)
add_partial_path(partially_grouped_rel, (Path *)
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 226b293622..c52a7c6133 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -193,6 +193,12 @@ extern bool pathkeys_contained_in(List *keys1, List *keys2);
extern int group_keys_reorder_by_pathkeys(List *pathkeys,
List **group_pathkeys,
List **group_clauses);
+extern void get_cheapest_group_keys_order(PlannerInfo *root,
+ double nrows,
+ List *target_list,
+ List **group_pathkeys,
+ List **group_clauses,
+ int n_preordered);
extern Path *get_cheapest_path_for_pathkeys(List *paths, List *pathkeys,
Relids required_outer,
CostSelector cost_criterion,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index e302dfbdce..f954790ea1 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2075,15 +2075,94 @@ SELECT
INTO btg
FROM
generate_series(1, 10000) i;
-CREATE INDEX ON btg(p, v);
VACUUM btg;
ANALYZE btg;
+-- GROUP BY optimization by reorder columns by frequency
SET enable_hashagg=off;
SET max_parallel_workers= 0;
SET max_parallel_workers_per_gather = 0;
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY p, v;
+ QUERY PLAN
+-----------------------------
+ GroupAggregate
+ Group Key: p, v
+ -> Sort
+ Sort Key: p, v
+ -> Seq Scan on btg
+(5 rows)
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p;
+ QUERY PLAN
+-----------------------------
+ GroupAggregate
+ Group Key: p, v
+ -> Sort
+ Sort Key: p, v
+ -> Seq Scan on btg
+(5 rows)
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p, c;
+ QUERY PLAN
+-----------------------------
+ GroupAggregate
+ Group Key: p, c, v
+ -> Sort
+ Sort Key: p, c, v
+ -> Seq Scan on btg
+(5 rows)
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p, c ORDER BY v, p, c;
+ QUERY PLAN
+-----------------------------
+ GroupAggregate
+ Group Key: v, p, c
+ -> Sort
+ Sort Key: v, p, c
+ -> Seq Scan on btg
+(5 rows)
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p, d, c;
+ QUERY PLAN
+------------------------------
+ GroupAggregate
+ Group Key: p, c, d, v
+ -> Sort
+ Sort Key: p, c, d, v
+ -> Seq Scan on btg
+(5 rows)
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p, d, c ORDER BY v, p, d ,c;
+ QUERY PLAN
+------------------------------
+ GroupAggregate
+ Group Key: v, p, d, c
+ -> Sort
+ Sort Key: v, p, d, c
+ -> Seq Scan on btg
+(5 rows)
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p, d, c ORDER BY p, v, d ,c;
+ QUERY PLAN
+------------------------------
+ GroupAggregate
+ Group Key: p, v, d, c
+ -> Sort
+ Sort Key: p, v, d, c
+ -> Seq Scan on btg
+(5 rows)
+
+-- GROUP BY optimization by reorder columns by index scan
+CREATE INDEX ON btg(p, v);
SET enable_seqscan=off;
SET enable_bitmapscan=off;
--- GROUP BY optimization by reorder columns by index scan
+VACUUM btg;
EXPLAIN (COSTS off)
SELECT count(*) FROM btg GROUP BY p, v;
QUERY PLAN
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index b983f9c506..3915a837f0 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1140,7 +1140,7 @@ SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM plt1 t1, pl
QUERY PLAN
--------------------------------------------------------------------------------
GroupAggregate
- Group Key: t1.c, t2.c, t3.c
+ Group Key: t1.c, t3.c, t2.c
-> Sort
Sort Key: t1.c, t3.c
-> Append
@@ -1284,7 +1284,7 @@ SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, ph
QUERY PLAN
--------------------------------------------------------------------------------
GroupAggregate
- Group Key: t1.c, t2.c, t3.c
+ Group Key: t1.c, t3.c, t2.c
-> Sort
Sort Key: t1.c, t3.c
-> Append
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 054a381dad..adecec8872 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -281,9 +281,9 @@ EXPLAIN (COSTS off)
QUERY PLAN
-----------------------------------
GroupAggregate
- Group Key: a, b
+ Group Key: b, a
-> Sort
- Sort Key: a, b
+ Sort Key: b, a
-> Seq Scan on ndistinct
(5 rows)
@@ -292,9 +292,9 @@ EXPLAIN (COSTS off)
QUERY PLAN
-----------------------------------
GroupAggregate
- Group Key: a, b, c
+ Group Key: b, a, c
-> Sort
- Sort Key: a, b, c
+ Sort Key: b, a, c
-> Seq Scan on ndistinct
(5 rows)
@@ -303,9 +303,9 @@ EXPLAIN (COSTS off)
QUERY PLAN
-----------------------------------
GroupAggregate
- Group Key: a, b, c, d
+ Group Key: d, b, a, c
-> Sort
- Sort Key: a, b, c, d
+ Sort Key: d, b, a, c
-> Seq Scan on ndistinct
(5 rows)
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 7ef703f3a7..061e3360c0 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -919,18 +919,44 @@ SELECT
INTO btg
FROM
generate_series(1, 10000) i;
-CREATE INDEX ON btg(p, v);
VACUUM btg;
ANALYZE btg;
+-- GROUP BY optimization by reorder columns by frequency
+
SET enable_hashagg=off;
SET max_parallel_workers= 0;
SET max_parallel_workers_per_gather = 0;
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY p, v;
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p;
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p, c;
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p, c ORDER BY v, p, c;
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p, d, c;
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p, d, c ORDER BY v, p, d ,c;
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p, d, c ORDER BY p, v, d ,c;
+
+-- GROUP BY optimization by reorder columns by index scan
+
+CREATE INDEX ON btg(p, v);
SET enable_seqscan=off;
SET enable_bitmapscan=off;
+VACUUM btg;
--- GROUP BY optimization by reorder columns by index scan
EXPLAIN (COSTS off)
SELECT count(*) FROM btg GROUP BY p, v;
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index b22b36ec0e..c073eb061a 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -686,7 +686,18 @@ get_eclass_for_sort_expr(PlannerInfo *root,
if (opcintype == cur_em->em_datatype &&
equal(expr, cur_em->em_expr))
- return cur_ec; /* Match! */
+ {
+ /*
+ * Match!
+ *
+ * Copy sortref if it wasn't set yet, it's possible if ec was
+ * constructed from WHERE clause, ie it doesn't have target
+ * reference at all
+ */
+ if (cur_ec->ec_sortref == 0 && sortref > 0)
+ cur_ec->ec_sortref = sortref;
+ return cur_ec;
+ }
}
}
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index ec66cb9c3c..4f93afdebc 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -26,6 +26,7 @@
#include "optimizer/paths.h"
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
@@ -327,6 +328,58 @@ pathkeys_contained_in(List *keys1, List *keys2)
return false;
}
+/*
+ * Reorder GROUP BY pathkeys and clauses to match order of pathkeys. Function
+ * returns new lists, original GROUP BY lists stay untouched.
+ */
+int
+group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys,
+ List **group_clauses)
+{
+ List *new_group_pathkeys= NIL,
+ *new_group_clauses = NIL;
+ ListCell *key;
+ int n;
+
+ if (pathkeys == NIL || *group_pathkeys == NIL)
+ return 0;
+
+ /*
+ * For each pathkey it tries to find corresponding GROUP BY pathkey and
+ * clause.
+ */
+ foreach(key, pathkeys)
+ {
+ PathKey *pathkey = (PathKey *) lfirst(key);
+ SortGroupClause *sgc;
+
+ /*
+ * Pathkey should use the same allocated struct, so, equiality of
+ * pointers is enough
+ */
+ if (!list_member_ptr(*group_pathkeys, pathkey))
+ break;
+
+ new_group_pathkeys = lappend(new_group_pathkeys, pathkey);
+
+ sgc = get_sortgroupref_clause(pathkey->pk_eclass->ec_sortref,
+ *group_clauses);
+ new_group_clauses = lappend(new_group_clauses, sgc);
+ }
+
+ n = list_length(new_group_pathkeys);
+
+ /*
+ * Just append the rest of pathkeys and clauses
+ */
+ *group_pathkeys = list_concat_unique_ptr(new_group_pathkeys,
+ *group_pathkeys);
+ *group_clauses = list_concat_unique_ptr(new_group_clauses,
+ *group_clauses);
+
+ return n;
+}
+
/*
* get_cheapest_path_for_pathkeys
* Find the cheapest path (according to the specified criterion) that
@@ -1611,6 +1664,39 @@ pathkeys_useful_for_ordering(PlannerInfo *root, List *pathkeys)
return 0; /* path ordering not useful */
}
+/*
+ * pathkeys_useful_for_grouping
+ * Count the number of pathkeys that are useful for grouping (instead of
+ * explicit sort)
+ *
+ * Group pathkeys could be reordered, so we don't bother about actual order in
+ * pathkeys
+ */
+static int
+pathkeys_useful_for_grouping(PlannerInfo *root, List *pathkeys)
+{
+ ListCell *key;
+ int n = 0;
+
+ if (root->group_pathkeys == NIL)
+ return 0; /* no special ordering requested */
+
+ if (pathkeys == NIL)
+ return 0; /* unordered path */
+
+ foreach(key, pathkeys)
+ {
+ PathKey *pathkey = (PathKey *) lfirst(key);
+
+ if (!list_member_ptr(root->group_pathkeys, pathkey))
+ break;
+
+ n++;
+ }
+
+ return n;
+}
+
/*
* truncate_useless_pathkeys
* Shorten the given pathkey list to just the useful pathkeys.
@@ -1625,6 +1711,9 @@ truncate_useless_pathkeys(PlannerInfo *root,
nuseful = pathkeys_useful_for_merging(root, rel, pathkeys);
nuseful2 = pathkeys_useful_for_ordering(root, pathkeys);
+ if (nuseful2 > nuseful)
+ nuseful = nuseful2;
+ nuseful2 = pathkeys_useful_for_grouping(root, pathkeys);
if (nuseful2 > nuseful)
nuseful = nuseful2;
@@ -1660,6 +1749,8 @@ has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel)
{
if (rel->joininfo != NIL || rel->has_eclass_joins)
return true; /* might be able to use pathkeys for merging */
+ if (root->group_pathkeys != NIL)
+ return true; /* might be able to use pathkeys for grouping */
if (root->query_pathkeys != NIL)
return true; /* might be able to use them for ordering */
return false; /* definitely useless */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 67a2c7a581..1e7809edf2 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6182,9 +6182,28 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
{
Path *path = (Path *) lfirst(lc);
bool is_sorted;
+ List *group_pathkeys = root->group_pathkeys,
+ *group_clauses = parse->groupClause;
+ int n_preordered_groups;
+
+ if (parse->groupingSets)
+ {
+ /*
+ * prevent group pathkey rreordering, just check the same
+ * order paths pathkeys and group pathkeys
+ */
+ is_sorted = pathkeys_contained_in(group_pathkeys,
+ path->pathkeys);
+ }
+ else
+ {
+ n_preordered_groups =
+ group_keys_reorder_by_pathkeys(path->pathkeys,
+ &group_pathkeys,
+ &group_clauses);
+ is_sorted = (n_preordered_groups == list_length(group_pathkeys));
+ }
- is_sorted = pathkeys_contained_in(root->group_pathkeys,
- path->pathkeys);
if (path == cheapest_path || is_sorted)
{
/* Sort the cheapest-total path if it isn't already sorted */
@@ -6192,7 +6211,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = (Path *) create_sort_path(root,
grouped_rel,
path,
- root->group_pathkeys,
+ group_pathkeys,
-1.0);
/* Now decide what to stick atop it */
@@ -6213,9 +6232,9 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
grouped_rel,
path,
grouped_rel->reltarget,
- parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ group_clauses ? AGG_SORTED : AGG_PLAIN,
AGGSPLIT_SIMPLE,
- parse->groupClause,
+ group_clauses,
havingQual,
agg_costs,
dNumGroups));
@@ -6230,7 +6249,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
create_group_path(root,
grouped_rel,
path,
- parse->groupClause,
+ group_clauses,
havingQual,
dNumGroups));
}
@@ -6251,19 +6270,26 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
foreach(lc, partially_grouped_rel->pathlist)
{
Path *path = (Path *) lfirst(lc);
+ List *group_pathkeys = root->group_pathkeys,
+ *group_clauses = parse->groupClause;
+ int n_preordered_groups;
+
+ n_preordered_groups = group_keys_reorder_by_pathkeys(path->pathkeys,
+ &group_pathkeys,
+ &group_clauses);
/*
* Insert a Sort node, if required. But there's no point in
* sorting anything but the cheapest path.
*/
- if (!pathkeys_contained_in(root->group_pathkeys, path->pathkeys))
+ if (n_preordered_groups != list_length(group_pathkeys))
{
if (path != partially_grouped_rel->cheapest_total_path)
continue;
path = (Path *) create_sort_path(root,
grouped_rel,
path,
- root->group_pathkeys,
+ group_pathkeys,
-1.0);
}
@@ -6273,9 +6299,9 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
grouped_rel,
path,
grouped_rel->reltarget,
- parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ group_clauses ? AGG_SORTED : AGG_PLAIN,
AGGSPLIT_FINAL_DESERIAL,
- parse->groupClause,
+ group_clauses,
havingQual,
agg_final_costs,
dNumGroups));
@@ -6284,7 +6310,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
create_group_path(root,
grouped_rel,
path,
- parse->groupClause,
+ group_clauses,
havingQual,
dNumGroups));
}
@@ -6521,9 +6547,15 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Path *path = (Path *) lfirst(lc);
bool is_sorted;
+ List *group_pathkeys = root->group_pathkeys,
+ *group_clauses = parse->groupClause;
+ int n_preordered_groups;
+
+ n_preordered_groups = group_keys_reorder_by_pathkeys(path->pathkeys,
+ &group_pathkeys,
+ &group_clauses);
+ is_sorted = (n_preordered_groups == list_length(group_pathkeys));
- is_sorted = pathkeys_contained_in(root->group_pathkeys,
- path->pathkeys);
if (path == cheapest_total_path || is_sorted)
{
/* Sort the cheapest partial path, if it isn't already */
@@ -6531,7 +6563,7 @@ create_partial_grouping_paths(PlannerInfo *root,
path = (Path *) create_sort_path(root,
partially_grouped_rel,
path,
- root->group_pathkeys,
+ group_pathkeys,
-1.0);
if (parse->hasAggs)
@@ -6540,9 +6572,9 @@ create_partial_grouping_paths(PlannerInfo *root,
partially_grouped_rel,
path,
partially_grouped_rel->reltarget,
- parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ group_clauses ? AGG_SORTED : AGG_PLAIN,
AGGSPLIT_INITIAL_SERIAL,
- parse->groupClause,
+ group_clauses,
NIL,
agg_partial_costs,
dNumPartialGroups));
@@ -6551,7 +6583,7 @@ create_partial_grouping_paths(PlannerInfo *root,
create_group_path(root,
partially_grouped_rel,
path,
- parse->groupClause,
+ group_clauses,
NIL,
dNumPartialGroups));
}
@@ -6565,17 +6597,24 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Path *path = (Path *) lfirst(lc);
bool is_sorted;
+ List *group_pathkeys = root->group_pathkeys,
+ *group_clauses = parse->groupClause;
+ int n_preordered_groups;
+
+ n_preordered_groups = group_keys_reorder_by_pathkeys(path->pathkeys,
+ &group_pathkeys,
+ &group_clauses);
+ is_sorted = (n_preordered_groups == list_length(group_pathkeys));
- is_sorted = pathkeys_contained_in(root->group_pathkeys,
- path->pathkeys);
if (path == cheapest_partial_path || is_sorted)
{
+
/* Sort the cheapest partial path, if it isn't already */
if (!is_sorted)
path = (Path *) create_sort_path(root,
partially_grouped_rel,
path,
- root->group_pathkeys,
+ group_pathkeys,
-1.0);
if (parse->hasAggs)
@@ -6584,9 +6623,9 @@ create_partial_grouping_paths(PlannerInfo *root,
partially_grouped_rel,
path,
partially_grouped_rel->reltarget,
- parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+ group_clauses ? AGG_SORTED : AGG_PLAIN,
AGGSPLIT_INITIAL_SERIAL,
- parse->groupClause,
+ group_clauses,
NIL,
agg_partial_costs,
dNumPartialPartialGroups));
@@ -6595,7 +6634,7 @@ create_partial_grouping_paths(PlannerInfo *root,
create_group_path(root,
partially_grouped_rel,
path,
- parse->groupClause,
+ group_clauses,
NIL,
dNumPartialPartialGroups));
}
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde307ad..226b293622 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -190,6 +190,9 @@ typedef enum
extern PathKeysComparison compare_pathkeys(List *keys1, List *keys2);
extern bool pathkeys_contained_in(List *keys1, List *keys2);
+extern int group_keys_reorder_by_pathkeys(List *pathkeys,
+ List **group_pathkeys,
+ List **group_clauses);
extern Path *get_cheapest_path_for_pathkeys(List *paths, List *pathkeys,
Relids required_outer,
CostSelector cost_criterion,
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index f85e913850..e302dfbdce 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2065,3 +2065,107 @@ SELECT balk(hundred) FROM tenk1;
(1 row)
ROLLBACK;
+-- GROUP BY optimization by reorder columns
+SELECT
+ i AS id,
+ i/2 AS p,
+ format('%60s', i%2) AS v,
+ i/4 AS c,
+ i/8 AS d
+ INTO btg
+FROM
+ generate_series(1, 10000) i;
+CREATE INDEX ON btg(p, v);
+VACUUM btg;
+ANALYZE btg;
+SET enable_hashagg=off;
+SET max_parallel_workers= 0;
+SET max_parallel_workers_per_gather = 0;
+SET enable_seqscan=off;
+SET enable_bitmapscan=off;
+-- GROUP BY optimization by reorder columns by index scan
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY p, v;
+ QUERY PLAN
+------------------------------------------------
+ GroupAggregate
+ Group Key: p, v
+ -> Index Only Scan using btg_p_v_idx on btg
+(3 rows)
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY p, v ORDER BY p, v;
+ QUERY PLAN
+------------------------------------------------
+ GroupAggregate
+ Group Key: p, v
+ -> Index Only Scan using btg_p_v_idx on btg
+(3 rows)
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p;
+ QUERY PLAN
+------------------------------------------------
+ GroupAggregate
+ Group Key: p, v
+ -> Index Only Scan using btg_p_v_idx on btg
+(3 rows)
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p ORDER BY p, v;
+ QUERY PLAN
+------------------------------------------------
+ GroupAggregate
+ Group Key: p, v
+ -> Index Only Scan using btg_p_v_idx on btg
+(3 rows)
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p, c;
+ QUERY PLAN
+-------------------------------------------------
+ GroupAggregate
+ Group Key: p, v, c
+ -> Sort
+ Sort Key: p, v, c
+ -> Index Scan using btg_p_v_idx on btg
+(5 rows)
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p, c ORDER BY p, v;
+ QUERY PLAN
+-------------------------------------------------
+ GroupAggregate
+ Group Key: p, v, c
+ -> Sort
+ Sort Key: p, v, c
+ -> Index Scan using btg_p_v_idx on btg
+(5 rows)
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, c, p, d;
+ QUERY PLAN
+-------------------------------------------------
+ GroupAggregate
+ Group Key: p, v, c, d
+ -> Sort
+ Sort Key: p, v, c, d
+ -> Index Scan using btg_p_v_idx on btg
+(5 rows)
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, c, p, d ORDER BY p, v;
+ QUERY PLAN
+-------------------------------------------------
+ GroupAggregate
+ Group Key: p, v, c, d
+ -> Sort
+ Sort Key: p, v, c, d
+ -> Index Scan using btg_p_v_idx on btg
+(5 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers;
+RESET max_parallel_workers_per_gather;
+RESET enable_seqscan;
+RESET enable_bitmapscan;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 506d0442d7..7ef703f3a7 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -907,3 +907,57 @@ EXPLAIN (COSTS OFF) SELECT balk(hundred) FROM tenk1;
SELECT balk(hundred) FROM tenk1;
ROLLBACK;
+
+-- GROUP BY optimization by reorder columns
+
+SELECT
+ i AS id,
+ i/2 AS p,
+ format('%60s', i%2) AS v,
+ i/4 AS c,
+ i/8 AS d
+ INTO btg
+FROM
+ generate_series(1, 10000) i;
+CREATE INDEX ON btg(p, v);
+
+VACUUM btg;
+ANALYZE btg;
+
+SET enable_hashagg=off;
+SET max_parallel_workers= 0;
+SET max_parallel_workers_per_gather = 0;
+SET enable_seqscan=off;
+SET enable_bitmapscan=off;
+
+-- GROUP BY optimization by reorder columns by index scan
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY p, v;
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY p, v ORDER BY p, v;
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p;
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p ORDER BY p, v;
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p, c;
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, p, c ORDER BY p, v;
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, c, p, d;
+
+EXPLAIN (COSTS off)
+SELECT count(*) FROM btg GROUP BY v, c, p, d ORDER BY p, v;
+
+RESET enable_hashagg;
+RESET max_parallel_workers;
+RESET max_parallel_workers_per_gather;
+RESET enable_seqscan;
+RESET enable_bitmapscan;
+