On 7/22/21 3:58 AM, Tomas Vondra wrote:
I've simplified the costing a bit, and the attached version actually
undoes all the "suspicious" plan changes in postgres_fdw. It changes one
new plan, but that seems somewhat reasonable, as it pushes sort to the
remote side.
I tried to justify heap-sort part of the compute_cpu_sort_cost() routine
and realized, that here we may have a mistake.
After a week of efforts, I don't found any research papers on dependence
of bounded heap-sort time compexity on number of duplicates.
So, I suppose self-made formula, based on simple logical constructions:
1. We should base on initial formula: cost ~ N*LOG2(M), where M -
output_tuples.
2. Realize, that full representation of this formula is:
cost ~ N*LOG2(min{N,M})
3. In the case of multicolumn, number of comparisons for each next
column can be estimated by the same formula, but arranged to a number of
tuples per group:
comparisons ~ input * LOG2(min{input,M})
4. Realize, that for the case, when M > input, we should change this
formula a bit:
comparisons ~ max{input,M} * LOG2(min{input,M})
Remember, that in our case M << tuples.
So, general formula for bounded heap sort can be written as:
cost ~ N * sum(max{n_i,M}/N * LOG2(min{n_i,M})), i=1,ncols
where n_1 == N, n_i - number of tuples per group, estimated from
previous iteration.
In attachment - an implementation of this approach.
--
regards,
Andrey Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/costsize.c
b/src/backend/optimizer/path/costsize.c
index 68a32740d7..2c3cce57aa 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1872,6 +1872,15 @@ get_width_cost_multiplier(PlannerInfo *root, Expr *expr)
*
* N * sum( Fk * log(Gk) )
*
+ * For bounded heap sort we haven't such a duplicates-related research. So
invent
+ * it based on a simple logic (M - number of output tuples):
+ * For one column we can estimate number of comparisons as:
+ * N * log(min{N,M})
+ * For the case of many columns we can natively estimate number of comparisons
by
+ * the formula:
+ * sum(max{n_i,M} * log(min{N,M})),
+ * where n_0 == N, n_i - number of tuples per group, estimated on previous
step.
+ *
* Note: We also consider column width, not just the comparator cost.
*
* NOTE: some callers currently pass NIL for pathkeys because they
@@ -1881,7 +1890,7 @@ get_width_cost_multiplier(PlannerInfo *root, Expr *expr)
static Cost
compute_cpu_sort_cost(PlannerInfo *root, List *pathkeys, int nPresortedKeys,
Cost comparison_cost, double tuples,
double output_tuples,
- bool heapSort)
+ bool bounded_sort)
{
Cost per_tuple_cost = 0.0;
ListCell *lc;
@@ -1907,7 +1916,7 @@ compute_cpu_sort_cost(PlannerInfo *root, List *pathkeys,
int nPresortedKeys,
* of this function, but I'm not sure. I suggest we introduce
some simple
* constants for that, instead of magic values.
*/
- output_tuples = (heapSort) ? 2.0 * output_tuples : tuples;
+ output_tuples = (bounded_sort) ? 2.0 * output_tuples : tuples;
per_tuple_cost += 2.0 * cpu_operator_cost * LOG2(output_tuples);
/* add cost provided by caller */
@@ -1925,8 +1934,7 @@ compute_cpu_sort_cost(PlannerInfo *root, List *pathkeys,
int nPresortedKeys,
{
PathKey *pathkey = (PathKey*)
lfirst(lc);
EquivalenceMember *em;
- double nGroups,
- correctedNGroups;
+ double nGroups;
/*
* We believe that equivalence members aren't very different,
so, to
@@ -2000,24 +2008,16 @@ compute_cpu_sort_cost(PlannerInfo *root, List
*pathkeys, int nPresortedKeys,
*/
if (i >= nPresortedKeys)
{
- if (heapSort)
- {
- double heap_tuples;
-
- /* have to keep at least one group, and a
multiple of group size */
- heap_tuples = Max(ceil(output_tuples /
tuplesPerPrevGroup) * tuplesPerPrevGroup,
-
tuplesPerPrevGroup);
-
- /* so how many (whole) groups is that? */
- correctedNGroups = ceil(heap_tuples /
tuplesPerPrevGroup);
- }
+ /*
+ * Quick sort and 'top-N' sorting (bounded heap sort)
algorithms
+ * have different formulas for time complexity
estimation.
+ */
+ if (bounded_sort)
+ per_tuple_cost += totalFuncCost *
+
(Max(tuplesPerPrevGroup, output_tuples) / tuples) *
+ LOG2(2.0 *
Min(tuplesPerPrevGroup, output_tuples));
else
- /* all groups in the input */
- correctedNGroups = nGroups;
-
- correctedNGroups = Max(1.0, ceil(correctedNGroups));
-
- per_tuple_cost += totalFuncCost *
LOG2(correctedNGroups);
+ per_tuple_cost += totalFuncCost * LOG2(nGroups);
}
i++;