On 7/22/21 3:58 AM, Tomas Vondra wrote:
I've simplified the costing a bit, and the attached version actually
undoes all the "suspicious" plan changes in postgres_fdw. It changes one
new plan, but that seems somewhat reasonable, as it pushes sort to the
remote side.

I tried to justify heap-sort part of the compute_cpu_sort_cost() routine and realized, that here we may have a mistake. After a week of efforts, I don't found any research papers on dependence of bounded heap-sort time compexity on number of duplicates.

So, I suppose self-made formula, based on simple logical constructions:

1. We should base on initial formula: cost ~ N*LOG2(M), where M - output_tuples.
2. Realize, that full representation of this formula is:

cost ~ N*LOG2(min{N,M})

3. In the case of multicolumn, number of comparisons for each next column can be estimated by the same formula, but arranged to a number of tuples per group:

comparisons ~ input * LOG2(min{input,M})

4. Realize, that for the case, when M > input, we should change this formula a bit:

comparisons ~ max{input,M} * LOG2(min{input,M})

Remember, that in our case M << tuples.
So, general formula for bounded heap sort can be written as:

cost ~ N * sum(max{n_i,M}/N * LOG2(min{n_i,M})), i=1,ncols

where n_1 == N, n_i - number of tuples per group, estimated from previous iteration.

In attachment - an implementation of this approach.

--
regards,
Andrey Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index 68a32740d7..2c3cce57aa 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1872,6 +1872,15 @@ get_width_cost_multiplier(PlannerInfo *root, Expr *expr)
  *
  *     N * sum( Fk * log(Gk) )
  *
+ * For bounded heap sort we haven't such a duplicates-related research. So 
invent
+ * it based on a simple logic (M - number of output tuples):
+ * For one column we can estimate number of comparisons as:
+ * N * log(min{N,M})
+ * For the case of many columns we can natively estimate number of comparisons 
by
+ * the formula:
+ * sum(max{n_i,M} * log(min{N,M})),
+ * where n_0 == N, n_i - number of tuples per group, estimated on previous 
step.
+ *
  * Note: We also consider column width, not just the comparator cost.
  *
  * NOTE: some callers currently pass NIL for pathkeys because they
@@ -1881,7 +1890,7 @@ get_width_cost_multiplier(PlannerInfo *root, Expr *expr)
 static Cost
 compute_cpu_sort_cost(PlannerInfo *root, List *pathkeys, int nPresortedKeys,
                                          Cost comparison_cost, double tuples, 
double output_tuples,
-                                         bool heapSort)
+                                         bool bounded_sort)
 {
        Cost            per_tuple_cost = 0.0;
        ListCell        *lc;
@@ -1907,7 +1916,7 @@ compute_cpu_sort_cost(PlannerInfo *root, List *pathkeys, 
int nPresortedKeys,
                 * of this function, but I'm not sure. I suggest we introduce 
some simple
                 * constants for that, instead of magic values.
                 */
-               output_tuples = (heapSort) ? 2.0 * output_tuples : tuples;
+               output_tuples = (bounded_sort) ? 2.0 * output_tuples : tuples;
                per_tuple_cost += 2.0 * cpu_operator_cost * LOG2(output_tuples);
 
                /* add cost provided by caller */
@@ -1925,8 +1934,7 @@ compute_cpu_sort_cost(PlannerInfo *root, List *pathkeys, 
int nPresortedKeys,
        {
                PathKey                         *pathkey = (PathKey*) 
lfirst(lc);
                EquivalenceMember       *em;
-               double                           nGroups,
-                                                        correctedNGroups;
+               double                           nGroups;
 
                /*
                 * We believe that equivalence members aren't very different, 
so, to
@@ -2000,24 +2008,16 @@ compute_cpu_sort_cost(PlannerInfo *root, List 
*pathkeys, int nPresortedKeys,
                 */
                if (i >= nPresortedKeys)
                {
-                       if (heapSort)
-                       {
-                               double heap_tuples;
-
-                               /* have to keep at least one group, and a 
multiple of group size */
-                               heap_tuples = Max(ceil(output_tuples / 
tuplesPerPrevGroup) * tuplesPerPrevGroup,
-                                                                 
tuplesPerPrevGroup);
-
-                               /* so how many (whole) groups is that? */
-                               correctedNGroups = ceil(heap_tuples / 
tuplesPerPrevGroup);
-                       }
+                       /*
+                        * Quick sort and 'top-N' sorting (bounded heap sort) 
algorithms
+                        * have different formulas for time complexity 
estimation.
+                        */
+                       if (bounded_sort)
+                               per_tuple_cost += totalFuncCost *
+                                                                 
(Max(tuplesPerPrevGroup, output_tuples) / tuples) *
+                                                                 LOG2(2.0 * 
Min(tuplesPerPrevGroup, output_tuples));
                        else
-                               /* all groups in the input */
-                               correctedNGroups = nGroups;
-
-                       correctedNGroups = Max(1.0, ceil(correctedNGroups));
-
-                       per_tuple_cost += totalFuncCost * 
LOG2(correctedNGroups);
+                               per_tuple_cost += totalFuncCost * LOG2(nGroups);
                }
 
                i++;

Reply via email to