On 07/12/2018 05:00 PM, Teodor Sigaev wrote:
>
>> One more thought about estimating the worst case - I wonder if simply
>> multiplying the per-tuple cost by 1.5 is the right approach. It does not
>> seem particularly principled, and it's trivial simple to construct
>> counter-examples defeating it (imagine columns with 99% of the rows
>> having the same value, and then many values in exactly one row - that
>> will defeat the ndistinct-based group size estimates).
>
> Agree. But right now that is best what we have. and constant 1.5 easily
> could be changed to 1 or 10 - it is just arbitrary value, intuitively
> chosen. As I mentioned in v7 patch description, new estimation function
> provides ~10% bigger estimation and this constant should not be very
> large, because we could easily get overestimation.
>
>>
>> The reason why constructing such counter-examples is so simple is that
>> this relies entirely on the ndistinct stats, ignoring the other stats we
>> already have. I think this might leverage the per-column MCV lists (and
>> eventually multi-column MCV lists, if that ever gets committed).
>>
>> We're estimating the number of tuples in group (after fixing values in
>> the preceding columns), because that's what determines the number of
>> comparisons we need to make at a given stage. The patch does this by
>> estimating the average group size, by estimating ndistinct of the
>> preceding columns G(i-1), and computing ntuples/G(i-1).
>>
>> But consider that we also have MCV lists for each column - if there is a
>> very common value, it should be there. So let's say M(i) is a frequency
>> of the most common value in i-th column, determined either from the MCV
>> list or as 1/ndistinct for that column. Then if m(i) is minimum of M(i)
>> for the fist i columns, then the largest possible group of values when
>> comparing values in i-th column is
>>
>> N * m(i-1)
>>
>> This seems like a better way to estimate the worst case. I don't know if
>> this should be used instead of NG(i), or if those two estimates should
>> be combined in some way.
> Agree, definitely we need an estimation of larger group size to use it
> in cost_sort. But I don't feel power to do that, pls, could you attack a
> computing group size issue? Thank you.
>
Attached is a simple patch illustrating this MCV-based approach. I don't
claim it's finished / RFC, but hopefully it's sufficient to show what I
meant. I'm not sure how to plug it into the v8 of your patch at this
point, so I've simply added an elog to estimate_num_groups to also print
the estimate or largest group for GROUP BY queries.
It's largely a simplified copy of estimate_num_groups() and there's a
couple of comments of how it might be improved further.
>>
>> I think estimating the largest group we need to sort should be helpful
>> for the incremental sort patch, so I'm adding Alexander to this
>> thread.Agree
> I think so. suggested estimation algorithm should be modified to support
> leading preordered keys and this form could be directly used in GROUP BY
> and incremental sort patches.
Right. What I think the function estimating the group size could do in
case of incremental sort is producing two values - maximum size of the
leading group, and maximum group size overall. The first one would be
useful for startup cost, the second one for total cost.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index f1c78ffb65..2d91c17866 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3424,6 +3424,9 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
ListCell *l;
int i;
+ /* DEBUG */
+ elog(WARNING, "estimate_largest_group (%d exprs) %f", list_length(groupExprs), estimate_largest_group(root, groupExprs, input_rows));
+
/*
* We don't ever want to return an estimate of zero groups, as that tends
* to lead to division-by-zero and other unpleasantness. The input_rows
@@ -3735,6 +3738,265 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
}
/*
+ * estimate_largest_group - Estimate size of largest group of rows
+ *
+ * Given a query having a GROUP BY or ORDER BY clause, estimate the size
+ * of the largest group.
+ *
+ * Inputs:
+ * root - the query
+ * groupExprs - list of expressions being grouped by
+ * input_rows - number of rows estimated to arrive at the group/unique
+ * filter step
+ *
+ * Estimating average size of a group can be done simply as 1/N where
+ * N is the number of groups obtained from estimate_num_groups. But for
+ * various places we are more interested in a worst-case scenario, i.e.
+ * the size of the largest group of rows (which may be significantly
+ * larger than the average group).
+ *
+ * To estimate size of the largest group, we use MCV (and additional
+ * statistics like null_frac) for individual columns.
+ *
+ * Given the lack of any cross-correlation statistics in the system, it's
+ * impossible to do anything really trustworthy with GROUP BY and ORDER BY
+ * conditions involving multiple Vars.
+ *
+ * Our current approach is as follows:
+ * 1. Reduce the given expressions to a list of unique Vars used. For
+ * example, ORDER BY a, a + b is treated the same as ORDER BY a, b.
+ * It is clearly correct not to count the same Var more than once.
+ * It is also reasonable to treat f(x) the same as x: f() cannot
+ * increase the number of distinct values (unless it is volatile,
+ * which we consider unlikely for grouping or sorting), but it
+ * probably won't reduce the number of distinct values much either.
+ * As a special case, if the expression can be matched to an
+ * expressional index for which we have statistics, then we treat
+ * the whole expression as though it were just a Var.
+ * 2. For Vars within a single source rel, we determine the largest group
+ * for each (from MCV list, null_frac and ndistinct), and then compute
+ * the minimum of these values.
+ * 3. If there are Vars from multiple rels, we repeat step 2 for each such
+ * rel, and multiply the results together.
+ */
+double
+estimate_largest_group(PlannerInfo *root, List *groupExprs, double input_rows)
+{
+ List *varinfos = NIL;
+ double srf_multiplier = 1.0;
+ ListCell *l;
+ int i;
+
+ /* frequency of the largest group */
+ double min_group_frequency = 1.0;
+ double max_group_rows;
+
+ /*
+ * We don't ever want to return an estimate of zero groups, as that tends
+ * to lead to division-by-zero and other unpleasantness. The input_rows
+ * estimate is usually already at least 1, but clamp it just in case it
+ * isn't.
+ */
+ input_rows = clamp_row_est(input_rows);
+
+ /*
+ * If no grouping columns, there's exactly one group. (This can't happen
+ * for normal cases with GROUP BY or DISTINCT, but it is possible for
+ * corner cases with set operations.)
+ */
+ if (groupExprs == NIL)
+ return input_rows;
+
+ i = 0;
+ foreach(l, groupExprs)
+ {
+ Node *groupexpr = (Node *) lfirst(l);
+ double this_srf_multiplier;
+ VariableStatData vardata;
+ List *varshere;
+ ListCell *l2;
+
+ /*
+ * Set-returning functions in grouping columns are a bit problematic.
+ * The code below will effectively ignore their SRF nature and come up
+ * with a numdistinct estimate as though they were scalar functions.
+ * We compensate by scaling up the end result by the largest SRF
+ * rowcount estimate. (This will be an overestimate if the SRF
+ * produces multiple copies of any output value, but it seems best to
+ * assume the SRF's outputs are distinct. In any case, it's probably
+ * pointless to worry too much about this without much better
+ * estimates for SRF output rowcounts than we have today.)
+ */
+ this_srf_multiplier = expression_returns_set_rows(groupexpr);
+ if (srf_multiplier < this_srf_multiplier)
+ srf_multiplier = this_srf_multiplier;
+
+ /*
+ * If examine_variable is able to deduce anything about the GROUP BY
+ * expression, treat it as a single variable even if it's really more
+ * complicated.
+ */
+ examine_variable(root, groupexpr, 0, &vardata);
+ if (HeapTupleIsValid(vardata.statsTuple) || vardata.isunique)
+ {
+ varinfos = add_unique_group_var(root, varinfos,
+ groupexpr, &vardata);
+ ReleaseVariableStats(vardata);
+ continue;
+ }
+ ReleaseVariableStats(vardata);
+
+ /*
+ * Else pull out the component Vars. Handle PlaceHolderVars by
+ * recursing into their arguments (effectively assuming that the
+ * PlaceHolderVar doesn't change the number of groups, which boils
+ * down to ignoring the possible addition of nulls to the result set).
+ */
+ varshere = pull_var_clause(groupexpr,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * If we find any variable-free GROUP BY item, then either it is a
+ * constant (and we can ignore it) or it contains a volatile function;
+ * in the latter case we punt and assume that each input row will
+ * yield a distinct group.
+ */
+ if (varshere == NIL)
+ {
+ if (contain_volatile_functions(groupexpr))
+ return 1.0; /* XXX Maybe return srf_multiplier instead? */
+ continue;
+ }
+
+ /*
+ * Else add variables to varinfos list
+ */
+ foreach(l2, varshere)
+ {
+ Node *var = (Node *) lfirst(l2);
+
+ examine_variable(root, var, 0, &vardata);
+ varinfos = add_unique_group_var(root, varinfos, var, &vardata);
+ ReleaseVariableStats(vardata);
+ }
+ }
+
+ /*
+ * If now no Vars, we must have an all-constant GROUP BY list, so the
+ * whole result is one large group.
+ *
+ * XXX Apply the srf_multiplier here?
+ */
+ if (varinfos == NIL)
+ return input_rows;
+
+ /*
+ * Group Vars by relation and estimate per-relation largest group.
+ *
+ * For each iteration of the outer loop, we process the frontmost Var in
+ * varinfos, plus all other Vars in the same relation. We remove these
+ * Vars from the newvarinfos list for the next iteration. This is the
+ * easiest way to group Vars of same rel together.
+ */
+ do
+ {
+ GroupVarInfo *varinfo1 = (GroupVarInfo *) linitial(varinfos);
+ List *newvarinfos = NIL;
+ List *relvarinfos = NIL;
+
+ /*
+ * Split the list of varinfos in two - one for the current rel, one
+ * for remaining Vars on other rels.
+ */
+ relvarinfos = lcons(varinfo1, relvarinfos);
+ for_each_cell(l, lnext(list_head(varinfos)))
+ {
+ GroupVarInfo *varinfo2 = (GroupVarInfo *) lfirst(l);
+
+ if (varinfo2->rel == varinfo1->rel)
+ {
+ /* varinfos on current rel */
+ relvarinfos = lcons(varinfo2, relvarinfos);
+ }
+ else
+ {
+ /* not time to process varinfo2 yet */
+ newvarinfos = lcons(varinfo2, newvarinfos);
+ }
+ }
+
+ /*
+ * Get the largest group estimate for the Vars of this rel. For
+ * each Var we look for MCV first - if there's one, we take the
+ * frequency of the first (most common) item. If there's no MCV
+ * for a Var, we assume all groups are about equal, and instead
+ * use 1/ndistinct.
+ *
+ * XXX This should probably consider null_frac.
+ */
+ foreach(l, relvarinfos)
+ {
+ GroupVarInfo *varinfo2 = (GroupVarInfo *) lfirst(l);
+ Node *var = varinfo2->var;
+ AttStatsSlot sslot;
+ VariableStatData vardata;
+ double group_frequency;
+
+ examine_variable(root, var, 0, &vardata);
+ varinfos = add_unique_group_var(root, varinfos, var, &vardata);
+
+ /* assume we have uniformly distributed groups */
+ group_frequency = 1.0 / varinfo2->ndistinct;
+
+ /* but if we have MCV list, we take the first value
+ *
+ * XXX We could also sort the MCV items, and take the first
+ * one (this would help for incremental sort, particularly
+ * when caring about startup cost, as e.g. for LIMIT). We
+ * could also produce two values - largest first group and
+ * largest group in general, and use both for costing.
+ */
+ if (HeapTupleIsValid(vardata.statsTuple) &&
+ get_attstatsslot(&sslot, vardata.statsTuple,
+ STATISTIC_KIND_MCV, InvalidOid,
+ ATTSTATSSLOT_NUMBERS))
+ group_frequency = sslot.numbers[i];
+
+ /*
+ * TODO perhaps consider null_frac here, depending on NULLS
+ * FIRST/LAST (again, more relevant to incremental sort).
+ */
+
+ if (group_frequency < min_group_frequency)
+ min_group_frequency = group_frequency;
+
+ ReleaseVariableStats(vardata);
+ }
+
+ varinfos = newvarinfos;
+ } while (varinfos != NIL);
+
+ /* compute the group size (frequency -> rows) */
+ max_group_rows = min_group_frequency * input_rows;
+
+ /* Now we can account for the effects of any SRFs */
+ max_group_rows *= srf_multiplier;
+
+ /* Round off */
+ max_group_rows = ceil(max_group_rows);
+
+ /* Guard against out-of-range answers */
+ if (max_group_rows > input_rows)
+ max_group_rows = input_rows;
+ if (max_group_rows < 1.0)
+ max_group_rows = 1.0;
+
+ return max_group_rows;
+}
+
+/*
* Estimate hash bucket statistics when the specified expression is used
* as a hash key for the given number of buckets.
*
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 95e44280c4..976df7720f 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -209,6 +209,9 @@ extern void mergejoinscansel(PlannerInfo *root, Node *clause,
extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
double input_rows, List **pgset);
+extern double estimate_largest_group(PlannerInfo *root, List *groupExprs,
+ double input_rows);
+
extern void estimate_hash_bucket_stats(PlannerInfo *root,
Node *hashkey, double nbuckets,
Selectivity *mcv_freq,