On Thu, Nov 07, 2019 at 01:38:20PM +0900, Kyotaro Horiguchi wrote:
Hello.
At Wed, 6 Nov 2019 20:58:49 +0100, Tomas Vondra <tomas.von...@2ndquadrant.com>
wrote in
>Here is a slightly polished v2 of the patch, the main difference being
>that computing clause_attnums was moved to a separate function.
>
This time with the attachment ;-)
This patch is a kind of straight-forward, which repeats what the
previous statext_mcv_clauselist_selectivity did as long as remaining
clauses matches any of MV-MCVs. Almost no regression in the cases
where zero or just one MV-MCV applies to the given clause list.
It applies cleanly on the current master and seems working as
expected.
I have some comments.
Could we have description in the documentation on what multiple
MV-MCVs are used in a query? And don't we need some regression tests?
Yes, regression tests are certainly needed - I though I've added them,
but it seems I failed to include them in the patch. Will fix.
I agree it's probably worth mentioning we can consider multiple stats,
but I'm a bit hesitant to put the exact rules how we pick the "best"
statistic to the docs. It's not 100% deterministic and it's likely
we'll need to tweak it a bit in the future.
I'd prefer showing the stats in EXPLAIN, but that's a separate patch.
+/*
+ * statext_mcv_clause_attnums
+ * Recalculate attnums from compatible but not-yet-estimated
clauses.
It returns attnums collected from multiple clause*s*. Is the name OK
with "clause_attnums"?
The comment says as if it checks the compatibility of each clause but
the work is done in the caller side. I'm not sure such strictness is
required, but it might be better that the comment represents what
exactly the function does.
But the incompatible clauses have the pre-computed attnums set to NULL,
so technically the comment is correct. But I'll clarify.
+ */
+static Bitmapset *
+statext_mcv_clause_attnums(int nclauses, Bitmapset **estimatedclauses,
+ Bitmapset **list_attnums)
The last two parameters are in the same type in notation but in
different actual types.. that is, one is a pointer to Bitmapset*, and
another is an array of Bitmaptset*. The code in the function itself
suggests that, but it would be helpful if a brief explanation of the
parameters is seen in the function comment.
OK, will explain in a comment.
+ /*
+ * Recompute attnums in the remaining clauses (we simply use
the bitmaps
+ * computed earlier, so that we don't have to inspect the
clauses again).
+ */
+ clauses_attnums =
statext_mcv_clause_attnums(list_length(clauses),
Couldn't we avoid calling this function twice with the same parameters
at the first round in the loop?
Hmmm, yeah. That's a good point.
+ foreach(l, clauses)
{
- stat_clauses = lappend(stat_clauses, (Node *)
lfirst(l));
- *estimatedclauses = bms_add_member(*estimatedclauses,
listidx);
+ /*
+ * If the clause is compatible with the selected
statistics, mark it
+ * as estimated and add it to the list to estimate.
+ */
+ if (list_attnums[listidx] != NULL &&
+ bms_is_subset(list_attnums[listidx],
stat->keys))
+ {
+ stat_clauses = lappend(stat_clauses, (Node *)
lfirst(l));
+ *estimatedclauses =
bms_add_member(*estimatedclauses, listidx);
+ }
The loop runs through all clauses every time. I agree that that is
better than using a copy of the clauses to avoid to step on already
estimated clauses, but maybe we need an Assertion that the listidx is
not a part of estimatedclauses to make sure no clauses are not
estimated twice.
Well, we can't really operate on a smaller "copy" of the list anyway,
because that would break the precalculation logic (the listidx value
would be incorrect for the new list), and tweaking it would be more
expensive than just iterating over all clauses. The assumption is that
we won't see extremely large number of clauses here.
Adding an assert seems reasonable. And maybe a comment why we should not
see any already-estimated clauses here.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services