Re: Using multiple extended statistics for estimates

Tomas Vondra Thu, 14 Nov 2019 12:05:07 -0800

On Thu, Nov 14, 2019 at 10:23:44AM -0800, Mark Dilger wrote:

On 11/14/19 7:55 AM, Tomas Vondra wrote:
On Wed, Nov 13, 2019 at 10:04:36AM -0800, Mark Dilger wrote:
On 11/13/19 7:28 AM, Tomas Vondra wrote:
Hi,

here's an updated patch, with some minor tweaks based on the review and
added tests (I ended up reworking those a bit, to make them more like
the existing ones).
Thanks, Tomas, for the new patch set!
Attached are my review comments so far, in the form of a patchapplied on top of yours.
Thanks.

1) It's not clear to me why adding 'const' to the List parameters would
  be useful? Can you explain?
When I first started reviewing the functions, I didn't know if thoselists were intended to be modified by the function. Adding 'const'helps document that the function does not intend to change them.


Hmmm, ok. I'll think about it, but we're not really using const* in this
way very much I think - at least not in the surrounding code.

2) I think you're right we can change find_strongest_dependency to do

   /* also skip weaker dependencies when attribute count matches */
   if (strongest->nattributes == dependency->nattributes &&
       strongest->degree >= dependency->degree)
       continue;

  That'll skip some additional dependencies, which seems OK.

3) It's not clear to me what you mean by

    * TODO: Improve this code comment.  Specifically, why would we
    * ignore that no rows will match?  It seems that such a discovery
    * would allow us to return an estimate of 0 rows, and that would
    * be useful.

  added to dependencies_clauselist_selectivity. Are you saying we
  should also compute selectivity estimates for individual clauses and
  use Min() as a limit? Maybe, but that seems unrelated to the patch.
I mean that the comment right above that TODO is hard to understand.You seem to be saying that it is good and proper to only take theselectivity estimate from the final clause in the list, but then go onto say that other clauses might prove that no rows will match. Sothat implies that by ignoring all but the last clause, we're ignoringsuch other clauses that prove no rows can match. But why would we beignoring those?
I am not arguing that your code is wrong. I'm just critiquing thehard-to-understand phrasing of that code comment.


Aha, I think I understand now - thanks for the explanation. You're right
the comment is trying to explain why just taking the last clause for a
given attnum is fine. I'll try to make the comment clearer.

For the case with equal Const values that should be mostly obvious, i.e.
"a=1 AND a=1 AND a=1" has the same selectivity as "a=1".

The case with different Const values is harder, unfortunately. It might
seem obvious that "a=1 AND a=2" means there are no matching rows, but
that heavily relies on the semantics of the equality operator. And we
can't simply compare the Const values either, I'm afraid, because there
are cases with cross-type operators like

 a = 1::int AND a = 1.0::numeric

where the Consts are of different type, yet both conditions can be true.

So it would be pretty tricky to do this, and the current code does not
even try to do that.

Instead, it just assumes that it's mostly fine to overestimate, because
then at runtime we'll simply end up with 0 rows here.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Using multiple extended statistics for estimates

Reply via email to