On 15/2/2024 18:10, Tomas Vondra wrote:


On 2/15/24 07:50, Andrei Lepikhov wrote:
On 18/12/2023 19:53, Tomas Vondra wrote:
On 12/18/23 11:40, Richard Guo wrote:
The challenge is where to get usable information about correlation
between columns. I only have a couple very rought ideas of what might
try. For example, if we have multi-column ndistinct statistics, we might
look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something from

      ndistinct(b,c,d) / ndistinct(b,c)

If we know how many distinct values we have for the predicate column, we
could then estimate the number of groups. I mean, we know that for the
restriction "WHERE b = 3" we only have 1 distinct value, so we could
estimate the number of groups as

      1 * ndistinct(b,c)
Did you mean here ndistinct(c,d) and the formula:
ndistinct(b,c,d) / ndistinct(c,d) ?

Yes, I think that's probably a more correct ... Essentially, the idea is
to estimate the change in number of distinct groups after adding a
column (or restricting it in some way).
Thanks, I got it. I just think how to implement such techniques with extensions just to test the idea in action. In the case of GROUP-BY we can use path hook, of course. But what if to invent a hook on clauselist estimation?
Do you implicitly bear in mind here the necessity of tracking clauses
that were applied to the data up to the moment of grouping?


I don't recall what exactly I considered two months ago when writing the
message, but I don't see why we would need to track that beyond what we
already have. Shouldn't it be enough for the grouping to simply inspect
the conditions on the lower levels?
Yes, exactly. I've thought about looking into baserestrictinfos and, if group-by references a subquery targetlist, into subqueries too.

--
regards,
Andrei Lepikhov
Postgres Professional



Reply via email to