Re: Merging statistics from children instead of re-sampling everything

Andrey V. Lepikhov Thu, 10 Feb 2022 20:29:57 -0800

On 2/11/22 03:37, Tomas Vondra wrote:

That being said, this thread was not really about foreign partitions,
but about re-analyzing inheritance trees in general. And sampling
foreign partitions doesn't really solve that - we'll still do the
sampling over and over.

IMO, to solve the problem we should do two things:
1. Avoid repeatable partition scans in the case inheritance tree.

2. Avoid to re-analyze everything in the case of active changes in smallsubset of partitions.

For (1) i can imagine a solution like multiplexing: on the stage ofdefining which relations to scan, group them and prepare parameters ofscanning to make multiple samples in one shot.It looks like we need a separate logic for analysis of partitionedtables - we should form and cache samples on each partition before ananalysis.It requires a prototype to understand complexity of such solution andcan be done separately from (2).

Task (2) is more difficult to solve. Here we can store samples from eachpartition in values[] field of pg_statistic or in specific table whichstores a 'most probable values' snapshot of each table.Most difficult problem here, as you mentioned, is ndistinct value. Is itpossible to store not exactly calculated value of ndistinct, but an'expected value', based on analysis of samples and histograms onpartitions? Such value can solve also a problem of estimation of a SETOPresult grouping (joining of them, etc), where we have statistics only onsources of the union.


--
regards,
Andrey Lepikhov
Postgres Professional

Re: Merging statistics from children instead of re-sampling everything

Reply via email to