Re: 回复: An implementation of multi-key sort

Konstantin Knizhnik Sat, 06 Jul 2024 23:32:58 -0700


On 04/07/2024 3:45 pm, Yao Wang wrote:

Generally, the benefit of mksort is mainly from duplicated values and sort
keys: the more duplicated values and sort keys are, the bigger benefit it
gets.

...

1. Use distinct stats info of table to enable mksort

It's kind of heuristics: in optimizer, check Form_pg_statistic->stadistinct
of a table via pg_statistics. Enable mksort only when it is less than a
threshold.

The hacked code works, which need to modify a couple of interfaces of
optimizer. In addition, a complete solution should consider types and
distinct values of all columns, which might be too complex, and the benefit
seems not so big.

If mksort really provides advantage only when there are a lot ofduplicates (for prefix keys?) and of small fraction of duplicates thereis even some (small) regressionthen IMHO taking in account in planner information about estimatednumber of distinct values seems to be really important. What was aproblem with accessing this statistics and why it requires modificationof optimizer interfaces? There is `get_variable_numdistinct` functionwhich is defined and used only in selfuncs.c

Information about values distribution seems to be quite useful for choosing optimal sort algorithm. Not only for multi-key sortoptimization. For example if we know min.max value of sort key and it issmall, we can use O(N) algorithm for sorting. Also it can help toestimate when TOP-N search is preferable.

Right now Posgres creates special path for incremental sort. I am notsure if we also need to be separate path for mk-sort.But IMHO if we need to change some optimizer interfaces to be able totake in account statistic and choose preferred sort algorithm atplanning time, then it should be done.If mksort can increase sort more than two times (for large number ofduplicates), it will be nice to take it in account when choosing optimalplan.

Also in this case we do not need extra GUC for explicit enabling ofmksort. There are too many parameters for optimizer and adding one morewill make tuning more complex. So I prefer that decision is take buyoptimizer itself based on the available information, especially ifcriteria seems to be obvious.



Best regards,
Konstantin

Re: 回复: An implementation of multi-key sort

Reply via email to