On Thu, Dec 4, 2025 at 1:14 AM Sami Imseih <[email protected]> wrote: > Can we drive the decision for what to do based on optimizer > stats, i.e. n_distinct and row counts? Not sure what the calculation would > be specifically, but something else to consider.
It's happened multiple times before that someone proposes a change that makes sorting faster on some inputs, but turns out to regress on low cardinality (I've done it myself). It seems to be pretty hard not to regress that case. Occasionally the author proposes to take optimizer stats into account, and that was rejected because cardinality stats are often wildly wrong. Further, underestimation is far more common than overestimation, in which case IIUC the planner would just continue to choose the existing heap method. > We can still provide the GUC to override the optimizer decisions, > but at least the optimizer, given up-to-date stats, may get it right most > of the time. I don't have much faith that people will properly set a GUC whose effects depends on the input characteristics and memory settings. The new method might be a better overall trade-off, but we'd need some more comprehensive measurements to know what we're dealing with. -- John Naylor Amazon Web Services
