On Sun, Sep 22, 2024 at 1:38 PM David Rowley <dgrowle...@gmail.com> wrote: > Just looking at the commit message: > > > The rationale is based on the assumption that incremental sort is > > always faster than full sort when there are presorted keys, a premise > > that has been applied in various parts of the code. This assumption > > does not always hold, particularly in cases with a large skew in the > > number of rows within the presorted groups. > > My understanding is that the worst case for incremental sort is the > same as sort, i.e. only 1 presorted group, which is the same effort to > sort. Is there something I'm missing?
I was thinking that when there’s a large skew in the number of rows per pre-sorted group, incremental sort might underperform full sort. As mentioned in the comments for cost_incremental_sort, it has to detect the sort groups, plus it needs to perform tuplesort_reset after each group. However, I doubt these factors would have a substantial impact on the performance of incremental sort. So maybe in the worst skewed groups case, incremental sort may still perform similarly to full sort. Another less obvious reason is that in cases of skewed groups, we may significantly underestimate the cost of incremental sort. This could result in choosing a more expensive subpath under the sort. Such as the example in [1], we end up with IndexScan->IncrementalSort rather than SeqScan->FullSort, while the former plan is more expensive to execute. However, this point does not affect this patch, because for a mergejoin, we only consider outerrel's cheapest-total-cost when we assume that an explicit sort atop is needed, i.e., we do not have a chance to select which subpath to use in this case. [1] https://postgr.es/m/CAMbWs49+CXsrgbq0LD1at-5jR=ahhn0ytdy9yvgxasmfndz...@mail.gmail.com Thanks Richard