On Thu, Jul 31, 2025 at 5:20 PM Andrei Lepikhov <lepi...@gmail.com> wrote: > On 27/7/2025 00:51, Alexander Korotkov wrote: > > On Tue, Jul 22, 2025 at 2:13 PM Andrei Lepikhov <lepi...@gmail.com > > I've another idea. cost_tuplesort() puts 2.0 under logarithm to prefer > > tuplesort over heapsort. I think we can adjust cost_gather_merge() and > > cost_merge_append() to do the same. 0001 patch implements that. I > > think the plan changes of 0001 might be reasonable since most cases deal > > with small rowsets. One thing concerns me: 0002 still affects one of > > the postgres_fdw checks. Could you, please, take a look? > Thanks for the idea! > I analysed your approach a little bit. > Initially, I ran the test script I had created previously [1] and > discovered that on a large scale (1e6 - 1e7 tuples), the plan still > defaults to MergeAppend, which deviates from the execution time (7190 ms > for Sort+Append and 8450 ms for MergeAppend+Sort). > > Attempting to find out the reason, I combined all the costs into a > single formula for each strategy: > > MergeAppend+Sort: > total_cost =CO*ntuples*(1+2*log(ntuples)) + Ccput * 0.5 * ntuples+ > 2*CO*N*log(N) + A > Sort+Append: > total_cost = CO*ntuples*(1+2*log(ntuples))+ Ccput * 0.5 * ntuples + A > > Terms: > - A - sum of total costs of underlying subtrees > - CO - cpu_operator_cost > - Ccput - cpu_tuple_cost > - N - number of subpaths (streams) > > Given the significant gap in total execution time between these > strategies, I believe it would be reasonable to introduce a coefficient > to the equation's 'ntuples' variable component that will keep the gap > between big quicksort and MergeAppend's heapsort out of the fuzzy factor > gap. > > Discovering papers on the value of constant in quicksort [2] and > heapsort [3], I realised that there is a difference. The constant's > value varies in a wide range: 1.3-1.5 for quicksort and 2-3 for > heapsort. Considering that we should change the current cost model as > little as possible, not to break the balance, we may just increase the > constant value for the heap sort to maintain a bare minimum gap between > strategies out of the fuzzy factor. In this case, the merge append > constant should be around 3.8 - 4.0. > > With this minor change, we see a shift in the regression tests. Most of > these changes were introduced by the new append strategy. Although I > haven't analysed these changes in depth yet, I believe they are all > related to the small data sets and should fade out on a larger scale. > > See this minor correction in the attachment. postgres_fdw tests are > stable now.
I have another idea. What if we allow MergeAppend paths only when at least one subpath is preordered. This trick also allow us to exclude MergeAppend(Sort) dominating Sort(Append). I see the regression tests changes now have much less volume and looks more reasonable. What do you think? Also, do you think get_cheapest_fractional_path_for_pathkeys_ext() and get_cheapest_path_for_pathkeys_ext() should consider incremental sort? The revised patch teaches them to do so. ------ Regards, Alexander Korotkov Supabase
v10-0001-Consider-an-explicit-sort-of-the-MergeAppend-sub.patch
Description: Binary data