Re: MergeAppend could consider sorting cheapest child path

Alexander Korotkov Mon, 01 Sep 2025 13:27:21 -0700

On Thu, Jul 31, 2025 at 5:20 PM Andrei Lepikhov <[email protected]> wrote:
> On 27/7/2025 00:51, Alexander Korotkov wrote:
> > On Tue, Jul 22, 2025 at 2:13 PM Andrei Lepikhov <[email protected]
> > I've another idea.  cost_tuplesort() puts 2.0 under logarithm to prefer
> > tuplesort over heapsort.  I think we can adjust cost_gather_merge() and
> > cost_merge_append() to do the same.  0001 patch implements that.  I
> > think the plan changes of 0001 might be reasonable since most cases deal
> > with small rowsets.  One thing concerns me: 0002 still affects one of
> > the postgres_fdw checks.  Could you, please, take a look?
> Thanks for the idea!
> I analysed your approach a little bit.
> Initially, I ran the test script I had created previously [1] and
> discovered that on a large scale (1e6 - 1e7 tuples), the plan still
> defaults to MergeAppend, which deviates from the execution time (7190 ms
> for Sort+Append and 8450 ms for MergeAppend+Sort).
>
> Attempting to find out the reason, I combined all the costs into a
> single formula for each strategy:
>
> MergeAppend+Sort:
> total_cost =CO*ntuples*(1+2*log(ntuples)) + Ccput * 0.5 * ntuples+
> 2*CO*N*log(N) + A
> Sort+Append:
> total_cost = CO*ntuples*(1+2*log(ntuples))+ Ccput * 0.5 * ntuples + A
>
> Terms:
> - A - sum of total costs of underlying subtrees
> - CO - cpu_operator_cost
> - Ccput - cpu_tuple_cost
> - N - number of subpaths (streams)
>
> Given the significant gap in total execution time between these
> strategies, I believe it would be reasonable to introduce a coefficient
> to the equation's 'ntuples' variable component that will keep the gap
> between big quicksort and MergeAppend's heapsort out of the fuzzy factor
> gap.
>
> Discovering papers on the value of constant in quicksort [2] and
> heapsort [3], I realised that there is a difference. The constant's
> value varies in a wide range: 1.3-1.5 for quicksort and 2-3 for
> heapsort. Considering that we should change the current cost model as
> little as possible, not to break the balance, we may just increase the
> constant value for the heap sort to maintain a bare minimum gap between
> strategies out of the fuzzy factor. In this case, the merge append
> constant should be around 3.8 - 4.0.
>
> With this minor change, we see a shift in the regression tests. Most of
> these changes were introduced by the new append strategy. Although I
> haven't analysed these changes in depth yet, I believe they are all
> related to the small data sets and should fade out on a larger scale.
>
> See this minor correction in the attachment. postgres_fdw tests are
> stable now.


I have another idea.  What if we allow MergeAppend paths only when at
least one subpath is preordered.  This trick also allow us to exclude
MergeAppend(Sort) dominating Sort(Append).  I see the regression tests
changes now have much less volume and looks more reasonable.  What do
you think?

Also, do you think get_cheapest_fractional_path_for_pathkeys_ext() and
get_cheapest_path_for_pathkeys_ext() should consider incremental sort?
 The revised patch teaches them to do so.

------
Regards,
Alexander Korotkov
Supabase

v10-0001-Consider-an-explicit-sort-of-the-MergeAppend-sub.patch
Description: Binary data

Re: MergeAppend could consider sorting cheapest child path

Reply via email to