Re: MergeAppend could consider sorting cheapest child path

Andrei Lepikhov Thu, 04 Sep 2025 10:54:08 -0700

On 2/9/2025 03:27, Richard Guo wrote:

On Tue, Sep 2, 2025 at 5:26 AM Alexander Korotkov <[email protected]> wrote:

I have another idea.  What if we allow MergeAppend paths only when at
least one subpath is preordered.  This trick also allow us to exclude
MergeAppend(Sort) dominating Sort(Append).  I see the regression tests
changes now have much less volume and looks more reasonable.  What do
you think?


I skimmed through the test case changes, and I'm not sure all of them
are actual improvements.  For example:

           ->  Append
-               ->  Foreign Scan on ftprt1_p1 t1_1
+               ->  Sort
+                     Sort Key: t1_1.a
+                     ->  Foreign Scan on ftprt1_p1 t1_1
                 ->  Foreign Scan on ftprt1_p2 t1_2

It seems that this patch moves the sort operation for ftprt1_p1 from
the remote server to local.  I'm not sure if this is an improvement,
or why it applies only to ftprt1_p1 and not to ftprt1_p2 (they have
very similar statistics).

I had a look into this case. The next stuff happens.

Initially, within generate_orderedappend_paths, the planner creates anAppend according to the 'match_partition_order' strategy, whichdominates the others.Next, pathlists of 'Foreign Scan on ftprt1_p1' and 'Foreign Scan onftprt1_p2' are different: the first one contains two paths:

1. startup_cost: 100.000, total_cost: 103.090, pathkeys: false
2. startup_cost: 102.880, total_cost: 103.110, pathkeys: true

And the second subpath has only one option to scan:
startup_cost: 100.000, total_cost: 103.660, pathkeys: true

Before, the optimiser always chose the path with pathkeys. However, thispatch attempts to do its best by comparing ForeignScan+Sort and ForeignScan.

Comparing the total path with the explicit Sort and pre-sorted one, we have:
- ForeignScan+Sort: startup_cost: 103.100, total_cost: 103.105
- Presorted: startup_cost: 102.880, total_cost: 103.110

And here is the issue: a difference in the third sign after decimalpoint. Let's check remote estimations with and without Sort:


With:
LockRows (cost=2.88..2.90 rows=1 width=25)
-> Sort (cost=2.88..2.89 rows=1 width=25)
Sort Key: t1.a
-> Seq Scan on public.fprt1_p1 t1 (cost=0.00..2.88 ...

Without:
LockRows (cost=0.00..2.88 rows=1 width=25)
-> Seq Scan on public.fprt1_p1 t1 (cost=0.00..2.88 ...

As you can see, according to these estimations, LockRows costs nothingwithout sorting and 0.1 with Sort. So, fluctuation was added byEXPLAIN's rounding.

What to do? At first, we can do nothing and just correct the output. ButI don't like unstable tests. We can adjust the query slightly toincrease the estimations or improve the estimation using extendedstatistics. I prefer the more elegant variant with extended statistics.See the attachment for a sketch on how to stabilise the output. Withthis patch applied before this feature, the test output stays the same.


Besides, I noticed that some plans have changed from an "Index Scan
with Index Cond" to a "Seq Scan with Filter + Sort".  I'm also not
sure whether this change results in better performance.

As you know, according to the cost model, SeqScan looks better on scansof tiny tables and full scans. I didn't delve as deeply into these casesyet as I did in the previous one, but it's clear that we're still seeingthe issue with tiny tables.


--
regards, Andrei Lepikhov

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index d3323b04676..cad0f35801e 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -10478,21 +10478,25 @@ SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 
't1_phv' phv, * FROM fprt1 WHERE
 (14 rows)
 
 -- test FOR UPDATE; partitionwise join does not apply
+CREATE STATISTICS stat1 ON (a % 25) FROM fprt1_p1;
+ANALYZE fprt1_p1;
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE 
t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF t1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  LockRows
-   ->  Nested Loop
-         Join Filter: (t1.a = t2.b)
-         ->  Append
-               ->  Foreign Scan on ftprt1_p1 t1_1
-               ->  Foreign Scan on ftprt1_p2 t1_2
-         ->  Materialize
+   ->  Sort
+         Sort Key: t1.a
+         ->  Hash Join
+               Hash Cond: (t2.b = t1.a)
                ->  Append
                      ->  Foreign Scan on ftprt2_p1 t2_1
                      ->  Foreign Scan on ftprt2_p2 t2_2
-(10 rows)
+               ->  Hash
+                     ->  Append
+                           ->  Foreign Scan on ftprt1_p1 t1_1
+                           ->  Foreign Scan on ftprt1_p2 t1_2
+(12 rows)
 
 SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE 
t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF t1;
   a  |  b  
@@ -10503,6 +10507,7 @@ SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON 
(t1.a = t2.b) WHERE t1.a
  400 | 400
 (4 rows)
 
+DROP STATISTICS stat1;
 RESET enable_partitionwise_join;
 -- ===================================================================
 -- test partitionwise aggregates
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql 
b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2c609e060b7..fbcec9dfb71 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3308,9 +3308,12 @@ SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' 
phv, * FROM fprt1 WHERE
 SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 
WHERE a % 25 = 0) t1 FULL JOIN (SELECT 't2_phv' phv, * FROM fprt2 WHERE b % 25 
= 0) t2 ON (t1.a = t2.b) ORDER BY t1.a, t2.b;
 
 -- test FOR UPDATE; partitionwise join does not apply
+CREATE STATISTICS stat1 ON (a % 25) FROM fprt1_p1;
+ANALYZE fprt1_p1;
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE 
t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF t1;
 SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE 
t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF t1;
+DROP STATISTICS stat1;
 
 RESET enable_partitionwise_join;

Re: MergeAppend could consider sorting cheapest child path

Reply via email to