On 2/3/2025 09:53, Alexander Korotkov wrote:
Hi, Andrei!

On Fri, Dec 6, 2024 at 10:13 AM Andrei Lepikhov <lepi...@gmail.com> wrote:

On 12/6/24 13:48, Andrei Lepikhov wrote:
On 11/2/24 01:18, Nikita Malakhov wrote:
I've corrected failing test and created a patch at Commitfest:
https://commitfest.postgresql.org/51/5361/ <https://
commitfest.postgresql.org/51/5361/>
I have played around with this feature, which looks promising for such a
tiny change. It provides a 'bottom boundary' recommendation for
appending subpaths, participating in the 'fractional branch' of paths.
As I see it works consistently with the plans, created for plain tables
filled with similar data.
According to the proposal to change SeqScan logic, IMO, Andy is right.
But it is a separate improvement because it wouldn't work in the case of
LIMIT 10 or 100, as the newly added regression tests demonstrate.
I think this feature gives sensible profit for partitionwise paths.
Pushing this knowledge into subpaths could help postgres_fdw to reduce
network traffic.

See the attached patch: regression tests added; *_ext function removed -
I think we wouldn't back-patch it into minor releases.

Thank you for revising this patch.  I've a couple of questions for you.
1. You revise get_cheapest_fractional_path() to reject parametrized
paths in all the cases.  Are you sure this wouldn't affect other
callers of this function?
Yes, in my opinion, it may not affect any caller for now. As you can see, the routine is used for upper-rels only. Such RelOptInfos can't contain parameterised paths for now. I already attempted to change the parameterisation code and let queries pull parameterisation from subqueries and subplans, but it is too invasive. It seems this logic will stay as it is for a long time.
I added a comment to explain this logic.
What's more, since all subpaths of an Append must have the same parameterisation, we simplify the case and just don't discover this option.

2. As usage of root->tuple_fraction RelOptInfo it has been criticized,
do you think we could limit this to some simple cases?  For instance,
check that RelOptInfo is the final result relation for given root.
I believe that using tuple_fraction is not an issue. Instead, it serves as a flag that allows the upper-level optimisation to consider additional options. The upper-level optimiser has more variants to examine and will select the optimal path based on the knowledge available at that level. Therefore, we're not introducing a mistake here; we're simply adding extra work in the narrow case. However, having only the bottom-up planning process, I don't see how we could avoid this additional workload.

--
regards, Andrei Lepikhov
From 4de8b1f9d8f22472c8811aad0f5853c436e79e4f Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepi...@gmail.com>
Date: Thu, 5 Dec 2024 15:15:49 +0700
Subject: [PATCH v2] Teach Append to consider tuple_fraction when accumulating
 subpaths.

This change is dedicated to more active usage of IndexScan and parameterised
NestLoop paths in partitioned cases under an Append node, as it already works
with plain tables. As newly added regression tests demonstrate, it should
provide more smartness to the partitionwise technique.

Having an indication of how many tuples are needed, it may be more meaningful
to use the 'fractional branch' subpaths of the Append path list, which are more
optimal for this specific number of tuples. Planning on a higher level, if
the optimiser needs all the tuples, it will choose non-fractional paths.
In the case when, during execution, Append needs to return fewer tuples than
declared by tuple_fraction, it would not be harmful to use the 'intermediate'
variant of paths. However, it will earn a considerable profit if a sensible set
of tuples is selected.

The change of the existing regression test demonstrates the positive outcome of
this feature - instead of scanning the whole table, the optimiser prefers
to use a parameterised scan, being aware of the only single tuple the join has
to produce to perform the query.

Discussion: 
https://www.postgresql.org/message-id/flat/CAN-LCVPxnWB39CUBTgOQ9O7Dd8DrA_tpT1EY3LNVnUuvAX1NjA%40mail.gmail.com
---
 src/backend/optimizer/path/allpaths.c        |  18 ++-
 src/backend/optimizer/plan/planner.c         |   7 ++
 src/test/regress/expected/partition_join.out | 116 +++++++++++++++++++
 src/test/regress/expected/union.out          |  15 ++-
 src/test/regress/sql/partition_join.sql      |  21 ++++
 5 files changed, 167 insertions(+), 10 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index b5bc9b602e2..d01acaf0b14 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1371,9 +1371,23 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo 
*rel,
                 */
                if (rel->consider_startup && childrel->cheapest_startup_path != 
NULL)
                {
+                       Path       *cheapest_path;
+
+                       /*
+                        * Having an indication on how much tuples the query 
should provide
+                        * the optimiser tries to choose the path, optimal for 
that specific
+                        * number of tuples.
+                        */
+                       if (root->tuple_fraction > 0.0)
+                               cheapest_path =
+                                       get_cheapest_fractional_path(childrel,
+                                                                               
                         root->tuple_fraction);
+                       else
+                               cheapest_path = childrel->cheapest_startup_path;
+
                        /* cheapest_startup_path must not be a parameterized 
path. */
-                       Assert(childrel->cheapest_startup_path->param_info == 
NULL);
-                       
accumulate_append_subpath(childrel->cheapest_startup_path,
+                       Assert(cheapest_path->param_info == NULL);
+                       accumulate_append_subpath(cheapest_path,
                                                                          
&startup_subpaths,
                                                                          NULL);
                }
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 36ee6dd43de..1a3f2670208 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6414,6 +6414,10 @@ make_sort_input_target(PlannerInfo *root,
  *       Find the cheapest path for retrieving a specified fraction of all
  *       the tuples expected to be returned by the given relation.
  *
+ * Do not consider parameterized paths; the caller might need a path as an
+ * append subpath and could become limited by the treatment of similar
+ * parameterization of all the subpaths. See 5b7b551 for details.
+ *
  * We interpret tuple_fraction the same way as grouping_planner.
  *
  * We assume set_cheapest() has been run on the given rel.
@@ -6436,6 +6440,9 @@ get_cheapest_fractional_path(RelOptInfo *rel, double 
tuple_fraction)
        {
                Path       *path = (Path *) lfirst(l);
 
+               if (path->param_info)
+                       continue;
+
                if (path == rel->cheapest_total_path ||
                        compare_fractional_path_costs(best_path, path, 
tuple_fraction) <= 0)
                        continue;
diff --git a/src/test/regress/expected/partition_join.out 
b/src/test/regress/expected/partition_join.out
index 34b963ce6fe..fb17eb37b43 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -5260,6 +5260,122 @@ SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y 
USING (id) ORDER BY x.id DE
                      Index Cond: (id = x_2.id)
 (11 rows)
 
+--
+-- Test Append's fractional paths
+--
+CREATE INDEX pht1_c_idx ON pht1(c);
+-- SeqScan might be the best choice if we need one single tuple
+EXPLAIN (COSTS OFF) SELECT * FROM pht1 p1 JOIN pht1 p2 USING (c) LIMIT 1;
+                    QUERY PLAN                    
+--------------------------------------------------
+ Limit
+   ->  Append
+         ->  Nested Loop
+               Join Filter: (p1_1.c = p2_1.c)
+               ->  Seq Scan on pht1_p1 p1_1
+               ->  Materialize
+                     ->  Seq Scan on pht1_p1 p2_1
+         ->  Nested Loop
+               Join Filter: (p1_2.c = p2_2.c)
+               ->  Seq Scan on pht1_p2 p1_2
+               ->  Materialize
+                     ->  Seq Scan on pht1_p2 p2_2
+         ->  Nested Loop
+               Join Filter: (p1_3.c = p2_3.c)
+               ->  Seq Scan on pht1_p3 p1_3
+               ->  Materialize
+                     ->  Seq Scan on pht1_p3 p2_3
+(17 rows)
+
+-- Increase number of tuples requested and an IndexScan will be chosen
+EXPLAIN (COSTS OFF) SELECT * FROM pht1 p1 JOIN pht1 p2 USING (c) LIMIT 100;
+                               QUERY PLAN                               
+------------------------------------------------------------------------
+ Limit
+   ->  Append
+         ->  Nested Loop
+               ->  Seq Scan on pht1_p1 p1_1
+               ->  Memoize
+                     Cache Key: p1_1.c
+                     Cache Mode: logical
+                     ->  Index Scan using pht1_p1_c_idx on pht1_p1 p2_1
+                           Index Cond: (c = p1_1.c)
+         ->  Nested Loop
+               ->  Seq Scan on pht1_p2 p1_2
+               ->  Memoize
+                     Cache Key: p1_2.c
+                     Cache Mode: logical
+                     ->  Index Scan using pht1_p2_c_idx on pht1_p2 p2_2
+                           Index Cond: (c = p1_2.c)
+         ->  Nested Loop
+               ->  Seq Scan on pht1_p3 p1_3
+               ->  Memoize
+                     Cache Key: p1_3.c
+                     Cache Mode: logical
+                     ->  Index Scan using pht1_p3_c_idx on pht1_p3 p2_3
+                           Index Cond: (c = p1_3.c)
+(23 rows)
+
+-- If the almost all the data should be fetched - prefer SeqScan
+EXPLAIN (COSTS OFF) SELECT * FROM pht1 p1 JOIN pht1 p2 USING (c) LIMIT 1000;
+                    QUERY PLAN                    
+--------------------------------------------------
+ Limit
+   ->  Append
+         ->  Hash Join
+               Hash Cond: (p1_1.c = p2_1.c)
+               ->  Seq Scan on pht1_p1 p1_1
+               ->  Hash
+                     ->  Seq Scan on pht1_p1 p2_1
+         ->  Hash Join
+               Hash Cond: (p1_2.c = p2_2.c)
+               ->  Seq Scan on pht1_p2 p1_2
+               ->  Hash
+                     ->  Seq Scan on pht1_p2 p2_2
+         ->  Hash Join
+               Hash Cond: (p1_3.c = p2_3.c)
+               ->  Seq Scan on pht1_p3 p1_3
+               ->  Hash
+                     ->  Seq Scan on pht1_p3 p2_3
+(17 rows)
+
+SET max_parallel_workers_per_gather = 1;
+SET debug_parallel_query = on;
+-- Partial paths also should be smart enough to employ limits
+EXPLAIN (COSTS OFF) SELECT * FROM pht1 p1 JOIN pht1 p2 USING (c) LIMIT 100;
+                                  QUERY PLAN                                  
+------------------------------------------------------------------------------
+ Gather
+   Workers Planned: 1
+   Single Copy: true
+   ->  Limit
+         ->  Append
+               ->  Nested Loop
+                     ->  Seq Scan on pht1_p1 p1_1
+                     ->  Memoize
+                           Cache Key: p1_1.c
+                           Cache Mode: logical
+                           ->  Index Scan using pht1_p1_c_idx on pht1_p1 p2_1
+                                 Index Cond: (c = p1_1.c)
+               ->  Nested Loop
+                     ->  Seq Scan on pht1_p2 p1_2
+                     ->  Memoize
+                           Cache Key: p1_2.c
+                           Cache Mode: logical
+                           ->  Index Scan using pht1_p2_c_idx on pht1_p2 p2_2
+                                 Index Cond: (c = p1_2.c)
+               ->  Nested Loop
+                     ->  Seq Scan on pht1_p3 p1_3
+                     ->  Memoize
+                           Cache Key: p1_3.c
+                           Cache Mode: logical
+                           ->  Index Scan using pht1_p3_c_idx on pht1_p3 p2_3
+                                 Index Cond: (c = p1_3.c)
+(26 rows)
+
+RESET debug_parallel_query;
+-- Remove indexes from the partitioned table and its partitions
+DROP INDEX pht1_c_idx CASCADE;
 -- cleanup
 DROP TABLE fract_t;
 RESET max_parallel_workers_per_gather;
diff --git a/src/test/regress/expected/union.out 
b/src/test/regress/expected/union.out
index caa8fe70a05..96962817ed4 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -1472,18 +1472,17 @@ select t1.unique1 from tenk1 t1
 inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0
    union all
 (values(1)) limit 1;
-                       QUERY PLAN                       
---------------------------------------------------------
+                             QUERY PLAN                              
+---------------------------------------------------------------------
  Limit
    ->  Append
          ->  Nested Loop
-               Join Filter: (t1.tenthous = t2.tenthous)
-               ->  Seq Scan on tenk1 t1
-               ->  Materialize
-                     ->  Seq Scan on tenk2 t2
-                           Filter: (thousand = 0)
+               ->  Seq Scan on tenk2 t2
+                     Filter: (thousand = 0)
+               ->  Index Scan using tenk1_thous_tenthous on tenk1 t1
+                     Index Cond: (tenthous = t2.tenthous)
          ->  Result
-(9 rows)
+(8 rows)
 
 -- Ensure there is no problem if cheapest_startup_path is NULL
 explain (costs off)
diff --git a/src/test/regress/sql/partition_join.sql 
b/src/test/regress/sql/partition_join.sql
index 26b8e3d063f..515d94878ab 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -1225,6 +1225,27 @@ SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y 
USING (id) ORDER BY x.id AS
 EXPLAIN (COSTS OFF)
 SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY x.id 
DESC LIMIT 10;
 
+--
+-- Test Append's fractional paths
+--
+
+CREATE INDEX pht1_c_idx ON pht1(c);
+-- SeqScan might be the best choice if we need one single tuple
+EXPLAIN (COSTS OFF) SELECT * FROM pht1 p1 JOIN pht1 p2 USING (c) LIMIT 1;
+-- Increase number of tuples requested and an IndexScan will be chosen
+EXPLAIN (COSTS OFF) SELECT * FROM pht1 p1 JOIN pht1 p2 USING (c) LIMIT 100;
+-- If the almost all the data should be fetched - prefer SeqScan
+EXPLAIN (COSTS OFF) SELECT * FROM pht1 p1 JOIN pht1 p2 USING (c) LIMIT 1000;
+
+SET max_parallel_workers_per_gather = 1;
+SET debug_parallel_query = on;
+-- Partial paths also should be smart enough to employ limits
+EXPLAIN (COSTS OFF) SELECT * FROM pht1 p1 JOIN pht1 p2 USING (c) LIMIT 100;
+RESET debug_parallel_query;
+
+-- Remove indexes from the partitioned table and its partitions
+DROP INDEX pht1_c_idx CASCADE;
+
 -- cleanup
 DROP TABLE fract_t;
 
-- 
2.48.1

Reply via email to