Re: make add_paths_to_append_rel aware of startup cost
On Fri, 16 Feb 2024 at 01:09, David Rowley wrote: > > On Thu, 15 Feb 2024 at 21:42, Andy Fan wrote: > > I found the both plans have the same cost, I can't get the accurate > > cause of this after some hours research, but it is pretty similar with > > 7516056c584e3, so I uses a similar strategy to stable it. is it > > acceptable? > > It's pretty hard to say. I can only guess why this test would be > flapping like this. I see it's happened before on mylodon, so probably > not a cosmic ray. It's not like add_path() chooses a random path when > the costs are the same, so I wondered if something similar is going on > here that was going on that led to f03a9ca4. In particular, see [1]. While it's not conclusive proof, the following demonstrates that relpages dropping by just 1 page causes the join order to change. regression=# explain regression-# select t1.unique1 from tenk1 t1 regression-# inner join tenk2 t2 on t1.tenthous = t2.tenthous regression-# union all regression-# (values(1)) limit 1; QUERY PLAN -- Limit (cost=0.00..150.08 rows=1 width=4) -> Append (cost=0.00..1500965.01 rows=10001 width=4) -> Nested Loop (cost=0.00..1500915.00 rows=1 width=4) Join Filter: (t1.tenthous = t2.tenthous) -> Seq Scan on tenk1 t1 (cost=0.00..445.00 rows=1 width=8) -> Materialize (cost=0.00..495.00 rows=1 width=4) -> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=1 width=4) -> Result (cost=0.00..0.01 rows=1 width=4) regression=# update pg_class set relpages=relpages - 1 where relname = 'tenk2'; UPDATE 1 regression=# explain regression-# select t1.unique1 from tenk1 t1 regression-# inner join tenk2 t2 on t1.tenthous = t2.tenthous regression-# union all regression-# (values(1)) limit 1; QUERY PLAN -- Limit (cost=0.00..150.52 rows=1 width=4) -> Append (cost=0.00..1505315.30 rows=10001 width=4) -> Nested Loop (cost=0.00..1505265.29 rows=1 width=4) Join Filter: (t1.tenthous = t2.tenthous) -> Seq Scan on tenk2 t2 (cost=0.00..445.29 rows=10029 width=4) -> Materialize (cost=0.00..495.00 rows=1 width=8) -> Seq Scan on tenk1 t1 (cost=0.00..445.00 rows=1 width=8) -> Result (cost=0.00..0.01 rows=1 width=4) I tried this with the proposed changes to the test and the plan did not change. I've pushed the change now. David
Re: make add_paths_to_append_rel aware of startup cost
David Rowley writes: > I'd be more happy using this one as percentage-wise, the cost > difference is much larger. +1 for the percentage-wise. > > I checked that the t2.thounsand = 0 query still tests the cheap > startup paths in add_paths_to_append_rel(). I get the same conclusion here. > > So, if nobody has any better ideas, I'm just going to push the " and > t2.thousand = 0" adjustment. LGTM. -- Best Regards Andy Fan
Re: make add_paths_to_append_rel aware of startup cost
On Thu, 15 Feb 2024 at 21:42, Andy Fan wrote: > I found the both plans have the same cost, I can't get the accurate > cause of this after some hours research, but it is pretty similar with > 7516056c584e3, so I uses a similar strategy to stable it. is it > acceptable? It's pretty hard to say. I can only guess why this test would be flapping like this. I see it's happened before on mylodon, so probably not a cosmic ray. It's not like add_path() chooses a random path when the costs are the same, so I wondered if something similar is going on here that was going on that led to f03a9ca4. In particular, see [1]. On master, I set a breakpoint in try_nestloop_path() to break on "outerrel->relid==1 && innerrel->relid==2". I see the total Nested Loop cost comes out the same with the join order reversed. Which is: -> Nested Loop (cost=0.00..1500915.00 rows=1 width=4) Doing the same with your patch applied, I get: -> Nested Loop (cost=0.00..600925.00 rows=4000 width=4) and forcing the join order to swap with the debugger, I see: -> Nested Loop (cost=0.00..600940.00 rows=4000 width=4) So there's a difference now, but it's quite small. If it was a problem like we had on [1], then since tenk1 and tenk2 have 345 pages (on my machine), if relpages is down 1 or 2 pages, we'll likely get more of a costing difference than 600925 vs 600940. If I use: explain select t1.unique1 from tenk1 t1 inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0 union all (values(1)) limit 1; I get: -> Nested Loop (cost=0.00..2415.03 rows=10 width=4) and with the join order reversed, I get: -> Nested Loop (cost=0.00..2440.00 rows=10 width=4) I'd be more happy using this one as percentage-wise, the cost difference is much larger. I don't quite have the will to go through proving what the actual problem is here. I think [1] already proved the relpages problem can (or could) happen. I checked that the t2.thounsand = 0 query still tests the cheap startup paths in add_paths_to_append_rel() and it does. If I flip startup_subpaths_valid to false in the debugger, the plan flips to: QUERY PLAN --- Limit (cost=470.12..514.00 rows=1 width=4) -> Append (cost=470.12..952.79 rows=11 width=4) -> Hash Join (cost=470.12..952.73 rows=10 width=4) Hash Cond: (t1.tenthous = t2.tenthous) -> Seq Scan on tenk1 t1 (cost=0.00..445.00 rows=1 width=8) -> Hash (cost=470.00..470.00 rows=10 width=4) -> Seq Scan on tenk2 t2 (cost=0.00..470.00 rows=10 width=4) Filter: (thousand = 0) -> Result (cost=0.00..0.01 rows=1 width=4) So, if nobody has any better ideas, I'm just going to push the " and t2.thousand = 0" adjustment. David [1] https://www.postgresql.org/message-id/4174.1563239552%40sss.pgh.pa.us
Re: make add_paths_to_append_rel aware of startup cost
Andy Fan writes: > Thomas Munro writes: > >> On Thu, Oct 5, 2023 at 9:07 PM David Rowley wrote: >>> Thanks. Pushed. >> >> FYI somehow this plan from a8a968a8212e flipped in this run: >> >> === dumping >> /home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/regression.diffs >> === >> diff -U3 >> /home/bf/bf-build/mylodon/HEAD/pgsql/src/test/regress/expected/union.out >> /home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/union.out >> --- /home/bf/bf-build/mylodon/HEAD/pgsql/src/test/regress/expected/union.out >> 2024-01-15 00:31:13.947555940 + >> +++ >> /home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/union.out >> 2024-02-14 00:06:17.075584839 + >> @@ -1447,9 +1447,9 @@ >> -> Append >> -> Nested Loop >> Join Filter: (t1.tenthous = t2.tenthous) >> - -> Seq Scan on tenk1 t1 >> + -> Seq Scan on tenk2 t2 >> -> Materialize >> - -> Seq Scan on tenk2 t2 >> + -> Seq Scan on tenk1 t1 >> -> Result >> (8 rows) >> >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon=2024-02-14%2000%3A01%3A03 > > Thanks for this information! I will take a look at this. I found the both plans have the same cost, I can't get the accurate cause of this after some hours research, but it is pretty similar with 7516056c584e3, so I uses a similar strategy to stable it. is it acceptable? -- Best Regards Andy Fan >From 01c2b5ae76a4493f33b894afdd4a8d3ce31e1a3e Mon Sep 17 00:00:00 2001 From: "yizhi.fzh" Date: Thu, 15 Feb 2024 16:29:30 +0800 Subject: [PATCH v1 1/1] Attempt to stabilize new unionall + limit regression test This test was recently added in a8a968a8212e, and some times It appears to be unstable in regards to the join order presumably due to the relations at either side of the join being equal in side. Here we add a qual to make one of them smaller so the planner is more likely to choose to material the smaller of the two. Reported-by: Thomas Munro Author: Andy Fan Discussion: https://postgr.es/m/CA%2BhUKGLqC-NobKYfjxNM3Gexv9OJ-Fhvy9bugUcXsZjTqH7W%3DQ%40mail.gmail.com#88e6420b59a863403b9b67a0128fdacc --- src/test/regress/expected/union.out | 11 ++- src/test/regress/sql/union.sql | 4 ++-- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out index 73e320bad4..0e527b65b3 100644 --- a/src/test/regress/expected/union.out +++ b/src/test/regress/expected/union.out @@ -1438,8 +1438,8 @@ where (x = 0) or (q1 >= q2 and q1 <= q2); -- Ensure we get a Nested Loop join between tenk1 and tenk2 explain (costs off) select t1.unique1 from tenk1 t1 -inner join tenk2 t2 on t1.tenthous = t2.tenthous - union all +inner join tenk2 t2 on t1.tenthous = t2.tenthous and t1.ten > 5 +union all (values(1)) limit 1; QUERY PLAN @@ -1447,11 +1447,12 @@ inner join tenk2 t2 on t1.tenthous = t2.tenthous -> Append -> Nested Loop Join Filter: (t1.tenthous = t2.tenthous) - -> Seq Scan on tenk1 t1 + -> Seq Scan on tenk2 t2 -> Materialize - -> Seq Scan on tenk2 t2 + -> Seq Scan on tenk1 t1 + Filter: (ten > 5) -> Result -(8 rows) +(9 rows) -- Ensure there is no problem if cheapest_startup_path is NULL explain (costs off) diff --git a/src/test/regress/sql/union.sql b/src/test/regress/sql/union.sql index 6c509ac80c..13ba9f21ad 100644 --- a/src/test/regress/sql/union.sql +++ b/src/test/regress/sql/union.sql @@ -548,8 +548,8 @@ where (x = 0) or (q1 >= q2 and q1 <= q2); -- Ensure we get a Nested Loop join between tenk1 and tenk2 explain (costs off) select t1.unique1 from tenk1 t1 -inner join tenk2 t2 on t1.tenthous = t2.tenthous - union all +inner join tenk2 t2 on t1.tenthous = t2.tenthous and t1.ten > 5 +union all (values(1)) limit 1; -- Ensure there is no problem if cheapest_startup_path is NULL -- 2.34.1
Re: make add_paths_to_append_rel aware of startup cost
Thomas Munro writes: > On Thu, Oct 5, 2023 at 9:07 PM David Rowley wrote: >> Thanks. Pushed. > > FYI somehow this plan from a8a968a8212e flipped in this run: > > === dumping > /home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/regression.diffs > === > diff -U3 > /home/bf/bf-build/mylodon/HEAD/pgsql/src/test/regress/expected/union.out > /home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/union.out > --- /home/bf/bf-build/mylodon/HEAD/pgsql/src/test/regress/expected/union.out > 2024-01-15 00:31:13.947555940 + > +++ > /home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/union.out > 2024-02-14 00:06:17.075584839 + > @@ -1447,9 +1447,9 @@ > -> Append > -> Nested Loop > Join Filter: (t1.tenthous = t2.tenthous) > - -> Seq Scan on tenk1 t1 > + -> Seq Scan on tenk2 t2 > -> Materialize > - -> Seq Scan on tenk2 t2 > + -> Seq Scan on tenk1 t1 > -> Result > (8 rows) > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon=2024-02-14%2000%3A01%3A03 Thanks for this information! I will take a look at this. -- Best Regards Andy Fan
Re: make add_paths_to_append_rel aware of startup cost
On Thu, Oct 5, 2023 at 9:07 PM David Rowley wrote: > Thanks. Pushed. FYI somehow this plan from a8a968a8212e flipped in this run: === dumping /home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/regression.diffs === diff -U3 /home/bf/bf-build/mylodon/HEAD/pgsql/src/test/regress/expected/union.out /home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/union.out --- /home/bf/bf-build/mylodon/HEAD/pgsql/src/test/regress/expected/union.out 2024-01-15 00:31:13.947555940 + +++ /home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/union.out 2024-02-14 00:06:17.075584839 + @@ -1447,9 +1447,9 @@ -> Append -> Nested Loop Join Filter: (t1.tenthous = t2.tenthous) - -> Seq Scan on tenk1 t1 + -> Seq Scan on tenk2 t2 -> Materialize - -> Seq Scan on tenk2 t2 + -> Seq Scan on tenk1 t1 -> Result (8 rows) https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon=2024-02-14%2000%3A01%3A03
Re: make add_paths_to_append_rel aware of startup cost
On Thu, 5 Oct 2023 at 14:11, Andy Fan wrote: > Patch LGTM. Thanks. Pushed. David
Re: make add_paths_to_append_rel aware of startup cost
On Wed, Oct 4, 2023 at 8:41 AM David Rowley wrote: > On Sun, 1 Oct 2023 at 21:26, Andy Fan wrote: > >> But overall, I'm more inclined to just go with the more simple "add a > >> cheap unordered startup append path if considering cheap startup > >> plans" version. I see your latest patch does both. So, I'd suggest two > >> patches as I do see the merit in keeping this simple and cheap. If we > >> can get the first part in and you still find cases where you're not > >> getting the most appropriate startup plan based on the tuple fraction, > >> then we can reconsider what extra complexity we should endure in the > >> code based on the example query where we've demonstrated the planner > >> is not choosing the best startup path appropriate to the given tuple > >> fraction. > > > > I think this is a fair point, I agree that your first part is good > enough to be > > committed first. Actually I tried a lot to make a test case which can > prove > > the value of cheapest fractional cost but no gain so far:( > > I've attached a patch with the same code as the previous patch but > this time including a regression test. > > I see no reason to not commit this so if anyone feels differently > please let me know. > > David > Patch LGTM. -- Best Regards Andy Fan
Re: make add_paths_to_append_rel aware of startup cost
On Sun, 1 Oct 2023 at 21:26, Andy Fan wrote: >> But overall, I'm more inclined to just go with the more simple "add a >> cheap unordered startup append path if considering cheap startup >> plans" version. I see your latest patch does both. So, I'd suggest two >> patches as I do see the merit in keeping this simple and cheap. If we >> can get the first part in and you still find cases where you're not >> getting the most appropriate startup plan based on the tuple fraction, >> then we can reconsider what extra complexity we should endure in the >> code based on the example query where we've demonstrated the planner >> is not choosing the best startup path appropriate to the given tuple >> fraction. > > I think this is a fair point, I agree that your first part is good enough to > be > committed first. Actually I tried a lot to make a test case which can prove > the value of cheapest fractional cost but no gain so far:( I've attached a patch with the same code as the previous patch but this time including a regression test. I see no reason to not commit this so if anyone feels differently please let me know. David consider_cheapest_startup_appendpath_v2.patch Description: Binary data
Re: make add_paths_to_append_rel aware of startup cost
Hi David, But overall, I'm more inclined to just go with the more simple "add a > cheap unordered startup append path if considering cheap startup > plans" version. I see your latest patch does both. So, I'd suggest two > patches as I do see the merit in keeping this simple and cheap. If we > can get the first part in and you still find cases where you're not > getting the most appropriate startup plan based on the tuple fraction, > then we can reconsider what extra complexity we should endure in the > code based on the example query where we've demonstrated the planner > is not choosing the best startup path appropriate to the given tuple > fraction. > I think this is a fair point, I agree that your first part is good enough to be committed first. Actually I tried a lot to make a test case which can prove the value of cheapest fractional cost but no gain so far:( -- Best Regards Andy Fan
Re: make add_paths_to_append_rel aware of startup cost
On Mon, 18 Sept 2023 at 22:55, Andy Fan wrote: > Here is an updated version to show what's in my mind. My current thoughts on this are that the fractional cost part adds quite a bit more complexity than the minimalistic approach of just also considering the cheapest startup path. There's also quite a bit I don't like about the extra code you've added: 1. RelOptInfo.tuple_fraction is not given a default value in locations where we do makeNode(RelOptInfo); 2. This is very poorly documented and badly named. Also seems to have a typo "stopper" + /* Like order by, group by, distinct and so. */ + bool has_stoper_op; With that, nobody has a hope of knowing if some new operation should set that value to true or false. I think it needs to define the meaning, which I think (very roughly) is "does the query require any additional upper-planner operations which could require having to read more tuples from the final join relation than the number of tuples which are read from the final upper rel." 3. get_fractional_path_cost() goes to the trouble of setting total_rows then does not use it. 4. I don't see why it's ok to take the total_rows from the first Path in the list in get_cheapest_fractional_path_ext(). What if another Path has some other value? But overall, I'm more inclined to just go with the more simple "add a cheap unordered startup append path if considering cheap startup plans" version. I see your latest patch does both. So, I'd suggest two patches as I do see the merit in keeping this simple and cheap. If we can get the first part in and you still find cases where you're not getting the most appropriate startup plan based on the tuple fraction, then we can reconsider what extra complexity we should endure in the code based on the example query where we've demonstrated the planner is not choosing the best startup path appropriate to the given tuple fraction. David
Re: make add_paths_to_append_rel aware of startup cost
Hi, > What do you think about this in my last reply? "If my above > analysis is correct, I think the best way to handle this is if there > is no more tables to join, we use cheapest fraction cost for all > the kinds of relations, including plain relation, append rel, > subquery and so on, If we have more tables to join, we use > cheapest startup cost.". This is what is in my mind now. > > Here is an updated version to show what's in my mind. -- Best Regards Andy Fan v3-0001-make-add_paths_to_append_rel-aware-of-startup_cos.patch Description: Binary data
Re: make add_paths_to_append_rel aware of startup cost
On Mon, Sep 18, 2023 at 11:58 AM David Rowley wrote: > On Mon, 18 Sept 2023 at 01:42, Andy Fan wrote: > > On Fri, Sep 15, 2023 at 3:15 PM David Rowley > wrote: > >> Instead of doing that, why don't you just create a completely new > >> AppendPath containing all the cheapest_startup_paths and add that to > >> the append rel. You can optimise this and only do it when > >> rel->consider_startup is true. > >> > >> Does the attached do anything less than what your patch does? > > > > > > We can work like this, but there is one difference from what > > my current patch does. It is cheapest_startup_path vs cheapest > > fraction path. For example if we have the following 3 paths with > > all of the estimated rows is 100 and the tuple_fraction is 10. > > Yeah, it's true that the version I wrote didn't consider the > fractional part, but I didn't really see it as worse than what you > did. It looks like you're assuming that every append child will have the same number of tuples read from it, but it seems to me that it > would only be valid to use the fractional part for the first child. The path chosen for subsequent child paths would, if done correctly, > need to account for the estimated rows from the previous child paths. > Actually this is consistent with what generate_union_paths does now. /* * If plain UNION, tell children to fetch all tuples. * * Note: in UNION ALL, we pass the top-level tuple_fraction unmodified to * each arm of the UNION ALL. One could make a case for reducing the * tuple fraction for later arms (discounting by the expected size of the * earlier arms' results) but it seems not worth the trouble. The normal * case where tuple_fraction isn't already zero is a LIMIT at top level, * and passing it down as-is is usually enough to get the desired result * of preferring fast-start plans. */ if (!op->all) root->tuple_fraction = 0.0; UNION ALL is pretty like append rel. It's not valid here to copy the code in generate_orderedappend_paths() > as MergeAppend won't necessarily exhaust the first child subpath first > like Append will. > Not sure which code you are referring to, but the code I refer to much is generate_union_paths and set_subquery_pathlist. > > Path 1: startup_cost = 60, total_cost = 80 -- cheapest total cost. > > Path 2: startup_cost = 10, total_cost = 1000 -- cheapest startup cost > > Path 3: startup_cost = 20, total_cost = 90 -- cheapest fractional cost > > > > So with the patch you propose, Path 1 & Path 3 are chosen to build > > append path. but with my current patch, Only path 3 is kept. IIUC, > > path 3 should be the best one in this case. > > I assume you mean mine would build AppendPaths for 1+2, not 1+3. Yes, it should be 1+2. > > You mentioned: > > > I just find it is hard to build the case for Identifier 2/3/4. > > I wonder if this is because generate_orderedappend_paths() handles > startup paths and most cases will that need a cheap startup plan will > require some sort of pathkeys. > Probably yes. > The example you mentioned of: > > (select * from tenk1 order by hundred) > UNION ALL > (select * from tenk1 order by hundred) > limit 3; > > I don't find this to be a compellingly real-world case. The planner > is under no obligation to output rows from the 1st branch of the UNION > ALL before the 2nd one. If the user cared about that then they'd have > instead added a top-level ORDER BY, in which case the planner seems > happy to use the index scan: > Sorry about the test case, here is the one with more compelling real-world. with s1 as (select * from tenk1 join tenk2 using (hundred)), s2 as (select * from tenk1 join tenk2 using (hundred)) select * from s1 union all select * from s2 limit 3; It would be good if you could provide a bit more detail on the cases > you want to improve here. For example, if your #4 case, you have > "WHERE xxx". I don't know if "xxx" is just a base qual or if there's a > correlation to the outer query in there. > for the #4, the quickest test case is select * from tenk1 where exists ( with s1 as (select * from tenk1 join tenk2 using (hundred)), s2 as (select * from tenk1 join tenk2 using (hundred)) select * from s1 union all select * from s2 where random() > 0.4); random() is used to make it can't be pull-up. and exists implies LIMIT 1; Another concern I have with your patch is that it seems to be under > the impression that there being further joins to evaluate at this > query level is the only reason that we would have to pull more than > the tuple fraction number of rows from the query. What gives you the > confidence that's the only reason we may want to pull more than the > tuple fraction of tuples from the append child? > I think you are talking about something like ORDER BY, GROUP BY clause, I do overlook it. but if we admit cheapest fractional cost is a right factor to consider, this issue is not unresolvable since parse is at hand. At last, you ignore the part of
Re: make add_paths_to_append_rel aware of startup cost
On Mon, 18 Sept 2023 at 01:42, Andy Fan wrote: > On Fri, Sep 15, 2023 at 3:15 PM David Rowley wrote: >> Instead of doing that, why don't you just create a completely new >> AppendPath containing all the cheapest_startup_paths and add that to >> the append rel. You can optimise this and only do it when >> rel->consider_startup is true. >> >> Does the attached do anything less than what your patch does? > > > We can work like this, but there is one difference from what > my current patch does. It is cheapest_startup_path vs cheapest > fraction path. For example if we have the following 3 paths with > all of the estimated rows is 100 and the tuple_fraction is 10. Yeah, it's true that the version I wrote didn't consider the fractional part, but I didn't really see it as worse than what you did. It looks like you're assuming that every append child will have the same number of tuples read from it, but it seems to me that it would only be valid to use the fractional part for the first child. The path chosen for subsequent child paths would, if done correctly, need to account for the estimated rows from the previous child paths. It's not valid here to copy the code in generate_orderedappend_paths() as MergeAppend won't necessarily exhaust the first child subpath first like Append will. > Path 1: startup_cost = 60, total_cost = 80 -- cheapest total cost. > Path 2: startup_cost = 10, total_cost = 1000 -- cheapest startup cost > Path 3: startup_cost = 20, total_cost = 90 -- cheapest fractional cost > > So with the patch you propose, Path 1 & Path 3 are chosen to build > append path. but with my current patch, Only path 3 is kept. IIUC, > path 3 should be the best one in this case. I assume you mean mine would build AppendPaths for 1+2, not 1+3. You mentioned: > I just find it is hard to build the case for Identifier 2/3/4. I wonder if this is because generate_orderedappend_paths() handles startup paths and most cases will that need a cheap startup plan will require some sort of pathkeys. The example you mentioned of: (select * from tenk1 order by hundred) UNION ALL (select * from tenk1 order by hundred) limit 3; I don't find this to be a compellingly real-world case. The planner is under no obligation to output rows from the 1st branch of the UNION ALL before the 2nd one. If the user cared about that then they'd have instead added a top-level ORDER BY, in which case the planner seems happy to use the index scan: regression=# explain (costs off) (select * from tenk1) UNION ALL (select * from tenk1) order by hundred limit 3; QUERY PLAN - Limit -> Merge Append Sort Key: tenk1.hundred -> Index Scan using tenk1_hundred on tenk1 -> Index Scan using tenk1_hundred on tenk1 tenk1_1 It would be good if you could provide a bit more detail on the cases you want to improve here. For example, if your #4 case, you have "WHERE xxx". I don't know if "xxx" is just a base qual or if there's a correlation to the outer query in there. Another concern I have with your patch is that it seems to be under the impression that there being further joins to evaluate at this query level is the only reason that we would have to pull more than the tuple fraction number of rows from the query. What gives you the confidence that's the only reason we may want to pull more than the tuple fraction of tuples from the append child? David
Re: make add_paths_to_append_rel aware of startup cost
Hi David, Thanks for taking a look at this! On Fri, Sep 15, 2023 at 3:15 PM David Rowley wrote: > On Thu, 7 Sept 2023 at 04:37, Andy Fan wrote: > > Currently add_paths_to_append_rel overlooked the startup cost for > creating > > append path, so it may have lost some optimization chances. After a > glance, > > the following 4 identifiers can be impacted. > > > - We shouldn't do the optimization if there are still more tables to > join, > > the reason is similar to has_multiple_baserels(root) in > > set_subquery_pathlist. But the trouble here is we may inherit multiple > > levels to build an appendrel, so I have to keep the 'top_relids' all > the > > time and compare it with PlannerInfo.all_baserels. If they are the > same, > > then it is the case we want to optimize. > > I think you've likely gone to the trouble of trying to determine if > there are any joins pending because you're considering using a cheap > startup path *instead* of the cheapest total path and you don't want > to do that when some join will cause all the rows to be read thus > making the plan more expensive if a cheap startup path was picked. > Yes, that's true. However it is not something we can't resolve, one of the solutions is just like what I did in the patch. but currently the main stuff which confuses me is if it is the right thing to give up the optimization if it has more tables to join (just like set_subquery_pathlist did). > Instead of doing that, why don't you just create a completely new > AppendPath containing all the cheapest_startup_paths and add that to > the append rel. You can optimise this and only do it when > rel->consider_startup is true. > > Does the attached do anything less than what your patch does? > We can work like this, but there is one difference from what my current patch does. It is cheapest_startup_path vs cheapest fraction path. For example if we have the following 3 paths with all of the estimated rows is 100 and the tuple_fraction is 10. Path 1: startup_cost = 60, total_cost = 80 -- cheapest total cost. Path 2: startup_cost = 10, total_cost = 1000 -- cheapest startup cost Path 3: startup_cost = 20, total_cost = 90 -- cheapest fractional cost So with the patch you propose, Path 1 & Path 3 are chosen to build append path. but with my current patch, Only path 3 is kept. IIUC, path 3 should be the best one in this case. We might also argue why Path 3 is kept in the first place (the children level), I think pathkey might be one option. and even path 3 is discarded somehow, I think only if it is the best one, we should keep it ideally. Another tiny factor of this is your propose isn't consistent with what set_subquery_pathlist which uses cheapest fractional cost and my proposal isn't consistent plain rel which uses cheapest startup cost. We can't say which one is better, though. If my above analysis is correct, I think the best way to handle this is if there is no more tables to join, we use cheapest fraction cost for all the kinds of relations, including plain relation, append rel, subquery and so on. If we have more tables to join, we use cheapest startup cost. On the implementation side, I want to use RelOptInfo.tuple_fraction instead of RelOptInfo.consider_startup. tuple_fraction = -1 means startup cost should not be considered. tuple_fraction = 0 means cheapest startup cost should be used. tuple_franction > 0 means cheapest fraction cost should be used. I still don't pay enough attention to consider_param_startup in RelOptInfo, I'm feeling the above strategy will not generate too much overhead to the planner for now while it can provides a better plan sometimes. -- Best Regards Andy Fan
Re: make add_paths_to_append_rel aware of startup cost
On Thu, 7 Sept 2023 at 04:37, Andy Fan wrote: > Currently add_paths_to_append_rel overlooked the startup cost for creating > append path, so it may have lost some optimization chances. After a glance, > the following 4 identifiers can be impacted. > - We shouldn't do the optimization if there are still more tables to join, > the reason is similar to has_multiple_baserels(root) in > set_subquery_pathlist. But the trouble here is we may inherit multiple > levels to build an appendrel, so I have to keep the 'top_relids' all the > time and compare it with PlannerInfo.all_baserels. If they are the same, > then it is the case we want to optimize. I think you've likely gone to the trouble of trying to determine if there are any joins pending because you're considering using a cheap startup path *instead* of the cheapest total path and you don't want to do that when some join will cause all the rows to be read thus making the plan more expensive if a cheap startup path was picked. Instead of doing that, why don't you just create a completely new AppendPath containing all the cheapest_startup_paths and add that to the append rel. You can optimise this and only do it when rel->consider_startup is true. Does the attached do anything less than what your patch does? David consider_cheapest_startup_appendpath.patch Description: Binary data
Re: make add_paths_to_append_rel aware of startup cost
> - We shouldn't do the optimization if there are still more tables to join, > the reason is similar to has_multiple_baserels(root) in > set_subquery_pathlist. > After some internal discussion, we have 2 different choices here. Let's call the current choice as method-a, and the new choice as method-b. Method-b will just ignore the "no more tables to join "limitation and build the append path with both cheapest startup cost and cheapest total cost, this is pretty like the method we joins a plain relation with another relation. The uneasy part is it is the cheapest start up cost rather than the cheapest fractional cost. method-a is pretty same as what set_subquery_pathlist is doing, which has a limitation on "no more tables to join" and has no the "cheapest startup cost" part. Ideally we can apply both strategies if we don't consider the effort. If there are no more tables to join, we use method-a. otherwise use method-b. With this thinking, we can even apply the same strategy to plain relations as well. However, I am not sure if the "cheapest startup cost" is a real problem. If it is not, we can apply method-b directly and modify set_subquery_pathlist to do the same for consistency. -- Best Regards Andy Fan
make add_paths_to_append_rel aware of startup cost
Hi: This thread is a refactor of thread [1] for easier communication. Currently add_paths_to_append_rel overlooked the startup cost for creating append path, so it may have lost some optimization chances. After a glance, the following 4 identifiers can be impacted. Identifier 1: SELECT .. FROM v1 UNION ALL SELECT .. FROM v2 LIMIT 3; Identifier 2: SELECT * FROM p .. LIMIT 3; p is a partitioned table. Identifier 3: SELECT * FROM p JOIN q using (partkey) LIMIT 3; If we did the partition-wise-join, then we lost the chances for a better plan. Identifier 4: -- EXISTS implies LIMIT 1; SELECT * FROM foo WHERE EXISTS (SELECT 1 FROM append_rel_v_not_pullup_able WHERE xxx); However, after I completed my patch and wanted to build some real queries to prove my idea, I just find it is hard to build the case for Identifier 2/3/4. But the Improvement for Identifier 1 is easy and my real user case in work is Identifier 1 as well. So a patch is attached for this case, it will use fractional costs rather than total costs if needed. The following items needs more attention during development. - We shouldn't do the optimization if there are still more tables to join, the reason is similar to has_multiple_baserels(root) in set_subquery_pathlist. But the trouble here is we may inherit multiple levels to build an appendrel, so I have to keep the 'top_relids' all the time and compare it with PlannerInfo.all_baserels. If they are the same, then it is the case we want to optimize. - add_partial_path doesn't consider the startup costs by design, I didn't rethink too much about this, but the idea of "choose a path which let each worker produces the top-N tuples as fast as possible" looks reasonable, and even though add_partial_path doesn't consider startup cost, it is still possible that childrel keeps more than 1 partial paths due to any reasons except startup_cost, for example PathKey. then we still have chances to choose the cheapest fractional path among them. The same strategy also applies to mixed partial and non-partial append paths. - Due to the complexity of add_paths_to_append_rel, 3 arguments have to be added to get_cheapest_fractional_path... Path * get_cheapest_fractional_path(RelOptInfo *rel, double tuple_fraction, bool allow_parameterized, bool look_partial, bool must_parallel_safe) Cases can be improved. Here is the simplest test case, but it will not be hard to provide more cases for Identifier 1. (select * from tenk1 order by hundred) UNION ALL (select * from tenk1 order by hundred) limit 3; master: 8.096ms. patched: 0.204ms. The below user case should be more reasonable for real business. with a as (select * from t1 join t2..), b as (select * from t1 join t3 ..) select * from a union all select * from b limit 3; The patch would also have impacts on identifier 2/3/4, even though I can't make a demo sql which can get benefits from this patch, I also added some test cases for code coverage purposes. Any feedback is welcome! [1] https://www.postgresql.org/message-id/flat/CAKU4AWqEnzhUTxopVhENC3vs6NnYV32+e6GSBtp1rAv0ZNX=m...@mail.gmail.com -- Best Regards Andy Fan v1-0001-make-add_paths_to_append_rel-aware-of-startup-cos.patch Description: Binary data