Re: ModifyTable overheads in generic plans
On Wed, Apr 7, 2021 at 5:18 PM Amit Langote wrote: > On Wed, Apr 7, 2021 at 8:24 AM Tom Lane wrote: > > I also could not get excited about postponing initialization of RETURNING > > or WITH CHECK OPTIONS expressions. I grant that that can be helpful > > when those features are used, but I doubt that RETURNING is used that > > heavily, and WITH CHECK OPTIONS is surely next door to nonexistent > > in performance-critical queries. If the feature isn't used, the cost > > of the existing code is about zero. So I couldn't see that it was worth > > the amount of code thrashing and risk of new bugs involved. > > Okay. > > > The bit you > > noted about EXPLAIN missing a subplan is pretty scary in this connection; > > I'm not at all sure that that's just cosmetic. > > Yeah, this and... I looked into this and can't see why this isn't just cosmetic as far as ModifyTable is concerned. "EXPLAIN missing a subplan" here just means that ModifyTableState.PlanState.subPlan is not set. Besides ExplainNode(), only ExecReScan() looks at PlanState.subPlan, and that does not seem relevant to ModifyTable, because it doesn't support rescanning. I don't see any such problems with creating RETURNING projections on-demand either. > > (Having said that, I'm wondering if there are bugs in these cases for > > cross-partition updates that target a previously-not-used partition. > > So we might have things to fix anyway.) > > ...this would need to be looked at a bit more closely, which I'll try > to do sometime later this week. Given the above, I can't think of any undiscovered problems related to WCO and RETURNING expression states in the cases where cross-partition updates target partitions that need to be initialized by ExecInitPartitionInfo(). Here is the result for the test case in updatable_views.sql modified to use partitioning and cross-partition updates: CREATE TABLE base_tbl (a int) partition by range (a); CREATE TABLE base_tbl1 PARTITION OF base_tbl FOR VALUES FROM (1) TO (6); CREATE TABLE base_tbl2 PARTITION OF base_tbl FOR VALUES FROM (6) TO (11); CREATE TABLE base_tbl3 PARTITION OF base_tbl FOR VALUES FROM (11) TO (15); CREATE TABLE ref_tbl (a int PRIMARY KEY); INSERT INTO ref_tbl SELECT * FROM generate_series(1,10); CREATE VIEW rw_view1 AS SELECT * FROM base_tbl b WHERE EXISTS(SELECT 1 FROM ref_tbl r WHERE r.a = b.a) WITH CHECK OPTION; INSERT INTO rw_view1 VALUES (1); INSERT 0 1 INSERT INTO rw_view1 VALUES (11); ERROR: new row violates check option for view "rw_view1" DETAIL: Failing row contains (11). -- Both are cross-partition updates where the target relation is -- lazily initialized in ExecInitPartitionInfo(), along with the WCO -- qual ExprState UPDATE rw_view1 SET a = a + 5 WHERE a = 1; UPDATE 1 UPDATE rw_view1 SET a = a + 5 WHERE a = 6; ERROR: new row violates check option for view "rw_view1" DETAIL: Failing row contains (11). EXPLAIN (costs off) INSERT INTO rw_view1 VALUES (5); QUERY PLAN -- Insert on base_tbl b -> Result (2 rows) EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5 WHERE a = 1; QUERY PLAN Update on base_tbl b Update on base_tbl1 b_1 -> Nested Loop -> Index Scan using ref_tbl_pkey on ref_tbl r Index Cond: (a = 1) -> Seq Scan on base_tbl1 b_1 Filter: (a = 1) (7 rows) EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5 WHERE a = 6; QUERY PLAN Update on base_tbl b Update on base_tbl2 b_1 -> Nested Loop -> Index Scan using ref_tbl_pkey on ref_tbl r Index Cond: (a = 6) -> Seq Scan on base_tbl2 b_1 Filter: (a = 6) (7 rows) Patch attached. I tested the performance benefit of doing this by modifying the update query used in earlier benchmarks to have a RETURNING * clause, getting the following TPS numbers: -Mprepared (plan_cache_mode=force_generic_plan) nparts 10cols 20cols 40cols HEAD 64 10909 90677171 128 690356244161 256 374830562219 1024953 738 427 Patched 64 13817 13395 12754 128 927191028279 256 534552075083 1024146314431389 Also, I don't see much impact of checking if (node->returningLists) in the per-result-rel initialization loop in the common cases where there's no RETURNING. -- Amit Langote EDB: http://www.enterprisedb.com 0001-Initialize-WITH-CHECK-OPTIONS-and-RETURNING-expressi.patch Description: Binary data
Re: ModifyTable overheads in generic plans
On Thu, Apr 8, 2021 at 1:54 PM David Rowley wrote: > On Thu, 8 Apr 2021 at 15:32, Amit Langote wrote: > > There's 10-20% improvement in this case too for various partition > > counts, which really has more to do with 86dc90056 than the work done > > here. > > I'm not sure of the exact query you're running, The query is basically this: \set a random(1, 100) update test_table set b = :a where a = :a; > but I imagine the > reason that it wasn't that slow with custom plans was down to > 428b260f87. Right, 428b260f87 is certainly why we are seeing numbers this big at all. However, I was saying that 86dc90056 is what makes v14 HEAD run about 10-20% faster than *v13.2* in this benchmark. Note that inheritance_planner() in v13, which, although not as bad as it used to be in v11, is still more expensive than a single grouping_planner() call for a given query that we now get thanks to 86dc90056. > So the remaining slowness for the generic plan case with large numbers > of partitions in the plan vs custom plans plan-time pruning is a) > locking run-time pruned partitions; and; b) permission checks during > executor startup? Actually, we didn't move ahead with making the ResulRelInfos themselves lazily as I had proposed in the original patch, so ExecInitModifyTable() still builds ResultRelInfos for all partitions. Although we did move initializations of some fields of it out of ExecInitModifyTable() --- commits a1115fa0, c5b7ba4e, saving a decent amount of time spent there. We need to study closely whether initializing foreign partition's updates (direct or otherwise) lazily doesn't produce wrong semantics before we can do that and we need the ResultRelInfos to pass to those APIs. -- Amit Langote EDB: http://www.enterprisedb.com
Re: ModifyTable overheads in generic plans
On Thu, 8 Apr 2021 at 15:32, Amit Langote wrote: > There's 10-20% improvement in this case too for various partition > counts, which really has more to do with 86dc90056 than the work done > here. I'm not sure of the exact query you're running, but I imagine the reason that it wasn't that slow with custom plans was down to 428b260f87. So the remaining slowness for the generic plan case with large numbers of partitions in the plan vs custom plans plan-time pruning is a) locking run-time pruned partitions; and; b) permission checks during executor startup? Aside from the WCO and RETURNING stuff you mentioned, I mean. David
Re: ModifyTable overheads in generic plans
On Thu, Apr 8, 2021 at 3:02 AM Tom Lane wrote: > Robert Haas writes: > > On Wed, Apr 7, 2021 at 12:34 PM Tom Lane wrote: > >> Indeed, that's a pretty impressive comparison. > > > +1. That looks like a big improvement. > > > In a vacuum, you'd hope that partitioning a table would make things > > faster rather than slower, when only one partition is implicated. Or > > at least that the speed would stay about the same. And, while this is > > a lot better, we're clearly not there yet. So I hope that, in future > > releases, we can continue to find ways to whittle down the overhead. > > Note that this test case includes plan_cache_mode = force_generic_plan, > so it's deliberately kneecapping our ability to tell that "only one > partition is implicated". For the record, here are the numbers for plan_cache_mode = auto. (Actually, plancache.c always goes with custom planning for partitioned tables.) v13.2 nparts 10cols 20cols 40cols 64 13391 12140 10958 128 13436 12297 10643 256 12564 12294 10355 102411450 10600 9020 v14dev HEAD 64 14925 14648 13361 128 14379 14333 13138 256 14478 14246 13316 102412744 12621 11579 There's 10-20% improvement in this case too for various partition counts, which really has more to do with 86dc90056 than the work done here. -- Amit Langote EDB: http://www.enterprisedb.com
Re: ModifyTable overheads in generic plans
On Thu, Apr 8, 2021 at 1:34 AM Tom Lane wrote: > Amit Langote writes: > > Also, I think we should update the commentary around ri_projectNew a > > bit to make it clear that noplace beside ExecGet{Insert|Update}Tuple > > should be touching it and the associated slots. > > Hm. I pushed your comment fixes in nodeModifyTable.c, but not this > change, because it seemed to be more verbose and not really an > improvement. Why are these fields any more hands-off than any others? > Besides which, there certainly is other code touching ri_oldTupleSlot. Oops, that's right. > Anyway, I've marked the CF entry closed, because I think this is about > as far as we can get for v14. I'm not averse to revisiting the > RETURNING and WITH CHECK OPTIONS issues later, but it looks to me like > that needs more study. Sure, I will look into that. -- Amit Langote EDB: http://www.enterprisedb.com
Re: ModifyTable overheads in generic plans
Robert Haas writes: > On Wed, Apr 7, 2021 at 12:34 PM Tom Lane wrote: >> Indeed, that's a pretty impressive comparison. > +1. That looks like a big improvement. > In a vacuum, you'd hope that partitioning a table would make things > faster rather than slower, when only one partition is implicated. Or > at least that the speed would stay about the same. And, while this is > a lot better, we're clearly not there yet. So I hope that, in future > releases, we can continue to find ways to whittle down the overhead. Note that this test case includes plan_cache_mode = force_generic_plan, so it's deliberately kneecapping our ability to tell that "only one partition is implicated". I think things would often be better in production cases. No argument that there's not work left to do, though. regards, tom lane
Re: ModifyTable overheads in generic plans
On Wed, Apr 7, 2021 at 12:34 PM Tom Lane wrote: > > v13.2 > > 64 323127472217 > > 128 152812691121 > > 256 709 652 491 > > 102496 78 67 > > > v14dev HEAD > > 64 14835 14360 14563 > > 128 946996019490 > > 256 552353835268 > > 1024148214151366 > > > Clearly, we've made some very good progress here. Thanks. > > Indeed, that's a pretty impressive comparison. +1. That looks like a big improvement. In a vacuum, you'd hope that partitioning a table would make things faster rather than slower, when only one partition is implicated. Or at least that the speed would stay about the same. And, while this is a lot better, we're clearly not there yet. So I hope that, in future releases, we can continue to find ways to whittle down the overhead. -- Robert Haas EDB: http://www.enterprisedb.com
Re: ModifyTable overheads in generic plans
Amit Langote writes: > Also, I think we should update the commentary around ri_projectNew a > bit to make it clear that noplace beside ExecGet{Insert|Update}Tuple > should be touching it and the associated slots. Hm. I pushed your comment fixes in nodeModifyTable.c, but not this change, because it seemed to be more verbose and not really an improvement. Why are these fields any more hands-off than any others? Besides which, there certainly is other code touching ri_oldTupleSlot. Anyway, I've marked the CF entry closed, because I think this is about as far as we can get for v14. I'm not averse to revisiting the RETURNING and WITH CHECK OPTIONS issues later, but it looks to me like that needs more study. > I reran my usual benchmark and got the following numbers, this time > comparing v13.2 against the latest HEAD: > nparts 10cols 20cols 40cols > v13.2 > 64 323127472217 > 128 152812691121 > 256 709 652 491 > 102496 78 67 > v14dev HEAD > 64 14835 14360 14563 > 128 946996019490 > 256 552353835268 > 1024148214151366 > Clearly, we've made some very good progress here. Thanks. Indeed, that's a pretty impressive comparison. regards, tom lane
Re: ModifyTable overheads in generic plans
On Wed, Apr 7, 2021 at 8:24 AM Tom Lane wrote: > Amit Langote writes: > > On Mon, Apr 5, 2021 at 1:43 AM Tom Lane wrote: > >> OK. Do you want to pull out the bits of the patch that we can still > >> do without postponing BeginDirectModify? > > > I ended up with the attached, whereby ExecInitResultRelation() is now > > performed for all relations before calling ExecInitNode() on the > > subplan. As mentioned, I moved other per-result-rel initializations > > into the same loop that does ExecInitResultRelation(), while moving > > code related to some initializations into initialize-on-first-access > > accessor functions for the concerned fields. I chose to do that for > > ri_WIthCheckOptionExprs, ri_projectReturning, and ri_projectNew. > > I pushed the parts of this that I thought were safe and productive. Thank you. +/* + * ExecInitInsertProjection + * Do one-time initialization of projection data for INSERT tuples. + * + * INSERT queries may need a projection to filter out junk attrs in the tlist. + * + * This is "one-time" for any given result rel, but we might touch + * more than one result rel in the course of a partitioned INSERT. I don't think we need this last bit for INSERT, because the result rels for leaf partitions will never have to go through ExecInitInsertProjection(). Leaf partitions are never directly fed tuples that ExecModifyTable() extracts out of the subplan, because those tuples will have gone through the root target table's projection before being passed to tuple routing. So, if INSERTs will ever need a projection, only the partitioned table being inserted into will need to have one built for. Also, I think we should update the commentary around ri_projectNew a bit to make it clear that noplace beside ExecGet{Insert|Update}Tuple should be touching it and the associated slots. + * This is "one-time" for any given result rel, but we might touch more than + * one result rel in the course of a partitioned UPDATE, and each one needs + * its own projection due to possible column order variation. Minor quibble, but should we write it as "...in the course of an inherited UPDATE"? Attached patch contains these changes. > The business about moving the subplan tree initialization to after > calling FDWs' BeginForeignModify functions seems to me to be a > nonstarter. Existing FDWs are going to expect their scan initializations > to have been done first. I'm surprised that postgres_fdw seemed to > need only a one-line fix; it could have been far worse. The amount of > trouble that could cause is absolutely not worth it to remove one loop > over the result relations. Okay, that sounds fair. After all, we write this about 'mtstate' in the description of BeginForeignModify(), which I had failed to notice: "mtstate is the overall state of the ModifyTable plan node being executed; global data about the plan and execution state is available via this structure." > I also could not get excited about postponing initialization of RETURNING > or WITH CHECK OPTIONS expressions. I grant that that can be helpful > when those features are used, but I doubt that RETURNING is used that > heavily, and WITH CHECK OPTIONS is surely next door to nonexistent > in performance-critical queries. If the feature isn't used, the cost > of the existing code is about zero. So I couldn't see that it was worth > the amount of code thrashing and risk of new bugs involved. Okay. > The bit you > noted about EXPLAIN missing a subplan is pretty scary in this connection; > I'm not at all sure that that's just cosmetic. Yeah, this and... > (Having said that, I'm wondering if there are bugs in these cases for > cross-partition updates that target a previously-not-used partition. > So we might have things to fix anyway.) ...this would need to be looked at a bit more closely, which I'll try to do sometime later this week. > Anyway, looking at the test case you posted at the very top of this > thread, I was getting this with HEAD on Friday: > > nparts TPS > 0 12152 > 10 8672 > 100 2753 > 1000314 > > and after the two patches I just pushed, it looks like: > > 0 12105 > 10 9928 > 100 5433 > 1000938 > > So while there's certainly work left to do, that's not bad for > some low-hanging-fruit grabbing. Yes, certainly. I reran my usual benchmark and got the following numbers, this time comparing v13.2 against the latest HEAD: nparts 10cols 20cols 40cols v13.2 64 323127472217 128 152812691121 256 709 652 491 102496 78 67 v14dev HEAD 64 14835 14360 14563 128 946996019490 256 552353835268 1024148214151366 Clearly, we've made some very good progress here. Thanks. -- Amit Langote EDB: http://www.enterprisedb.com projectNew-comment-fixes.patch Description: Binary data
Re: ModifyTable overheads in generic plans
Hi, On 2021-04-06 19:24:11 -0400, Tom Lane wrote: > I also could not get excited about postponing initialization of RETURNING > or WITH CHECK OPTIONS expressions. I grant that that can be helpful > when those features are used, but I doubt that RETURNING is used that > heavily, and WITH CHECK OPTIONS is surely next door to nonexistent > in performance-critical queries. FWIW, there's a number of ORMs etc that use it on every insert (there's not really a better way to get the serial when you also want to do pipelining). > npartsTPS > 0 12152 > 108672 > 100 2753 > 1000 314 > > and after the two patches I just pushed, it looks like: > > 0 12105 > 109928 > 100 5433 > 1000 938 > > So while there's certainly work left to do, that's not bad for > some low-hanging-fruit grabbing. Nice. 3x at the upper end is pretty good. Greetings, Andres Freund
Re: ModifyTable overheads in generic plans
Amit Langote writes: > On Mon, Apr 5, 2021 at 1:43 AM Tom Lane wrote: >> OK. Do you want to pull out the bits of the patch that we can still >> do without postponing BeginDirectModify? > I ended up with the attached, whereby ExecInitResultRelation() is now > performed for all relations before calling ExecInitNode() on the > subplan. As mentioned, I moved other per-result-rel initializations > into the same loop that does ExecInitResultRelation(), while moving > code related to some initializations into initialize-on-first-access > accessor functions for the concerned fields. I chose to do that for > ri_WIthCheckOptionExprs, ri_projectReturning, and ri_projectNew. I pushed the parts of this that I thought were safe and productive. The business about moving the subplan tree initialization to after calling FDWs' BeginForeignModify functions seems to me to be a nonstarter. Existing FDWs are going to expect their scan initializations to have been done first. I'm surprised that postgres_fdw seemed to need only a one-line fix; it could have been far worse. The amount of trouble that could cause is absolutely not worth it to remove one loop over the result relations. I also could not get excited about postponing initialization of RETURNING or WITH CHECK OPTIONS expressions. I grant that that can be helpful when those features are used, but I doubt that RETURNING is used that heavily, and WITH CHECK OPTIONS is surely next door to nonexistent in performance-critical queries. If the feature isn't used, the cost of the existing code is about zero. So I couldn't see that it was worth the amount of code thrashing and risk of new bugs involved. The bit you noted about EXPLAIN missing a subplan is pretty scary in this connection; I'm not at all sure that that's just cosmetic. (Having said that, I'm wondering if there are bugs in these cases for cross-partition updates that target a previously-not-used partition. So we might have things to fix anyway.) Anyway, looking at the test case you posted at the very top of this thread, I was getting this with HEAD on Friday: nparts TPS 0 12152 10 8672 100 2753 1000314 and after the two patches I just pushed, it looks like: 0 12105 10 9928 100 5433 1000938 So while there's certainly work left to do, that's not bad for some low-hanging-fruit grabbing. regards, tom lane
Re: ModifyTable overheads in generic plans
On Mon, Apr 5, 2021 at 1:43 AM Tom Lane wrote: > Amit Langote writes: > > On Sun, Apr 4, 2021 at 10:20 AM Tom Lane wrote: > >> In some desultory performance testing here, it seemed like a > >> significant part of the cost is ExecOpenIndices, and I don't see > >> a reason offhand why we could not delay/skip that. I also concur > >> with delaying construction of ri_ChildToRootMap and the > >> partition_tuple_routing data structures, since many queries will > >> never need those at all. > > > As I mentioned in [1], creating ri_projectNew can be expensive too, > > especially as column count (and partition count for the generic plan > > case) grows. I think we should have an static inline > > initialize-on-first-access accessor function for that field too. > > > Actually, I remember considering having such accessor functions (all > > static inline) for ri_WithCheckOptionExprs, ri_projectReturning, > > ri_onConflictArbiterIndexes, and ri_onConflict (needed by ON CONFLICT > > UPDATE) as well, prompted by Heikki's comments earlier in the > > discussion. I also remember, before even writing this patch, not > > liking that WCO and RETURNING expressions are initialized in their own > > separate loops, rather than being part of the earlier loop that says: > > Sure, we might as well try to improve the cosmetics here. > > >> Anyway, I think the way to proceed for now is to grab the low-hanging > >> fruit of things that clearly won't change any semantics. But tail end > >> of the dev cycle is no time to be making really fundamental changes > >> in how FDW direct modify works. > > > I agree. > > OK. Do you want to pull out the bits of the patch that we can still > do without postponing BeginDirectModify? I ended up with the attached, whereby ExecInitResultRelation() is now performed for all relations before calling ExecInitNode() on the subplan. As mentioned, I moved other per-result-rel initializations into the same loop that does ExecInitResultRelation(), while moving code related to some initializations into initialize-on-first-access accessor functions for the concerned fields. I chose to do that for ri_WIthCheckOptionExprs, ri_projectReturning, and ri_projectNew. ExecInitNode() is called on the subplan (to set outerPlanState(mtstate) that is) after all of the per-result-rel initializations are done. One of the initializations is calling BeginForeignModify() for non-direct modifications, an API to which we currently pass mtstate. Moving that to before setting outerPlanState(mtstate) so as to be in the same loop as other initializations had me worried just a little bit given a modification I had to perform in postgresBeginForeignModify(): @@ -1879,7 +1879,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate, rte, resultRelInfo, mtstate->operation, - outerPlanState(mtstate)->plan, + outerPlan(mtstate->ps.plan), query, target_attrs, values_end_len, Though I think that this is harmless, because I'd think that the implementers of this API shouldn't really rely too strongly on assuming that outerPlanState(mtstate) is valid when it is called, if postgres_fdw's implementation is any indication. Another slightly ugly bit is the dependence of direct modify API on ri_projectReturning being set even if it doesn't care for anything else in the ResultRelInfo. So in ExecInitModifyTable() ri_projectReturning initialization is not skipped for directly-modified foreign result relations. Notes on regression test changes: * Initializing WCO quals during execution instead of during ExecInitNode() of ModifyTable() causes a couple of regression test changes in updatable_view.out that were a bit unexpected for me -- Subplans that are referenced in WCO quals are no longer shown in the plain EXPLAIN output. Even though that's a user-visible change, maybe we can live with that? * ri_RootResultRelInfo in *all* child relations instead of just in tuple-routing result relations has caused changes to inherit.out and privileges.out. I think that's basically down to ExecConstraints() et al doing one thing for child relations in which ri_RootResultRelInfo is set and another for those in which it is not. Now it's set in *all* child relations, so it always does the former thing. I remember having checked that those changes are only cosmetic when I first encountered them. * Moving PartitionTupleRouting initialization to be done lazily for cross-partition update cases causes changes to update.out. They have to do with the fact that the violations of the actual target table's partition constraint are now shown as such, instead of reporting them as occurring on one of the leaf partitions. Again, only cosmetic. > Another thing we could consider,
Re: ModifyTable overheads in generic plans
Amit Langote writes: > On Sun, Apr 4, 2021 at 10:20 AM Tom Lane wrote: >> In some desultory performance testing here, it seemed like a >> significant part of the cost is ExecOpenIndices, and I don't see >> a reason offhand why we could not delay/skip that. I also concur >> with delaying construction of ri_ChildToRootMap and the >> partition_tuple_routing data structures, since many queries will >> never need those at all. > As I mentioned in [1], creating ri_projectNew can be expensive too, > especially as column count (and partition count for the generic plan > case) grows. I think we should have an static inline > initialize-on-first-access accessor function for that field too. > Actually, I remember considering having such accessor functions (all > static inline) for ri_WithCheckOptionExprs, ri_projectReturning, > ri_onConflictArbiterIndexes, and ri_onConflict (needed by ON CONFLICT > UPDATE) as well, prompted by Heikki's comments earlier in the > discussion. I also remember, before even writing this patch, not > liking that WCO and RETURNING expressions are initialized in their own > separate loops, rather than being part of the earlier loop that says: Sure, we might as well try to improve the cosmetics here. >> Anyway, I think the way to proceed for now is to grab the low-hanging >> fruit of things that clearly won't change any semantics. But tail end >> of the dev cycle is no time to be making really fundamental changes >> in how FDW direct modify works. > I agree. OK. Do you want to pull out the bits of the patch that we can still do without postponing BeginDirectModify? Another thing we could consider, perhaps, is keeping the behavior the same for foreign tables but postponing init of local ones. To avoid opening the relations to figure out which kind they are, we'd have to rely on the RTE copies of relkind, which is a bit worrisome --- I'm not certain that those are guaranteed to be up-to-date --- but it's probably okay since there is no way to convert a regular table to foreign or vice versa. Anyway, that idea seems fairly messy so I'm inclined to just pursue the lower-hanging fruit for now. regards, tom lane
Re: ModifyTable overheads in generic plans
On Sun, Apr 4, 2021 at 10:20 AM Tom Lane wrote: > Amit Langote writes: > > On Thu, Apr 1, 2021 at 3:12 AM Tom Lane wrote: > >> Amit Langote writes: > > [ v14-0002-Initialize-result-relation-information-lazily.patch ] > >> Needs YA rebase over 86dc90056. > > > Done. > > I spent some time looking this over. Thanks. > There are bits of it we can > adopt without too much trouble, but I'm afraid that 0001 (delay > FDW BeginDirectModify until the first actual update) is a nonstarter, > which makes the main idea of delaying ExecInitResultRelation unworkable. > > My fear about 0001 is that it will destroy any hope of direct updates > on different remote partitions executing with consistent semantics > (i.e. compatible snapshots), because some row updates triggered by the > local query may have already happened before a given partition gets to > start its remote query. Maybe we can work around that, but I do not > want to commit a major restructuring that assumes we can dodge this > problem when we don't yet even have a fix for cross-partition updates > that does rely on the assumption of synchronous startup. Hmm, okay, I can understand the concern. > In some desultory performance testing here, it seemed like a > significant part of the cost is ExecOpenIndices, and I don't see > a reason offhand why we could not delay/skip that. I also concur > with delaying construction of ri_ChildToRootMap and the > partition_tuple_routing data structures, since many queries will > never need those at all. As I mentioned in [1], creating ri_projectNew can be expensive too, especially as column count (and partition count for the generic plan case) grows. I think we should have an static inline initialize-on-first-access accessor function for that field too. Actually, I remember considering having such accessor functions (all static inline) for ri_WithCheckOptionExprs, ri_projectReturning, ri_onConflictArbiterIndexes, and ri_onConflict (needed by ON CONFLICT UPDATE) as well, prompted by Heikki's comments earlier in the discussion. I also remember, before even writing this patch, not liking that WCO and RETURNING expressions are initialized in their own separate loops, rather than being part of the earlier loop that says: /* * Do additional per-result-relation initialization. */ for (i = 0; i < nrels; i++) { I guess ri_RowIdAttNo initialization can go into the same loop. > > * PartitionTupleRouting.subplan_resultrel_htab is removed in favor > > of using ModifyTableState.mt_resultOidHash to look up an UPDATE > > result relation by OID. > > Hmm, that sounds promising too, though I didn't look at the details. > > Anyway, I think the way to proceed for now is to grab the low-hanging > fruit of things that clearly won't change any semantics. But tail end > of the dev cycle is no time to be making really fundamental changes > in how FDW direct modify works. I agree. -- Amit Langote EDB: http://www.enterprisedb.com [1] https://www.postgresql.org/message-id/CA+HiwqHLUNhMxy46Mrb04VJpN=hudm9bd7xdz6f5h2o4imx...@mail.gmail.com
Re: ModifyTable overheads in generic plans
Amit Langote writes: > On Thu, Apr 1, 2021 at 3:12 AM Tom Lane wrote: >> Amit Langote writes: > [ v14-0002-Initialize-result-relation-information-lazily.patch ] >> Needs YA rebase over 86dc90056. > Done. I spent some time looking this over. There are bits of it we can adopt without too much trouble, but I'm afraid that 0001 (delay FDW BeginDirectModify until the first actual update) is a nonstarter, which makes the main idea of delaying ExecInitResultRelation unworkable. My fear about 0001 is that it will destroy any hope of direct updates on different remote partitions executing with consistent semantics (i.e. compatible snapshots), because some row updates triggered by the local query may have already happened before a given partition gets to start its remote query. Maybe we can work around that, but I do not want to commit a major restructuring that assumes we can dodge this problem when we don't yet even have a fix for cross-partition updates that does rely on the assumption of synchronous startup. In some desultory performance testing here, it seemed like a significant part of the cost is ExecOpenIndices, and I don't see a reason offhand why we could not delay/skip that. I also concur with delaying construction of ri_ChildToRootMap and the partition_tuple_routing data structures, since many queries will never need those at all. > * PartitionTupleRouting.subplan_resultrel_htab is removed in favor > of using ModifyTableState.mt_resultOidHash to look up an UPDATE > result relation by OID. Hmm, that sounds promising too, though I didn't look at the details. Anyway, I think the way to proceed for now is to grab the low-hanging fruit of things that clearly won't change any semantics. But tail end of the dev cycle is no time to be making really fundamental changes in how FDW direct modify works. regards, tom lane
Re: ModifyTable overheads in generic plans
On Thu, Apr 1, 2021 at 10:12 PM Amit Langote wrote: > > On Thu, Apr 1, 2021 at 3:12 AM Tom Lane wrote: > > Amit Langote writes: > > > [ v14-0002-Initialize-result-relation-information-lazily.patch ] > > Needs YA rebase over 86dc90056. > > Done. I will post the updated results for -Mprepared benchmarks I did > in the other thread shortly. Test details: pgbench -n -T60 -Mprepared -f nojoin.sql nojoin.sql: \set a random(1, 100) update test_table t set b = :a where a = :a; * test_table has 40 columns and partitions as shown below * plan_cache_mode = force_generic_plan Results: nparts master patched 64 626217118 128 344912082 256 17227643 1024359 2099 * tps figures shown are the median of 3 runs. So, drastic speedup can be seen by even just not creating ResultRelInfos for child relations that are not updated, as the patch does. I haven't yet included any changes for AcquireExecutorLocks() and ExecCheckRTPerms() bottlenecks that still remain and cause the drop in tps as partition count increases. -- Amit Langote EDB: http://www.enterprisedb.com
Re: ModifyTable overheads in generic plans
On Thu, Apr 1, 2021 at 3:12 AM Tom Lane wrote: > Amit Langote writes: > > [ v14-0002-Initialize-result-relation-information-lazily.patch ] > Needs YA rebase over 86dc90056. Done. I will post the updated results for -Mprepared benchmarks I did in the other thread shortly. -- Amit Langote EDB: http://www.enterprisedb.com v15-0001-Set-ForeignScanState.resultRelInfo-lazily.patch Description: Binary data v15-0002-Initialize-result-relation-information-lazily.patch Description: Binary data
Re: ModifyTable overheads in generic plans
Amit Langote writes: > [ v14-0002-Initialize-result-relation-information-lazily.patch ] Needs YA rebase over 86dc90056. regards, tom lane
Re: ModifyTable overheads in generic plans
On Mon, Jan 25, 2021 at 2:23 PM Amit Langote wrote: > On Tue, Dec 22, 2020 at 5:16 PM Amit Langote wrote: > > On Mon, Dec 7, 2020 at 3:53 PM Amit Langote wrote: > > > On Thu, Nov 12, 2020 at 5:04 PM Amit Langote > > > wrote: > > > > Attached new 0002 which does these adjustments. I went with > > > > ri_RootTargetDesc to go along with ri_RelationDesc. > > > > > > > > Also, I have updated the original 0002 (now 0003) to make > > > > GetChildToRootMap() use ri_RootTargetDesc instead of > > > > ModifyTableState.rootResultRelInfo.ri_RelationDesc, so that even > > > > AfterTriggerSaveEvent() can now use that function. This allows us to > > > > avoid having to initialize ri_ChildToRootMap anywhere but inside > > > > GetChildRootMap(), with that long comment defending doing so. :-) > > > > > > These needed to be rebased due to recent copy.c upheavals. Attached. > > > > Needed to be rebased again. > > And again, this time over the recent batch insert API related patches. Another rebase. I've dropped what was patch 0001 in the previous set, because I think it has been rendered unnecessary due to recently committed changes. However, the rebase led to a couple of additional regression test output changes that I think are harmless. The changes are caused by the fact that ri_RootResultRelInfo now gets initialized in *all* child result relations, not just those that participate in tuple routing. -- Amit Langote EDB: http://www.enterprisedb.com v14-0001-Set-ForeignScanState.resultRelInfo-lazily.patch Description: Binary data v14-0002-Initialize-result-relation-information-lazily.patch Description: Binary data
Re: ModifyTable overheads in generic plans
On Tue, Dec 22, 2020 at 5:16 PM Amit Langote wrote: > On Mon, Dec 7, 2020 at 3:53 PM Amit Langote wrote: > > On Thu, Nov 12, 2020 at 5:04 PM Amit Langote > > wrote: > > > Attached new 0002 which does these adjustments. I went with > > > ri_RootTargetDesc to go along with ri_RelationDesc. > > > > > > Also, I have updated the original 0002 (now 0003) to make > > > GetChildToRootMap() use ri_RootTargetDesc instead of > > > ModifyTableState.rootResultRelInfo.ri_RelationDesc, so that even > > > AfterTriggerSaveEvent() can now use that function. This allows us to > > > avoid having to initialize ri_ChildToRootMap anywhere but inside > > > GetChildRootMap(), with that long comment defending doing so. :-) > > > > These needed to be rebased due to recent copy.c upheavals. Attached. > > Needed to be rebased again. And again, this time over the recent batch insert API related patches. -- Amit Langote EDB: http://www.enterprisedb.com v13-0001-Set-ForeignScanState.resultRelInfo-lazily.patch Description: Binary data v13-0002-Rethink-ResultRelInfo.ri_PartitionRoot.patch Description: Binary data v13-0003-Initialize-result-relation-information-lazily.patch Description: Binary data
Re: ModifyTable overheads in generic plans
On Mon, Dec 7, 2020 at 3:53 PM Amit Langote wrote: > > On Thu, Nov 12, 2020 at 5:04 PM Amit Langote wrote: > > Attached new 0002 which does these adjustments. I went with > > ri_RootTargetDesc to go along with ri_RelationDesc. > > > > Also, I have updated the original 0002 (now 0003) to make > > GetChildToRootMap() use ri_RootTargetDesc instead of > > ModifyTableState.rootResultRelInfo.ri_RelationDesc, so that even > > AfterTriggerSaveEvent() can now use that function. This allows us to > > avoid having to initialize ri_ChildToRootMap anywhere but inside > > GetChildRootMap(), with that long comment defending doing so. :-) > > These needed to be rebased due to recent copy.c upheavals. Attached. Needed to be rebased again. -- Amit Langote EDB: http://www.enterprisedb.com v12-0001-Set-ForeignScanState.resultRelInfo-lazily.patch Description: Binary data v12-0002-Rethink-ResultRelInfo.ri_PartitionRoot.patch Description: Binary data v12-0003-Initialize-result-relation-information-lazily.patch Description: Binary data
Re: ModifyTable overheads in generic plans
On Thu, Nov 12, 2020 at 5:04 PM Amit Langote wrote: > Attached new 0002 which does these adjustments. I went with > ri_RootTargetDesc to go along with ri_RelationDesc. > > Also, I have updated the original 0002 (now 0003) to make > GetChildToRootMap() use ri_RootTargetDesc instead of > ModifyTableState.rootResultRelInfo.ri_RelationDesc, so that even > AfterTriggerSaveEvent() can now use that function. This allows us to > avoid having to initialize ri_ChildToRootMap anywhere but inside > GetChildRootMap(), with that long comment defending doing so. :-) These needed to be rebased due to recent copy.c upheavals. Attached. -- Amit Langote EDB: http://www.enterprisedb.com v11-0001-Set-ForeignScanState.resultRelInfo-lazily.patch Description: Binary data v11-0002-Rethink-ResultRelInfo.ri_PartitionRoot.patch Description: Binary data v11-0003-Initialize-result-relation-information-lazily.patch Description: Binary data
Re: ModifyTable overheads in generic plans
On Wed, Nov 11, 2020 at 10:14 PM Heikki Linnakangas wrote: > I'm still a bit confused and unhappy about the initialization of > ResultRelInfos and the various fields in them. We've made progress in > the previous patches, but it's still a bit messy. > > > /* > >* If transition tuples will be captured, initialize a map to > > convert > >* child tuples into the format of the table mentioned in the > > query > >* (root relation), because the transition tuple store can > > only store > >* tuples in the root table format. However for INSERT, the > > map is > >* only initialized for a given partition when the partition > > itself is > >* first initialized by ExecFindPartition. Also, this map is > > also > >* needed if an UPDATE ends up having to move tuples across > >* partitions, because in that case the child tuple to be > > moved first > >* needs to be converted into the root table's format. In > > that case, > >* we use GetChildToRootMap() to either create one from > > scratch if > >* we didn't already create it here. > >* > >* Note: We cannot always initialize this map lazily, that > > is, use > >* GetChildToRootMap(), because AfterTriggerSaveEvent(), > > which needs > >* the map, doesn't have access to the "target" relation that > > is > >* needed to create the map. > >*/ > > if (mtstate->mt_transition_capture && operation != CMD_INSERT) > > { > > Relationrelation = > > resultRelInfo->ri_RelationDesc; > > RelationtargetRel = > > mtstate->rootResultRelInfo->ri_RelationDesc; > > > > resultRelInfo->ri_ChildToRootMap = > > > > convert_tuples_by_name(RelationGetDescr(relation), > > > > RelationGetDescr(targetRel)); > > /* First time creating the map for this result > > relation. */ > > Assert(!resultRelInfo->ri_ChildToRootMapValid); > > resultRelInfo->ri_ChildToRootMapValid = true; > > } > > The comment explains that AfterTriggerSaveEvent() cannot use > GetChildToRootMap(), because it doesn't have access to the root target > relation. But there is a field for that in ResultRelInfo: > ri_PartitionRoot. However, that's only set up when we do partition routing. > > How about we rename ri_PartitionRoot to e.g ri_RootTarget, and set it > always, even for non-partition inheritance? We have that information > available when we initialize the ResultRelInfo, so might as well. Yeah, I agree it's better to use ri_PartitionRoot more generally like you describe here. > Some code currently checks ri_PartitionRoot, to determine if a tuple > that's been inserted, has been routed. For example: > > > /* > >* Also check the tuple against the partition constraint, if > > there is > >* one; except that if we got here via tuple-routing, we > > don't need to > >* if there's no BR trigger defined on the partition. > >*/ > > if (resultRelationDesc->rd_rel->relispartition && > > (resultRelInfo->ri_PartitionRoot == NULL || > >(resultRelInfo->ri_TrigDesc && > > > > resultRelInfo->ri_TrigDesc->trig_insert_before_row))) > > ExecPartitionCheck(resultRelInfo, slot, estate, true); > > So if we set ri_PartitionRoot always, we would need some other way to > determine if the tuple at hand has actually been routed or not. But > wouldn't that be a good thing anyway? Isn't it possible that the same > ResultRelInfo is sometimes used for routed tuples, and sometimes for > tuples that have been inserted/updated "directly"? > ExecLookupUpdateResultRelByOid() sets that field lazily, so I think it > would be possible to get here with ri_PartitionRoot either set or not, > depending on whether an earlier cross-partition update was routed to the > table. ri_RelationDesc != ri_PartitionRoot gives whether the result relation is the original target relation of the query or not, so checking that should be enough here. > The above check is just an optimization, to skip unnecessary > ExecPartitionCheck() calls, but I think this snippet in > ExecConstraints() needs to get this right: > > > /* > >* If the tuple has been routed, it's been > > converted to the > >* partition's rowtype, which might differ > > from the root > >* table's. We must convert it back to
Re: ModifyTable overheads in generic plans
I'm still a bit confused and unhappy about the initialization of ResultRelInfos and the various fields in them. We've made progress in the previous patches, but it's still a bit messy. /* * If transition tuples will be captured, initialize a map to convert * child tuples into the format of the table mentioned in the query * (root relation), because the transition tuple store can only store * tuples in the root table format. However for INSERT, the map is * only initialized for a given partition when the partition itself is * first initialized by ExecFindPartition. Also, this map is also * needed if an UPDATE ends up having to move tuples across * partitions, because in that case the child tuple to be moved first * needs to be converted into the root table's format. In that case, * we use GetChildToRootMap() to either create one from scratch if * we didn't already create it here. * * Note: We cannot always initialize this map lazily, that is, use * GetChildToRootMap(), because AfterTriggerSaveEvent(), which needs * the map, doesn't have access to the "target" relation that is * needed to create the map. */ if (mtstate->mt_transition_capture && operation != CMD_INSERT) { Relationrelation = resultRelInfo->ri_RelationDesc; RelationtargetRel = mtstate->rootResultRelInfo->ri_RelationDesc; resultRelInfo->ri_ChildToRootMap = convert_tuples_by_name(RelationGetDescr(relation), RelationGetDescr(targetRel)); /* First time creating the map for this result relation. */ Assert(!resultRelInfo->ri_ChildToRootMapValid); resultRelInfo->ri_ChildToRootMapValid = true; } The comment explains that AfterTriggerSaveEvent() cannot use GetChildToRootMap(), because it doesn't have access to the root target relation. But there is a field for that in ResultRelInfo: ri_PartitionRoot. However, that's only set up when we do partition routing. How about we rename ri_PartitionRoot to e.g ri_RootTarget, and set it always, even for non-partition inheritance? We have that information available when we initialize the ResultRelInfo, so might as well. Some code currently checks ri_PartitionRoot, to determine if a tuple that's been inserted, has been routed. For example: /* * Also check the tuple against the partition constraint, if there is * one; except that if we got here via tuple-routing, we don't need to * if there's no BR trigger defined on the partition. */ if (resultRelationDesc->rd_rel->relispartition && (resultRelInfo->ri_PartitionRoot == NULL || (resultRelInfo->ri_TrigDesc && resultRelInfo->ri_TrigDesc->trig_insert_before_row))) ExecPartitionCheck(resultRelInfo, slot, estate, true); So if we set ri_PartitionRoot always, we would need some other way to determine if the tuple at hand has actually been routed or not. But wouldn't that be a good thing anyway? Isn't it possible that the same ResultRelInfo is sometimes used for routed tuples, and sometimes for tuples that have been inserted/updated "directly"? ExecLookupUpdateResultRelByOid() sets that field lazily, so I think it would be possible to get here with ri_PartitionRoot either set or not, depending on whether an earlier cross-partition update was routed to the table. The above check is just an optimization, to skip unnecessary ExecPartitionCheck() calls, but I think this snippet in ExecConstraints() needs to get this right: /* * If the tuple has been routed, it's been converted to the * partition's rowtype, which might differ from the root * table's. We must convert it back to the root table's * rowtype so that val_desc shown error message matches the * input tuple. */ if (resultRelInfo->ri_PartitionRoot) { AttrMap*map; rel = resultRelInfo->ri_PartitionRoot; tupdesc = RelationGetDescr(rel);
Re: ModifyTable overheads in generic plans
Thanks for the review. On Wed, Nov 11, 2020 at 5:55 PM Heikki Linnakangas wrote: > On 10/11/2020 17:32, Heikki Linnakangas wrote: > > On 10/11/2020 13:12, Amit Langote wrote: > >> On second thought, it seems A would amount to merely a cosmetic > >> adjustment of the API, nothing more. B seems to get the job done for > >> me and also doesn't unnecessarily break compatibility, so I've updated > >> 0001 to implement B. Please give it a look. > > > > Looks good at a quick glance. It is a small API break that > > BeginDirectModify() is now called during execution, not at executor > > startup, but I don't think that's going to break FDWs in practice. One > > could argue, though, that if we're going to change the API, we should do > > it more loudly. So changing the arguments might be a good thing. > > > > The BeginDirectModify() and BeginForeignModify() interfaces are > > inconsistent, but that's not this patch's fault. I wonder if we could > > move the call to BeginForeignModify() also to ForeignNext(), though? And > > BeginForeignScan() too, while we're at it. > > With these patches, BeginForeignModify() and BeginDirectModify() are > both called during execution, before the first > IterateForeignScan/IterateDirectModify call. The documentation for > BeginForeignModify() needs to be updated, it still claims that it's run > at executor startup, but that's not true after these patches. So that > needs to be updated. Good point, I've updated the patch to note that. > I think that's a good thing, because it means that BeginForeignModify() > and BeginDirectModify() are called at the same stage, from the FDW's > point of view. Even though BeginDirectModify() is called from > ForeignNext(), and BeginForeignModify() from ExecModifyTable(), that > difference isn't visible to the FDW; both are after executor startup but > before the first Iterate call. Right. -- Amit Langote EDB: http://www.enterprisedb.com v9-0001-Set-ForeignScanState.resultRelInfo-lazily.patch Description: Binary data v9-0002-Initialize-result-relation-information-lazily.patch Description: Binary data
Re: ModifyTable overheads in generic plans
On 10/11/2020 17:32, Heikki Linnakangas wrote: On 10/11/2020 13:12, Amit Langote wrote: On second thought, it seems A would amount to merely a cosmetic adjustment of the API, nothing more. B seems to get the job done for me and also doesn't unnecessarily break compatibility, so I've updated 0001 to implement B. Please give it a look. Looks good at a quick glance. It is a small API break that BeginDirectModify() is now called during execution, not at executor startup, but I don't think that's going to break FDWs in practice. One could argue, though, that if we're going to change the API, we should do it more loudly. So changing the arguments might be a good thing. The BeginDirectModify() and BeginForeignModify() interfaces are inconsistent, but that's not this patch's fault. I wonder if we could move the call to BeginForeignModify() also to ForeignNext(), though? And BeginForeignScan() too, while we're at it. With these patches, BeginForeignModify() and BeginDirectModify() are both called during execution, before the first IterateForeignScan/IterateDirectModify call. The documentation for BeginForeignModify() needs to be updated, it still claims that it's run at executor startup, but that's not true after these patches. So that needs to be updated. I think that's a good thing, because it means that BeginForeignModify() and BeginDirectModify() are called at the same stage, from the FDW's point of view. Even though BeginDirectModify() is called from ForeignNext(), and BeginForeignModify() from ExecModifyTable(), that difference isn't visible to the FDW; both are after executor startup but before the first Iterate call. - Heikki
Re: ModifyTable overheads in generic plans
On 10/11/2020 13:12, Amit Langote wrote: On Wed, Nov 4, 2020 at 11:32 AM Amit Langote wrote: On Tue, Nov 3, 2020 at 9:05 PM Heikki Linnakangas wrote: A) We could change FDW API so that BeginDirectModify takes the same arguments as BeginForeignModify(). That avoids the assumption that it's a ForeignScan node, because BeginForeignModify() doesn't take ForeignScanState as argument. That would be consistent, which is nice. But I think we'd somehow still need to pass the ResultRelInfo to the corresponding ForeignScan, and I'm not sure how. Maybe ForeignScan doesn't need to contain any result relation info then? ForeignScan.operation != CMD_SELECT is enough to tell it to call IterateDirectModify() as today. Hmm, I misspoke. We do still need ForeignScanState.resultRelInfo, because the IterateDirectModify() API uses it to return the remotely inserted/updated/deleted tuple for the RETURNING projection performed by ExecModifyTable(). B) Look up the ResultRelInfo, and call BeginDirectModify(), on the first call to ForeignNext(). C) Accept the Assertion. And add an elog() check in the planner for that with a proper error message. I'm leaning towards B), but maybe there's some better solution I didn't think of? Perhaps changing the API would make sense in any case, it is a bit weird as it is. Backwards-incompatible API changes are not nice, but I don't think there are many FDWs out there that implement the DirectModify functions. And those functions are pretty tightly coupled with the executor and ModifyTable node details anyway, so I don't feel like we can, or need to, guarantee that they stay unchanged across major versions. B is not too bad, but I tend to prefer doing A too. On second thought, it seems A would amount to merely a cosmetic adjustment of the API, nothing more. B seems to get the job done for me and also doesn't unnecessarily break compatibility, so I've updated 0001 to implement B. Please give it a look. Looks good at a quick glance. It is a small API break that BeginDirectModify() is now called during execution, not at executor startup, but I don't think that's going to break FDWs in practice. One could argue, though, that if we're going to change the API, we should do it more loudly. So changing the arguments might be a good thing. The BeginDirectModify() and BeginForeignModify() interfaces are inconsistent, but that's not this patch's fault. I wonder if we could move the call to BeginForeignModify() also to ForeignNext(), though? And BeginForeignScan() too, while we're at it. Overall, this is probably fine as it is though. I'll review more thorougly tomorrow. - Heikki
Re: ModifyTable overheads in generic plans
On Wed, Nov 4, 2020 at 11:32 AM Amit Langote wrote: > On Tue, Nov 3, 2020 at 9:05 PM Heikki Linnakangas wrote: > > On 03/11/2020 10:27, Amit Langote wrote: > > > Please check the attached if that looks better. > > > > Great, thanks! Yeah, I like that much better. > > > > This makes me a bit unhappy: > > > > > > > > /* Also let FDWs init themselves for foreign-table result > > > rels */ > > > if (resultRelInfo->ri_FdwRoutine != NULL) > > > { > > > if (resultRelInfo->ri_usesFdwDirectModify) > > > { > > > ForeignScanState *fscan = (ForeignScanState > > > *) mtstate->mt_plans[i]; > > > > > > /* > > >* For the FDW's convenience, set the > > > ForeignScanState node's > > >* ResultRelInfo to let the FDW know which > > > result relation it > > >* is going to work with. > > >*/ > > > Assert(IsA(fscan, ForeignScanState)); > > > fscan->resultRelInfo = resultRelInfo; > > > > > > resultRelInfo->ri_FdwRoutine->BeginDirectModify(fscan, eflags); > > > } > > > else if > > > (resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL) > > > { > > > List *fdw_private = (List *) > > > list_nth(node->fdwPrivLists, i); > > > > > > > > > resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate, > > > > > > resultRelInfo, > > > > > > fdw_private, > > > > > > i, > > > > > > eflags); > > > } > > > } > > > > If you remember, I was unhappy with a similar assertion in the earlier > > patches [1]. I'm not sure what to do instead though. A few options: > > > > A) We could change FDW API so that BeginDirectModify takes the same > > arguments as BeginForeignModify(). That avoids the assumption that it's > > a ForeignScan node, because BeginForeignModify() doesn't take > > ForeignScanState as argument. That would be consistent, which is nice. > > But I think we'd somehow still need to pass the ResultRelInfo to the > > corresponding ForeignScan, and I'm not sure how. > > Maybe ForeignScan doesn't need to contain any result relation info > then? ForeignScan.operation != CMD_SELECT is enough to tell it to > call IterateDirectModify() as today. Hmm, I misspoke. We do still need ForeignScanState.resultRelInfo, because the IterateDirectModify() API uses it to return the remotely inserted/updated/deleted tuple for the RETURNING projection performed by ExecModifyTable(). > > B) Look up the ResultRelInfo, and call BeginDirectModify(), on the first > > call to ForeignNext(). > > > > C) Accept the Assertion. And add an elog() check in the planner for that > > with a proper error message. > > > > I'm leaning towards B), but maybe there's some better solution I didn't > > think of? Perhaps changing the API would make sense in any case, it is a > > bit weird as it is. Backwards-incompatible API changes are not nice, but > > I don't think there are many FDWs out there that implement the > > DirectModify functions. And those functions are pretty tightly coupled > > with the executor and ModifyTable node details anyway, so I don't feel > > like we can, or need to, guarantee that they stay unchanged across major > > versions. > > B is not too bad, but I tend to prefer doing A too. On second thought, it seems A would amount to merely a cosmetic adjustment of the API, nothing more. B seems to get the job done for me and also doesn't unnecessarily break compatibility, so I've updated 0001 to implement B. Please give it a look. -- Amit Langote EDB: http://www.enterprisedb.com v8-0001-Set-ForeignScanState.resultRelInfo-lazily.patch Description: Binary data v8-0002-Initialize-result-relation-information-lazily.patch Description: Binary data
Re: ModifyTable overheads in generic plans
On Wed, Nov 4, 2020 at 11:32 AM Amit Langote wrote: > On Tue, Nov 3, 2020 at 9:05 PM Heikki Linnakangas wrote: > > A) We could change FDW API so that BeginDirectModify takes the same > > arguments as BeginForeignModify(). That avoids the assumption that it's > > a ForeignScan node, because BeginForeignModify() doesn't take > > ForeignScanState as argument. That would be consistent, which is nice. > > But I think we'd somehow still need to pass the ResultRelInfo to the > > corresponding ForeignScan, and I'm not sure how. > > Maybe ForeignScan doesn't need to contain any result relation info > then? ForeignScan.operation != CMD_SELECT is enough to tell it to > call IterateDirectModify() as today. > > > B) Look up the ResultRelInfo, and call BeginDirectModify(), on the first > > call to ForeignNext(). > > > > C) Accept the Assertion. And add an elog() check in the planner for that > > with a proper error message. > > > > I'm leaning towards B), but maybe there's some better solution I didn't > > think of? Perhaps changing the API would make sense in any case, it is a > > bit weird as it is. Backwards-incompatible API changes are not nice, but > > I don't think there are many FDWs out there that implement the > > DirectModify functions. And those functions are pretty tightly coupled > > with the executor and ModifyTable node details anyway, so I don't feel > > like we can, or need to, guarantee that they stay unchanged across major > > versions. > > B is not too bad, but I tend to prefer doing A too. How about I update the 0001 patch to implement A? -- Amit Langote EDB: http://www.enterprisedb.com
Re: ModifyTable overheads in generic plans
On Tue, Nov 3, 2020 at 9:05 PM Heikki Linnakangas wrote: > On 03/11/2020 10:27, Amit Langote wrote: > > Please check the attached if that looks better. > > Great, thanks! Yeah, I like that much better. > > This makes me a bit unhappy: > > > > > /* Also let FDWs init themselves for foreign-table result > > rels */ > > if (resultRelInfo->ri_FdwRoutine != NULL) > > { > > if (resultRelInfo->ri_usesFdwDirectModify) > > { > > ForeignScanState *fscan = (ForeignScanState > > *) mtstate->mt_plans[i]; > > > > /* > >* For the FDW's convenience, set the > > ForeignScanState node's > >* ResultRelInfo to let the FDW know which > > result relation it > >* is going to work with. > >*/ > > Assert(IsA(fscan, ForeignScanState)); > > fscan->resultRelInfo = resultRelInfo; > > > > resultRelInfo->ri_FdwRoutine->BeginDirectModify(fscan, eflags); > > } > > else if > > (resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL) > > { > > List *fdw_private = (List *) > > list_nth(node->fdwPrivLists, i); > > > > > > resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate, > > > >resultRelInfo, > > > >fdw_private, > > > >i, > > > >eflags); > > } > > } > > If you remember, I was unhappy with a similar assertion in the earlier > patches [1]. I'm not sure what to do instead though. A few options: > > A) We could change FDW API so that BeginDirectModify takes the same > arguments as BeginForeignModify(). That avoids the assumption that it's > a ForeignScan node, because BeginForeignModify() doesn't take > ForeignScanState as argument. That would be consistent, which is nice. > But I think we'd somehow still need to pass the ResultRelInfo to the > corresponding ForeignScan, and I'm not sure how. Maybe ForeignScan doesn't need to contain any result relation info then? ForeignScan.operation != CMD_SELECT is enough to tell it to call IterateDirectModify() as today. > B) Look up the ResultRelInfo, and call BeginDirectModify(), on the first > call to ForeignNext(). > > C) Accept the Assertion. And add an elog() check in the planner for that > with a proper error message. > > I'm leaning towards B), but maybe there's some better solution I didn't > think of? Perhaps changing the API would make sense in any case, it is a > bit weird as it is. Backwards-incompatible API changes are not nice, but > I don't think there are many FDWs out there that implement the > DirectModify functions. And those functions are pretty tightly coupled > with the executor and ModifyTable node details anyway, so I don't feel > like we can, or need to, guarantee that they stay unchanged across major > versions. B is not too bad, but I tend to prefer doing A too. -- Amit Langote EDB: http://www.enterprisedb.com
Re: ModifyTable overheads in generic plans
On 03/11/2020 10:27, Amit Langote wrote: Please check the attached if that looks better. Great, thanks! Yeah, I like that much better. This makes me a bit unhappy: /* Also let FDWs init themselves for foreign-table result rels */ if (resultRelInfo->ri_FdwRoutine != NULL) { if (resultRelInfo->ri_usesFdwDirectModify) { ForeignScanState *fscan = (ForeignScanState *) mtstate->mt_plans[i]; /* * For the FDW's convenience, set the ForeignScanState node's * ResultRelInfo to let the FDW know which result relation it * is going to work with. */ Assert(IsA(fscan, ForeignScanState)); fscan->resultRelInfo = resultRelInfo; resultRelInfo->ri_FdwRoutine->BeginDirectModify(fscan, eflags); } else if (resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL) { List *fdw_private = (List *) list_nth(node->fdwPrivLists, i); resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate, resultRelInfo, fdw_private, i, eflags); } } If you remember, I was unhappy with a similar assertion in the earlier patches [1]. I'm not sure what to do instead though. A few options: A) We could change FDW API so that BeginDirectModify takes the same arguments as BeginForeignModify(). That avoids the assumption that it's a ForeignScan node, because BeginForeignModify() doesn't take ForeignScanState as argument. That would be consistent, which is nice. But I think we'd somehow still need to pass the ResultRelInfo to the corresponding ForeignScan, and I'm not sure how. B) Look up the ResultRelInfo, and call BeginDirectModify(), on the first call to ForeignNext(). C) Accept the Assertion. And add an elog() check in the planner for that with a proper error message. I'm leaning towards B), but maybe there's some better solution I didn't think of? Perhaps changing the API would make sense in any case, it is a bit weird as it is. Backwards-incompatible API changes are not nice, but I don't think there are many FDWs out there that implement the DirectModify functions. And those functions are pretty tightly coupled with the executor and ModifyTable node details anyway, so I don't feel like we can, or need to, guarantee that they stay unchanged across major versions. [1] https://www.postgresql.org/message-id/19c23dd9-89ce-75a3-9105-5fc05a46f94a%40iki.fi - Heikki
Re: ModifyTable overheads in generic plans
On Mon, Nov 2, 2020 at 10:53 PM Amit Langote wrote: > On Mon, Nov 2, 2020 at 10:19 PM Heikki Linnakangas wrote: > > (/me reads patch further) I presume that's what you referred to in the > > commit message: > > > > > Also, extend this lazy initialization approach to some of the > > > individual fields of ResultRelInfo such that even for the result > > > relations that are initialized, those fields are only initialized on > > > first access. While no performance improvement is to be expected > > > there, it can lead to a simpler initialization logic of the > > > ResultRelInfo itself, because the conditions for whether a given > > > field is needed or not tends to look confusing. One side-effect > > > of this is that any "SubPlans" referenced in the expressions of > > > those fields are also lazily initialized and hence changes the > > > output of EXPLAIN (without ANALYZE) in some regression tests. > > > > > > I'm now curious what the initialization logic would look like, if we > > initialized those fields in ExecGetResultRelation(). At a quick glance > > on the conditions on when those initializations are done in the patch > > now, it would seem pretty straightforward. If the target list contains > > any junk columns, initialize junk filter, and if > > ModifyTable->returningLists is set, initialize RETURNING list. Maybe I'm > > missing something. > > Yeah, it's not that complicated to initialize those things in > ExecGetResultRelation(). In fact, ExecGetResultRelation() (or its > subroutine ExecBuildResultRelation()) housed those initializations in > the earlier versions of this patch, but I changed that after our > discussion about being lazy about initializing as much stuff as we > can. Maybe I should revert that? Please check the attached if that looks better. -- Amit Langote EDB: http://www.enterprisedb.com v7-0001-Call-BeginDirectModify-from-ExecInitModifyTable.patch Description: Binary data v7-0002-Initialize-result-relation-information-lazily.patch Description: Binary data
Re: ModifyTable overheads in generic plans
On Mon, Nov 2, 2020 at 10:19 PM Heikki Linnakangas wrote: > On 30/10/2020 08:13, Amit Langote wrote: > > /* > > * Perform WITH CHECK OPTIONS check, if any. > > */ > > static void > > ExecProcessWithCheckOptions(ModifyTableState *mtstate, ResultRelInfo > > *resultRelInfo, > > TupleTableSlot *slot, > > WCOKind wco_kind) > > { > > ModifyTable *node = (ModifyTable *) mtstate->ps.plan; > > EState *estate = mtstate->ps.state; > > > > if (node->withCheckOptionLists == NIL) > > return; > > > > /* Initialize expression state if not already done. */ > > if (resultRelInfo->ri_WithCheckOptions == NIL) > > { > > int whichrel = resultRelInfo - > > mtstate->resultRelInfo; > > List *wcoList; > > List *wcoExprs = NIL; > > ListCell *ll; > > > > Assert(whichrel >= 0 && whichrel < mtstate->mt_nplans); > > wcoList = (List *) list_nth(node->withCheckOptionLists, > > whichrel); > > foreach(ll, wcoList) > > { > > WithCheckOption *wco = (WithCheckOption *) lfirst(ll); > > ExprState *wcoExpr = ExecInitQual((List *) wco->qual, > > > > >ps); > > > > wcoExprs = lappend(wcoExprs, wcoExpr); > > } > > > > resultRelInfo->ri_WithCheckOptions = wcoList; > > resultRelInfo->ri_WithCheckOptionExprs = wcoExprs; > > } > > > > /* > >* ExecWithCheckOptions() will skip any WCOs which are not of the kind > >* we are looking for at this point. > >*/ > > ExecWithCheckOptions(wco_kind, resultRelInfo, slot, estate); > > } > > Can we do this initialization in ExecGetResultRelation()? That would > seem much more straightforward. Is there any advantage to delaying it > here? And same thing with the junk filter and the RETURNING list. > > (/me reads patch further) I presume that's what you referred to in the > commit message: > > > Also, extend this lazy initialization approach to some of the > > individual fields of ResultRelInfo such that even for the result > > relations that are initialized, those fields are only initialized on > > first access. While no performance improvement is to be expected > > there, it can lead to a simpler initialization logic of the > > ResultRelInfo itself, because the conditions for whether a given > > field is needed or not tends to look confusing. One side-effect > > of this is that any "SubPlans" referenced in the expressions of > > those fields are also lazily initialized and hence changes the > > output of EXPLAIN (without ANALYZE) in some regression tests. > > > I'm now curious what the initialization logic would look like, if we > initialized those fields in ExecGetResultRelation(). At a quick glance > on the conditions on when those initializations are done in the patch > now, it would seem pretty straightforward. If the target list contains > any junk columns, initialize junk filter, and if > ModifyTable->returningLists is set, initialize RETURNING list. Maybe I'm > missing something. Yeah, it's not that complicated to initialize those things in ExecGetResultRelation(). In fact, ExecGetResultRelation() (or its subroutine ExecBuildResultRelation()) housed those initializations in the earlier versions of this patch, but I changed that after our discussion about being lazy about initializing as much stuff as we can. Maybe I should revert that? -- Amit Langote EDB: http://www.enterprisedb.com
Re: ModifyTable overheads in generic plans
On 30/10/2020 08:13, Amit Langote wrote: /* * Perform WITH CHECK OPTIONS check, if any. */ static void ExecProcessWithCheckOptions(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo, TupleTableSlot *slot, WCOKind wco_kind) { ModifyTable *node = (ModifyTable *) mtstate->ps.plan; EState *estate = mtstate->ps.state; if (node->withCheckOptionLists == NIL) return; /* Initialize expression state if not already done. */ if (resultRelInfo->ri_WithCheckOptions == NIL) { int whichrel = resultRelInfo - mtstate->resultRelInfo; List *wcoList; List *wcoExprs = NIL; ListCell *ll; Assert(whichrel >= 0 && whichrel < mtstate->mt_nplans); wcoList = (List *) list_nth(node->withCheckOptionLists, whichrel); foreach(ll, wcoList) { WithCheckOption *wco = (WithCheckOption *) lfirst(ll); ExprState *wcoExpr = ExecInitQual((List *) wco->qual, >ps); wcoExprs = lappend(wcoExprs, wcoExpr); } resultRelInfo->ri_WithCheckOptions = wcoList; resultRelInfo->ri_WithCheckOptionExprs = wcoExprs; } /* * ExecWithCheckOptions() will skip any WCOs which are not of the kind * we are looking for at this point. */ ExecWithCheckOptions(wco_kind, resultRelInfo, slot, estate); } Can we do this initialization in ExecGetResultRelation()? That would seem much more straightforward. Is there any advantage to delaying it here? And same thing with the junk filter and the RETURNING list. (/me reads patch further) I presume that's what you referred to in the commit message: Also, extend this lazy initialization approach to some of the individual fields of ResultRelInfo such that even for the result relations that are initialized, those fields are only initialized on first access. While no performance improvement is to be expected there, it can lead to a simpler initialization logic of the ResultRelInfo itself, because the conditions for whether a given field is needed or not tends to look confusing. One side-effect of this is that any "SubPlans" referenced in the expressions of those fields are also lazily initialized and hence changes the output of EXPLAIN (without ANALYZE) in some regression tests. I'm now curious what the initialization logic would look like, if we initialized those fields in ExecGetResultRelation(). At a quick glance on the conditions on when those initializations are done in the patch now, it would seem pretty straightforward. If the target list contains any junk columns, initialize junk filter, and if ModifyTable->returningLists is set, initialize RETURNING list. Maybe I'm missing something. - Heikki
Re: ModifyTable overheads in generic plans
Attached updated patches based on recent the discussion at: * Re: partition routing layering in nodeModifyTable.c * https://www.postgresql.org/message-id/CA%2BHiwqHpmMjenQqNpMHrhg3DRhqqQfby2RCT1HWVwMin3_5vMA%40mail.gmail.com 0001 adjusts how ForeignScanState.resultRelInfo is initialized for use by direct modify operations. 0002 refactors ResultRelInfo initialization do be don lazily on first use I call these v6, because the last version posted on this thread was v5, even though it went through a couple of iterations on the above thread. Sorry about the confusion. -- Amit Langote EDB: http://www.enterprisedb.com v6-0001-Call-BeginDirectModify-from-ExecInitModifyTable.patch Description: Binary data v6-0002-Initialize-result-relation-information-lazily.patch Description: Binary data
Re: ModifyTable overheads in generic plans
Hello, On Fri, Aug 7, 2020 at 9:26 PM Amit Langote wrote: > On Tue, Aug 4, 2020 at 3:15 PM Amit Langote wrote: > > On Sat, Aug 1, 2020 at 4:46 AM Robert Haas wrote: > > > On Fri, Jun 26, 2020 at 8:36 AM Amit Langote > > > wrote: > > > > 0001 and 0002 are preparatory patches. > > > > > > I read through these patches a bit but it's really unclear what the > > > point of them is. I think they need better commit messages, or better > > > comments, or both. > > > > Thanks for taking a look. Sorry about the lack of good commentary, > > which I have tried to address in the attached updated version. I > > extracted one more part as preparatory from the earlier 0003 patch, so > > there are 4 patches now. > > > > Also as discussed with Daniel, I have changed the patches so that they > > can be applied on plain HEAD instead of having to first apply the > > patches at [1]. Without runtime pruning for UPDATE/DELETE proposed in > > [1], optimizing ResultRelInfo creation by itself does not improve the > > performance/scalability by that much, but the benefit of lazily > > creating ResultRelInfos seems clear so I think maybe it's okay to > > pursue this independently. > > Per cfbot's automatic patch tester, there were some issues in the 0004 patch: > > nodeModifyTable.c: In function ‘ExecModifyTable’: > 1529nodeModifyTable.c:2484:24: error: ‘junkfilter’ may be used > uninitialized in this function [-Werror=maybe-uninitialized] > 1530 junkfilter->jf_junkAttNo, > 1531^ > 1532nodeModifyTable.c:2309:14: note: ‘junkfilter’ was declared here > 1533 JunkFilter *junkfilter; > 1534 ^ > 1535cc1: all warnings being treated as errors > 1536: recipe for target 'nodeModifyTable.o' failed > 1537make[3]: *** [nodeModifyTable.o] Error 1 > > Fixed in the attached updated version Needed a rebase due to f481d28232. Attached updated patches. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com v5-0004-Initialize-result-relation-information-lazily.patch Description: Binary data v5-0001-Revise-how-FDWs-obtain-result-relation-informatio.patch Description: Binary data v5-0002-Don-t-make-root-ResultRelInfo-for-insert-queries.patch Description: Binary data v5-0003-Revise-child-to-root-tuple-conversion-map-managem.patch Description: Binary data
Re: ModifyTable overheads in generic plans
On Tue, Aug 4, 2020 at 3:15 PM Amit Langote wrote: > On Sat, Aug 1, 2020 at 4:46 AM Robert Haas wrote: > > On Fri, Jun 26, 2020 at 8:36 AM Amit Langote > > wrote: > > > 0001 and 0002 are preparatory patches. > > > > I read through these patches a bit but it's really unclear what the > > point of them is. I think they need better commit messages, or better > > comments, or both. > > Thanks for taking a look. Sorry about the lack of good commentary, > which I have tried to address in the attached updated version. I > extracted one more part as preparatory from the earlier 0003 patch, so > there are 4 patches now. > > Also as discussed with Daniel, I have changed the patches so that they > can be applied on plain HEAD instead of having to first apply the > patches at [1]. Without runtime pruning for UPDATE/DELETE proposed in > [1], optimizing ResultRelInfo creation by itself does not improve the > performance/scalability by that much, but the benefit of lazily > creating ResultRelInfos seems clear so I think maybe it's okay to > pursue this independently. Per cfbot's automatic patch tester, there were some issues in the 0004 patch: nodeModifyTable.c: In function ‘ExecModifyTable’: 1529nodeModifyTable.c:2484:24: error: ‘junkfilter’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 1530 junkfilter->jf_junkAttNo, 1531^ 1532nodeModifyTable.c:2309:14: note: ‘junkfilter’ was declared here 1533 JunkFilter *junkfilter; 1534 ^ 1535cc1: all warnings being treated as errors 1536: recipe for target 'nodeModifyTable.o' failed 1537make[3]: *** [nodeModifyTable.o] Error 1 Fixed in the attached updated version -- Amit Langote EnterpriseDB: http://www.enterprisedb.com v4-0003-Revise-child-to-root-tuple-conversion-map-managem.patch Description: Binary data v4-0002-Don-t-make-root-ResultRelInfo-for-insert-queries.patch Description: Binary data v4-0004-Initialize-result-relation-information-lazily.patch Description: Binary data v4-0001-Revise-how-FDWs-obtain-result-relation-informatio.patch Description: Binary data
Re: ModifyTable overheads in generic plans
On Sat, Aug 1, 2020 at 4:46 AM Robert Haas wrote: > On Fri, Jun 26, 2020 at 8:36 AM Amit Langote wrote: > > 0001 and 0002 are preparatory patches. > > I read through these patches a bit but it's really unclear what the > point of them is. I think they need better commit messages, or better > comments, or both. Thanks for taking a look. Sorry about the lack of good commentary, which I have tried to address in the attached updated version. I extracted one more part as preparatory from the earlier 0003 patch, so there are 4 patches now. Also as discussed with Daniel, I have changed the patches so that they can be applied on plain HEAD instead of having to first apply the patches at [1]. Without runtime pruning for UPDATE/DELETE proposed in [1], optimizing ResultRelInfo creation by itself does not improve the performance/scalability by that much, but the benefit of lazily creating ResultRelInfos seems clear so I think maybe it's okay to pursue this independently. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com [1] https://www.postgresql.org/message-id/CA%2BHiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ%40mail.gmail.com v3-0003-Revise-child-to-root-tuple-conversion-map-managem.patch Description: Binary data v3-0004-Initialize-result-relation-information-lazily.patch Description: Binary data v3-0001-Revise-how-FDWs-obtain-result-relation-informatio.patch Description: Binary data v3-0002-Don-t-make-root-ResultRelInfo-for-insert-queries.patch Description: Binary data
Re: ModifyTable overheads in generic plans
On Fri, Jun 26, 2020 at 8:36 AM Amit Langote wrote: > 0001 and 0002 are preparatory patches. I read through these patches a bit but it's really unclear what the point of them is. I think they need better commit messages, or better comments, or both. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: ModifyTable overheads in generic plans
> On 1 Jul 2020, at 15:38, Amit Langote wrote: > Another thing I could do is decouple the patches to discuss here from > the patches of the other thread, which should be possible and might be > good to avoid back and forth between the two threads. It sounds like it would make it easier for reviewers, so if it's possible with a reasonable effort it might be worth it. I've moved this entry to the next CF for now. cheers ./daniel
Re: ModifyTable overheads in generic plans
> On 1 Jul 2020, at 08:30, Amit Langote wrote: > > On Fri, Jun 26, 2020 at 9:36 PM Amit Langote wrote: >> I would like to discuss a refactoring patch that builds on top of the >> patches at [1] to address $subject. > > I forgot to update a place in postgres_fdw causing one of its tests to crash. > > Fixed in the attached updated version. The attached 0003 fails to apply to current HEAD, please submit another rebased version. Marking the entry as Waiting on Author in the meantime. cheers ./daniel
Re: ModifyTable overheads in generic plans
On Fri, Jun 26, 2020 at 9:36 PM Amit Langote wrote: > I would like to discuss a refactoring patch that builds on top of the > patches at [1] to address $subject. I forgot to update a place in postgres_fdw causing one of its tests to crash. Fixed in the attached updated version. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com v2-0003-Delay-initializing-UPDATE-DELETE-ResultRelInfos.patch Description: Binary data v2-0002-Do-not-set-rootResultRelIndex-unnecessarily.patch Description: Binary data v2-0001-Revise-how-some-FDW-executor-APIs-obtain-ResultRe.patch Description: Binary data
Re: ModifyTable overheads in generic plans
On Mon, Jun 29, 2020 at 10:39 AM David Rowley wrote: > > On Sat, 27 Jun 2020 at 00:36, Amit Langote wrote: > > 2. ExecCheckRTPerms(): checks permissions of *all* partitions before > > executing the plan tree, but maybe it's okay to check only the ones > > that will be accessed > > I don't think it needs to be quite as complex as that. > expand_single_inheritance_child will set the > RangeTblEntry.requiredPerms to 0, so we never need to check > permissions on a partition. The overhead of permission checking when > there are many partitions is just down to the fact that > ExecCheckRTPerms() loops over the entire rangetable and calls > ExecCheckRTEPerms for each one. ExecCheckRTEPerms() does have very > little work to do when requiredPerms is 0, but the loop itself and the > function call overhead show up when you remove the other bottlenecks. I had forgotten that we set requiredPerms to 0 for the inheritance child tables. > I have a patch somewhere that just had the planner add the RTindexes > with a non-zero requiredPerms and set that in the plan so that > ExecCheckRTPerms could just look at the ones that actually needed > something checked. There's a slight disadvantage there that for > queries to non-partitioned tables that we need to build a Bitmapset > that has all items from the rangetable. That's likely a small > overhead, but not free, so perhaps there is a better way. I can't think of anything for this that doesn't involve having one more list of RTEs or bitmapset of RT indexes in PlannedStmt. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
Re: ModifyTable overheads in generic plans
On Sat, 27 Jun 2020 at 00:36, Amit Langote wrote: > 2. ExecCheckRTPerms(): checks permissions of *all* partitions before > executing the plan tree, but maybe it's okay to check only the ones > that will be accessed I don't think it needs to be quite as complex as that. expand_single_inheritance_child will set the RangeTblEntry.requiredPerms to 0, so we never need to check permissions on a partition. The overhead of permission checking when there are many partitions is just down to the fact that ExecCheckRTPerms() loops over the entire rangetable and calls ExecCheckRTEPerms for each one. ExecCheckRTEPerms() does have very little work to do when requiredPerms is 0, but the loop itself and the function call overhead show up when you remove the other bottlenecks. I have a patch somewhere that just had the planner add the RTindexes with a non-zero requiredPerms and set that in the plan so that ExecCheckRTPerms could just look at the ones that actually needed something checked. There's a slight disadvantage there that for queries to non-partitioned tables that we need to build a Bitmapset that has all items from the rangetable. That's likely a small overhead, but not free, so perhaps there is a better way. David
Re: ModifyTable overheads in generic plans
On Fri, Jun 26, 2020 at 9:36 PM Amit Langote wrote: > I would like to discuss a refactoring patch that builds on top of the > patches at [1] to address $subject. I've added this to the next CF: https://commitfest.postgresql.org/28/2621/ -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
ModifyTable overheads in generic plans
Hi, I would like to discuss a refactoring patch that builds on top of the patches at [1] to address $subject. To get an idea for what eliminating these overheads looks like, take a look at the following benchmarking results. Note 1: I've forced the use of generic plan by setting plan_cache_mode to 'force_generic_plan' Note 2: The individual TPS figures as measured are quite noisy, though I just want to show the rough trend with increasing number of partitions. pgbench -i -s 10 --partitions={0, 10, 100, 1000} pgbench -T120 -f test.sql -M prepared test.sql: \set aid random(1, 100) update pgbench_accounts set abalance = abalance + 1 where aid = :aid; Without any of the patches: 0 tps = 13045.485121 (excluding connections establishing) 10 tps = 9358.157433 (excluding connections establishing) 100 tps = 1878.274500 (excluding connections establishing) 1000tps = 84.684695 (excluding connections establishing) The slowdown as the partition count increases can be explained by the fact that UPDATE and DELETE can't currently use runtime partition pruning. So, even if any given transaction is only updating a single tuple in a single partition, the plans for *all* partitions are being initialized and also the ResultRelInfos. That is, a lot of useless work being done in InitPlan() and ExecInitModifyTable(). With the patches at [1] (latest 0001+0002 posted there), whereby the generic plan for UPDATE can now perform runtime pruning, numbers can be seen to improve, slightly: 0 tps = 12743.487196 (excluding connections establishing) 10 tps = 12644.240748 (excluding connections establishing) 100 tps = 4158.123345 (excluding connections establishing) 1000tps = 391.248067 (excluding connections establishing) So even though runtime pruning enabled by those patches ensures that the useless plans are left untouched by the executor, the ResultRelInfos are still being made assuming *all* result relations will be processed. With the attached patches (0001+0002+0003) that I want to discuss here in this thread, numbers are further improved: 0 tps = 13419.283168 (excluding connections establishing) 10 tps = 12588.016095 (excluding connections establishing) 100 tps = 8560.824225 (excluding connections establishing) 1000tps = 1926.553901 (excluding connections establishing) 0001 and 0002 are preparatory patches. 0003 teaches nodeModifyTable.c to make the ResultRelInfo for a given result relation lazily, that is, when the plan producing tuples to be updated/deleted actually produces one that belongs to that relation. So, if a transaction only updates one tuple, then only one ResultRelInfo would be made. For larger partition counts, that saves significant amount of work. However, there's one new loop in ExecInitModifyTable() added by the patches at [1] that loops over all partitions, which I haven't been able to eliminate so far and I'm seeing it cause significant bottleneck at higher partition counts. The loop is meant to create a hash table that maps result relation OIDs to their offsets in the PlannedStmt.resultRelations list. We need this mapping, because the ResultRelInfos are accessed from the query-global array using that offset. One approach that was mentioned by David Rowley at [1] to not have do this mapping is to make the result relation's scan node's targetlist emit the relation's RT index or ordinal position to begin with, instead of the table OID, but I haven't figured out a way to do that. Having taken care of the ModifyTable overheads (except the one mentioned in the last paragraph), a few more bottlenecks are seen to pop up at higher partition counts. Basically, they result from doing some pre-execution actions on relations contained in the plan by traversing the flat range table in whole. 1. AcquireExecutorLocks(): locks *all* partitions before executing the plan tree but runtime pruning allows to skip scanning all but one 2. ExecCheckRTPerms(): checks permissions of *all* partitions before executing the plan tree, but maybe it's okay to check only the ones that will be accessed Problem 1 has been discussed before and David Rowley even developed a patch that was discussed at [2]. The approach taken in the patch was to delay locking of the partitions contained in a generic plan that are potentially runtime pruneable, although as also described in the linked thread, that approach has a race condition whereby a concurrent session may invalidate the generic plan by altering a partition in the window between when AcquireExecutorLocks() runs on the plan and the plan is executed. Another solution suggested to me by Robert Haas in an off-list discussion is to teach AcquireExecutorLocks() or the nearby code to perform EXTERN parameter based pruning before passing the plan tree to the executor and lock partitions that survive that pruning. It's perhaps doable if we refactor the ExecFindInitialMatchingSubPlans() to not require a full-blown