Hi, (added Fujita-san)
I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos to use for partition routing targets. Specifically, the bug occurs when UPDATE targets include a foreign partition that is locally modified (as opposed to being modified directly on the remove server) and its ResultRelInfo (called subplan result rel in the source code) is reused for tuple routing: -- setup create extension postgres_fdw ; create server loopback foreign data wrapper postgres_fdw; create user mapping for current_user server loopback; create table p (a int) partition by list (a); create table p1 partition of p for values in (1); create table p2base (a int check (a = 2)); create foreign table p2 partition of p for values in (2) server loopback options (table_name 'p2base'); insert into p values (1), (2); -- see in the plan that foreign partition p2 is locally modified explain verbose update p set a = 2 from (values (1), (2)) s(x) where a = s.x returning *; QUERY PLAN ───────────────────────────────────────────────────────────────────────────────── Update on public.p (cost=0.05..236.97 rows=50 width=42) Output: p1.a, "*VALUES*".column1 Update on public.p1 Foreign Update on public.p2 Remote SQL: UPDATE public.p2base SET a = $2 WHERE ctid = $1 RETURNING a -> Hash Join (cost=0.05..45.37 rows=26 width=42) Output: 2, p1.ctid, "*VALUES*".*, "*VALUES*".column1 Hash Cond: (p1.a = "*VALUES*".column1) -> Seq Scan on public.p1 (cost=0.00..35.50 rows=2550 width=10) Output: p1.ctid, p1.a -> Hash (cost=0.03..0.03 rows=2 width=32) Output: "*VALUES*".*, "*VALUES*".column1 -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=32) Output: "*VALUES*".*, "*VALUES*".column1 -> Hash Join (cost=100.05..191.59 rows=24 width=42) Output: 2, p2.ctid, "*VALUES*".*, "*VALUES*".column1 Hash Cond: (p2.a = "*VALUES*".column1) -> Foreign Scan on public.p2 (cost=100.00..182.27 rows=2409 width=10) Output: p2.ctid, p2.a Remote SQL: SELECT a, ctid FROM public.p2base FOR UPDATE -> Hash (cost=0.03..0.03 rows=2 width=32) Output: "*VALUES*".*, "*VALUES*".column1 -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=32) Output: "*VALUES*".*, "*VALUES*".column1 -- as opposed to directly on remote side (because there's no local join) explain verbose update p set a = 2 returning *; QUERY PLAN ───────────────────────────────────────────────────────────────────────────── Update on public.p (cost=0.00..227.40 rows=5280 width=10) Output: p1.a Update on public.p1 Foreign Update on public.p2 -> Seq Scan on public.p1 (cost=0.00..35.50 rows=2550 width=10) Output: 2, p1.ctid -> Foreign Update on public.p2 (cost=100.00..191.90 rows=2730 width=10) Remote SQL: UPDATE public.p2base SET a = 2 RETURNING a (8 rows) Running the first update query crashes: update p set a = 2 from (values (1), (2)) s(x) where a = s.x returning tableoid::regclass, *; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. The problem seems to occur because ExecSetupPartitionTupleRouting thinks it can reuse p2's ResultRelInfo for tuple routing. In this case, it can't be used, because its ri_FdwState contains information that will be needed for postgresExecForeignUpdate to work, but it's still used today. Because it's assigned to be used for tuple routing, its ri_FdwState will be overwritten by postgresBeginForeignInsert that's invoked by the tuple routing code using the aforementioned ResultRelInfo. Crash occurs when postgresExecForeignUpdate () can't find the information it's expecting in the ri_FdwState. The solution is to teach ExecSetupPartitionTupleRouting to avoid using a subplan result rel if its ri_FdwState is already set. I was wondering if it should also check ri_usesFdwDirectModify is true, but in that case, ri_FdwState is unused, so it's perhaps safe for tuple routing code to scribble on it. I have attached 2 patches, one for PG 11 where the problem first appeared and another for HEAD. The patch for PG 11 is significantly bigger due to having to handle the complexities of mapping of subplan result rel indexes to leaf partition indexes. Most of that complexity is gone in HEAD due to 3f2393edefa5, so the patch for HEAD is much simpler. I've also added a test in postgres_fdw.sql to exercise this test case. Thanks, Amit
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index cdd788f825..b108aad897 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -7730,6 +7730,45 @@ update utrtest set a = 1 where a = 1 or a = 2 returning *; 1 | qux triggered ! (1 row) +-- Check case where the foreign partition is a subplan target rel and +-- foreign parttion is locally modified (target table being joined +-- locally prevents a direct/remote modification plan). +explain (verbose, costs off) +update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *; + QUERY PLAN +------------------------------------------------------------------------------ + Update on public.utrtest + Output: remp.a, remp.b, "*VALUES*".column1 + Foreign Update on public.remp + Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b + Update on public.locp + -> Hash Join + Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1 + Hash Cond: (remp.a = "*VALUES*".column1) + -> Foreign Scan on public.remp + Output: remp.b, remp.ctid, remp.a + Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE + -> Hash + Output: "*VALUES*".*, "*VALUES*".column1 + -> Values Scan on "*VALUES*" + Output: "*VALUES*".*, "*VALUES*".column1 + -> Hash Join + Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1 + Hash Cond: (locp.a = "*VALUES*".column1) + -> Seq Scan on public.locp + Output: locp.b, locp.ctid, locp.a + -> Hash + Output: "*VALUES*".*, "*VALUES*".column1 + -> Values Scan on "*VALUES*" + Output: "*VALUES*".*, "*VALUES*".column1 +(24 rows) + +update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *; + a | b | x +---+-----------------+--- + 1 | qux triggered ! | 1 +(1 row) + delete from utrtest; insert into utrtest values (2, 'qux'); -- Check case where the foreign partition isn't a subplan target rel diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index 813286bba5..a27783cede 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -1932,6 +1932,13 @@ update utrtest set a = 1 where a = 1 or a = 2 returning *; -- The new values are concatenated with ' triggered !' update utrtest set a = 1 where a = 1 or a = 2 returning *; +-- Check case where the foreign partition is a subplan target rel and +-- foreign parttion is locally modified (target table being joined +-- locally prevents a direct/remote modification plan). +explain (verbose, costs off) +update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *; +update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *; + delete from utrtest; insert into utrtest values (2, 'qux'); diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c index 011e3cff1a..090f86c340 100644 --- a/src/backend/executor/execPartition.c +++ b/src/backend/executor/execPartition.c @@ -136,6 +136,11 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel) { update_rri = mtstate->resultRelInfo; num_update_rri = list_length(node->plans); + /* + * Each of these will be set to either the leaf partition index + * to which a given subplan resultrel is mapped to or -1 if it + * cannot be used for tuple routing. + */ proute->subplan_partition_offsets = palloc(num_update_rri * sizeof(int)); proute->num_subplan_partition_offsets = num_update_rri; @@ -174,18 +179,29 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel) if (update_rri_index < num_update_rri && RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid) { - leaf_part_rri = &update_rri[update_rri_index]; - /* - * This is required in order to convert the partition's tuple to - * be compatible with the root partitioned table's tuple - * descriptor. When generating the per-subplan result rels, this - * was not set. + * Skip if an FDW is already using it. We must use a different + * one for tuple routing, because the latter will need its own + * FdwState. */ - leaf_part_rri->ri_PartitionRoot = rel; + if (update_rri[update_rri_index].ri_FdwState == NULL) + { + leaf_part_rri = &update_rri[update_rri_index]; - /* Remember the subplan offset for this ResultRelInfo */ - proute->subplan_partition_offsets[update_rri_index] = i; + /* + * This is required in order to convert the partition's tuple + * to be compatible with the root partitioned table's tuple + * descriptor. When generating the per-subplan result rels, + * this was not set. + */ + leaf_part_rri->ri_PartitionRoot = rel; + + /* Remember the subplan offset for this ResultRelInfo */ + proute->subplan_partition_offsets[update_rri_index] = i; + } + else + /* Mark this subplan's resultrel as unused. */ + proute->subplan_partition_offsets[update_rri_index] = -1; update_rri_index++; } @@ -195,8 +211,9 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel) } /* - * For UPDATE, we should have found all the per-subplan resultrels in the - * leaf partitions. (If this is an INSERT, both values will be zero.) + * For UPDATE, we should have gone through all the per-subplan resultrels + * while matching against leaf partitions. (If this is an INSERT, both + * values will be zero.) */ Assert(update_rri_index == num_update_rri); @@ -892,21 +909,26 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate, resultRelInfo); /* - * If this result rel is one of the UPDATE subplan result rels, let - * ExecEndPlan() close it. For INSERT or COPY, - * proute->subplan_partition_offsets will always be NULL. Note that - * the subplan_partition_offsets array and the partitions array have - * the partitions in the same order. So, while we iterate over - * partitions array, we also iterate over the - * subplan_partition_offsets array in order to figure out which of the - * result rels are present in the UPDATE subplans. + * If this partition routing result rel is one of the UPDATE subplan + * result rels, don't close it here, ExecEndPlan() will close it. + * For INSERT or COPY, proute->subplan_partition_offsets will always + * be NULL, because there are no subplan result rels in that case. + * Note that since partitions appear in the same order in both the + * subplan_partition_offsets array and the the partitions array, it's + * okay to match them like this. */ if (proute->subplan_partition_offsets && - subplan_index < proute->num_subplan_partition_offsets && - proute->subplan_partition_offsets[subplan_index] == i) + subplan_index < proute->num_subplan_partition_offsets) { - subplan_index++; - continue; + /* skip subplan result rels that were not reused. */ + while (proute->subplan_partition_offsets[subplan_index] < 0) + subplan_index++; + + if (proute->subplan_partition_offsets[subplan_index] == i) + { + subplan_index++; + continue; + } } ExecCloseIndices(resultRelInfo);
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 42108bd3d4..6471d69063 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -7828,6 +7828,45 @@ update utrtest set a = 1 where a = 1 or a = 2 returning *; 1 | qux triggered ! (1 row) +-- Check case where the foreign partition is a subplan target rel and +-- foreign parttion is locally modified (target table being joined +-- locally prevents a direct/remote modification plan). +explain (verbose, costs off) +update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *; + QUERY PLAN +------------------------------------------------------------------------------ + Update on public.utrtest + Output: remp.a, remp.b, "*VALUES*".column1 + Foreign Update on public.remp + Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b + Update on public.locp + -> Hash Join + Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1 + Hash Cond: (remp.a = "*VALUES*".column1) + -> Foreign Scan on public.remp + Output: remp.b, remp.ctid, remp.a + Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE + -> Hash + Output: "*VALUES*".*, "*VALUES*".column1 + -> Values Scan on "*VALUES*" + Output: "*VALUES*".*, "*VALUES*".column1 + -> Hash Join + Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1 + Hash Cond: (locp.a = "*VALUES*".column1) + -> Seq Scan on public.locp + Output: locp.b, locp.ctid, locp.a + -> Hash + Output: "*VALUES*".*, "*VALUES*".column1 + -> Values Scan on "*VALUES*" + Output: "*VALUES*".*, "*VALUES*".column1 +(24 rows) + +update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *; + a | b | x +---+-----------------+--- + 1 | qux triggered ! | 1 +(1 row) + delete from utrtest; insert into utrtest values (2, 'qux'); -- Check case where the foreign partition isn't a subplan target rel diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index eb9d1ad59d..a18b0cc097 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -1971,6 +1971,13 @@ update utrtest set a = 1 where a = 1 or a = 2 returning *; -- The new values are concatenated with ' triggered !' update utrtest set a = 1 where a = 1 or a = 2 returning *; +-- Check case where the foreign partition is a subplan target rel and +-- foreign parttion is locally modified (target table being joined +-- locally prevents a direct/remote modification plan). +explain (verbose, costs off) +update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *; +update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *; + delete from utrtest; insert into utrtest values (2, 'qux'); diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c index e41801662b..967a8f78b6 100644 --- a/src/backend/executor/execPartition.c +++ b/src/backend/executor/execPartition.c @@ -466,6 +466,14 @@ ExecHashSubPlanResultRelsByOid(ModifyTableState *mtstate, Oid partoid = RelationGetRelid(rri->ri_RelationDesc); SubplanResultRelHashElem *elem; + /* + * Skip if an FDW is already using it. We must use a different + * one for tuple routing, because the latter will need its own + * FdwState. + */ + if (rri->ri_FdwState) + continue; + elem = (SubplanResultRelHashElem *) hash_search(htab, &partoid, HASH_ENTER, &found); Assert(!found);