Hi,

(added Fujita-san)

I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos
to use for partition routing targets.  Specifically, the bug occurs when
UPDATE targets include a foreign partition that is locally modified (as
opposed to being modified directly on the remove server) and its
ResultRelInfo (called subplan result rel in the source code) is reused for
tuple routing:

-- setup
create extension postgres_fdw ;
create server loopback foreign data wrapper postgres_fdw;
create user mapping for current_user server loopback;
create table p (a int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2base (a int check (a = 2));
create foreign table p2 partition of p for values in (2) server loopback
options (table_name 'p2base');
insert into p values (1), (2);

-- see in the plan that foreign partition p2 is locally modified
explain verbose update p set a = 2 from (values (1), (2)) s(x) where a =
s.x returning *;
                                   QUERY PLAN

─────────────────────────────────────────────────────────────────────────────────
 Update on public.p  (cost=0.05..236.97 rows=50 width=42)
   Output: p1.a, "*VALUES*".column1
   Update on public.p1
   Foreign Update on public.p2
     Remote SQL: UPDATE public.p2base SET a = $2 WHERE ctid = $1 RETURNING a
   ->  Hash Join  (cost=0.05..45.37 rows=26 width=42)
         Output: 2, p1.ctid, "*VALUES*".*, "*VALUES*".column1
         Hash Cond: (p1.a = "*VALUES*".column1)
         ->  Seq Scan on public.p1  (cost=0.00..35.50 rows=2550 width=10)
               Output: p1.ctid, p1.a
         ->  Hash  (cost=0.03..0.03 rows=2 width=32)
               Output: "*VALUES*".*, "*VALUES*".column1
               ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2
width=32)
                     Output: "*VALUES*".*, "*VALUES*".column1
   ->  Hash Join  (cost=100.05..191.59 rows=24 width=42)
         Output: 2, p2.ctid, "*VALUES*".*, "*VALUES*".column1
         Hash Cond: (p2.a = "*VALUES*".column1)
         ->  Foreign Scan on public.p2  (cost=100.00..182.27 rows=2409
width=10)
               Output: p2.ctid, p2.a
               Remote SQL: SELECT a, ctid FROM public.p2base FOR UPDATE
         ->  Hash  (cost=0.03..0.03 rows=2 width=32)
               Output: "*VALUES*".*, "*VALUES*".column1
               ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2
width=32)
                     Output: "*VALUES*".*, "*VALUES*".column1


-- as opposed to directly on remote side (because there's no local join)
explain verbose update p set a = 2 returning *;
                                 QUERY PLAN
─────────────────────────────────────────────────────────────────────────────
 Update on public.p  (cost=0.00..227.40 rows=5280 width=10)
   Output: p1.a
   Update on public.p1
   Foreign Update on public.p2
   ->  Seq Scan on public.p1  (cost=0.00..35.50 rows=2550 width=10)
         Output: 2, p1.ctid
   ->  Foreign Update on public.p2  (cost=100.00..191.90 rows=2730 width=10)
         Remote SQL: UPDATE public.p2base SET a = 2 RETURNING a
(8 rows)

Running the first update query crashes:

update p set a = 2 from (values (1), (2)) s(x) where a = s.x returning
tableoid::regclass, *;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

The problem seems to occur because ExecSetupPartitionTupleRouting thinks
it can reuse p2's ResultRelInfo for tuple routing.  In this case, it can't
be used, because its ri_FdwState contains information that will be needed
for postgresExecForeignUpdate to work, but it's still used today.  Because
it's assigned to be used for tuple routing, its ri_FdwState will be
overwritten by postgresBeginForeignInsert that's invoked by the tuple
routing code using the aforementioned ResultRelInfo.  Crash occurs when
postgresExecForeignUpdate () can't find the information it's expecting in
the ri_FdwState.

The solution is to teach ExecSetupPartitionTupleRouting to avoid using a
subplan result rel if its ri_FdwState is already set.  I was wondering if
it should also check ri_usesFdwDirectModify is true, but in that case,
ri_FdwState is unused, so it's perhaps safe for tuple routing code to
scribble on it.

I have attached 2 patches, one for PG 11 where the problem first appeared
and another for HEAD.  The patch for PG 11 is significantly bigger due to
having to handle the complexities of mapping of subplan result rel indexes
to leaf partition indexes.  Most of that complexity is gone in HEAD due to
3f2393edefa5, so the patch for HEAD is much simpler.  I've also added a
test in postgres_fdw.sql to exercise this test case.

Thanks,
Amit
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index cdd788f825..b108aad897 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -7730,6 +7730,45 @@ update utrtest set a = 1 where a = 1 or a = 2 returning 
*;
  1 | qux triggered !
 (1 row)
 
+-- Check case where the foreign partition is a subplan target rel and
+-- foreign parttion is locally modified (target table being joined
+-- locally prevents a direct/remote modification plan).
+explain (verbose, costs off)
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+                                  QUERY PLAN                                  
+------------------------------------------------------------------------------
+ Update on public.utrtest
+   Output: remp.a, remp.b, "*VALUES*".column1
+   Foreign Update on public.remp
+     Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
+   Update on public.locp
+   ->  Hash Join
+         Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (remp.a = "*VALUES*".column1)
+         ->  Foreign Scan on public.remp
+               Output: remp.b, remp.ctid, remp.a
+               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+   ->  Hash Join
+         Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (locp.a = "*VALUES*".column1)
+         ->  Seq Scan on public.locp
+               Output: locp.b, locp.ctid, locp.a
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+(24 rows)
+
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+ a |        b        | x 
+---+-----------------+---
+ 1 | qux triggered ! | 1
+(1 row)
+
 delete from utrtest;
 insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition isn't a subplan target rel
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql 
b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 813286bba5..a27783cede 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -1932,6 +1932,13 @@ update utrtest set a = 1 where a = 1 or a = 2 returning 
*;
 -- The new values are concatenated with ' triggered !'
 update utrtest set a = 1 where a = 1 or a = 2 returning *;
 
+-- Check case where the foreign partition is a subplan target rel and
+-- foreign parttion is locally modified (target table being joined
+-- locally prevents a direct/remote modification plan).
+explain (verbose, costs off)
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+
 delete from utrtest;
 insert into utrtest values (2, 'qux');
 
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 011e3cff1a..090f86c340 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -136,6 +136,11 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, 
Relation rel)
        {
                update_rri = mtstate->resultRelInfo;
                num_update_rri = list_length(node->plans);
+               /*
+                * Each of these will be set to either the leaf partition index
+                * to which a given subplan resultrel is mapped to or -1 if it
+                * cannot be used for tuple routing.
+                */
                proute->subplan_partition_offsets =
                        palloc(num_update_rri * sizeof(int));
                proute->num_subplan_partition_offsets = num_update_rri;
@@ -174,18 +179,29 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, 
Relation rel)
                if (update_rri_index < num_update_rri &&
                        
RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid)
                {
-                       leaf_part_rri = &update_rri[update_rri_index];
-
                        /*
-                        * This is required in order to convert the partition's 
tuple to
-                        * be compatible with the root partitioned table's tuple
-                        * descriptor.  When generating the per-subplan result 
rels, this
-                        * was not set.
+                        * Skip if an FDW is already using it.  We must use a 
different
+                        * one for tuple routing, because the latter will need 
its own
+                        * FdwState.
                         */
-                       leaf_part_rri->ri_PartitionRoot = rel;
+                       if (update_rri[update_rri_index].ri_FdwState == NULL)
+                       {
+                               leaf_part_rri = &update_rri[update_rri_index];
 
-                       /* Remember the subplan offset for this ResultRelInfo */
-                       proute->subplan_partition_offsets[update_rri_index] = i;
+                               /*
+                                * This is required in order to convert the 
partition's tuple
+                                * to be compatible with the root partitioned 
table's tuple
+                                * descriptor.  When generating the per-subplan 
result rels,
+                                * this was not set.
+                                */
+                               leaf_part_rri->ri_PartitionRoot = rel;
+
+                               /* Remember the subplan offset for this 
ResultRelInfo */
+                               
proute->subplan_partition_offsets[update_rri_index] = i;
+                       }
+                       else
+                               /* Mark this subplan's resultrel as unused. */
+                               
proute->subplan_partition_offsets[update_rri_index] = -1;
 
                        update_rri_index++;
                }
@@ -195,8 +211,9 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, 
Relation rel)
        }
 
        /*
-        * For UPDATE, we should have found all the per-subplan resultrels in 
the
-        * leaf partitions.  (If this is an INSERT, both values will be zero.)
+        * For UPDATE, we should have gone through all the per-subplan 
resultrels
+        * while matching against leaf partitions.  (If this is an INSERT, both
+        * values will be zero.)
         */
        Assert(update_rri_index == num_update_rri);
 
@@ -892,21 +909,26 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
                                                                                
                                   resultRelInfo);
 
                /*
-                * If this result rel is one of the UPDATE subplan result rels, 
let
-                * ExecEndPlan() close it. For INSERT or COPY,
-                * proute->subplan_partition_offsets will always be NULL. Note 
that
-                * the subplan_partition_offsets array and the partitions array 
have
-                * the partitions in the same order. So, while we iterate over
-                * partitions array, we also iterate over the
-                * subplan_partition_offsets array in order to figure out which 
of the
-                * result rels are present in the UPDATE subplans.
+                * If this partition routing result rel is one of the UPDATE 
subplan
+                * result rels, don't close it here, ExecEndPlan() will close 
it.
+                * For INSERT or COPY, proute->subplan_partition_offsets will 
always
+                * be NULL, because there are no subplan result rels in that 
case.
+                * Note that since partitions appear in the same order in both 
the
+                * subplan_partition_offsets array and the the partitions 
array, it's
+                * okay to match them like this.
                 */
                if (proute->subplan_partition_offsets &&
-                       subplan_index < proute->num_subplan_partition_offsets &&
-                       proute->subplan_partition_offsets[subplan_index] == i)
+                       subplan_index < proute->num_subplan_partition_offsets)
                {
-                       subplan_index++;
-                       continue;
+                       /* skip subplan result rels that were not reused. */
+                       while (proute->subplan_partition_offsets[subplan_index] 
< 0)
+                               subplan_index++;
+
+                       if (proute->subplan_partition_offsets[subplan_index] == 
i)
+                       {
+                               subplan_index++;
+                               continue;
+                       }
                }
 
                ExecCloseIndices(resultRelInfo);
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 42108bd3d4..6471d69063 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -7828,6 +7828,45 @@ update utrtest set a = 1 where a = 1 or a = 2 returning 
*;
  1 | qux triggered !
 (1 row)
 
+-- Check case where the foreign partition is a subplan target rel and
+-- foreign parttion is locally modified (target table being joined
+-- locally prevents a direct/remote modification plan).
+explain (verbose, costs off)
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+                                  QUERY PLAN                                  
+------------------------------------------------------------------------------
+ Update on public.utrtest
+   Output: remp.a, remp.b, "*VALUES*".column1
+   Foreign Update on public.remp
+     Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
+   Update on public.locp
+   ->  Hash Join
+         Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (remp.a = "*VALUES*".column1)
+         ->  Foreign Scan on public.remp
+               Output: remp.b, remp.ctid, remp.a
+               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+   ->  Hash Join
+         Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (locp.a = "*VALUES*".column1)
+         ->  Seq Scan on public.locp
+               Output: locp.b, locp.ctid, locp.a
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+(24 rows)
+
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+ a |        b        | x 
+---+-----------------+---
+ 1 | qux triggered ! | 1
+(1 row)
+
 delete from utrtest;
 insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition isn't a subplan target rel
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql 
b/contrib/postgres_fdw/sql/postgres_fdw.sql
index eb9d1ad59d..a18b0cc097 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -1971,6 +1971,13 @@ update utrtest set a = 1 where a = 1 or a = 2 returning 
*;
 -- The new values are concatenated with ' triggered !'
 update utrtest set a = 1 where a = 1 or a = 2 returning *;
 
+-- Check case where the foreign partition is a subplan target rel and
+-- foreign parttion is locally modified (target table being joined
+-- locally prevents a direct/remote modification plan).
+explain (verbose, costs off)
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+
 delete from utrtest;
 insert into utrtest values (2, 'qux');
 
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index e41801662b..967a8f78b6 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -466,6 +466,14 @@ ExecHashSubPlanResultRelsByOid(ModifyTableState *mtstate,
                Oid                     partoid = 
RelationGetRelid(rri->ri_RelationDesc);
                SubplanResultRelHashElem   *elem;
 
+               /*
+                * Skip if an FDW is already using it.  We must use a different
+                * one for tuple routing, because the latter will need its own
+                * FdwState.
+                */
+               if (rri->ri_FdwState)
+                       continue;
+
                elem = (SubplanResultRelHashElem *)
                        hash_search(htab, &partoid, HASH_ENTER, &found);
                Assert(!found);

Reply via email to