Re: Segfault on logical replication to partitioned table with foreign children

2022-10-31 Thread ilya . v . gladyshev


On Sun, 2022-10-30 at 09:39 -0400, Tom Lane wrote:
> Dilip Kumar  writes:
> > Yes, this looks like a bug and your fix seems correct to me.  It
> > would
> > be nice to add a test case for this scenario.
> 
> A test case doesn't seem that exciting to me.  If we were trying to
> make it actually work, then yeah, but throwing an error isn't that
> useful to test.  The code will be exercised by replication to a
> regular partitioned table (I assume we do have tests for that).
> 
> What I'm wondering about is whether we can refactor this code
> to avoid so many usually-useless catalog lookups.  Pulling the
> namespace name, in particular, is expensive and we generally
> are not going to need the result.  In the child-rel case it'd
> be much better to pass the opened relation and let the error-check
> subroutine work from that.  Maybe we should just do it like that
> at the start of the recursion, too?  Or pass the relid and let
> the subroutine look up the names only in the error case.

Sure, I think passing in the opened relation is a good idea.

> A completely different line of thought is that this doesn't seem
> like a terribly bulletproof fix, since children could get added to
> a partitioned table after we look.  Maybe it'd be better to check
> the relkind at the last moment before we do something that depends
> on a child table being a relation.

These checks are run both on subscription DDL commands, which is good
to get some early feedback, and inside logical_rel_open(), right before
something useful is about to get done to the relation, so we should be
good here. I think some tests would actually be nice to verify this,
but I don't really have a strong opinion about it.

I'll refactor the patch and post a bit later.






Segfault on logical replication to partitioned table with foreign children

2022-10-28 Thread ilya . v . gladyshev
Hi,

Right now it is possible to add a partitioned table with foreign tables
as its children as a target of a subscription. It can lead to an assert
(or a segfault, if compiled without asserts) on a logical replication
worker when the worker attempts to insert the data received via
replication into the foreign table. Reproduce with caution, the worker
is going to crash and restart indefinitely. The setup:

Publisher on 5432 port:

CREATE TABLE parent (id int, num int);
CREATE PUBLICATION parent_pub FOR TABLE parent;

Subscriber on 5433 port:

CREATE EXTENSION postgres_fdw;
CREATE SERVER loopback foreign data wrapper postgres_fdw options (host
'127.0.0.1', port '5433', dbname 'postgres');
CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
CREATE TABLE parent (id int, num int) partition by range (id);
CREATE FOREIGN TABLE p1 PARTITION OF parent DEFAULT SERVER loopback;
CREATE TABLE p1_loc(id int, num int);
CREATE SUBSCRIPTION parent_sub CONNECTION 'host=127.0.0.1 port=5432
dbname=postgres' PUBLICATION parent_pub;

Then run an insert on the publisher: INSERT INTO parent VALUES (1, 1);

This will cause a segfault or raise an assert, because inserting into
foreign tables via logical replication is not possible. The solution I
propose is to add recursive checks of relkind for children of a target,
if the target is a partitioned table. I have attached a patch for this
and managed to reproduce this on REL_14_STABLE as well, not sure if a
patch for that version is also needed.

Kind Regards,
Ilya Gladyshev

From 96f1fce8fb50f527b958de13a60f7324dd1ef052 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev 
Date: Fri, 28 Oct 2022 23:25:20 +0400
Subject: [PATCH] check relkind of subscription target recursively

---
 src/backend/commands/subscriptioncmds.c| 13 ---
 src/backend/executor/execReplication.c | 45 ++
 src/backend/replication/logical/relation.c |  7 ++--
 src/include/executor/executor.h|  4 +-
 4 files changed, 51 insertions(+), 18 deletions(-)

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index f0cec2ad5e..e2dd6425e4 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -703,9 +703,10 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 
 relid = RangeVarGetRelid(rv, AccessShareLock, false);
 
-/* Check for supported relkind. */
-CheckSubscriptionRelkind(get_rel_relkind(relid),
-		 rv->schemaname, rv->relname);
+/* Check for supported relkind recursively. */
+CheckSubscriptionRelation(relid,
+		  get_rel_relkind(relid),
+		  rv->schemaname, rv->relname);
 
 AddSubscriptionRelState(subid, relid, table_state,
 		InvalidXLogRecPtr);
@@ -864,9 +865,9 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data,
 
 			relid = RangeVarGetRelid(rv, AccessShareLock, false);
 
-			/* Check for supported relkind. */
-			CheckSubscriptionRelkind(get_rel_relkind(relid),
-	 rv->schemaname, rv->relname);
+			/* Check for supported relkind recursively. */
+			CheckSubscriptionRelation(relid, get_rel_relkind(relid),
+	  rv->schemaname, rv->relname);
 
 			pubrel_local_oids[off++] = relid;
 
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 6014f2e248..98bc4a6618 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -16,9 +16,11 @@
 
 #include "access/genam.h"
 #include "access/relscan.h"
+#include "access/table.h"
 #include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
+#include "catalog/pg_inherits.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
@@ -645,13 +647,7 @@ CheckCmdReplicaIdentity(Relation rel, CmdType cmd)
  errhint("To enable deleting from the table, set REPLICA IDENTITY using ALTER TABLE.")));
 }
 
-
-/*
- * Check if we support writing into specific relkind.
- *
- * The nspname and relname are only needed for error reporting.
- */
-void
+static void
 CheckSubscriptionRelkind(char relkind, const char *nspname,
 		 const char *relname)
 {
@@ -662,3 +658,38 @@ CheckSubscriptionRelkind(char relkind, const char *nspname,
 		nspname, relname),
  errdetail_relkind_not_supported(relkind)));
 }
+
+/*
+ * Recursively check if we support writing into specific relkind.
+ *
+ * The nspname and relname are only needed for error reporting.
+ */
+void
+CheckSubscriptionRelation(Oid relid, char relkind, const char *nspname,
+		  const char *relname)
+{
+	CheckSubscriptionRelkind(relkind, nspname, relname);
+
+	if (relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		List *inheritors;
+		ListCell *lc;
+
+		inheritors = find_all_inheritors(relid,
+		 AccessShareLock,
+		 NULL);
+		foreach (lc, inheritors)
+		{
+			Oid child_oid = lfirst_oid(lc);
+			Relation child_rel = RelationIdGetRelation(child_