Hi, On Fri, Jul 03, 2026 at 03:45:34PM +0530, Amit Kapila wrote: > On Fri, Jul 3, 2026 at 1:38 PM Zhijie Hou (Fujitsu) > <[email protected]> wrote: > > > > On Friday, July 3, 2026 1:53 PM Bertrand Drouvot > > <[email protected]> wrote: > > > > > > > but given the patch's simplicity, I recommend backpatching. > > > > > > That's right but that would only improve error messages. That said, > > > looking > > > closer, they are elog() ones, so "not expected" to occur so yeah backpatch > > > does make sense. > > > > +1 for backpatching, even if it's rare, the "ERROR: tuple concurrently > > updated" > > message seems confusing to me. > > > > I also think backpatching makes sense. BTW, I have a comment:
Thanks for looking at it! > + heap_freetuple(tup); > + tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, ObjectIdGetDatum(MyDatabaseId), > + CStringGetDatum(stmt->subname)); > > heap_freetuple() could be done before acquiring the lock, is there a > reason to keep it after lock? No particular reason, could be done before. Done in 0001 attached. > > > > > > > That said, what about also fixing DropSubscription() like in the 0002 > > > attached? > > > (that would also produce those elog() messages in case of concurrent DROP > > > or > > > ALTER). > > > > For the patch, I'm not sure if we must repeat the checks twice. Could we > > simply move the original checks to after we take the lock? At least, the > > GetSubscription() call and the password check can be moved there and old > > codes > > can be deleted. > > > > Isn't the same true for the AlterSubscription() case as well? I think there is no need to lock if we are later going to disallow changing the subscription data due to the password_required/superuser check. That said moving it as suggested by Hou-san, does simplify the code and the lock is not held for long, so done that way in 0001. > Also, I > noticed that AlterPublication() does the same trick but it uses > PUBLICATIONOID cacheid, so shouldn't we use SUBSCRIPTIONOID cacheid > here as well? I think this is to prevent the case where the same name > pub/sub is recreated after lock. Oh right and I did it that way in 0001 and 0002. But while doing this and looking closely, I'm not sure AlterPublication() does it right. Indeed, in theory, the OID could have been re-used too (between the time we did the name resolution and the time we lock the publication). I think what is needed is something similar to RangeVarGetRelidExtended(), means do the name resolution, acl check (ownership) and lock acquisition, all in unison. That's what 0003 is trying to achieve for the subscription and 0004 for the publication. What do you think? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
>From 37c9434730537f124223aed18e874103ad37969e Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot <[email protected]> Date: Fri, 3 Jul 2026 12:28:42 +0000 Subject: [PATCH v3 1/4] Re-read subscription state after lock in AlterSubscription AlterSubscription() reads the subscription's catalog state via GetSubscription() before acquiring AccessExclusiveLock on the subscription object. A concurrent session that commits a DROP or ALTER between the read and the lock acquisition leaves the other session acting with stale information once it unblocks. Fix by moving the GetSubscription() call, the password_required privilege check, and the local variable assignments to after LockSharedObject(), with a re-read of the subscription tuple to ensure we operate on current catalog state. Remark: The ownership check is intentionally not re-done after the lock because AlterSubscriptionOwner() does not take AccessExclusiveLock on the subscription object: it only takes RowExclusiveLock on the pg_subscription catalog table. This means ownership can change regardless of our lock, making a re-check after lock acquisition pointless. The existing "tuple concurrently updated" error from CatalogTupleUpdate() already provides a protection if ownership changes concurrently. Author: Bertrand Drouvot <[email protected]> Reviewed-by: Dilip Kumar <[email protected]> Reviewed-by: Hayato Kuroda (Fujitsu) <[email protected]> Reviewed-by: Zhijie Hou <[email protected]> Reviewed-by: Amit Kapila <[email protected]> Discussion: https://postgr.es/m/akZUpiDa1UfmzYxL%40bdtpg --- src/backend/commands/subscriptioncmds.c | 30 +++++++++++++++++++------ 1 file changed, 23 insertions(+), 7 deletions(-) 100.0% src/backend/commands/ diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 4292e7fb8f4..517d46f47f9 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -1686,6 +1686,25 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, orig_conninfo_needed = false; } + heap_freetuple(tup); + + /* Lock the subscription so nobody else can do anything with it. */ + LockSharedObject(SubscriptionRelationId, subid, 0, AccessExclusiveLock); + + /* + * Re-read the subscription tuple after acquiring the lock. A concurrent + * DROP or ALTER may have committed before we acquired the lock. + */ + tup = SearchSysCacheCopy1(SUBSCRIPTIONOID, ObjectIdGetDatum(subid)); + + if (!HeapTupleIsValid(tup)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("subscription \"%s\" does not exist", + stmt->subname))); + + form = (Form_pg_subscription) GETSTRUCT(tup); + /* * Skip ACL checks on the subscription's foreign server, if any. If * changing the server (or replacing it with a raw connection), then the @@ -1695,11 +1714,6 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, */ sub = GetSubscription(subid, false, orig_conninfo_needed, false); - retain_dead_tuples = sub->retaindeadtuples; - origin = sub->origin; - max_retention = sub->maxretention; - retention_active = sub->retentionactive; - /* * Don't allow non-superuser modification of a subscription with * password_required=false. @@ -1710,8 +1724,10 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, errmsg("password_required=false is superuser-only"), errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser."))); - /* Lock the subscription so nobody else can do anything with it. */ - LockSharedObject(SubscriptionRelationId, subid, 0, AccessExclusiveLock); + retain_dead_tuples = sub->retaindeadtuples; + origin = sub->origin; + max_retention = sub->maxretention; + retention_active = sub->retentionactive; /* Form a new tuple. */ memset(values, 0, sizeof(values)); -- 2.34.1
>From 58b9cfaa5f806d527ad36904f8b3b67c05b70478 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot <[email protected]> Date: Fri, 3 Jul 2026 11:54:29 +0000 Subject: [PATCH v3 2/4] Re-read subscription state after lock in DropSubscription Similarly to what has been done for AlterSubscription() in XXXX, re-read the subscription tuple after LockSharedObject() in DropSubscription(). A concurrent DROP or ALTER may have committed while we were waiting for the lock. Without a re-read, DropSubscription would deal with invalid data, which currently produces a confusing "tuple concurrently updated" elog() from CatalogTupleDelete(). Author: Bertrand Drouvot <[email protected]> Reviewed-by: Zhijie Hou <[email protected]> Reviewed-by: Amit Kapila <[email protected]> Discussion: https://postgr.es/m/akZUpiDa1UfmzYxL%40bdtpg --- src/backend/commands/subscriptioncmds.c | 36 ++++++++++++++++++------- 1 file changed, 27 insertions(+), 9 deletions(-) 100.0% src/backend/commands/ diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 517d46f47f9..e23b366a87d 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -2567,17 +2567,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel) return; } - datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup, - Anum_pg_subscription_subconninfo, &isnull); - if (!isnull) - subconninfo = TextDatumGetCString(datum); - form = (Form_pg_subscription) GETSTRUCT(tup); subid = form->oid; - subowner = form->subowner; - subserver = form->subserver; - subconflictlogrelid = form->subconflictlogrelid; - must_use_password = !superuser_arg(subowner) && form->subpasswordrequired; /* must be owner */ if (!object_ownercheck(SubscriptionRelationId, subid, GetUserId())) @@ -2587,12 +2578,39 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel) /* DROP hook for the subscription being removed */ InvokeObjectDropHook(SubscriptionRelationId, subid, 0); + ReleaseSysCache(tup); + /* * Lock the subscription so nobody else can do anything with it (including * the replication workers). */ LockSharedObject(SubscriptionRelationId, subid, 0, AccessExclusiveLock); + /* + * Re-read the subscription tuple after acquiring the lock. A concurrent + * ALTER or DROP may have committed before we acquired the lock. + */ + tup = SearchSysCache1(SUBSCRIPTIONOID, ObjectIdGetDatum(subid)); + + if (!HeapTupleIsValid(tup)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("subscription \"%s\" does not exist", + stmt->subname))); + + form = (Form_pg_subscription) GETSTRUCT(tup); + subowner = form->subowner; + subserver = form->subserver; + subconflictlogrelid = form->subconflictlogrelid; + must_use_password = !superuser_arg(subowner) && form->subpasswordrequired; + + datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup, + Anum_pg_subscription_subconninfo, &isnull); + if (!isnull) + subconninfo = TextDatumGetCString(datum); + else + subconninfo = NULL; + /* Get subname */ datum = SysCacheGetAttrNotNull(SUBSCRIPTIONOID, tup, Anum_pg_subscription_subname); -- 2.34.1
>From 0915ae18376961d702695bb6617b8c04b0e50bf1 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot <[email protected]> Date: Fri, 3 Jul 2026 14:03:13 +0000 Subject: [PATCH v3 3/4] Add invalidation-based retry loop for Alter/Drop Subscription Following the approach of RangeVarGetRelidExtended() for relations, add a retry loop that includes name resolution, ownership check, and lock acquisition in AlterSubscription() and DropSubscription(). The loop records SharedInvalidMessageCounter, resolves the subscription name to an OID, checks ownership, then locks the subscription. If the invalidation counter changed (indicating concurrent DDL), we save the current OID and retry. On the next iteration, if the name still resolves to the same OID, we're done (already holding the correct lock). If it resolves to a different OID, we release the old lock and acquire the new one. This mirrors RangeVarGetRelidExtended()'s behavior: the lock is kept across retries to avoid a window where another session could have committed concurrent DDL modifying the ownership and/or the name resolution. Author: Bertrand Drouvot <[email protected]> Reviewed-by: Discussion: https://postgr.es/m/akZUpiDa1UfmzYxL%40bdtpg --- src/backend/commands/subscriptioncmds.c | 167 +++++++++++++++++------- 1 file changed, 118 insertions(+), 49 deletions(-) 100.0% src/backend/commands/ diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index e23b366a87d..866341c2cfb 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -50,6 +50,7 @@ #include "replication/walsender.h" #include "replication/worker_internal.h" #include "storage/lmgr.h" +#include "storage/sinval.h" #include "utils/acl.h" #include "utils/builtins.h" #include "utils/guc.h" @@ -1592,23 +1593,67 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, rel = table_open(SubscriptionRelationId, RowExclusiveLock); - /* Fetch the existing tuple. */ - tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, ObjectIdGetDatum(MyDatabaseId), - CStringGetDatum(stmt->subname)); + /* + * Lock the subscription so nobody else can do anything with it. + * + * Like RangeVarGetRelidExtended() does for relations, we resolve the + * name, check ownership, and lock inside a loop. If invalidation messages + * arrive (indicating concurrent DDL), we retry. We keep the lock held + * across retries and only release it if the name resolves to a different + * OID on the next iteration. + */ + { + Oid oldSubId = InvalidOid; + bool retry = false; - if (!HeapTupleIsValid(tup)) - ereport(ERROR, - (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg("subscription \"%s\" does not exist", - stmt->subname))); + for (;;) + { + uint64 inval_count = SharedInvalidMessageCounter; - form = (Form_pg_subscription) GETSTRUCT(tup); - subid = form->oid; + tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, + ObjectIdGetDatum(MyDatabaseId), + CStringGetDatum(stmt->subname)); - /* must be owner */ - if (!object_ownercheck(SubscriptionRelationId, subid, GetUserId())) - aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_SUBSCRIPTION, - stmt->subname); + if (!HeapTupleIsValid(tup)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("subscription \"%s\" does not exist", + stmt->subname))); + + form = (Form_pg_subscription) GETSTRUCT(tup); + subid = form->oid; + + if (!object_ownercheck(SubscriptionRelationId, subid, + GetUserId())) + aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_SUBSCRIPTION, + stmt->subname); + + /* + * If upon retry we get the same OID, the invalidation messages + * did not change the final answer. So we're done. If we got a + * different OID, unlock the old one and lock the new one below. + */ + if (retry) + { + if (subid == oldSubId) + break; + UnlockSharedObject(SubscriptionRelationId, oldSubId, 0, + AccessExclusiveLock); + } + + LockSharedObject(SubscriptionRelationId, subid, 0, + AccessExclusiveLock); + + /* If no invalidation messages, we're done. */ + if (inval_count == SharedInvalidMessageCounter) + break; + + /* Something may have changed, retry. */ + retry = true; + oldSubId = subid; + heap_freetuple(tup); + } + } /* parse and check options */ switch (stmt->kind) @@ -1686,25 +1731,6 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, orig_conninfo_needed = false; } - heap_freetuple(tup); - - /* Lock the subscription so nobody else can do anything with it. */ - LockSharedObject(SubscriptionRelationId, subid, 0, AccessExclusiveLock); - - /* - * Re-read the subscription tuple after acquiring the lock. A concurrent - * DROP or ALTER may have committed before we acquired the lock. - */ - tup = SearchSysCacheCopy1(SUBSCRIPTIONOID, ObjectIdGetDatum(subid)); - - if (!HeapTupleIsValid(tup)) - ereport(ERROR, - (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg("subscription \"%s\" does not exist", - stmt->subname))); - - form = (Form_pg_subscription) GETSTRUCT(tup); - /* * Skip ACL checks on the subscription's foreign server, if any. If * changing the server (or replacing it with a raw connection), then the @@ -2570,11 +2596,6 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel) form = (Form_pg_subscription) GETSTRUCT(tup); subid = form->oid; - /* must be owner */ - if (!object_ownercheck(SubscriptionRelationId, subid, GetUserId())) - aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_SUBSCRIPTION, - stmt->subname); - /* DROP hook for the subscription being removed */ InvokeObjectDropHook(SubscriptionRelationId, subid, 0); @@ -2583,20 +2604,68 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel) /* * Lock the subscription so nobody else can do anything with it (including * the replication workers). + * + * Like RangeVarGetRelidExtended() does for relations, we resolve the + * name, check ownership, and lock inside a loop. If invalidation messages + * arrive (indicating concurrent DDL), we retry. We keep the lock held + * across retries and only release it if the name resolves to a different + * OID on the next iteration. */ - LockSharedObject(SubscriptionRelationId, subid, 0, AccessExclusiveLock); + { + Oid oldSubId = InvalidOid; + bool retry = false; - /* - * Re-read the subscription tuple after acquiring the lock. A concurrent - * ALTER or DROP may have committed before we acquired the lock. - */ - tup = SearchSysCache1(SUBSCRIPTIONOID, ObjectIdGetDatum(subid)); + for (;;) + { + uint64 inval_count = SharedInvalidMessageCounter; - if (!HeapTupleIsValid(tup)) - ereport(ERROR, - (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg("subscription \"%s\" does not exist", - stmt->subname))); + tup = SearchSysCache2(SUBSCRIPTIONNAME, + ObjectIdGetDatum(MyDatabaseId), + CStringGetDatum(stmt->subname)); + + if (!HeapTupleIsValid(tup)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("subscription \"%s\" does not exist", + stmt->subname))); + + form = (Form_pg_subscription) GETSTRUCT(tup); + subid = form->oid; + + if (!object_ownercheck(SubscriptionRelationId, subid, + GetUserId())) + { + ReleaseSysCache(tup); + aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_SUBSCRIPTION, + stmt->subname); + } + + /* + * If upon retry we get the same OID, the invalidation messages + * did not change the final answer. So we're done. If we got a + * different OID, unlock the old one and lock the new one below. + */ + if (retry) + { + if (subid == oldSubId) + break; + UnlockSharedObject(SubscriptionRelationId, oldSubId, 0, + AccessExclusiveLock); + } + + LockSharedObject(SubscriptionRelationId, subid, 0, + AccessExclusiveLock); + + /* If no invalidation messages, we're done. */ + if (inval_count == SharedInvalidMessageCounter) + break; + + /* Something may have changed, retry. */ + retry = true; + oldSubId = subid; + ReleaseSysCache(tup); + } + } form = (Form_pg_subscription) GETSTRUCT(tup); subowner = form->subowner; -- 2.34.1
>From f8f68c999dbd52ed566a9953b35292a1218015fe Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot <[email protected]> Date: Fri, 3 Jul 2026 14:46:42 +0000 Subject: [PATCH v3 4/4] Add invalidation-based retry loop for AlterPublication Apply the same RangeVarGetRelidExtended() style retry loop to AlterPublication()'s tables/schemas branch that was added for subscriptions in commit XXXX. Previously, this branch resolved the publication name and checked ownership at the top of AlterPublication(), then locked and re-read by OID. This left a window where concurrent DDL could have modified the ownership and/or the name resolution Now the tables/schemas branch has its own complete retry loop: name resolution, ownership check, and lock acquisition all inside the loop. Author: Bertrand Drouvot <[email protected]> Reviewed-by: Discussion: https://postgr.es/m/akZUpiDa1UfmzYxL%40bdtpg --- src/backend/commands/publicationcmds.c | 105 +++++++++++++++++-------- 1 file changed, 72 insertions(+), 33 deletions(-) 100.0% src/backend/commands/ diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c index 440adb356ad..dfd707bc7d7 100644 --- a/src/backend/commands/publicationcmds.c +++ b/src/backend/commands/publicationcmds.c @@ -39,6 +39,7 @@ #include "parser/parse_relation.h" #include "rewrite/rewriteHandler.h" #include "storage/lmgr.h" +#include "storage/sinval.h" #include "utils/acl.h" #include "utils/builtins.h" #include "utils/inval.h" @@ -1662,54 +1663,92 @@ AlterPublication(ParseState *pstate, AlterPublicationStmt *stmt) rel = table_open(PublicationRelationId, RowExclusiveLock); - tup = SearchSysCacheCopy1(PUBLICATIONNAME, - CStringGetDatum(stmt->pubname)); + if (stmt->options) + { + tup = SearchSysCacheCopy1(PUBLICATIONNAME, + CStringGetDatum(stmt->pubname)); - if (!HeapTupleIsValid(tup)) - ereport(ERROR, - (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg("publication \"%s\" does not exist", - stmt->pubname))); + if (!HeapTupleIsValid(tup)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("publication \"%s\" does not exist", + stmt->pubname))); - pubform = (Form_pg_publication) GETSTRUCT(tup); + pubform = (Form_pg_publication) GETSTRUCT(tup); - /* must be owner */ - if (!object_ownercheck(PublicationRelationId, pubform->oid, GetUserId())) - aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_PUBLICATION, - stmt->pubname); + /* must be owner */ + if (!object_ownercheck(PublicationRelationId, pubform->oid, + GetUserId())) + aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_PUBLICATION, + stmt->pubname); - if (stmt->options) AlterPublicationOptions(pstate, stmt, rel, tup); + } else { List *relations = NIL; List *exceptrelations = NIL; List *schemaidlist = NIL; - Oid pubid = pubform->oid; + Oid pubid; - ObjectsInPublicationToOids(stmt->pubobjects, pstate, &relations, - &exceptrelations, &schemaidlist); + /* + * Lock the publication so nobody else can do anything with it. + * + * Like RangeVarGetRelidExtended() does for relations, we resolve the + * name, check ownership, and lock inside a loop. If invalidation + * messages arrive (indicating concurrent DDL), we retry. We keep the + * lock held across retries and only release it if the name resolves + * to a different OID on the next iteration. + */ + { + Oid oldPubId = InvalidOid; + bool retry = false; - CheckAlterPublication(stmt, tup, relations, schemaidlist); + for (;;) + { + uint64 inval_count = SharedInvalidMessageCounter; - heap_freetuple(tup); + tup = SearchSysCacheCopy1(PUBLICATIONNAME, + CStringGetDatum(stmt->pubname)); - /* Lock the publication so nobody else can do anything with it. */ - LockDatabaseObject(PublicationRelationId, pubid, 0, - AccessExclusiveLock); + if (!HeapTupleIsValid(tup)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("publication \"%s\" does not exist", + stmt->pubname))); - /* - * It is possible that by the time we acquire the lock on publication, - * concurrent DDL has removed it. We can test this by checking the - * existence of publication. We get the tuple again to avoid the risk - * of any publication option getting changed. - */ - tup = SearchSysCacheCopy1(PUBLICATIONOID, ObjectIdGetDatum(pubid)); - if (!HeapTupleIsValid(tup)) - ereport(ERROR, - errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg("publication \"%s\" does not exist", - stmt->pubname)); + pubform = (Form_pg_publication) GETSTRUCT(tup); + pubid = pubform->oid; + + if (!object_ownercheck(PublicationRelationId, pubid, + GetUserId())) + aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_PUBLICATION, + stmt->pubname); + + if (retry) + { + if (pubid == oldPubId) + break; + UnlockDatabaseObject(PublicationRelationId, oldPubId, 0, + AccessExclusiveLock); + } + + LockDatabaseObject(PublicationRelationId, pubid, 0, + AccessExclusiveLock); + + if (inval_count == SharedInvalidMessageCounter) + break; + + retry = true; + oldPubId = pubid; + heap_freetuple(tup); + } + } + + ObjectsInPublicationToOids(stmt->pubobjects, pstate, &relations, + &exceptrelations, &schemaidlist); + + CheckAlterPublication(stmt, tup, relations, schemaidlist); relations = list_concat(relations, exceptrelations); AlterPublicationTables(stmt, tup, relations, pstate->p_sourcetext, -- 2.34.1
