On Thu, Jul 17, 2025 at 4:21 PM Amit Kapila <amit.kapil...@gmail.com> wrote: >
> It seems the patch assumes that the above lock is sufficient because > AlterSubscription will take an AcessExclusiveLock on the same > subscription. So, with this proposal, if both of those become > serialized then the other locking order may not matter. Am I correct > or is there any other theory you have in mind? > > If that is the theory then I think we are missing cases where the > apply worker and Alter subscription operates on different > subscriptions. > > Consider AlterSubscription_refresh() takes first AccessExclusiveLock > on SubscriptionRelRelationId and then ExclusiveLock on > ReplicationOriginRelationId via replorigin_drop_by_name() . The apply > worker first takes ExclusiveLock on ReplicationOriginRelationId via > replorigin_drop_by_name() and then RowExclusiveLock on > SubscriptionRelRelationId via UpdateSubscriptionRelState(). Won't such > a scenario taking conflicting locks in reverse order can lead to > deadlock at least in PG15? Yes, this is correct. I have also verified this in my testing that when there is a second subscription, a deadlock can still happen with my previous patch. I have now modified the patch in tablesync worker to take locks on both SubscriptionRelationId and SubscriptionRelRelationId prior to taking lock on ReplicationOriginRelationId. I have also found that there is a similar problem in DropSubscription() where lock on SubscriptionRelRelationId is not taken before dropping the tracking origin. I've also modified the signature of UpdateSubscriptionRelState to take a bool "lock_needed" which if false, the SubscriptionRelationId is not locked. I've made a new version of the patch on PG_15. regards, Ajin Cherian Fujitsu Australia
v3-0001-Fix-a-deadlock-during-ALTER-SUBSCRIPTION.-DROP-PU.patch
Description: Binary data