On Tue, Apr 1, 2025 at 2:09 PM Amit Kapila wrote:

> 
> On Mon, Mar 31, 2025 at 5:0 PM Zhijie Hou (Fujitsu) 
> <houzj.f...@fujitsu.com> wrote:
> >
> > Thanks for the comments, they have been addressed in V2.
> >
> 
> In the PG17 patch, we only have a check preventing failover and 
> two_phase in CreateSubscription. Don't we need a similar check for 
> AlterSubscription?
> 
> Apart from the above, I have a few suggestions for changes in docs, 
> error messages, and comments for both versions. See attached.

Thanks for the comments.

Here is the V3 patch set which addressed all the comments.

I also added a testcase for ALTER SUB in 0002 as suggested by
Kuroda-san off-list.

Additionally, I found that a test failed in PG17 following this
patch because it involved creating a subscription with both failover and
two-phase enabled. Since that test was designed to test the upgrade of
replication slots and is not directly related to failover functionality, I
decided to disable the failover option for test case.

Best Regards,
Hou zj

Attachment: v3-0001-Fix-slot-synchronization-with-two_phase-decoding-.patch
Description: v3-0001-Fix-slot-synchronization-with-two_phase-decoding-.patch

From 0c9a8086913ca2e13106df386544821f39c1d41c Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.f...@cn.fujitsu.com>
Date: Mon, 31 Mar 2025 16:28:57 +0800
Subject: [PATCH v3] Disallow enabling failover for a replication slot that
 enables two-phase decoding

This commit fixes a bug for slot synchronization with logical replication
slots that enabled two_phase decoding. As it stands, transactions prepared
before two-phase decoding is enabled may fail to replicate to the subscriber
after being committed on a promoted standby following a failover.

The issue arises because the two_phase_at field of a slot, which tracks the LSN
from which two-phase decoding starts, is not synchronized to standby servers.
Without this field, the logical decoding might incorrectly identify prepared
transaction as already replicated to the subscriber, causing them to be
skipped.

To address the issue on HEAD, this commit makes the two_phase_at field of the 
slot
visible in the pg_replication_slots view and enables the slot synchronization
to copy this value to the corresponding synced slot on the standby server.

The bug has been present since the introduction of slot synchronization in
PostgreSQL 17. However, due to the need for catalog changes, backpatching this
fix is not feasible. Instead, to prevent the risk of losing prepared
transactions in prior versions, we now disallow enabling failover and two-phase
decoding together for a replication slot.
---
 contrib/test_decoding/expected/slot.out       |  2 ++
 contrib/test_decoding/sql/slot.sql            |  1 +
 src/backend/commands/subscriptioncmds.c       | 25 +++++++++++++++++
 src/backend/replication/slot.c                | 28 +++++++++++++++++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |  6 ++--
 .../t/040_standby_failover_slots_sync.pl      | 23 +++++++++++++++
 src/test/regress/expected/subscription.out    |  3 ++
 src/test/regress/sql/subscription.sql         |  4 +++
 8 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/contrib/test_decoding/expected/slot.out 
b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..8fd762dea85 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM 
pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM 
pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', 
true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM 
pg_create_logical_replication_slot('failover_twophase_true_slot', 
'test_decoding', false, true, true);
+ERROR:  "failover" and "two_phase" are mutually exclusive options
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql 
b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM 
pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 
'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 
'test_decoding', false, false);
 SELECT 'init' FROM 
pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', 
true, false, true);
+SELECT 'init' FROM 
pg_create_logical_replication_slot('failover_twophase_true_slot', 
'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..6db3c9347fa 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,18 @@ CreateSubscription(ParseState *pstate, 
CreateSubscriptionStmt *stmt,
                                 errmsg("password_required=false is 
superuser-only"),
                                 errhint("Subscriptions with the 
password_required option set to false may only be created or modified by the 
superuser.")));
 
+       /*
+        * Do not allow users to enable failover and two_phase option together.
+        *
+        * See comments atop the similar check in ReplicationSlotCreate() for
+        * detailed reasons.
+        */
+       if (opts.twophase && opts.failover)
+               ereport(ERROR,
+                               errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                               errmsg("\"%s\" and \"%s\" are mutually 
exclusive options",
+                                          "failover", "two_phase"));
+
        /*
         * If built with appropriate switch, whine when regression-testing
         * conventions for subscription names are violated.
@@ -1245,6 +1257,19 @@ AlterSubscription(ParseState *pstate, 
AlterSubscriptionStmt *stmt,
                                                                 errmsg("cannot 
set %s for enabled subscription",
                                                                                
"failover")));
 
+                                       /*
+                                        * Do not allow users to enable 
failover and two_phase
+                                        * option together.
+                                        *
+                                        * See comments atop the similar check 
in
+                                        * ReplicationSlotCreate() for detailed 
reasons.
+                                        */
+                                       if (sub->twophasestate != 
LOGICALREP_TWOPHASE_STATE_DISABLED &&
+                                               opts.failover)
+                                               ereport(ERROR,
+                                                               
errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                                                               errmsg("cannot 
enable failover for a two_phase enabled subscription"));
+
                                        /*
                                         * The changed failover option of the 
slot can't be rolled
                                         * back.
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..f40028bd941 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,22 @@ ReplicationSlotCreate(const char *name, bool db_specific,
                        ereport(ERROR,
                                        errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                                        errmsg("cannot enable failover for a 
temporary replication slot"));
+
+               /*
+                * Do not allow users to enable failover for slots that enable
+                * two-phase decoding.
+                *
+                * This is because the two_phase_at field of a slot, which 
tracks the
+                * LSN from which two-phase decoding starts, is not 
synchronized to
+                * standby servers. Without this field, the logical decoding 
might
+                * incorrectly identify prepared transaction as already 
replicated to
+                * the subscriber, causing them to be skipped.
+                */
+               if (two_phase)
+                       ereport(ERROR,
+                                       errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                                       errmsg("\"%s\" and \"%s\" are mutually 
exclusive options",
+                                                  "failover", "two_phase"));
        }
 
        /*
@@ -848,6 +864,18 @@ ReplicationSlotAlter(const char *name, bool failover)
                                errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                                errmsg("cannot enable failover for a temporary 
replication slot"));
 
+       /*
+        * Do not allow users to enable failover for slots that enable two-phase
+        * decoding.
+        *
+        * See comments atop the similar check in ReplicationSlotCreate() for
+        * detailed reasons.
+        */
+       if (failover && MyReplicationSlot->data.two_phase)
+               ereport(ERROR,
+                               errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                               errmsg("cannot enable failover for a two-phase 
enabled replication slot"));
+
        if (MyReplicationSlot->data.failover != failover)
        {
                SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl 
b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..e329cc609ce 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -173,7 +173,7 @@ $sub->start;
 $sub->safe_psql(
        'postgres', qq[
        CREATE TABLE tbl (a int);
-       CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION 
regress_pub WITH (two_phase = 'true', failover = 'true')
+       CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION 
regress_pub WITH (two_phase = 'true')
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
@@ -193,8 +193,8 @@ command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old 
cluster');
 # Check that the slot 'regress_sub' has migrated to the new cluster
 $newpub->start;
 my $result = $newpub->safe_psql('postgres',
-       "SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+       "SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
 
 # Update the connection
 my $new_connstr = $newpub->connstr . ' dbname=postgres';
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl 
b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..6e68ed5d4bb 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -98,6 +98,29 @@ my ($result, $stdout, $stderr) = 
$subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
        "altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option cannot be enabled for a two_phase enabled
+# subscription.
+##################################################
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+       "CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' 
PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, create_slot = false, 
enabled = false, two_phase = true);"
+);
+
+# Enable failover for the subscription
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+       "ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~ /ERROR:  cannot enable failover for a two_phase enabled 
subscription/,
+       "Enabling failover is not allowed for a two_phase enabled 
subscription");
+
+# Drop the subscription
+$subscriber1->safe_psql(
+       'postgres', q{
+ALTER SUBSCRIPTION regress_mysub2 SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_mysub2;
+});
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby 
server.
 ##################################################
diff --git a/src/test/regress/expected/subscription.out 
b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..bbd83244ba9 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,9 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' 
PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  "failover" and "two_phase" are mutually exclusive options
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql 
b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' 
PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.30.0.windows.2

Reply via email to