On Fri, May 23, 2025 at 08:55:27PM +0530, vignesh C wrote:
> This issue can be consistently reproduced by injecting a delay (e.g.,
> 3 seconds) in tap_sub's walsender while decoding the commit of
> 'mygid'. A patch to demonstrate this behavior is provided at
> 021_two_phase_test_failure_reproduce.patch. The test can be fixed by
> explicitly waiting for both subscriptions to catch up before dropping
> either. A patch implementing this fix is attached.

+    if (parsed->twophase_xid && strcmp(parsed->twophase_gid, "mygid") == 0 &&
+        strcmp(NameStr(MyReplicationSlot->data.name), "tap_sub") == 0)
+        sleep(3);
+

Smart filtering to prove your point.

> +# Wait for both subscribers to catchup
>  $node_publisher->wait_for_catchup($appname_copy);
> +$node_publisher->wait_for_catchup($appname);
> +
> +# Make sure there are no prepared transactions on the subscriber
> +$result = $node_subscriber->safe_psql('postgres',
> +    "SELECT count(*) FROM pg_prepared_xacts;");
> +is($result, qq(0), 'should be no prepared transactions on subscriber');

Yes, agreed that your suggested fix looks sensible with an extra check
for pg_prepared_xacts on the subscriber side that can be useful for
debugging.  I'll take care of that, if there are no objections.
--
Michael

Attachment: signature.asc
Description: PGP signature

Reply via email to