Re: Fix 035_standby_logical_decoding.pl race conditions

Amit Kapila Sat, 05 Apr 2025 10:00:00 -0700

On Wed, Apr 2, 2025 at 2:06 PM Bertrand Drouvot
<[email protected]> wrote:
>
> Hi Kuroda-san,
>
> On Wed, Apr 02, 2025 at 07:16:25AM +0000, Hayato Kuroda (Fujitsu) wrote:
>
> As far v4-0001:
>
> === 1
>
> +# would advance an active replication slot's catalog_xmin
>
> s/would/could/? I mean the system also needs to be "slow" enough (so the
> sleep() in the reproducer)
>
> === 2
>
> +# Injection_point is used to avoid seeing an xl_running_xacts even here. In
> +# scenario 5, we verify the case that the backend process detects the page 
> has
> +# enough tuples; thus, page pruning happens. If the record is generated just
> +# before doing on-pruning, the catalog_xmin of the active slot would be
> +# updated; hence, the conflict would not occur.
>
> Not sure we need to explain what scenario 5 does here, but that does not hurt
> if you feel the need.
>
> Also maybe mention the last update in the comment and add some nuance (like
> proposed in === 1), something like?
>
> "
> # Injection_point is used to avoid seeing a xl_running_xacts here. Indeed,
> # if it is generated between the last 2 updates then the catalog_xmin of the 
> active
> # slot could be updated; hence, the conflict could not occur.
> "
>


I have changed it based on your suggestions and made a few other
changes in the comments. Please see attached.

*
+  if (IS_INJECTION_POINT_ATTACHED("log-running-xacts"))

It is better to name the injection point as skip-log-running-xacts as
that will be appropriate based on its usage.

-- 
With Regards,
Amit Kapila.

diff --git a/src/backend/storage/ipc/standby.c 
b/src/backend/storage/ipc/standby.c
index 0e621e9996a..7fa8d9247e0 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -1288,21 +1288,17 @@ LogStandbySnapshot(void)
 
        Assert(XLogStandbyInfoActive());
 
-       /* For testing slot invalidation */
 #ifdef USE_INJECTION_POINTS
-       if (IS_INJECTION_POINT_ATTACHED("log-running-xacts"))
+       if (IS_INJECTION_POINT_ATTACHED("skip-log-running-xacts"))
        {
                /*
-                * RUNNING_XACTS could move slots's xmin forward and cause 
random
-                * failures in some tests. Skip generating to avoid it.
-                *
-                * XXX What value should we return here? Originally this 
returns the
-                * inserted location of RUNNING_XACT record. Based on that, here
-                * returns the latest insert location for now.
+                * This record could move slot's xmin forward during decoding, 
leading
+                * to unpredictable results, so skip it when requested by the 
test.
                 */
                return GetInsertRecPtr();
        }
 #endif
+
        /*
         * Get details of any AccessExclusiveLocks being held at the moment.
         */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index 96dd0340e8d..39a8797a7cb 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -246,10 +246,10 @@ sub check_for_invalidation
 # VACUUM command, $sql the sql to launch before triggering the vacuum and
 # $to_vac the relation to vacuum.
 #
-# Note that injection_point is used to avoid seeing a xl_running_xacts that
-# would advance an active replication slot's catalog_xmin. Advancing the active
-# replication slot's catalog_xmin would break some tests that expect the active
-# slot to conflict with the catalog xmin horizon.
+# Note that the injection_point avoids seeing a xl_running_xacts that could
+# advance an active replication slot's catalog_xmin. Advancing the active
+# replication slot's catalog_xmin would break some tests that expect the
+# active slot to conflict with the catalog xmin horizon.
 sub wait_until_vacuum_can_remove
 {
        my ($vac_option, $sql, $to_vac) = @_;
@@ -257,7 +257,7 @@ sub wait_until_vacuum_can_remove
        # Note that from this point the checkpointer and bgwriter will skip 
writing
        # xl_running_xacts record.
        $node_primary->safe_psql('testdb',
-               "SELECT injection_points_attach('log-running-xacts', 
'error');");
+               "SELECT injection_points_attach('skip-log-running-xacts', 
'error');");
 
        # Get the current xid horizon,
        my $xid_horizon = $node_primary->safe_psql('testdb',
@@ -281,7 +281,7 @@ sub wait_until_vacuum_can_remove
 
        # Resume generating the xl_running_xacts record
        $node_primary->safe_psql('testdb',
-               "SELECT injection_points_detach('log-running-xacts');");
+               "SELECT injection_points_detach('skip-log-running-xacts');");
 }
 
 ########################
@@ -790,13 +790,12 @@ $logstart = -s $node_standby->logfile;
 reactive_slots_change_hfs_and_wait_for_xmins('no_conflict_', 'pruning_', 0,
        0);
 
-# Injection_point is used to avoid seeing an xl_running_xacts even here. In
-# scenario 5, we verify the case that the backend process detects the page has
-# enough tuples; thus, page pruning happens. If the record is generated just
-# before doing on-pruning, the catalog_xmin of the active slot would be
-# updated; hence, the conflict would not occur.
+# Injection_point avoids seeing an xl_running_xacts even here. This is required
+# because if it is generated between the last two updates, then the 
catalog_xmin
+# of the active slot could be updated, and hence, the conflict won't occur. See
+# comments atop wait_until_vacuum_can_remove.
 $node_primary->safe_psql('testdb',
-       "SELECT injection_points_attach('log-running-xacts', 'error');");
+       "SELECT injection_points_attach('skip-log-running-xacts', 'error');");
 
 # This should trigger the conflict
 $node_primary->safe_psql('testdb',
@@ -812,7 +811,7 @@ $node_primary->wait_for_replay_catchup($node_standby);
 
 # Resume generating the xl_running_xacts record
 $node_primary->safe_psql('testdb',
-       "SELECT injection_points_detach('log-running-xacts');");
+       "SELECT injection_points_detach('skip-log-running-xacts');");
 
 # Check invalidation in the logfile and in pg_stat_database_conflicts
 check_for_invalidation('pruning_', $logstart, 'with on-access pruning');

Re: Fix 035_standby_logical_decoding.pl race conditions

Reply via email to