Re: pg_upgrade: optimize replication slot caught-up check

Masahiko Sawada Wed, 21 Jan 2026 15:50:13 -0800

On Mon, Jan 19, 2026 at 10:38 PM shveta malik <[email protected]> wrote:
>
> On Wed, Jan 14, 2026 at 11:24 PM Masahiko Sawada <[email protected]> 
> wrote:
> >
> > I've attached the updated patch.
> >
>
> Thank You for the patch. I like the idea of optimization. Few initial 
> comments:


Thank you for reviewing the patch!

>
> 1)
> + * The query returns the slot names and their caught-up status in
> + * the same order as the results collected by
> + * get_old_cluster_logical_slot_infos(). If this query is changed,
>
> I could not find the function get_old_cluster_logical_slot_infos(), do
> you mean get_old_cluster_logical_slot_infos_query()?

It seems an oversight in commit 6d3d2e8e541f0. I think it should be
get_db_rel_and_slot_infos().

>
> 2)
> "  WHERE database = current_database() AND "
> "    slot_type = 'logical' AND "
>
> Is there a reason why database = current_database() is placed before
> slot_type = 'logical'? I am not sure how the PostgreSQL optimizer and
> executor will order these predicates, but from the first look,
> slot_type = 'logical' appears cheaper and could be placed first,
> consistent with the ordering used at other places.

Changed.

>
> 3)
> Shouldn’t we add a sanity check inside
> get_old_cluster_logical_slot_infos_query() to ensure that when
> skip_caught_up_check is true, we are on PostgreSQL 18 or lower? This
> would make the function safer for future use if it's called elsewhere.
> I understand the caller already performs a similar check, but I think
> it's more appropriate here since we call
> binary_upgrade_logical_slot_has_caught_up() from inside, which doesn’t
> even exist on newer versions.

What kind of sanity check did you mean? We can have a check with
pg_fatal() but it seems almost the same to me even if pg_upgrade fails
with an error due to missing
binary_upgrade_logical_slot_has_caught_up().

>
> 4)
> +# Check the file content. While both test_slot1 and test_slot2 should
> be reporting
> +# that they have unconsumed WAL records, test_slot3 should not be reported as
> +# it has caught up.
>
> Can you please elaborate the reason behind test_slot3 not being
> reported? Also mention in the comment if possible.

We advance test_slot3 to the current WAL LSN before executing
pg_upgrade, so the test_slot3 should have consumed all pending WALs.
Please refer to the following changes:

 # Preparations for the subsequent test:
-# 1. Generate extra WAL records. At this point neither test_slot1 nor
-#   test_slot2 has consumed them.
+# 1. Generate extra WAL records. At this point none of slots has consumed them.
 #
 # 2. Advance the slot test_slot2 up to the current WAL location, but test_slot1
 #   still has unconsumed WAL records.
 #
 # 3. Emit a non-transactional message. This will cause test_slot2 to detect the
 #   unconsumed WAL record.
+#
+# 4. Advance the slot test_slots3 up to the current WAL location.
 $oldpub->start;
 $oldpub->safe_psql(
    'postgres', qq[
        CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
        SELECT pg_replication_slot_advance('test_slot2', pg_current_wal_lsn());
-       SELECT count(*) FROM pg_logical_emit_message('false',
'prefix', 'This is a non-transactional message');
+       SELECT count(*) FROM pg_logical_emit_message('false',
'prefix', 'This is a non-transactional message', true);
+       SELECT pg_replication_slot_advance('test_slot3', pg_current_wal_lsn());

I believe that the following new comment explains the reason well:

+# Check the file content. While both test_slot1 and test_slot2 should
be reporting
+# that they have unconsumed WAL records, test_slot3 should not be reported as
+# it has caught up.

I've attached the updated patch.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

v6-0001-pg_upgrade-Optimize-replication-slot-caught-up-ch.patch
Description: Binary data

Re: pg_upgrade: optimize replication slot caught-up check

Reply via email to