On Wed, Mar 18, 2026 at 4:08 PM Ashutosh Sharma <[email protected]> wrote:
>
> Hi,
>
> Thanks again for reviewing the test cases. Please find my comments inline 
> below:
>
> On Wed, Mar 18, 2026 at 12:01 PM shveta malik <[email protected]> wrote:
> >
> >
> > Few comments:
> >
> > 1)
> > +
> > +# Physical replication slots for the two standbys
> > +$primary->safe_psql('postgres',
> > + "SELECT pg_create_physical_replication_slot('sb1_slot');");
> > +$primary->safe_psql('postgres',
> > + "SELECT pg_create_physical_replication_slot('sb2_slot');");
> > +$primary->safe_psql('postgres',
> > + "SELECT pg_create_physical_replication_slot('first_slot');");
> >
> > Can we please write a comment atop 'first_slot' creation for its purpose.
> > The comment '+# Physical replication slots for the two standbys'
> > confuses as we are creating 3 slots following that comment.
> >
>
> Moved this to PART E (the test-case where this slot is being used).
> This is also suggested in one of your comments below.
>
> > 2)
> > +# PART A: Plain list (ALL mode) blocks when any slot is unavailable
> >
> > +# PART C: Verify plain list (ALL mode) requires ALL slots
> >
> > I think we can merge the 2 tests. Is there a specific reason we have
> > it in 2 different sections (am I missing something?).
> >
> > Part A can first test blocking on inactive standby (which it is doing
> > already) and then restart standby and can check that logical decoding
> > now proceeds. Part C can then be removed.
> >
>
> I did not remove PART C entirely, however, I removed the portions that
> overlapped with PART A.
>
> I would suggest not just evaluating each test case independently, it
> would be useful to also consider the overall flow to understand why
> PART B follows PART A, and so on.
>
> Viewed sequentially, PART A tests a unique scenario while also setting
> up the environment needed for PART B (quorum mode). Similarly, PART C
> covers a different scenario and prepares the setup for the test case
> in PART D. This approach helps avoid unnecessary stop/start cycles of
> the synchronous standby.
>
> > 3)
> > Same with these 2:
> > +# PART B: ANY mode (quorum) — logical decoding proceeds with N-of-M slots
> >
> > +# PART D: ANY mode — verify it works with either standby
> >
> > Part B tests that it works when standby1 is down.
> > Part D tests that it works when standby2  is down and also works when
> > both are up.
> >
> > I think we don't need these duplicate test cases (unless I am missing
> > something?), we can merge Part D to Part B itself i.e. test with
> > standby1 down and up in a single test.
> >
>
> Removed PART D. It was almost similar to PART B.
>
> > 4)
> > These 2 tests are basically testing the same thing. Do we need both?
> > +# FIRST 1 should skip sb1_slot (unavailable) and use sb2_slot.
> >
> > +# Test that FIRST 1 is different from plain list — FIRST 1 succeeds
> > with one down.
> >
>
> Yes, they are actually the same. Hence removed the latter one.
>
> >
> > 5)
> > I see that we only need 'first_slot' for the last test and do not have
> > even a standby associated with it. So we can even move it's creation
> > to that test itself instead of creating it in the beginning and adding
> > comments to explain 'why'.
> >
>
> This is related to comment 1 above.
>
> PFA patch that addresses above comments.

I was reviewing the latest patch and here are few comments/suggestions

1.
+       <para>
+        The keyword <literal>FIRST</literal>, coupled with
+        <replaceable class="parameter">num_sync</replaceable>, specifies
+        priority-based semantics. Logical decoding will wait for the first
+        <replaceable class="parameter">num_sync</replaceable> available
+        physical slots in priority order (the order they appear in the list).
+        If a slot is missing, logical, invalidated, or inactive, it will be
+        skipped. However, if a slot exists and is valid and active but has
+        not yet caught up, the system will wait for it rather than skipping
+        to lower-priority slots.
+       </para>

This explains that it will skip missing, logical, invalidated, or
inactive slots while processing the list and deciding on which first N
slots to wait for, but I could not understand what would be the
behavior if after skipping a few invalid slots the remaining valid
slots are less than N, will it proceed for decoding?  I think this
should be clarified in docs.

2. I see a lot of overlapping code between
check_synchronized_standby_slots() and
check_synchronous_standby_names(), can we do some refactoring to make
a common code?

I have just pass through the patch and will try to complete the review soon.

-- 
Regards,
Dilip Kumar
Google


Reply via email to