Re: Implement waiting for wal lsn replay: reloaded

Xuneng Zhou Fri, 26 Dec 2025 08:15:49 -0800

Hi Chao,

Thanks a lot for your review!


On Fri, Dec 26, 2025 at 4:25 PM Chao Li <[email protected]> wrote:
>
>
>
> > On Dec 19, 2025, at 10:49, Xuneng Zhou <[email protected]> wrote:
> >
> > Hi,
> >
> > On Thu, Dec 18, 2025 at 8:25 PM Alexander Korotkov <[email protected]> 
> > wrote:
> >>
> >> On Thu, Dec 18, 2025 at 2:24 PM Xuneng Zhou <[email protected]> wrote:
> >>> On Thu, Dec 18, 2025 at 6:38 PM Alexander Korotkov <[email protected]> 
> >>> wrote:
> >>>>
> >>>> Hi, Xuneng!
> >>>>
> >>>> On Tue, Dec 16, 2025 at 6:46 AM Xuneng Zhou <[email protected]> wrote:
> >>>>> Remove the erroneous WAIT_LSN_TYPE_COUNT case from the switch
> >>>>> statement in v5 patch 1.
> >>>>
> >>>> Thank you for your work on this patchset.  Generally, it looks like
> >>>> good and quite straightforward extension of the current functionality.
> >>>> But this patch adds 4 new unreserved keywords to our grammar.  Do you
> >>>> think we can put mode into with options clause?
> >>>>
> >>>
> >>> Thanks for pointing this out. Yeah, 4 unreserved keywords add
> >>> complexity to the parser and it may not be worthwhile since replay is
> >>> expected to be the common use scenario. Maybe we can do something like
> >>> this:
> >>>
> >>> -- Default (REPLAY mode)
> >>> WAIT FOR LSN '0/306EE20' WITH (TIMEOUT '1s');
> >>>
> >>> -- Explicit REPLAY mode
> >>> WAIT FOR LSN '0/306EE20' WITH (MODE 'replay', TIMEOUT '1s');
> >>>
> >>> -- WRITE mode
> >>> WAIT FOR LSN '0/306EE20' WITH (MODE 'write', TIMEOUT '1s');
> >>>
> >>> If no mode is set explicitly in the options clause, it defaults to
> >>> replay. I'll update the patch per your suggestion.
> >>
> >> This is exactly what I meant.  Please, go ahead.
> >>
> >
> > Here is the updated patch set (v7). Please check.
> >
> > --
> > Best,
> > Xuneng
> > <v7-0001-Extend-xlogwait-infrastructure-with-write-and-flu.patch><v7-0004-Use-WAIT-FOR-LSN-in.patch><v7-0003-Add-tab-completion-for-WAIT-FOR-LSN-MODE-option.patch><v7-0002-Add-MODE-option-to-WAIT-FOR-LSN-command.patch>
>
> Hi Xuneng,
>
> A solid patch! Just a few small comments:
>
> 1 - 0001
> ```
> +XLogRecPtr
> +GetCurrentLSNForWaitType(WaitLSNType lsnType)
> +{
> +       switch (lsnType)
> +       {
> +               case WAIT_LSN_TYPE_STANDBY_REPLAY:
> +                       return GetXLogReplayRecPtr(NULL);
> +
> +               case WAIT_LSN_TYPE_STANDBY_WRITE:
> +                       return GetWalRcvWriteRecPtr();
> +
> +               case WAIT_LSN_TYPE_STANDBY_FLUSH:
> +                       return GetWalRcvFlushRecPtr(NULL, NULL);
> +
> +               case WAIT_LSN_TYPE_PRIMARY_FLUSH:
> +                       return GetFlushRecPtr(NULL);
> +       }
> +
> +       elog(ERROR, "invalid LSN wait type: %d", lsnType);
> +       pg_unreachable();
> +}
> ```
>
> As you add pg_unreachable() in the new function GetCurrentLSNForWaitType(), 
> I’m thinking if we should just do an Assert(), I saw every existing related 
> function has done such an assert, for example addLSNWaiter(), it does 
> “Assert(i >= 0 && i < WAIT_LSN_TYPE_COUNT);”. I guess we can just following 
> the current mechanism to verify lsnType. So, for GetCurrentLSNForWaitType(), 
> we can just add a default clause and Assert(false).

My take is that Assert(false) alone might not be enough here, since
assertions vanish in non-assert builds. An unexpected lsnType is a
real bug even in production, so keeping a hard error plus
pg_unreachable() seems to be a safer pattern. It also acts as a
guardrail for future extensions — if new wait types are added without
updating this code, we’ll fail loudly rather than silently returning
an incorrect LSN. Assert(i >= 0 && i < WAIT_LSN_TYPE_COUNT) was added
to the top of the function.

> 2 - 0002
> ```
> +                       else
> +                               ereport(ERROR,
> +                                               
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                                                errmsg("unrecognized value 
> for WAIT option \"%s\": \"%s\"",
> +                                                               "MODE", 
> mode_str),
> ```
>
> I wonder why don’t we directly put MODE into the error message?

Yeah, putting MODE into the error message is cleaner. It's done in v8.

> 3 - 0002
> ```
>                 case WAIT_LSN_RESULT_NOT_IN_RECOVERY:
>                         if (throw)
>                         {
> +                               const           WaitLSNTypeDesc *desc = 
> &WaitLSNTypeDescs[lsnType];
> +                               XLogRecPtr      currentLSN = 
> GetCurrentLSNForWaitType(lsnType);
> +
>                                 if (PromoteIsTriggered())
>                                 {
>                                         ereport(ERROR,
>                                                         
> errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                                                         errmsg("recovery is 
> not in progress"),
> -                                                       errdetail("Recovery 
> ended before replaying target LSN %X/%08X; last replay LSN %X/%08X.",
> +                                                       errdetail("Recovery 
> ended before target LSN %X/%08X was %s; last %s LSN %X/%08X.",
>                                                                           
> LSN_FORMAT_ARGS(lsn),
> -                                                                         
> LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL))));
> +                                                                         
> desc->verb,
> +                                                                         
> desc->noun,
> +                                                                         
> LSN_FORMAT_ARGS(currentLSN)));
>                                 }
>                                 else
>                                         ereport(ERROR,
>                                                         
> errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                                                         errmsg("recovery is 
> not in progress"),
> -                                                       errhint("Waiting for 
> the replay LSN can only be executed during recovery."));
> +                                                       errhint("Waiting for 
> the %s LSN can only be executed during recovery.",
> +                                                                       
> desc->noun));
>                         }
> ```
>
> currentLSN is only used in the if clause, thus it can be defined inside the 
> if clause.

+ 1.

> 3 - 0002
> ```
> +       /*
> +        * If we wrote an LSN that someone was waiting for then walk over the
> +        * shared memory array and set latches to notify the waiters.
> +        */
> +       if (waitLSNState &&
> +               (LogstreamResult.Write >=
> +                
> pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_STANDBY_WRITE])))
> +               WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_WRITE, 
> LogstreamResult.Write);
> ```
>
> Do we need to mention "walk over the shared memory array and set latches” in 
> the comment? The logic belongs to WaitLSNWakeup(). What about if the wake up 
> logic changes in future, then this comment would become stale. So I think we 
> only need to mention “notify the waiters”.
>

It makes sense to me. They are incorporated into v8.

>
> 4 - 0003
> ```
> +       /*
> +        * Handle parenthesized option list.  This fires when we're in an
> +        * unfinished parenthesized option list.  get_previous_words treats a
> +        * completed parenthesized option list as one word, so the above test 
> is
> +        * correct.  mode takes a string value ('replay', 'write', 'flush'),
> +        * timeout takes a string value, no_throw takes no value.
> +        */
>         else if (HeadMatches("WAIT", "FOR", "LSN", MatchAny, "WITH", "(*") &&
>                          !HeadMatches("WAIT", "FOR", "LSN", MatchAny, "WITH", 
> "(*)"))
>         {
> -               /*
> -                * This fires if we're in an unfinished parenthesized option 
> list.
> -                * get_previous_words treats a completed parenthesized option 
> list as
> -                * one word, so the above test is correct.
> -                */
>                 if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
> -                       COMPLETE_WITH("timeout", "no_throw");
> -
> -               /*
> -                * timeout takes a string value, no_throw takes no value. We 
> don't
> -                * offer completions for these values.
> -                */
> ```
>
> The new comment has lost the meaning of “We don’t offer completions for these 
> values (timeout and no_throw)”, to be explicit, I feel we can retain the 
> sentence.

 The sentence is retained.

> 5 - 0004
> ```
> +       my $isrecovery =
> +         $self->safe_psql('postgres', "SELECT pg_is_in_recovery()");
> +       chomp($isrecovery);
>         croak "unknown mode $mode for 'wait_for_catchup', valid modes are "
>           . join(', ', keys(%valid_modes))
>           unless exists($valid_modes{$mode});
> @@ -3347,9 +3350,6 @@ sub wait_for_catchup
>         }
>         if (!defined($target_lsn))
>         {
> -               my $isrecovery =
> -                 $self->safe_psql('postgres', "SELECT pg_is_in_recovery()");
> -               chomp($isrecovery);
> ```
>
> I wonder why pull up pg_is_in_recovery to an early place and unconditionally 
> call it?
>

This seems unnecessary. I also realized that my earlier approach in
patch 4 may have been semantically incorrect — it could end up waiting
for the LSN to replay/write/flush on the node itself, rather than on
the downstream standby, which defeats the purpose of
wait_for_catchup(). Patch 4 attempts to address this by running WAIT
FOR LSN on the standby itself.

Support for primary-flush waiting and the refactoring of existing
modes have been also incorporated in v8 following Alexander’s
feedback. The major updates are in patches 2 and 4. Please check.

--
Best,
Xuneng

v8-0001-Extend-xlogwait-infrastructure-with-write-and-flu.patch
Description: Binary data

v8-0003-Add-tab-completion-for-WAIT-FOR-LSN-MODE-option.patch
Description: Binary data

v8-0004-Use-WAIT-FOR-LSN-in-PostgreSQL-Test-Cluster-wait_.patch
Description: Binary data

v8-0002-Add-MODE-option-to-WAIT-FOR-LSN-command.patch
Description: Binary data

Re: Implement waiting for wal lsn replay: reloaded

Reply via email to