On Thu, Mar 12, 2026 at 1:21 PM Japin Li <[email protected]> wrote:
>
>
> Hi, Alexander
>
> On Wed, 11 Mar 2026 at 16:24, Alexander Korotkov <[email protected]> wrote:
> > Hi!
> >
> > On Wed, Mar 4, 2026 at 10:47 AM Andrey Silitskiy
> > <[email protected]> wrote:
> >> On Wed, 03 Mar 2026 Japin Li <japinli(at)hotmail(dot)com> wrote:
> >>  > At first glance, wal_sender_shutdown_timeout seems to have the wrong
> >>  > type.
> >>
> >> Fixed.
> >
> > I've revised this patch fixing grammar in commit message, comments and
> > documentation.
> >
> > I think the current patch addresses all the main concerns raised in
> > the thread.  The patch doesn't unconditionally change the behavior: it
> > introduces a new GUC, which could be set on per-connection basis, and
> > also affects physical WAL senders.  The GUC specifies timeout, which
> > gives user a flexibility.  The default value of the GUC is -1
> > (disabled).  So, no behavior change by default.  Also, it doesn't
> > require replication protocol change.  New WalSndDoneImmediate() sends
> > done message to the receiver just like WalSndDone().  So, existing
> > clients should be OK.
>
> Thanks for updating the patch.
>
> 1.
> The shutdown_request_timestamp is used only in WalSndCheckShutdownTimeOut().
> Would it make sense to declare it inside this function instead?
>
> 2.
> +static void
> +WalSndDoneImmediate()
> +{
>
> We should add `void` to the parameter list here to match the declaration:
>
> >
> > I'm going to push this if no objections.

Thanks for working on this patch!

I'm not sure introducing a timeout for walsender shutdown is a good idea,
or whether users can easily determine an appropriate timeout value.

With the patch, when I set wal_sender_shutdown_timeout to 1ms and
wal_sender_timeout to 1min, it still took about 26s for walsender to shut down
due to wal_sender_shutdown_timeout expiration in my test. This suggests
the timeout may not be handled correctly.

My test steps were:
1. Set up a logical replication environment.
2. Lock the table in ACCESS EXCLUSIVE mode on the subscriber.
3. Load a large amount of data into the table on the publisher.
   As a result, the logical apply worker cannot receive new data due
to the lock wait, so the data is not replicated and the send buffer
becomes full.
4. Shut down the publisher.

2026-03-13 13:24:24 JST [postmaster] LOG:  received fast shutdown request
2026-03-13 13:24:24 JST [postmaster] LOG:  aborting any active transactions
2026-03-13 13:24:24 JST [postmaster] LOG:  background worker "logical
replication launcher" (PID 34410) exited with exit code 1
2026-03-13 13:24:24 JST [checkpointer] LOG:  shutting down
2026-03-13 13:24:50 JST [walsender] WARNING:  walsender shutting down
due to wal_sender_shutdown_timeout expiration
2026-03-13 13:24:50 JST [walsender] HINT:  Replication may be incomplete.

Regards,

-- 
Fujii Masao


Reply via email to