Dear Hackers,
I think, I reproduced test fails. The test fails because walsender is in
waiting state in WalSndDoneImmediate -> ereport with the following stack (see
below). It seems, it tries to send the message to the replica and flush it, but
the replica is hung.
#0 0x00007a4b37f2a037 in epoll_wait
#1 0x000056855317a2e8 in WaitEventSetWaitBlock
#2 WaitEventSetWait
#3 0x0000568552feea8e in secure_write
#4 0x0000568552ff5666 in internal_flush_buffer
#5 0x0000568552ff5966 in internal_flush
#6 socket_flush ()
#7 socket_flush ()
#8 0x00005685532ff1b3 in send_message_to_frontend (edata=<optimized out>)
#9 EmitErrorReport ()
#10 0x00005685532ff6dd in errfinish
#11 0x000056855312cc9c in WalSndDoneImmediate () at walsender.c:3625
I would propose to remove the ereport call from WalSndDoneImmediate.
With best regards,
Vitaly
On 1/19/26 15:41, Fujii Masao wrote:
On Sun, Jan 18, 2026 at 1:20 AM Andrey Silitskiy
<[email protected]> wrote:
On Jan 9, 2026 at 10:04 AM Fujii Masao
<masao(dot)fujii(at)gmail(dot)com> wrote:
Why do we need to send a "done" message to the receiver here?
Since delivery isn't guaranteed in immediate mode, it seems of limited
value.
It seems to me that it is better to send a message in cases where it is
possible, so as not to raise errors on the subscriber during a clean shutdown.
And when this is not possible, exit the process without waiting.
For the immediate mode, would it make sense to log that the walsender is
terminating in immediate mode and that WAL replication may be incomplete,
so users can more easily understand what happened?
Added to the latest patch.
Thanks for updating the patch!
cfbot is reporting a test failure. Could you please look into it and
fix the issue?
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F6234
Regards,