On 2025/05/13 0:47, Andrey Borodin wrote:
Moved off from "Small fixes needed by high-availability tools"

On 12 May 2025, at 01:33, Amit Kapila <amit.kapil...@gmail.com> wrote:

On Fri, May 2, 2025 at 6:30 PM Andrey Borodin <x4...@yandex-team.ru> wrote:

3. Allow reading LSN written by walreciever, but not flushed yet

Problem: if we have synchronous_standby_names = ANY(node1,node2), node2 might 
be ahead of node1 by flush LSN, but before by written LSN. If we do a failover 
we choose node2 instead of node1 and loose data recently committed with 
synchronous_commit=remote_write.

In this case, doesn't the flush LSN typically catch up to the write LSN on node2
after a few seconds? Even if the walreceiver exits while there's still written
but unflushed WAL, it looks like WalRcvDie() ensures everything is flushed by
calling XLogWalRcvFlush(). So, isn't it safe to rely on the flush LSN when 
selecting
the most advanced node? No?


Caveat: we already have a function pg_last_wal_receive_lsn(), which in fact 
returns flushed LSN, not written. I propose to add a new function which returns 
LSN actually written. Internals of this function are already implemented 
(GetWalRcvWriteRecPtr()), but unused.

GetWalRcvWriteRecPtr() returns walrcv->writtenUpto, which can move backward
when the walreceiver restarts. This behavior is OK for your purpose?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Reply via email to