Hi Alvaro, I just started reading this thread, but maybe you can confirm or refute my understanding of what was done.
In the first email you write > As mentioned in the course of thread [1], we're missing a fix for streaming replication to avoid sending records that the primary hasn't fully flushed yet. This patch is a first attempt at fixing that problem by retreating the LSN reported as FlushPtr whenever a segment is registered, based on the understanding that if no registration exists then the LogwrtResult.Flush pointer can be taken at face value; but if a registration exists, then we have to stream only till the start LSN of that registered entry. So did we end up holding back the wal_sender to not send anything that is not confirmed as flushed on master Are there measurements on how much this slows down replication compared to allowing sending the moment it is written in buffers but not necessarily flushed locally ? Did we investigate possibility of sending as fast as possible and controlling the flush synchronisation by sending separate flush pointers *both* ways ? And maybe there was even an alternative considered where we are looking at a more general Durability, for example 2-out-of-3 where primary is one of the 3 and not necessarily the most durable one? ----- Hannu Krosing Google Cloud - We have a long list of planned contributions and we are hiring. Contact me if interested. On Fri, Sep 24, 2021 at 4:33 AM Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote: > > On 2021-Sep-23, Alvaro Herrera wrote: > > > However, I notice now that the pg_rewind tests reproducibly fail in > > branch 14 for reasons I haven't yet understood. It's strange that no > > other branch fails, even when run quite a few times. > > Turns out that this is a real bug (setting EndOfLog seems insufficient). > I'm looking into it. > > -- > Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ > "No necesitamos banderas > No reconocemos fronteras" (Jorge González) > >