Dear Horiguchi-san, Amit, > > On Tue, Dec 13, 2022 at 7:35 AM Kyotaro Horiguchi > > <horikyota....@gmail.com> wrote: > > > > > > At Mon, 12 Dec 2022 18:10:00 +0530, Amit Kapila > <amit.kapil...@gmail.com> wrote in > > Yeah, I think ideally it will timeout but if we have a solution like > > during delay, we keep sending ping messages time-to-time, it should > > work fine. However, that needs to be verified. Do you see any reasons > > why that won't work?
I have implemented and tested that workers wake up per wal_receiver_timeout/2 and send keepalive. Basically it works well, but I found two problems. Do you have any good suggestions about them? 1) With this PoC at present, workers calculate sending intervals based on its wal_receiver_timeout, and it is suppressed when the parameter is set to zero. This means that there is a possibility that walsender is timeout when wal_sender_timeout in publisher and wal_receiver_timeout in subscriber is different. Supposing that wal_sender_timeout is 2min, wal_receiver_tiemout is 5min, and min_apply_delay is 10min. The worker on subscriber will wake up per 2.5min and send keepalives, but walsender exits before the message arrives to publisher. One idea to avoid that is to send the min_apply_delay subscriber option to publisher and compare them, but it may be not sufficient. Because XXX_timout GUC parameters could be modified later. 2) The issue reported by Vignesh-san[1] has still remained. I have already analyzed that [2], the root cause is that flushed WAL is not updated and sent to the publisher. Even if workers send keepalive messages to pub during the delay, the flushed position cannot be modified. [1]: https://www.postgresql.org/message-id/CALDaNm1vT8qNBqHivtAgYur-5-YkwF026VHtw9srd4fsdeaufA%40mail.gmail.com [2]: https://www.postgresql.org/message-id/TYAPR01MB5866F6BE7399E6343A96E016F51C9%40TYAPR01MB5866.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED