On Mon, Sep 13, 2021 at 7:01 AM Kyotaro Horiguchi <horikyota....@gmail.com> wrote: > > Hello. > > As reported in [1] it seems that walsender can suffer timeout in > certain cases. It is not clearly confirmed, but I suspect that > there's the case where LogicalRepApplyLoop keeps running the innermost > loop without receiving keepalive packet for longer than > wal_sender_timeout (not wal_receiver_timeout). >
Why is that happening? In the previous investigation in this area [1] your tests revealed that after reading a WAL page, we always send keep alive, so even if the transaction is large, we should send some keepalive in-between. The other thing that I am not able to understand from Abhishek's reply [2] is why increasing wal_sender_timeout/wal_recevier_timeout leads to the removal of required WAL segments. As per my understanding, we shouldn't remove WAL unless we get confirmation that the subscriber has processed it. [1] - https://www.postgresql.org/message-id/20210610.150016.1709823354377067679.horikyota.ntt%40gmail.com [2] - https://www.postgresql.org/message-id/CAEDsCzjEHLxgqa4d563CKFwSbgBvvnM91Cqfq_qoZDXCkyOsiw%40mail.gmail.com Note - I have added Abhishek to see if he has answers to any of these questions. -- With Regards, Amit Kapila.