I have observed that currently incase there is a network break between master and standby, walsender process gets terminated immediately, however walreceiver detects the breakage after long time. The main reason I could see is due to replication_timeout configuration parameter, walsender checks for replication_timeout, if there is no communication from other side till replication_timeout time it detects it as a condition to terminate the walsender. However there is no such mechanism in walreceiver, it fails during send socket call from XLogWalRcvSendReply() after calling the same many times as internally might be in send until the sockets internal buffer is full, it keeps accumulating even if other side recv has not received the data.
Shouldn't in walreceiver, there be a mechanism so that it can detect n/w failure sooner? Basic Steps to observe above behavior 1. Both master and standby machine are connected normally, 2. then you use the command: ifconfig ip down; make the network card of master and standby down, Observation master can detect connect abnormal, but the standby can't detect connect abnormal and show a connected channel long time. With Regards, Amit Kapila