Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

Heikki Linnakangas Tue, 02 Oct 2012 01:27:14 -0700

On 02.10.2012 10:36, Amit kapila wrote:

On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote:

So let's think how this should ideally work from a user's point of view.
I think there should be just two settings: walsender_timeout and
walreceiver_timeout. walsender_timeout specifies how long a walsender
will keep a connection open if it doesn't hear from the walreceiver, and
walreceiver_timeout is the same for walreceiver. The system should
figure out itself how often to send keepalive messages so that those
timeouts are not reached.


By this it implies that we should remove wal_receiver_status_interval. 
Currently it is also used
incase of reply message of data sent by sender which contains till what point 
receiver has flushed. So if we remove this variable
receiver might start sending that message sonner than required.
Is that okay behavior?

I guess we should keep that setting, then, so that you can get statusupdates more often than would be required for heartbeat purposes.

In walsender, after half of walsender_timeout has elapsed and we haven't
received anything from the client, the walsender process should send a
"ping" message to the client. Whenever the client receives a Ping, it
replies. The walreceiver does the same; when half of walreceiver_timeout
has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip
resets the timer in both ends, regardless of which side initiated it, so
if e.g walsender_timeout<  walreceiver_timeout, the client will never
have to initiate a Ping message, because walsender will always reach the
walsender_timeout/2 point first and initiate the heartbeat message.


Just to clarify, walsender should reset timer after it gets reply from receiver 
of the message it sent.


Right.

walreceiver should reset timer after sending reply for heartbeat message.

> Similar to above timers will be reset when receiver sent theheartbeat message.

walreceiver should reset the timer when it *receives* any message fromwalsender. If it sends the reply right away, I guess that's the samething, but I'd phrase it so that it's the reception of a message fromthe other end that resets the timer.

The Ping/Pong messages don't necessarily need to be new message types,
we can use the message types we currently have, perhaps with an
additional flag attached to them, to request the other side to reply
immediately.


Can't we make the decision to send reply immediately based on message type, 
because these message types will be unique.

To clarify my understanding,
1. the heartbeat message from walsender side will be keepalive message ('k') 
and from walreceiver side it will be Hot Standby feedback message ('h').
2. the reply message from walreceiver side will be current reply message ('r').

Yep. I wonder why need separate message types for Hot Standby Feedback'h' and Reply 'r', though. Seems it would be simpler to have just onemessasge type that includes all the fields from both messages.

3. currently there is no reply kind of message from walsender, so do we need to 
introduce one new message for it or can use some existing message only?
     if new, do we need to send any additional information along with it, for 
existing messages can we use keepalive message it self as reply message but 
with an additional byte
     to indicate it is reply?

Hmm, I think I'd prefer to use the existing Keepalive message 'k', withan additional flag.


- Heikki


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

Reply via email to