subject:"Re\: \[HACKERS\] \[BUGS\] BUG #7534\: walreceiver takes long time to detect n\/w breakdown"

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-11-13 Thread Fujii Masao

On Tue, Nov 13, 2012 at 1:06 PM, Amit kapila amit.kap...@huawei.com wrote:
 On Monday, November 12, 2012 8:23 PM Fujii Masao wrote:
 On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila amit.kap...@huawei.com wrote:
 On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote:
 On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila amit.kap...@huawei.com
 wrote:
  On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
  On 19.10.2012 14:42, Amit kapila wrote:
   On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
   Before implementing the timeout parameter, I think that it's
 better
  to change
   both pg_basebackup background process and pg_receivexlog so that

 BTW, IIRC the walsender has no timeout mechanism during sending
 backup data to pg_basebackup. So it's also useful to implement the
 timeout mechanism for the walsender during backup.

 Yes, its useful, but for walsender the main problem is that it uses blocking
 send call to send the data.
 I have tried using tcp_keepalive settings, but the send call doesn't comeout
 incase of network break.
 The only way I could get it out is:
 change in the corresponding file /proc/sys/net/ipv4/tcp_retries2 by using
 the command
 echo 8  /proc/sys/net/ipv4/tcp_retries2
 As per recommendation, its value should be at-least 8 (equivalent to 100
 sec)

 Do you have any idea, how it can be achieved?

 What about using pq_putmessage_noblock()?

 I will try this, but do you know why at first place in code the blocking mode 
 is used to send files?
 I am asking as I am little scared that it should not break any design which 
 was initially thought of while making send of files as blocking.

I'm afraid I don't know why. I guess that using non-blocking mode complicates
the code, so in the first version of pg_basebackup the blocking mode
was adopted.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-11-12 Thread Fujii Masao

On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila amit.kap...@huawei.com wrote:
 On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote:
 On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila amit.kap...@huawei.com
 wrote:
  On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
  On 19.10.2012 14:42, Amit kapila wrote:
   On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
   Before implementing the timeout parameter, I think that it's
 better
  to change
   both pg_basebackup background process and pg_receivexlog so that
 they
   send back the reply message immediately when they receive the
  keepalive
   message requesting the reply. Currently, they always ignore such
  keepalive
   message, so status interval parameter (-s) in them always must be
 set
  to
   the value less than replication timeout. We can avoid this
  troublesome
   parameter setting by introducing the same logic of walreceiver
 into
  both
   pg_basebackup background process and pg_receivexlog.
  
   Please find the patch attached to address the modification
 mentioned
  by you (send immediate reply for keepalive).
   Both basebackup and pg_receivexlog uses the same function
  ReceiveXLogStream, so single change for both will address the issue.
 
  Thanks, committed this one after shuffling it around the changes I
  committed yesterday. I also updated the docs to not claim that -s
 option
  is required to avoid timeout disconnects anymore.
 
  Thank you.
  However I think still the issue will not be completely solved.
  pg_basebackup/pg_receivexlog can still take long time to
  detect network break as they don't have timeout concept. To do that I
 have
  sent one proposal which is mentioned at end of mail chain:
  http://archives.postgresql.org/message-
 id/6C0B27F7206C9E4CA54AE035729E9C3828
  53BBED@szxeml509-mbs
 
  Do you think there is any need to introduce such mechanism in
  pg_basebackup/pg_receivexlog?

 Are you planning to introduce the timeout mechanism in pg_basebackup
 main process? Or background process? It's useful to implement both.

 By background process, you mean ReceiveXlogStream?
 For both.

 I think for background process, it can be done in a way similar to what we
 have done for walreceiver.

Yes.

 But I have some doubts for how to do for main process:

 Logic similar to walreceiver can not be used incase network goes down during
 getting other database file from server.
 The reason for the same is to receive the data files PQgetCopyData() is
 called in synchronous mode, so it keeps waiting for infinite time till it
 gets some data.
 In order to solve this issue, I can think of following options:
 1. Making this call also asynchronous (but now sure about impact of this).

+1

Walreceiver already calls PQgetCopyData() asynchronously. ISTM you can
solve the issue in the similar way to walreceiver's.

 2. In function pqWait, instead of passing hard-code value -1 (i.e. infinite
 wait), we can send some finite time. This time can be received as command
 line argument
 from respective utility and set the same in PGconn structure.
 In order to have timeout value in PGconn, we can have:
 a. Add new parameter in PGconn to indicate the receive timeout.
 b. Use the existing parameter connect_timeout for receive timeout
 also but this may lead to confusion.
 3. Any other better option?

 Apart from above issue, there is possibility that if during connect time
 network goes down, then it might hang,  because connect_timeout by default
 will be NULL and connectDBComplete will start waiting inifinitely for
 connection to become successful.
 So shall we have command line argument separately for this also or any other
 way as you suugest.

Yes, I think that we should add something like --conninfo option to
pg_basebackup
and pg_receivexlog. We can easily set not only connect_timeout but also sslmode,
application_name, ... by using such option accepting conninfo string.

 BTW, IIRC the walsender has no timeout mechanism during sending
 backup data to pg_basebackup. So it's also useful to implement the
 timeout mechanism for the walsender during backup.

 Yes, its useful, but for walsender the main problem is that it uses blocking
 send call to send the data.
 I have tried using tcp_keepalive settings, but the send call doesn't comeout
 incase of network break.
 The only way I could get it out is:
 change in the corresponding file /proc/sys/net/ipv4/tcp_retries2 by using
 the command
 echo 8  /proc/sys/net/ipv4/tcp_retries2
 As per recommendation, its value should be at-least 8 (equivalent to 100
 sec)

 Do you have any idea, how it can be achieved?

What about using pq_putmessage_noblock()?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-11-08 Thread Heikki Linnakangas


On 19.10.2012 14:42, Amit kapila wrote:

On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:

Before implementing the timeout parameter, I think that it's better to change
both pg_basebackup background process and pg_receivexlog so that they
send back the reply message immediately when they receive the keepalive
message requesting the reply. Currently, they always ignore such keepalive
message, so status interval parameter (-s) in them always must be set to
the value less than replication timeout. We can avoid this troublesome
parameter setting by introducing the same logic of walreceiver into both
pg_basebackup background process and pg_receivexlog.


Please find the patch attached to address the modification mentioned by you 
(send immediate reply for keepalive).
Both basebackup and pg_receivexlog uses the same function ReceiveXLogStream, so 
single change for both will address the issue.


Thanks, committed this one after shuffling it around the changes I 
committed yesterday. I also updated the docs to not claim that -s option 
is required to avoid timeout disconnects anymore.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-11-08 Thread Amit Kapila

On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
 On 19.10.2012 14:42, Amit kapila wrote:
  On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
  Before implementing the timeout parameter, I think that it's better
 to change
  both pg_basebackup background process and pg_receivexlog so that they
  send back the reply message immediately when they receive the
 keepalive
  message requesting the reply. Currently, they always ignore such
 keepalive
  message, so status interval parameter (-s) in them always must be set
 to
  the value less than replication timeout. We can avoid this
 troublesome
  parameter setting by introducing the same logic of walreceiver into
 both
  pg_basebackup background process and pg_receivexlog.
 
  Please find the patch attached to address the modification mentioned
 by you (send immediate reply for keepalive).
  Both basebackup and pg_receivexlog uses the same function
 ReceiveXLogStream, so single change for both will address the issue.
 
 Thanks, committed this one after shuffling it around the changes I
 committed yesterday. I also updated the docs to not claim that -s option
 is required to avoid timeout disconnects anymore.

Thank you.
However I think still the issue will not be completely solved.
pg_basebackup/pg_receivexlog can still take long time to 
detect network break as they don't have timeout concept. To do that I have
sent one proposal which is mentioned at end of mail chain:
http://archives.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C3828
53BBED@szxeml509-mbs

Do you think there is any need to introduce such mechanism in
pg_basebackup/pg_receivexlog?

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-11-08 Thread Fujii Masao

On Thu, Nov 8, 2012 at 2:22 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 16.10.2012 15:31, Heikki Linnakangas wrote:

 On 15.10.2012 19:31, Fujii Masao wrote:

 On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:

 On 15.10.2012 13:13, Heikki Linnakangas wrote:


 Oh, I didn't remember that we've documented the specific structs
 that we
 pass around. It's quite bogus anyway to explain the messages the way we
 do currently, as they are actually dependent on the underlying
 architecture's endianess and padding. I think we should refactor the
 protocol to not transmit raw structs, but use pq_sentint and friends to
 construct the messages. This was discussed earlier (see


 http://archives.postgresql.org/message-id/4fe2279c.2070...@enterprisedb.com),

 I think there's consensus that 9.3 would be a good time to do that
 as we changed the XLogRecPtr format anyway.


 This is what I came up with. The replication protocol is now
 architecture-independent. The WAL format itself is still
 architecture-independent, of course, but this is useful if you want
 to e.g
 use pg_receivexlog to back up a server that runs on a different
 platform.

 I chose the int64 format to transmit timestamps, even when compiled with
 --disable-integer-datetimes.

 Please review if you have the time..


 Thanks for the patch!

 When I ran pg_receivexlog, I encountered the following error.


 Yeah, clearly I didn't test this near enough...

 I fixed the bugs you bumped into, new version attached.


 Committed this now, after fixing a few more bugs that came up during
 testing.

As I suggested upthread, pg_basebackup and pg_receivexlog no longer
need to check integer_datetimes before establishing the connection,
thanks to this commit. If this is right, the attached patch should be applied.
The patch just removes the check of integer_datetimes by pg_basebackup
and pg_receivexlog.

Regards,

-- 
Fujii Masao


dont_check_integer_datetimes_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-11-08 Thread Fujii Masao

On Fri, Nov 9, 2012 at 1:40 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Nov 8, 2012 at 2:22 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 On 16.10.2012 15:31, Heikki Linnakangas wrote:

 On 15.10.2012 19:31, Fujii Masao wrote:

 On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:

 On 15.10.2012 13:13, Heikki Linnakangas wrote:


 Oh, I didn't remember that we've documented the specific structs
 that we
 pass around. It's quite bogus anyway to explain the messages the way we
 do currently, as they are actually dependent on the underlying
 architecture's endianess and padding. I think we should refactor the
 protocol to not transmit raw structs, but use pq_sentint and friends to
 construct the messages. This was discussed earlier (see


 http://archives.postgresql.org/message-id/4fe2279c.2070...@enterprisedb.com),

 I think there's consensus that 9.3 would be a good time to do that
 as we changed the XLogRecPtr format anyway.


 This is what I came up with. The replication protocol is now
 architecture-independent. The WAL format itself is still
 architecture-independent, of course, but this is useful if you want
 to e.g
 use pg_receivexlog to back up a server that runs on a different
 platform.

 I chose the int64 format to transmit timestamps, even when compiled with
 --disable-integer-datetimes.

 Please review if you have the time..


 Thanks for the patch!

 When I ran pg_receivexlog, I encountered the following error.


 Yeah, clearly I didn't test this near enough...

 I fixed the bugs you bumped into, new version attached.


 Committed this now, after fixing a few more bugs that came up during
 testing.

 As I suggested upthread, pg_basebackup and pg_receivexlog no longer
 need to check integer_datetimes before establishing the connection,
 thanks to this commit. If this is right, the attached patch should be applied.
 The patch just removes the check of integer_datetimes by pg_basebackup
 and pg_receivexlog.

Another comment that I made upthread is:


In XLogWalRcvSendReply() and XLogWalRcvSendHSFeedback(),
GetCurrentTimestamp() is called twice. I think that we can skip the
latter call if integer-datetime is enabled because the return value of
GetCurrentTimestamp() and GetCurrentIntegerTimestamp() is in the
same format. It's worth reducing the number of GetCurrentTimestamp()
calls, I think.


Attached patch removes redundant GetCurrentTimestamp() call
from XLogWalRcvSendReply() and XLogWalRcvSendHSFeedback(),
if --enable-integer-datetimes.

Regards,

-- 
Fujii Masao


reduce_get_current_timestamp_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-11-08 Thread Fujii Masao

On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila amit.kap...@huawei.com wrote:
 On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
 On 19.10.2012 14:42, Amit kapila wrote:
  On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
  Before implementing the timeout parameter, I think that it's better
 to change
  both pg_basebackup background process and pg_receivexlog so that they
  send back the reply message immediately when they receive the
 keepalive
  message requesting the reply. Currently, they always ignore such
 keepalive
  message, so status interval parameter (-s) in them always must be set
 to
  the value less than replication timeout. We can avoid this
 troublesome
  parameter setting by introducing the same logic of walreceiver into
 both
  pg_basebackup background process and pg_receivexlog.
 
  Please find the patch attached to address the modification mentioned
 by you (send immediate reply for keepalive).
  Both basebackup and pg_receivexlog uses the same function
 ReceiveXLogStream, so single change for both will address the issue.

 Thanks, committed this one after shuffling it around the changes I
 committed yesterday. I also updated the docs to not claim that -s option
 is required to avoid timeout disconnects anymore.

 Thank you.
 However I think still the issue will not be completely solved.
 pg_basebackup/pg_receivexlog can still take long time to
 detect network break as they don't have timeout concept. To do that I have
 sent one proposal which is mentioned at end of mail chain:
 http://archives.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C3828
 53BBED@szxeml509-mbs

 Do you think there is any need to introduce such mechanism in
 pg_basebackup/pg_receivexlog?

Are you planning to introduce the timeout mechanism in pg_basebackup
main process? Or background process? It's useful to implement both.

BTW, IIRC the walsender has no timeout mechanism during sending
backup data to pg_basebackup. So it's also useful to implement the
timeout mechanism for the walsender during backup.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-11-08 Thread Amit Kapila

On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote:
 On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila amit.kap...@huawei.com
 wrote:
  On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
  On 19.10.2012 14:42, Amit kapila wrote:
   On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
   Before implementing the timeout parameter, I think that it's
 better
  to change
   both pg_basebackup background process and pg_receivexlog so that
 they
   send back the reply message immediately when they receive the
  keepalive
   message requesting the reply. Currently, they always ignore such
  keepalive
   message, so status interval parameter (-s) in them always must be
 set
  to
   the value less than replication timeout. We can avoid this
  troublesome
   parameter setting by introducing the same logic of walreceiver
 into
  both
   pg_basebackup background process and pg_receivexlog.
  
   Please find the patch attached to address the modification
 mentioned
  by you (send immediate reply for keepalive).
   Both basebackup and pg_receivexlog uses the same function
  ReceiveXLogStream, so single change for both will address the issue.
 
  Thanks, committed this one after shuffling it around the changes I
  committed yesterday. I also updated the docs to not claim that -s
 option
  is required to avoid timeout disconnects anymore.
 
  Thank you.
  However I think still the issue will not be completely solved.
  pg_basebackup/pg_receivexlog can still take long time to
  detect network break as they don't have timeout concept. To do that I
 have
  sent one proposal which is mentioned at end of mail chain:
  http://archives.postgresql.org/message-
 id/6C0B27F7206C9E4CA54AE035729E9C3828
  53BBED@szxeml509-mbs
 
  Do you think there is any need to introduce such mechanism in
  pg_basebackup/pg_receivexlog?
 
 Are you planning to introduce the timeout mechanism in pg_basebackup
 main process? Or background process? It's useful to implement both.

By background process, you mean ReceiveXlogStream?
For both.

I think for background process, it can be done in a way similar to what we
have done for walreceiver.
But I have some doubts for how to do for main process:

Logic similar to walreceiver can not be used incase network goes down during
getting other database file from server. 
The reason for the same is to receive the data files PQgetCopyData() is
called in synchronous mode, so it keeps waiting for infinite time till it
gets some data. 
In order to solve this issue, I can think of following options: 
1. Making this call also asynchronous (but now sure about impact of this). 
2. In function pqWait, instead of passing hard-code value -1 (i.e. infinite
wait), we can send some finite time. This time can be received as command
line argument 
from respective utility and set the same in PGconn structure. 
In order to have timeout value in PGconn, we can have: 
a. Add new parameter in PGconn to indicate the receive timeout. 
b. Use the existing parameter connect_timeout for receive timeout
also but this may lead to confusion. 
3. Any other better option?

Apart from above issue, there is possibility that if during connect time
network goes down, then it might hang,  because connect_timeout by default
will be NULL and connectDBComplete will start waiting inifinitely for
connection to become successful. 
So shall we have command line argument separately for this also or any other
way as you suugest. 

 BTW, IIRC the walsender has no timeout mechanism during sending
 backup data to pg_basebackup. So it's also useful to implement the
 timeout mechanism for the walsender during backup.

Yes, its useful, but for walsender the main problem is that it uses blocking
send call to send the data.
I have tried using tcp_keepalive settings, but the send call doesn't comeout
incase of network break.
The only way I could get it out is:
change in the corresponding file /proc/sys/net/ipv4/tcp_retries2 by using
the command 
echo 8  /proc/sys/net/ipv4/tcp_retries2 
As per recommendation, its value should be at-least 8 (equivalent to 100
sec)

Do you have any idea, how it can be achieved?

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-11-07 Thread Heikki Linnakangas

On 16.10.2012 15:31, Heikki Linnakangas wrote:

On 15.10.2012 19:31, Fujii Masao wrote:

On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

On 15.10.2012 13:13, Heikki Linnakangas wrote:

Oh, I didn't remember that we've documented the specific structs
that we
pass around. It's quite bogus anyway to explain the messages the way we
do currently, as they are actually dependent on the underlying
architecture's endianess and padding. I think we should refactor the
protocol to not transmit raw structs, but use pq_sentint and friends to
construct the messages. This was discussed earlier (see

http://archives.postgresql.org/message-id/4fe2279c.2070...@enterprisedb.com),

I think there's consensus that 9.3 would be a good time to do that
as we changed the XLogRecPtr format anyway.

This is what I came up with. The replication protocol is now
architecture-independent. The WAL format itself is still
architecture-independent, of course, but this is useful if you want
to e.g
use pg_receivexlog to back up a server that runs on a different
platform.

I chose the int64 format to transmit timestamps, even when compiled with
--disable-integer-datetimes.

Please review if you have the time..

Thanks for the patch!

When I ran pg_receivexlog, I encountered the following error.

Yeah, clearly I didn't test this near enough...

I fixed the bugs you bumped into, new version attached.

Committed this now, after fixing a few more bugs that came up during
testing. Next, I'll take a look at the patch you sent for adding
timeouts to pg_basebackup and pg_receivexlog
(http://archives.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C382853BBED@szxeml509-mbs)

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-18 Thread Fujii Masao

On Wed, Oct 17, 2012 at 8:46 PM, Amit Kapila amit.kap...@huawei.com wrote:
 On Monday, October 15, 2012 3:43 PM Heikki Linnakangas wrote:
 On 13.10.2012 19:35, Fujii Masao wrote:
  On Thu, Oct 11, 2012 at 11:52 PM, Heikki Linnakangas
  hlinnakan...@vmware.com  wrote:
  Ok, thanks. Committed.
 
  I found one typo. The attached patch fixes that typo.

 Thanks, fixed.

  ISTM you need to update the protocol.sgml because you added
  the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage.



  Is it worth adding the same mechanism (send back the reply immediately
  if walsender request a reply) into pg_basebackup and pg_receivexlog?

 Good catch. Yes, they should be taught about this too. I'll look into
 doing that too.

 If you have not started and you don't have objection, I can pickup this to
 complete it.

 For both (pg_basebackup and pg_receivexlog), we need to get a timeout
 parameter from user in command line, as
 there is no conf file here. New Option can be -t (parameter name can be
 recvtimeout).

 The main changes will be in function ReceiveXlogStream(), it is a common
 function for both
 Pg_basebackup and pg_receivexlog. Handling will be done in same way as we
 have done in walreceiver.

 Suggestions/Comments?

Before implementing the timeout parameter, I think that it's better to change
both pg_basebackup background process and pg_receivexlog so that they
send back the reply message immediately when they receive the keepalive
message requesting the reply. Currently, they always ignore such keepalive
message, so status interval parameter (-s) in them always must be set to
the value less than replication timeout. We can avoid this troublesome
parameter setting by introducing the same logic of walreceiver into both
pg_basebackup background process and pg_receivexlog.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-18 Thread Fujii Masao

On Tue, Oct 16, 2012 at 9:31 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 15.10.2012 19:31, Fujii Masao wrote:

 On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas
 hlinnakan...@vmware.com  wrote:

 On 15.10.2012 13:13, Heikki Linnakangas wrote:


 Oh, I didn't remember that we've documented the specific structs that we
 pass around. It's quite bogus anyway to explain the messages the way we
 do currently, as they are actually dependent on the underlying
 architecture's endianess and padding. I think we should refactor the
 protocol to not transmit raw structs, but use pq_sentint and friends to
 construct the messages. This was discussed earlier (see


 http://archives.postgresql.org/message-id/4fe2279c.2070...@enterprisedb.com),
 I think there's consensus that 9.3 would be a good time to do that as we
 changed the XLogRecPtr format anyway.



 This is what I came up with. The replication protocol is now
 architecture-independent. The WAL format itself is still
 architecture-independent, of course, but this is useful if you want to
 e.g
 use pg_receivexlog to back up a server that runs on a different platform.

 I chose the int64 format to transmit timestamps, even when compiled with
 --disable-integer-datetimes.

 Please review if you have the time..


 Thanks for the patch!

 When I ran pg_receivexlog, I encountered the following error.


 Yeah, clearly I didn't test this near enough...

 I fixed the bugs you bumped into, new version attached.

Thanks for updating the patch!

We should remove the check of integer_datetime by pg_basebackup
background process and pg_receivexlog? Currently, they always check
it, and then if its setting value is not the same between a client and
server, they fail. Thanks to the patch, ISTM this check is no longer
required.

+   pq_sendint64(reply_message, GetCurrentIntegerTimestamp());

In XLogWalRcvSendReply() and XLogWalRcvSendHSFeedback(),
GetCurrentTimestamp() is called twice. I think that we can skip the
latter call if integer-datetime is enabled because the return value of
GetCurrentTimestamp() and GetCurrentIntegerTimestamp() is in the
same format. It's worth reducing the number of GetCurrentTimestamp()
calls, I think.

elog(DEBUG2, sending write %X/%X flush %X/%X apply %X/%X,
-(uint32) (reply_message.write  32), (uint32) 
reply_message.write,
-(uint32) (reply_message.flush  32), (uint32) 
reply_message.flush,
-(uint32) (reply_message.apply  32), (uint32) 
reply_message.apply);
+(uint32) (writePtr  32), (uint32) writePtr,
+(uint32) (flushPtr  32), (uint32) flushPtr,
+(uint32) (applyPtr  32), (uint32) applyPtr);

elog(DEBUG2, write %X/%X flush %X/%X apply %X/%X,
-(uint32) (reply.write  32), (uint32) reply.write,
-(uint32) (reply.flush  32), (uint32) reply.flush,
-(uint32) (reply.apply  32), (uint32) reply.apply);
+(uint32) (writePtr  32), (uint32) writePtr,
+(uint32) (flushPtr  32), (uint32) flushPtr,
+(uint32) (applyPtr  32), (uint32) applyPtr);

Isn't it worth logging not only WAL location but also the replyRequested
flag in these debug message?

The remaining of the patch looks good to me.

 +   hdrlen = sizeof(int64) + sizeof(int64) +
 sizeof(int64);
 +   hdrlen = sizeof(int64) + sizeof(int64) +
 sizeof(char);

 These should be macro, to avoid calculation overhead?


 The compiler will calculate this at compilation time, it's going to be a
 constant at runtime.

Yes, you're right.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-17 Thread Amit Kapila

 On Monday, October 15, 2012 3:43 PM Heikki Linnakangas wrote:
 On 13.10.2012 19:35, Fujii Masao wrote:
  On Thu, Oct 11, 2012 at 11:52 PM, Heikki Linnakangas
  hlinnakan...@vmware.com  wrote:
  Ok, thanks. Committed.
 
  I found one typo. The attached patch fixes that typo.
 
 Thanks, fixed.
 
  ISTM you need to update the protocol.sgml because you added
  the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage.


 
  Is it worth adding the same mechanism (send back the reply immediately
  if walsender request a reply) into pg_basebackup and pg_receivexlog?
 
 Good catch. Yes, they should be taught about this too. I'll look into
 doing that too.

If you have not started and you don't have objection, I can pickup this to
complete it.

For both (pg_basebackup and pg_receivexlog), we need to get a timeout
parameter from user in command line, as
there is no conf file here. New Option can be -t (parameter name can be
recvtimeout).

The main changes will be in function ReceiveXlogStream(), it is a common
function for both 
Pg_basebackup and pg_receivexlog. Handling will be done in same way as we
have done in walreceiver.

Suggestions/Comments?

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-17 Thread Amit Kapila

On Wednesday, October 17, 2012 5:16 PM Amit Kapila wrote:
  On Monday, October 15, 2012 3:43 PM Heikki Linnakangas wrote:
  On 13.10.2012 19:35, Fujii Masao wrote:
   On Thu, Oct 11, 2012 at 11:52 PM, Heikki Linnakangas
   hlinnakan...@vmware.com  wrote:
   Ok, thanks. Committed.
  
   I found one typo. The attached patch fixes that typo.
 
  Thanks, fixed.
 
   ISTM you need to update the protocol.sgml because you added
   the field 'replyRequested' to WalSndrMessage and
 StandbyReplyMessage.
 
 
 
   Is it worth adding the same mechanism (send back the reply
 immediately
   if walsender request a reply) into pg_basebackup and pg_receivexlog?
 
  Good catch. Yes, they should be taught about this too. I'll look into
  doing that too.
 
 If you have not started and you don't have objection, I can pickup this
 to
 complete it.
 
 For both (pg_basebackup and pg_receivexlog), we need to get a timeout
 parameter from user in command line, as
 there is no conf file here. New Option can be -t (parameter name can be
 recvtimeout).
 
 The main changes will be in function ReceiveXlogStream(), it is a common
 function for both
 Pg_basebackup and pg_receivexlog. Handling will be done in same way as
 we
 have done in walreceiver.

Some more functions where it receives the data files also need similar
handling in pg_basebackup.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-16 Thread Heikki Linnakangas


On 15.10.2012 19:31, Fujii Masao wrote:

On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas
hlinnakan...@vmware.com  wrote:

On 15.10.2012 13:13, Heikki Linnakangas wrote:


Oh, I didn't remember that we've documented the specific structs that we
pass around. It's quite bogus anyway to explain the messages the way we
do currently, as they are actually dependent on the underlying
architecture's endianess and padding. I think we should refactor the
protocol to not transmit raw structs, but use pq_sentint and friends to
construct the messages. This was discussed earlier (see

http://archives.postgresql.org/message-id/4fe2279c.2070...@enterprisedb.com),
I think there's consensus that 9.3 would be a good time to do that as we
changed the XLogRecPtr format anyway.



This is what I came up with. The replication protocol is now
architecture-independent. The WAL format itself is still
architecture-independent, of course, but this is useful if you want to e.g
use pg_receivexlog to back up a server that runs on a different platform.

I chose the int64 format to transmit timestamps, even when compiled with
--disable-integer-datetimes.

Please review if you have the time..


Thanks for the patch!

When I ran pg_receivexlog, I encountered the following error.


Yeah, clearly I didn't test this near enough...

I fixed the bugs you bumped into, new version attached.


+   hdrlen = sizeof(int64) + sizeof(int64) + 
sizeof(int64);
+   hdrlen = sizeof(int64) + sizeof(int64) + 
sizeof(char);

These should be macro, to avoid calculation overhead?


The compiler will calculate this at compilation time, it's going to be a 
constant at runtime.


- Heikki
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 3d72a16..5a32517 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1366,7 +1366,8 @@ The commands accepted in walsender mode are:
   WAL data is sent as a series of CopyData messages.  (This allows
   other information to be intermixed; in particular the server can send
   an ErrorResponse message if it encounters a failure after beginning
-  to stream.)  The payload in each CopyData message follows this format:
+  to stream.)  The payload of each CopyData message from server to the
+  client contains a message of one of the following formats:
  /para
 
  para
@@ -1390,34 +1391,32 @@ The commands accepted in walsender mode are:
   /varlistentry
   varlistentry
   term
-  Byte8
+  Int64
   /term
   listitem
   para
-  The starting point of the WAL data in this message, given in
-  XLogRecPtr format.
+  The starting point of the WAL data in this message.
   /para
   /listitem
   /varlistentry
   varlistentry
   term
-  Byte8
+  Int64
   /term
   listitem
   para
-  The current end of WAL on the server, given in
-  XLogRecPtr format.
+  The current end of WAL on the server.
   /para
   /listitem
   /varlistentry
   varlistentry
   term
-  Byte8
+  Int64
   /term
   listitem
   para
-  The server's system clock at the time of transmission,
-  given in TimestampTz format.
+  The server's system clock at the time of transmission, as
+  microseconds since midnight on 2000-01-01.
   /para
   /listitem
   /varlistentry
@@ -1445,25 +1444,12 @@ The commands accepted in walsender mode are:
continuation records can be sent in different CopyData messages.
  /para
  para
-   Note that all fields within the WAL data and the above-described header
-   will be in the sending server's native format.  Endianness, and the
-   format for the timestamp, are unpredictable unless the receiver has
-   verified that the sender's system identifier matches its own
-   filenamepg_control/ contents.
- /para
- para
If the WAL sender process is terminated normally (during postmaster
shutdown), it will send a CommandComplete message before exiting.
This might not happen during an abnormal shutdown, of course.
  /para
 
  para
-   The receiving process can send replies back to the sender at any time,
-   using one of the following message formats (also in the payload of a
-   CopyData message):
- /para
-
- para
   variablelist
   varlistentry
   term
@@ -1495,12 +1481,23 @@ The commands accepted in walsender mode are:
   /varlistentry
   varlistentry
   term
-  Byte8
+  Int64
   /term
   listitem
   para
-  The server's system clock at the time of transmission,
-  given in TimestampTz format.
+  The server's system clock at the time of transmission, as
+  microseconds since midnight on 2000-01-01.
+  /para
+

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-15 Thread Heikki Linnakangas


On 13.10.2012 19:35, Fujii Masao wrote:

On Thu, Oct 11, 2012 at 11:52 PM, Heikki Linnakangas
hlinnakan...@vmware.com  wrote:

Ok, thanks. Committed.


I found one typo. The attached patch fixes that typo.


Thanks, fixed.


ISTM you need to update the protocol.sgml because you added
the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage.


Oh, I didn't remember that we've documented the specific structs that we 
pass around. It's quite bogus anyway to explain the messages the way we 
do currently, as they are actually dependent on the underlying 
architecture's endianess and padding. I think we should refactor the 
protocol to not transmit raw structs, but use pq_sentint and friends to 
construct the messages. This was discussed earlier (see 
http://archives.postgresql.org/message-id/4fe2279c.2070...@enterprisedb.com), 
I think there's consensus that 9.3 would be a good time to do that as we 
changed the XLogRecPtr format anyway.


I'll look into doing that..


Is it worth adding the same mechanism (send back the reply immediately
if walsender request a reply) into pg_basebackup and pg_receivexlog?


Good catch. Yes, they should be taught about this too. I'll look into 
doing that too.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-15 Thread Heikki Linnakangas


On 15.10.2012 13:13, Heikki Linnakangas wrote:

On 13.10.2012 19:35, Fujii Masao wrote:

ISTM you need to update the protocol.sgml because you added
the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage.


Oh, I didn't remember that we've documented the specific structs that we
pass around. It's quite bogus anyway to explain the messages the way we
do currently, as they are actually dependent on the underlying
architecture's endianess and padding. I think we should refactor the
protocol to not transmit raw structs, but use pq_sentint and friends to
construct the messages. This was discussed earlier (see
http://archives.postgresql.org/message-id/4fe2279c.2070...@enterprisedb.com),
I think there's consensus that 9.3 would be a good time to do that as we
changed the XLogRecPtr format anyway.


This is what I came up with. The replication protocol is now 
architecture-independent. The WAL format itself is still 
architecture-independent, of course, but this is useful if you want to 
e.g use pg_receivexlog to back up a server that runs on a different 
platform.


I chose the int64 format to transmit timestamps, even when compiled with 
--disable-integer-datetimes.


Please review if you have the time..

- Heikki
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 3d72a16..5a32517 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1366,7 +1366,8 @@ The commands accepted in walsender mode are:
   WAL data is sent as a series of CopyData messages.  (This allows
   other information to be intermixed; in particular the server can send
   an ErrorResponse message if it encounters a failure after beginning
-  to stream.)  The payload in each CopyData message follows this format:
+  to stream.)  The payload of each CopyData message from server to the
+  client contains a message of one of the following formats:
  /para
 
  para
@@ -1390,34 +1391,32 @@ The commands accepted in walsender mode are:
   /varlistentry
   varlistentry
   term
-  Byte8
+  Int64
   /term
   listitem
   para
-  The starting point of the WAL data in this message, given in
-  XLogRecPtr format.
+  The starting point of the WAL data in this message.
   /para
   /listitem
   /varlistentry
   varlistentry
   term
-  Byte8
+  Int64
   /term
   listitem
   para
-  The current end of WAL on the server, given in
-  XLogRecPtr format.
+  The current end of WAL on the server.
   /para
   /listitem
   /varlistentry
   varlistentry
   term
-  Byte8
+  Int64
   /term
   listitem
   para
-  The server's system clock at the time of transmission,
-  given in TimestampTz format.
+  The server's system clock at the time of transmission, as
+  microseconds since midnight on 2000-01-01.
   /para
   /listitem
   /varlistentry
@@ -1445,25 +1444,12 @@ The commands accepted in walsender mode are:
continuation records can be sent in different CopyData messages.
  /para
  para
-   Note that all fields within the WAL data and the above-described header
-   will be in the sending server's native format.  Endianness, and the
-   format for the timestamp, are unpredictable unless the receiver has
-   verified that the sender's system identifier matches its own
-   filenamepg_control/ contents.
- /para
- para
If the WAL sender process is terminated normally (during postmaster
shutdown), it will send a CommandComplete message before exiting.
This might not happen during an abnormal shutdown, of course.
  /para
 
  para
-   The receiving process can send replies back to the sender at any time,
-   using one of the following message formats (also in the payload of a
-   CopyData message):
- /para
-
- para
   variablelist
   varlistentry
   term
@@ -1495,12 +1481,23 @@ The commands accepted in walsender mode are:
   /varlistentry
   varlistentry
   term
-  Byte8
+  Int64
   /term
   listitem
   para
-  The server's system clock at the time of transmission,
-  given in TimestampTz format.
+  The server's system clock at the time of transmission, as
+  microseconds since midnight on 2000-01-01.
+  /para
+  /listitem
+  /varlistentry
+  varlistentry
+  term
+  Byte1
+  /term
+  listitem
+  para
+  1 means that the client should reply to this message as soon as
+  possible, to avoid a timeout disconnect. 0 otherwise.
   /para
   /listitem
   /varlistentry
@@ -1512,6 +1509,12 @@ The commands accepted in walsender mode are:
  /para
 
  para
+   The receiving process can send replies back to the sender at any

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-15 Thread Fujii Masao

On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 15.10.2012 13:13, Heikki Linnakangas wrote:

 On 13.10.2012 19:35, Fujii Masao wrote:

 ISTM you need to update the protocol.sgml because you added
 the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage.


 Oh, I didn't remember that we've documented the specific structs that we
 pass around. It's quite bogus anyway to explain the messages the way we
 do currently, as they are actually dependent on the underlying
 architecture's endianess and padding. I think we should refactor the
 protocol to not transmit raw structs, but use pq_sentint and friends to
 construct the messages. This was discussed earlier (see

 http://archives.postgresql.org/message-id/4fe2279c.2070...@enterprisedb.com),
 I think there's consensus that 9.3 would be a good time to do that as we
 changed the XLogRecPtr format anyway.


 This is what I came up with. The replication protocol is now
 architecture-independent. The WAL format itself is still
 architecture-independent, of course, but this is useful if you want to e.g
 use pg_receivexlog to back up a server that runs on a different platform.

 I chose the int64 format to transmit timestamps, even when compiled with
 --disable-integer-datetimes.

 Please review if you have the time..

Thanks for the patch!

When I ran pg_receivexlog, I encountered the following error.

$ pg_receivexlog -D hoge
pg_receivexlog: unexpected termination of replication stream: ERROR:
no data left in message

pg_basebackup -X stream caused the same error.

$ pg_basebackup -D hoge -X stream -c fast
pg_basebackup: could not send feedback packet: no COPY in progress
pg_basebackup: child process exited with error 1

In walreceiver.c, tmpbuf is allocated for every XLogWalRcvProcessMsg() call.
It should be allocated just once and continue to be used till end, to reduce
palloc overhead?

+   hdrlen = sizeof(int64) + sizeof(int64) + 
sizeof(int64);
+   hdrlen = sizeof(int64) + sizeof(int64) + 
sizeof(char);

These should be macro, to avoid calculation overhead?

+   /* Construct the the message and send it. */
+   resetStringInfo(reply_message);
+   pq_sendbyte(reply_message, 'h');
+   pq_sendint(reply_message, xmin, 4);
+   pq_sendint(reply_message, nextEpoch, 4);
+   walrcv_send(reply_message.data, reply_message.len);

You seem to have forgotten to send the sendTime.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-13 Thread Fujii Masao

On Thu, Oct 11, 2012 at 11:52 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 11.10.2012 13:17, Amit Kapila wrote:

 How does this look now?


 The Patch is fine and test results are also fine.


 Ok, thanks. Committed.

I found one typo. The attached patch fixes that typo.

ISTM you need to update the protocol.sgml because you added
the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage.

Is it worth adding the same mechanism (send back the reply immediately
if walsender request a reply) into pg_basebackup and pg_receivexlog?

Regards,

-- 
Fujii Masao


typo.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-11 Thread Amit Kapila

  On Wednesday, October 10, 2012 9:15 PM Heikki Linnakangas wrote:
 On 04.10.2012 13:12, Amit kapila wrote:
  Following changes are done to support replication timeout in sender as
 well as receiver:
 
  1. One new configuration parameter wal_receiver_timeout is added to
 detect timeout at receiver task.
  2. Existing parameter replication_timeout is renamed to
 wal_sender_timeout.
 
 Ok. The other option would be to have just one GUC, I'm open to
 bikeshedding on this one. On one hand, there's no reason the timeouts
 have to the same, so it would be nice to have separate settings, but on
 the other hand, I can't imagine a case where a single setting wouldn't
 work just as well.

I think for below case, they are required to be separate:

1. M1 (Master), S1 (Standby 1), S2 (Standby 2)
2. S1 is standby for M1, and S2 is standby for S1. Basically a simple case
of cascaded replication
3. M1 and S1 are on local network but S2 is placed at geographically
different location. 
  (what I want to say is n/w between M1-S1 is of good speed and S1-S2 is
very slow)
4. In above case, user might want to configure different timeouts for sender
and receiver on S1.

 Attached is an updated patch. I reverted the merging of message types
 and fixed a bunch of cosmetic issues. There was one bug: in the main
 loop of walreceiver, you send the ping message on every wakeup after
 enough time has passed since last reception. That means that if the
 server doesn't reply promptly, you send a new ping message every 100 ms
 (NAPTIME_PER_CYCLE), until it gets a reply. Walsender had the same
 issue, but it was not quite as sever there because the naptime was
 longer. Fixed that.

Thanks.

 
 How does this look now?

The Patch is fine and test results are also fine.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-11 Thread Heikki Linnakangas


On 11.10.2012 13:17, Amit Kapila wrote:

How does this look now?


The Patch is fine and test results are also fine.


Ok, thanks. Committed.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-10 Thread Heikki Linnakangas


On 04.10.2012 13:12, Amit kapila wrote:

Following changes are done to support replication timeout in sender as well as 
receiver:

1. One new configuration parameter wal_receiver_timeout is added to detect 
timeout at receiver task.
2. Existing parameter replication_timeout is renamed to wal_sender_timeout.


Ok. The other option would be to have just one GUC, I'm open to 
bikeshedding on this one. On one hand, there's no reason the timeouts 
have to the same, so it would be nice to have separate settings, but on 
the other hand, I can't imagine a case where a single setting wouldn't 
work just as well.



3. Now PrimaryKeepaliveMessage structure is modified to add one more field to 
indicate whether keep-alive is of type 'r' (i.e.
 reply) or 'h' (i.e. heart-beat).
4. Now the keep-alive message from sender will be sent to standby if it was 
idle for more than or equal to half of wal_sender_timeout.
 In this case it will send keep-alive of type 'h'.
5. Once the standby receiver a keep-alive, it needs to send an immediate reply 
to primary to indicate connection is alive.
6. Now Reply message to send wal offset and Feedback message to send oldest 
transaction are merged into single Reply message.
 So now the structure StandbyReplyMessage is changed to add two more fields 
as xmin and epoch. Also StandbyHSFeedbackMessage
 structure is changed to remove xmin and epoch fields (as these are moved 
to StandbyReplyMessage).
7. Because of changes as in step-6, once receiver task receives some data from 
primary then it will only send Reply Message.


Oh I see. That's not what I meant by combining the keep-alive and hs 
feedback messages, I imagined that the hearbeats would *also* use the 
same message type. Ie. there would be only a single message type from 
standby to primary, used for:


1. updating the receive/apply pointer
2. HS feedback
3. for pinging the server when wal_receiver_timeout is approaching
4. to reply to to pings from the server.

Since we didn't quite achieve that, it seems best leave out this merging 
of reply and HS feedback message types, to keep the patch small. We 
might still want to do that, but better do that as a separate patch.



8. Same Reply message is sent in step-5 and step-7 but incase of step-5, then 
reply is sent immediately but incase of step-7, reply is sent
  if wal_receiver_status_interval has lapsed (this part is same as earlier).
9. Similar to sender, if receiver finds itself idle for more than or equal to 
half of configured wal_receiver_timeout, then it will send the
  hot-standby heartbeat. This heart-beat has been modified to send only 
sendTime.
10. Once sender task receiver heart-beat message from standby then it sends 
back the reply immediately. In this keep-alive message is
sent of type 'r'.
11. If even after wal_sender_timeout no message received from standby then it 
will be considered as network break at sender task.
12. If even after wal_receiver_timeout no message received from primary then it 
will be considered as network break at receiver task.


Attached is an updated patch. I reverted the merging of message types 
and fixed a bunch of cosmetic issues. There was one bug: in the main 
loop of walreceiver, you send the ping message on every wakeup after 
enough time has passed since last reception. That means that if the 
server doesn't reply promptly, you send a new ping message every 100 ms 
(NAPTIME_PER_CYCLE), until it gets a reply. Walsender had the same 
issue, but it was not quite as sever there because the naptime was 
longer. Fixed that.


How does this look now?

- Heikki
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 2236,2245  include 'filename'
 /listitem
/varlistentry
  
!  varlistentry id=guc-replication-timeout xreflabel=replication_timeout
!   termvarnamereplication_timeout/varname (typeinteger/type)/term
indexterm
!primaryvarnamereplication_timeout/ configuration parameter/primary
/indexterm
listitem
 para
--- 2236,2245 
 /listitem
/varlistentry
  
!  varlistentry id=guc-wal-sender-timeout xreflabel=wal_sender_timeout
!   termvarnamewal_sender_timeout/varname (typeinteger/type)/term
indexterm
!primaryvarnamewal_sender_timeout/ configuration parameter/primary
/indexterm
listitem
 para
***
*** 2251,2262  include 'filename'
  the filenamepostgresql.conf/ file or on the server command line.
  The default value is 60 seconds.
 /para
-para
- To prevent connections from being terminated prematurely,
- xref linkend=guc-wal-receiver-status-interval
- must be enabled on the standby, and its value must be less than the
- value of varnamereplication_timeout/.
-/para
/listitem
   /varlistentry
  
--- 2251,2256 
***
*** 2474,2484

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-09 Thread Robert Haas

On Mon, Oct 8, 2012 at 10:42 AM, Amit Kapila amit.kap...@huawei.com wrote:
 How about following:
 1. replication_client_timeout -- shouldn't it be client as new configuration
 is for wal receiver
 2. replication_standby_timeout

ISTM that the client and the standby are the same thing.

 If we introduce a new parameter for wal receiver, wouldn't
 replication_timeout be confusing for user?

Maybe.  I actually don't think that I understand what problem we're
trying to solve here.  If the connection between the master and the
standby is lost, shouldn't the standby realize that it's no longer
receiving keepalives from the master and terminate the connection?  I
thought I had tested this at some point and it was working, so either
it's subsequently gotten broken again or the scenario you're talking
about is different in some way that I don't currently understand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-09 Thread Amit Kapila

On Tuesday, October 09, 2012 6:00 PM Robert Haas wrote:
 On Mon, Oct 8, 2012 at 10:42 AM, Amit Kapila amit.kap...@huawei.com
 wrote:
  How about following:
  1. replication_client_timeout -- shouldn't it be client as new
 configuration
  is for wal receiver
  2. replication_standby_timeout
 
 ISTM that the client and the standby are the same thing.

Yeah same, but may be one (replication_standby_timeout) can be more easily
understandable by user.

 
  If we introduce a new parameter for wal receiver, wouldn't
  replication_timeout be confusing for user?
 
 Maybe.  

 I actually don't think that I understand what problem we're
 trying to solve here.  If the connection between the master and the
 standby is lost, shouldn't the standby realize that it's no longer
 receiving keepalives from the master and terminate the connection? 

For wal receiver keepalives are also like one kind of message, so the
behavior is such that when it checks
that it doesn't receive any message, it tries to send reply/feedback message
to master after an interval of 
wal_receiver_status_interval.
So after every wal_receiver_status_interval, wal receiver sends a reply, but
still the socket send doesn't
fail. It fails only after many send calls as internally might be in send(),
until the sockets internal buffer is full, it keeps accumulating even if
other side recv has not received the data.
So that's the reason we decided to introduce a timeout parameter in wal
receiver similar to what we have currently in walsender.

 I
 thought I had tested this at some point and it was working, so either
 it's subsequently gotten broken again or the scenario you're talking
 about is different in some way that I don't currently understand.

Standby takes quite longer around 15 minutes to detect whereas master is
able to
detect quite sooner in 2-3 mins and master also mainly detects due to
timeout functionality in wal sender.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-08 Thread Robert Haas

On Thu, Oct 4, 2012 at 6:12 AM, Amit kapila amit.kap...@huawei.com wrote:
 1. One new configuration parameter wal_receiver_timeout is added to detect 
 timeout at receiver task.
 2. Existing parameter replication_timeout is renamed to wal_sender_timeout.

-1 from me on a backward compatibility break here.  I don't know what
else to call the new GUC (replication_server_timeout?) but I'm not
excited about breaking existing conf files, nor do I particularly like
the proposed new names.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-08 Thread Amit Kapila

 On Monday, October 08, 2012 7:38 PM Robert Haas wrote:
 On Thu, Oct 4, 2012 at 6:12 AM, Amit kapila amit.kap...@huawei.com
 wrote:
  1. One new configuration parameter wal_receiver_timeout is added to
 detect timeout at receiver task.
  2. Existing parameter replication_timeout is renamed to
 wal_sender_timeout.
 
 -1 from me on a backward compatibility break here.  I don't know what
 else to call the new GUC (replication_server_timeout?) but I'm not
 excited about breaking existing conf files, nor do I particularly like
 the proposed new names.

How about following:
1. replication_client_timeout -- shouldn't it be client as new configuration
is for wal receiver
2. replication_standby_timeout

If we introduce a new parameter for wal receiver, wouldn't
replication_timeout be confusing for user?

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-04 Thread Amit Kapila



 -Original Message-
 From: pgsql-bugs-ow...@postgresql.org [mailto:pgsql-bugs-
 ow...@postgresql.org] On Behalf Of Amit kapila
 Sent: Thursday, October 04, 2012 3:43 PM
 To: Heikki Linnakangas
 Cc: Fujii Masao; pgsql-b...@postgresql.org; pgsql-hackers@postgresql.org
 Subject: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w
 breakdown
 
 On Tuesday, October 02, 2012 1:56 PM Heikki Linnakangas wrote:
 On 02.10.2012 10:36, Amit kapila wrote:
  On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote:
  So let's think how this should ideally work from a user's point of
 view.
  I think there should be just two settings: walsender_timeout and
  walreceiver_timeout. walsender_timeout specifies how long a
  walsender will keep a connection open if it doesn't hear from the

 
 Thank you for suggestions.
 I have addressed your suggestions in patch attached with this mail.
 
 Following changes are done to support replication timeout in sender as
 well as receiver:


Testing Done for the Patch

1. Verified the value of new configuration parameter and changed
configuration parameter using the show command (using Show of specific 
   parameter as well as show all). 
2. Verified the new configuration parameter in --describe-config. 
3. Verified the existing parameter replication_timeout's new name in
--describe-config. 
4. Start primary and standby node with default timeout, leave it for
sometime in idle situation. 
   It should not error out due to network break error. 
5. a. Start primary and standby node with default timeout, bring down the
network. 
   b. Both sender and receiver should be able to detect network break-down
almost at same time. 
   c. Once the network is up again, connection should get re-established
successfully. 
5. a. Start primary and standby node with wal_sender_timeout less than
wal_receiver_timeout, bring down the network. 
   b. Sender should be able to detect network break-down before receiver
task. 
   c. Once the network is up again, connection should get re-established
successfully. 
6. a. Start primary and standby node with wal_receiver_timeout less than
wal_sender_timeout, bring down the network. 
   b. Receiver should be able to detect network break-down before sender
task. 
   c. Once the network is up again, connection should get re-established
successfully. 
7. a. In 5th test case, change the value of wal_receiver_status_interval to
more than wal_receiver_timeout and hence more than  
  wal_sender_timeout. 
   b. Then bring down the network down.
   c. Sender task should be able to detect network break-down once
wal_sender_timeout has lapsed. 
   d. Once the network is up again, connection should get re-established
successfully.
   Intent of this test is to check there is no dependency of
wal_sender_timeout on wal_receiver_status_interval for detection of
   Network break.

All the above tests are passed. 

With Regards,
Amit Kapila.
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-02 Thread Heikki Linnakangas


On 02.10.2012 10:36, Amit kapila wrote:

On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote:

So let's think how this should ideally work from a user's point of view.
I think there should be just two settings: walsender_timeout and
walreceiver_timeout. walsender_timeout specifies how long a walsender
will keep a connection open if it doesn't hear from the walreceiver, and
walreceiver_timeout is the same for walreceiver. The system should
figure out itself how often to send keepalive messages so that those
timeouts are not reached.


By this it implies that we should remove wal_receiver_status_interval. 
Currently it is also used
incase of reply message of data sent by sender which contains till what point 
receiver has flushed. So if we remove this variable
receiver might start sending that message sonner than required.
Is that okay behavior?


I guess we should keep that setting, then, so that you can get status 
updates more often than would be required for heartbeat purposes.



In walsender, after half of walsender_timeout has elapsed and we haven't
received anything from the client, the walsender process should send a
ping message to the client. Whenever the client receives a Ping, it
replies. The walreceiver does the same; when half of walreceiver_timeout
has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip
resets the timer in both ends, regardless of which side initiated it, so
if e.g walsender_timeout  walreceiver_timeout, the client will never
have to initiate a Ping message, because walsender will always reach the
walsender_timeout/2 point first and initiate the heartbeat message.


Just to clarify, walsender should reset timer after it gets reply from receiver 
of the message it sent.


Right.


walreceiver should reset timer after sending reply for heartbeat message.
 Similar to above timers will be reset when receiver sent the 
heartbeat message.


walreceiver should reset the timer when it *receives* any message from 
walsender. If it sends the reply right away, I guess that's the same 
thing, but I'd phrase it so that it's the reception of a message from 
the other end that resets the timer.



The Ping/Pong messages don't necessarily need to be new message types,
we can use the message types we currently have, perhaps with an
additional flag attached to them, to request the other side to reply
immediately.


Can't we make the decision to send reply immediately based on message type, 
because these message types will be unique.

To clarify my understanding,
1. the heartbeat message from walsender side will be keepalive message ('k') 
and from walreceiver side it will be Hot Standby feedback message ('h').
2. the reply message from walreceiver side will be current reply message ('r').


Yep. I wonder why need separate message types for Hot Standby Feedback 
'h' and Reply 'r', though. Seems it would be simpler to have just one 
messasge type that includes all the fields from both messages.



3. currently there is no reply kind of message from walsender, so do we need to 
introduce one new message for it or can use some existing message only?
 if new, do we need to send any additional information along with it, for 
existing messages can we use keepalive message it self as reply message but 
with an additional byte
 to indicate it is reply?


Hmm, I think I'd prefer to use the existing Keepalive message 'k', with 
an additional flag.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-01 Thread Heikki Linnakangas


On 21.09.2012 14:18, Amit kapila wrote:

On Tuesday, September 18, 2012 6:02 PM Fujii Masao wrote:
On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapilaamit.kap...@huawei.com  wrote:


Approach-2 :
Provide a variable wal_send_status_interval, such that if this is 0, then
the current behavior would prevail and if its non-zero then KeepAlive
message would be send maximum after that time.
The modified code of WALSendLoop will be as follows:


snip

Which way you think is better or you have any other idea to handle.



I think #2 is better because it's more intuitive to a user.


Please find a patch attached for implementation of Approach-2.


Hmm, I think we need to step back a bit. I've never liked the way 
replication_timeout works, where it's the user's responsibility to set 
wal_receiver_status_interval  replication_timeout. It's not very 
user-friendly. I'd rather not copy that same design to this walreceiver 
timeout. If there's two different timeouts like that, it's even worse, 
because it's easy to confuse the two.


So let's think how this should ideally work from a user's point of view. 
I think there should be just two settings: walsender_timeout and 
walreceiver_timeout. walsender_timeout specifies how long a walsender 
will keep a connection open if it doesn't hear from the walreceiver, and 
walreceiver_timeout is the same for walreceiver. The system should 
figure out itself how often to send keepalive messages so that those 
timeouts are not reached.


In walsender, after half of walsender_timeout has elapsed and we haven't 
received anything from the client, the walsender process should send a 
ping message to the client. Whenever the client receives a Ping, it 
replies. The walreceiver does the same; when half of walreceiver_timeout 
has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip 
resets the timer in both ends, regardless of which side initiated it, so 
if e.g walsender_timeout  walreceiver_timeout, the client will never 
have to initiate a Ping message, because walsender will always reach the 
walsender_timeout/2 point first and initiate the heartbeat message.


The Ping/Pong messages don't necessarily need to be new message types, 
we can use the message types we currently have, perhaps with an 
additional flag attached to them, to request the other side to reply 
immediately.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-01 Thread Robert Haas

On Mon, Oct 1, 2012 at 6:38 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Hmm, I think we need to step back a bit. I've never liked the way
 replication_timeout works, where it's the user's responsibility to set
 wal_receiver_status_interval  replication_timeout. It's not very
 user-friendly. I'd rather not copy that same design to this walreceiver
 timeout. If there's two different timeouts like that, it's even worse,
 because it's easy to confuse the two.

I agree, but also note that wal_receiver_status_interval serves
another user-visible purpose as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-01 Thread Fujii Masao

On Mon, Oct 1, 2012 at 7:38 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Hmm, I think we need to step back a bit. I've never liked the way
 replication_timeout works, where it's the user's responsibility to set
 wal_receiver_status_interval  replication_timeout. It's not very
 user-friendly. I'd rather not copy that same design to this walreceiver
 timeout. If there's two different timeouts like that, it's even worse,
 because it's easy to confuse the two.

Agreed.

I'd like to specify the replication timeout like we do TCP keepalives, i.e.,
what about introducing something like following parameters?

walsender_keepalives_idle
walsender_keepalives_interval
walsender_keeaplives_count
walreceiver_keepalives_idle
walreceiver_keepalives_interval
walreceiver_keepalives_count

I believe many users are basically familiar with TCP keepalives and how to
specify it. So I think that this approach would be intuitive to users. Also
this approach includes your proposal. If you specify

walsender_keepalives_idle = walsender_timeout / 2
walsender_keepalives_interval = -1 (disable; Ping is never sent
again if there is no reply after first Ping is sent)
walsender_keepalives_count = 1

the replication timeout works as you proposed. But of course the downside
of this approach is that the number of parameter for replication timeout is
increased from two (replication_timeout and
wal_receiver_status_interval) to six,
and those parameters are confusingly similar to existing
tcp_keepalives parameters,
which might cause another confusion to users. One idea to solve this problem is
to use existing tcp_keepalives paramters values for the replication timeout.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-01 Thread Robert Haas

On Mon, Oct 1, 2012 at 12:57 PM, Fujii Masao masao.fu...@gmail.com wrote:
 I believe many users are basically familiar with TCP keepalives and how to
 specify it. So I think that this approach would be intuitive to users.

My experience is that many users are unfamiliar with TCP keepalives
and that when given the options they tend to do it wrong.  I think a
simpler system would be better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-01 Thread Alvaro Herrera

Excerpts from Robert Haas's message of lun oct 01 21:02:54 -0300 2012:
 On Mon, Oct 1, 2012 at 12:57 PM, Fujii Masao masao.fu...@gmail.com wrote:
  I believe many users are basically familiar with TCP keepalives and how to
  specify it. So I think that this approach would be intuitive to users.
 
 My experience is that many users are unfamiliar with TCP keepalives
 and that when given the options they tend to do it wrong.  I think a
 simpler system would be better.

+1

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-09-18 Thread Fujii Masao

On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapila amit.kap...@huawei.com wrote:
 To define the behavior correctly, according to me there are 2 options now:

 Approach-1 :
 Document that both(sender and receiver) the timeout parameters should be
 greater than wal_receiver_status_interval.
 If both are greater, then I think it might never timeout due to Idle.

In this approach, keepalive messages are sent each wal_receiver_status_interval?

 Approach-2 :
 Provide a variable wal_send_status_interval, such that if this is 0, then
 the current behavior would prevail and if its non-zero then KeepAlive
 message would be send maximum after that time.
 The modified code of WALSendLoop will be as follows:
snip
 Which way you think is better or you have any other idea to handle.

I think #2 is better because it's more intuitive to a user.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-09-18 Thread Amit Kapila

On Tuesday, September 18, 2012 6:03 PM Fujii Masao wrote:
On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapila amit.kap...@huawei.com wrote:
 To define the behavior correctly, according to me there are 2 options
now:

 Approach-1 :
 Document that both(sender and receiver) the timeout parameters should be
 greater than wal_receiver_status_interval.
 If both are greater, then I think it might never timeout due to Idle.

 In this approach, keepalive messages are sent each
wal_receiver_status_interval?
  wal_receiver_status_interval or sleeptime whichever is smaller.

 Approach-2 :
 Provide a variable wal_send_status_interval, such that if this is 0, then
 the current behavior would prevail and if its non-zero then KeepAlive
 message would be send maximum after that time.
 The modified code of WALSendLoop will be as follows:
snip
 Which way you think is better or you have any other idea to handle.

 I think #2 is better because it's more intuitive to a user.

I shall update the Patch as per Approach-2 and upload the same.

With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-09-17 Thread Amit Kapila

On Sunday, September 16, 2012 12:14 AM Fujii Masao wrote:
On Sat, Sep 15, 2012 at 4:26 PM, Amit kapila amit.kap...@huawei.com wrote:
 On Saturday, September 15, 2012 11:27 AM Fujii Masao wrote:
 On Fri, Sep 14, 2012 at 10:01 PM, Amit kapila amit.kap...@huawei.com
wrote:

 On Thursday, September 13, 2012 10:57 PM Fujii Masao
 On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila amit.kap...@huawei.com
wrote:
 On Wednesday, September 12, 2012 10:15 PM Fujii Masao
 On Wed, Sep 12, 2012 at 8:54 PM,  amit.kap...@huawei.com wrote:
 The following bug has been logged on the website:

  I would like to implement such feature for walreceiver, but there is
one
 confusion that whether to use
  same configuration parameter(replication_timeout) for walrecevier as
for
 master or introduce a new
  configuration parameter (receiver_replication_timeout).

I like the latter. I believe some users want to set the different
timeout values,
for example, in the case where the master and standby servers are
placed in
the same room, but cascaded standby is placed in other continent.

 Thank you for your suggestion. I have implemented as per your
suggestion to have separate timeout parameter for walreceiver.
 The main changes are:
 1. Introduce a new configuration parameter
wal_receiver_replication_timeout for walreceiver.
 2. In function WalReceiverMain(), check if there is no communication
till wal_receiver_replication_timeout, exit the walreceiver.
 This is same as walsender functionality.

 As this is a feature, So I am uploading the attached patch in coming
CommitFest.

 Suggestions/Comments?

 You also need to change walsender so that it periodically sends the
heartbeat
 message, like walreceiver does each wal_receiver_status_interval.
Otherwise,
 walreceiver will detect the timeout wrongly whenever there is no traffic
in the
 master.

 Doesn't current keepalive message from walsender will suffice that need?

 No. Though the keepalive interval should be smaller than the timeout,
 IIRC there is
 no way to specify the keepalive interval now.

To define the behavior correctly, according to me there are 2 options now:

Approach-1 :
Document that both(sender and receiver) the timeout parameters should be
greater than wal_receiver_status_interval.
If both are greater, then I think it might never timeout due to Idle.

Approach-2 :
Provide a variable wal_send_status_interval, such that if this is 0, then
the current behavior would prevail and if its non-zero then KeepAlive
message would be send maximum after that time. 
The modified code of WALSendLoop will be as follows:

  TimestampTz timeout = 0; 
longsleeptime = 1; /* 10 s */ 
intwakeEvents; 

/* sleeptime should be equal to wal send interval if
it is not zero otherwise default as 10 sec*/ 
if (wal_send_status_interval  0) 
{ 
sleeptime = wal_send_status_interval; 
} 

wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | 
WL_SOCKET_READABLE | WL_TIMEOUT; 

if (pq_is_send_pending()) 
wakeEvents |= WL_SOCKET_WRITEABLE; 
else if (wal_send_status_interval  0) 
{ 
WalSndKeepalive(output_message); 
/* Try to flush pending output to the client
*/ 
if (pq_flush_if_writable() != 0) 
break; 
} 

/* Determine time until replication timeout */ 
if (replication_timeout  0) 
{ 
timeout =
TimestampTzPlusMilliseconds(last_reply_timestamp, 
 
replication_timeout); 

if (wal_send_status_interval = 0) 
{ 
sleeptime = 1 + (replication_timeout
/ 10); 
} 
} 



/* Sleep until something happens or replication
timeout */ 
WaitLatchOrSocket(MyWalSnd-latch, wakeEvents, 
  MyProcPort-sock,
sleeptime); 

/* 
 * Check for replication timeout.  Note we ignore
the corner case 
 * possibility that the client replied just as we
reached the 
 * timeout ... he's supposed to reply *before* that.

 */ 
if (replication_timeout  0  
GetCurrentTimestamp() = timeout) 
{

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-09-15 Thread Fujii Masao

On Sat, Sep 15, 2012 at 4:26 PM, Amit kapila amit.kap...@huawei.com wrote:
 On Saturday, September 15, 2012 11:27 AM Fujii Masao wrote:
 On Fri, Sep 14, 2012 at 10:01 PM, Amit kapila amit.kap...@huawei.com wrote:

 On Thursday, September 13, 2012 10:57 PM Fujii Masao
 On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila amit.kap...@huawei.com wrote:
 On Wednesday, September 12, 2012 10:15 PM Fujii Masao
 On Wed, Sep 12, 2012 at 8:54 PM,  amit.kap...@huawei.com wrote:
 The following bug has been logged on the website:

  I would like to implement such feature for walreceiver, but there is one
 confusion that whether to use
  same configuration parameter(replication_timeout) for walrecevier as for
 master or introduce a new
  configuration parameter (receiver_replication_timeout).

I like the latter. I believe some users want to set the different
timeout values,
for example, in the case where the master and standby servers are placed in
the same room, but cascaded standby is placed in other continent.

 Thank you for your suggestion. I have implemented as per your suggestion to 
 have separate timeout parameter for walreceiver.
 The main changes are:
 1. Introduce a new configuration parameter wal_receiver_replication_timeout 
 for walreceiver.
 2. In function WalReceiverMain(), check if there is no communication till 
 wal_receiver_replication_timeout, exit the walreceiver.
 This is same as walsender functionality.

 As this is a feature, So I am uploading the attached patch in coming 
 CommitFest.

 Suggestions/Comments?

 You also need to change walsender so that it periodically sends the heartbeat
 message, like walreceiver does each wal_receiver_status_interval. Otherwise,
 walreceiver will detect the timeout wrongly whenever there is no traffic in 
 the
 master.

 Doesn't current keepalive message from walsender will suffice that need?

No. Though the keepalive interval should be smaller than the timeout,
IIRC there is
no way to specify the keepalive interval now.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-09-14 Thread Fujii Masao

On Fri, Sep 14, 2012 at 10:01 PM, Amit kapila amit.kap...@huawei.com wrote:

 On Thursday, September 13, 2012 10:57 PM Fujii Masao
 On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila amit.kap...@huawei.com wrote:
 On Wednesday, September 12, 2012 10:15 PM Fujii Masao
 On Wed, Sep 12, 2012 at 8:54 PM,  amit.kap...@huawei.com wrote:
 The following bug has been logged on the website:

 Bug reference:  7534
 Logged by:  Amit Kapila
 Email address:  amit.kap...@huawei.com
 PostgreSQL version: 9.2.0
 Operating system:   Suse 10
 Description:

 1. Both master and standby machine are connected normally,
 2. then you use the command: ifconfig ip down; make the network card of
 master and standby down,

 Observation
 master can detect connect abnormal, but the standby can't detect connect
 abnormal and show a connected channel long time.


  I would like to implement such feature for walreceiver, but there is one
 confusion that whether to use
  same configuration parameter(replication_timeout) for walrecevier as for
 master or introduce a new
  configuration parameter (receiver_replication_timeout).

I like the latter. I believe some users want to set the different
timeout values,
for example, in the case where the master and standby servers are placed in
the same room, but cascaded standby is placed in other continent.

 Thank you for your suggestion. I have implemented as per your suggestion to 
 have separate timeout parameter for walreceiver.
 The main changes are:
 1. Introduce a new configuration parameter wal_receiver_replication_timeout 
 for walreceiver.
 2. In function WalReceiverMain(), check if there is no communication till 
 wal_receiver_replication_timeout, exit the walreceiver.
 This is same as walsender functionality.

 As this is a feature, So I am uploading the attached patch in coming 
 CommitFest.

 Suggestions/Comments?

You also need to change walsender so that it periodically sends the heartbeat
message, like walreceiver does each wal_receiver_status_interval. Otherwise,
walreceiver will detect the timeout wrongly whenever there is no traffic in the
master.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-09-13 Thread Fujii Masao

On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila amit.kap...@huawei.com wrote:
 On Wednesday, September 12, 2012 10:15 PM Fujii Masao
 On Wed, Sep 12, 2012 at 8:54 PM,  amit.kap...@huawei.com wrote:
 The following bug has been logged on the website:

 Bug reference:  7534
 Logged by:  Amit Kapila
 Email address:  amit.kap...@huawei.com
 PostgreSQL version: 9.2.0
 Operating system:   Suse 10
 Description:

 1. Both master and standby machine are connected normally,
 2. then you use the command: ifconfig ip down; make the network card of
 master and standby down,

 Observation
 master can detect connect abnormal, but the standby can't detect connect
 abnormal and show a connected channel long time.

 What about setting keepalives_xxx libpq parameters?

 http://www.postgresql.org/docs/devel/static/libpq-connect.html#LIBPQ-PARAMKE
 YWORDS

 Keepalives are not a perfect solution for the termination of connection,
 but
 it would help to a certain extent.

 We have tried by enabling keepalive, but it didn't worked maybe because
 walreceiver is trying to send reveiver status.
 It fails in sending that after many attempts of same.

 If you need something like walreceiver-version of replication_timeout,
 such feature has not been implemented yet.
 Please feel free to implement that!

  I would like to implement such feature for walreceiver, but there is one
 confusion that whether to use
  same configuration parameter(replication_timeout) for walrecevier as for
 master or introduce a new
  configuration parameter (receiver_replication_timeout).

I like the latter. I believe some users want to set the different
timeout values,
for example, in the case where the master and standby servers are placed in
the same room, but cascaded standby is placed in other continent.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-09-12 Thread Amit Kapila

On Wednesday, September 12, 2012 10:15 PM Fujii Masao
On Wed, Sep 12, 2012 at 8:54 PM,  amit.kap...@huawei.com wrote:
 The following bug has been logged on the website:

 Bug reference:  7534
 Logged by:  Amit Kapila
 Email address:  amit.kap...@huawei.com
 PostgreSQL version: 9.2.0
 Operating system:   Suse 10
 Description:

 1. Both master and standby machine are connected normally,
 2. then you use the command: ifconfig ip down; make the network card of
 master and standby down,

 Observation
 master can detect connect abnormal, but the standby can't detect connect
 abnormal and show a connected channel long time.

 What about setting keepalives_xxx libpq parameters?

http://www.postgresql.org/docs/devel/static/libpq-connect.html#LIBPQ-PARAMKE
YWORDS

 Keepalives are not a perfect solution for the termination of connection,
but
 it would help to a certain extent. 

We have tried by enabling keepalive, but it didn't worked maybe because
walreceiver is trying to send reveiver status.
It fails in sending that after many attempts of same.

 If you need something like walreceiver-version of replication_timeout,
such feature has not been implemented yet. 
 Please feel free to implement that!

 I would like to implement such feature for walreceiver, but there is one
confusion that whether to use 
 same configuration parameter(replication_timeout) for walrecevier as for
master or introduce a new 
 configuration parameter (receiver_replication_timeout).

 The only point in having different timeout parameters for walsender and
walreceiver is for the case of standby which 
 has both walsender and walreceiver to send logs to cascaded standby, in
such case somebody might want to have different timeout parameters for
walsender and walreceiver.
 OTOH it will create confusion to have too many parameters. My opinion is to
have one timeout parameter for both walsender and walrecevier.

Let me know your suggestion/opinion about same.

Note- I am marking cc to pgsql-hackers, as it will be a feature request.

With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

39 matches

Mail list logo