Re: [HACKERS] Replication server timeout patch

2011-03-31 Thread Heikki Linnakangas

On 31.03.2011 05:46, Fujii Masao wrote:

On Wed, Mar 30, 2011 at 10:54 PM, Robert Haasrobertmh...@gmail.com  wrote:

On Wed, Mar 30, 2011 at 4:08 AM, Fujii Masaomasao.fu...@gmail.com  wrote:

On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

On 30.03.2011 10:58, Fujii Masao wrote:


On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.comwrote:
+A value of zero means wait forever.  This parameter can only be
set in

The first sentence sounds misleading. Even if you set the parameter to
zero,
replication connections can be terminated because of keepalive or socket
error.


Hmm, should I change it back to A value of zero disables the timeout ? Any
better suggestions?


I like that. But I appreciate if anyone suggests the better.


Maybe sticking the word mechanism in there would be a bit better.
A value of zero disables the timeout mechanism?


I'm OK with that. Or, what about A value of zero turns this off which is
used in statement_timeout for the sake of consistency?


Committed Robert's suggestion.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-30 Thread Heikki Linnakangas

On 29.03.2011 07:55, Fujii Masao wrote:

On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

pq_flush_if_writable() calls internal_flush() without using PG_TRY block.
This seems unsafe because for example pgwin32_waitforsinglesocket()
called by secure_write() can throw ERROR.


Perhaps it's time to give up on the assumption that the socket is in
blocking mode except within those two functions. Attached patch adds the
pq_set_nonblocking() function from your patch, and adds calls to it before
all secure_read/write operations to put the socket in the right mode.
There's only a few of those operations.


Sounds good.

+   pq_set_nonblocking(false); /* XXX: Is this required? */

No. Since secure_close and close_SSL don't use MyProcPort-sock and
MyProcPort-noblock which can be changed in pq_set_nonblocking,
I don't think that is required.


Ok, I took that out.


+   pq_putmessage_noblock('d', msgbuf, 1 + sizeof(WalDataMessageHeader) + 
nbytes);

Don't we need to check the return value of pq_putmessage_noblock? That
can return EOF when trouble happens (for example the send system call fails).


No, pq_putmessage_noblock doesn't call send() because it enlarges the 
buffer to make sure the message fits, and it doesn't anything else that 
could fail else. I changed its return type to void, and added an 
Assert() to check that the pq_putmessage() call it does internally 
indeed doesn't fail.



Should we use COMMERROR instead of ERROR if we fail to put the socket in the
right mode?


Maybe.


I made it COMMERROR. ERRORs are sent to the client, and you could get 
into infinite recursion if sending the ERROR requires setting the 
blocking mode again.


Committed with those changes. I also reworded the docs a bit.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-30 Thread Fujii Masao
On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 +       pq_putmessage_noblock('d', msgbuf, 1 +
 sizeof(WalDataMessageHeader) + nbytes);

 Don't we need to check the return value of pq_putmessage_noblock? That
 can return EOF when trouble happens (for example the send system call
 fails).

 No, pq_putmessage_noblock doesn't call send() because it enlarges the buffer
 to make sure the message fits, and it doesn't anything else that could fail
 else. I changed its return type to void, and added an Assert() to check that
 the pq_putmessage() call it does internally indeed doesn't fail.

Oh, you're right.

 Committed with those changes. I also reworded the docs a bit.

Thanks a lot!

+A value of zero means wait forever.  This parameter can only be set in

The first sentence sounds misleading. Even if you set the parameter to zero,
replication connections can be terminated because of keepalive or socket error.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-30 Thread Heikki Linnakangas

On 30.03.2011 10:58, Fujii Masao wrote:

On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:
+A value of zero means wait forever.  This parameter can only be set in

The first sentence sounds misleading. Even if you set the parameter to zero,
replication connections can be terminated because of keepalive or socket error.


Hmm, should I change it back to A value of zero disables the timeout ? 
Any better suggestions?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-30 Thread Fujii Masao
On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 30.03.2011 10:58, Fujii Masao wrote:

 On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com  wrote:
 +        A value of zero means wait forever.  This parameter can only be
 set in

 The first sentence sounds misleading. Even if you set the parameter to
 zero,
 replication connections can be terminated because of keepalive or socket
 error.

 Hmm, should I change it back to A value of zero disables the timeout ? Any
 better suggestions?

I like that. But I appreciate if anyone suggests the better.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-30 Thread Robert Haas
On Wed, Mar 30, 2011 at 4:08 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 On 30.03.2011 10:58, Fujii Masao wrote:

 On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com  wrote:
 +        A value of zero means wait forever.  This parameter can only be
 set in

 The first sentence sounds misleading. Even if you set the parameter to
 zero,
 replication connections can be terminated because of keepalive or socket
 error.

 Hmm, should I change it back to A value of zero disables the timeout ? Any
 better suggestions?

 I like that. But I appreciate if anyone suggests the better.

Maybe sticking the word mechanism in there would be a bit better.
A value of zero disables the timeout mechanism?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-30 Thread Fujii Masao
On Wed, Mar 30, 2011 at 10:54 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Mar 30, 2011 at 4:08 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 On 30.03.2011 10:58, Fujii Masao wrote:

 On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com  wrote:
 +        A value of zero means wait forever.  This parameter can only be
 set in

 The first sentence sounds misleading. Even if you set the parameter to
 zero,
 replication connections can be terminated because of keepalive or socket
 error.

 Hmm, should I change it back to A value of zero disables the timeout ? Any
 better suggestions?

 I like that. But I appreciate if anyone suggests the better.

 Maybe sticking the word mechanism in there would be a bit better.
 A value of zero disables the timeout mechanism?

I'm OK with that. Or, what about A value of zero turns this off which is
used in statement_timeout for the sake of consistency?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-29 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas
 Should we use COMMERROR instead of ERROR if we fail to put the socket in the
 right mode?

 Maybe.

COMMERROR exists to keep us from trying to send an error report down a
failed socket.  I would assume (perhaps wrongly) that
walsender/walreceiver don't try to push error reports across the socket
anyway, only to the postmaster log.  If correct, there is no need for
COMMERROR, and using it just muddies the code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-29 Thread Robert Haas
On Tue, Mar 29, 2011 at 9:24 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas
 Should we use COMMERROR instead of ERROR if we fail to put the socket in the
 right mode?

 Maybe.

 COMMERROR exists to keep us from trying to send an error report down a
 failed socket.  I would assume (perhaps wrongly) that
 walsender/walreceiver don't try to push error reports across the socket
 anyway, only to the postmaster log.  If correct, there is no need for
 COMMERROR, and using it just muddies the code.

I don't think that's how it works.  The error the server sends is
copied into some of the messages in the client log, which is really
useful for debugging.

ERROR: can't connect to the server (server said: you're not authorized)

...or something like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-29 Thread Fujii Masao
On Wed, Mar 30, 2011 at 1:04 AM, Robert Haas robertmh...@gmail.com wrote:
 COMMERROR exists to keep us from trying to send an error report down a
 failed socket.  I would assume (perhaps wrongly) that
 walsender/walreceiver don't try to push error reports across the socket
 anyway, only to the postmaster log.  If correct, there is no need for
 COMMERROR, and using it just muddies the code.

 I don't think that's how it works.  The error the server sends is
 copied into some of the messages in the client log, which is really
 useful for debugging.

 ERROR: can't connect to the server (server said: you're not authorized)

 ...or something like that.

Yes. Walsender sends its error message to walreceiver, and walreceiver
writes it down to the server log. For example;

FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
segment 00010016 has already been removed

The second FATAL message is sent from walsender.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-28 Thread Heikki Linnakangas

On 24.03.2011 15:24, Fujii Masao wrote:

On Wed, Mar 23, 2011 at 7:33 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

I don't much like the API for this. Walsender shouldn't need to know about
the details of the FE/BE protocol, pq_putbytes_if_available() seems too low
level to be useful.

I think a better API would be to have a non-blocking version of
pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, so
that when the message doesn't fit in the output buffer in pq_putmessage(),
the buffer is enlarged instead of trying to flush it.

Attached is a patch using that approach. This is a much smaller patch, and
easier to understand.


Agreed. Thanks for improving the patch.

pq_flush_if_writable() calls internal_flush() without using PG_TRY block.
This seems unsafe because for example pgwin32_waitforsinglesocket()
called by secure_write() can throw ERROR.


Perhaps it's time to give up on the assumption that the socket is in 
blocking mode except within those two functions. Attached patch adds the 
pq_set_nonblocking() function from your patch, and adds calls to it 
before all secure_read/write operations to put the socket in the right 
mode. There's only a few of those operations.


Should we use COMMERROR instead of ERROR if we fail to put the socket in 
the right mode?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e0ebee6..3192ef7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2019,6 +2019,28 @@ SET ENABLE_SEQSCAN TO OFF;
/para
   /listitem
  /varlistentry
+
+ varlistentry id=guc-replication-timeout xreflabel=replication_timeout
+  termvarnamereplication_timeout/varname (typeinteger/type)/term
+  indexterm
+   primaryvarnamereplication_timeout/ configuration parameter/primary
+  /indexterm
+  listitem
+   para
+Specifies the maximum time, in milliseconds, to wait for the reply
+from the standby before terminating replication.  This is useful for
+the primary server to detect the standby crash or network outage.
+A value of zero turns this off.  This parameter can only be set in
+the filenamepostgresql.conf/ file or on the server command line.
+The default value is 60 seconds.
+   /para
+   para
+To make the timeout work properly, xref linkend=guc-wal-receiver-status-interval
+must be enabled on the standby, and its value must be less than the
+value of varnamereplication_timeout/.
+   /para
+  /listitem
+ /varlistentry
  /variablelist
 /sect2
 
@@ -2216,6 +2238,11 @@ SET ENABLE_SEQSCAN TO OFF;
the filenamepostgresql.conf/ file or on the server command line.
The default value is 10 seconds.
   /para
+  para
+   When xref linkend=guc-replication-timeout is enabled on the primary,
+   varnamewal_receiver_status_interval/ must be enabled, and its value
+   must be less than the value of varnamereplication_timeout/.
+  /para
   /listitem
  /varlistentry
 
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 3c7b05b..db313a8 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -55,10 +55,12 @@
  *		pq_peekbyte		- peek at next byte from connection
  *		pq_putbytes		- send bytes to connection (not flushed until pq_flush)
  *		pq_flush		- flush pending output
+ *		pq_flush_if_writable - flush pending output if writable without blocking
  *		pq_getbyte_if_available - get a byte if available without blocking
  *
  * message-level I/O (and old-style-COPY-OUT cruft):
  *		pq_putmessage	- send a normal message (suppressed in COPY OUT mode)
+ *		pq_putmessage_noblock - buffer a normal message without blocking (suppressed in COPY OUT mode)
  *		pq_startcopyout - inform libpq that a COPY OUT transfer is beginning
  *		pq_endcopyout	- end a COPY OUT transfer
  *
@@ -92,6 +94,7 @@
 #include miscadmin.h
 #include storage/ipc.h
 #include utils/guc.h
+#include utils/memutils.h
 
 /*
  * Configuration options
@@ -105,15 +108,21 @@ static char sock_path[MAXPGPATH];
 
 
 /*
- * Buffers for low-level I/O
+ * Buffers for low-level I/O.
+ *
+ * The receive buffer is fixed size. Send buffer is usually 8k, but can be
+ * enlarged by pq_putmessage_noblock() if the message doesn't fit otherwise.
  */
 
-#define PQ_BUFFER_SIZE 8192
+#define PQ_SEND_BUFFER_SIZE 8192
+#define PQ_RECV_BUFFER_SIZE 8192
 
-static char PqSendBuffer[PQ_BUFFER_SIZE];
+static char *PqSendBuffer;
+static int	PqSendBufferSize;	/* Size send buffer */
 static int	PqSendPointer;		/* Next index to store a byte in PqSendBuffer */
+static int	PqSendStart;		/* Next index to send a byte in PqSendBuffer */
 
-static char PqRecvBuffer[PQ_BUFFER_SIZE];
+static char PqRecvBuffer[PQ_RECV_BUFFER_SIZE];
 static int	PqRecvPointer;		/* Next index to read a byte from PqRecvBuffer */
 static 

Re: [HACKERS] Replication server timeout patch

2011-03-25 Thread Robert Haas
On Wed, Mar 23, 2011 at 6:33 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 16.03.2011 11:11, Fujii Masao wrote:

 On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masaomasao.fu...@gmail.com
  wrote:

 Agreed. I'll change the patch.

 Done. I attached the updated patch.

 I don't much like the API for this. Walsender shouldn't need to know about
 the details of the FE/BE protocol, pq_putbytes_if_available() seems too low
 level to be useful.

 I think a better API would be to have a non-blocking version of
 pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, so
 that when the message doesn't fit in the output buffer in pq_putmessage(),
 the buffer is enlarged instead of trying to flush it.

 Attached is a patch using that approach. This is a much smaller patch, and
 easier to understand. I'm not totally happy with the walsender main loop, it
 seems to work as it is, but the logic has become quite complicated. Ideas
 welcome on how to simplify that.

Heikki, are you planning to commit this, either with or without
further revisions?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-24 Thread Fujii Masao
On Wed, Mar 23, 2011 at 7:33 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I don't much like the API for this. Walsender shouldn't need to know about
 the details of the FE/BE protocol, pq_putbytes_if_available() seems too low
 level to be useful.

 I think a better API would be to have a non-blocking version of
 pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, so
 that when the message doesn't fit in the output buffer in pq_putmessage(),
 the buffer is enlarged instead of trying to flush it.

 Attached is a patch using that approach. This is a much smaller patch, and
 easier to understand.

Agreed. Thanks for improving the patch.

pq_flush_if_writable() calls internal_flush() without using PG_TRY block.
This seems unsafe because for example pgwin32_waitforsinglesocket()
called by secure_write() can throw ERROR.

 I'm not totally happy with the walsender main loop, it
 seems to work as it is, but the logic has become quite complicated. Ideas
 welcome on how to simplify that.

As the patch I proposed before did, how about leaving XLogSend() instead
of WalSndLoop() to call pq_flush_if_writable() when there is pending data
in output buffer?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-23 Thread Heikki Linnakangas

On 16.03.2011 11:11, Fujii Masao wrote:

On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masaomasao.fu...@gmail.com  wrote:

Agreed. I'll change the patch.


Done. I attached the updated patch.


I don't much like the API for this. Walsender shouldn't need to know 
about the details of the FE/BE protocol, pq_putbytes_if_available() 
seems too low level to be useful.


I think a better API would be to have a non-blocking version of 
pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, 
so that when the message doesn't fit in the output buffer in 
pq_putmessage(), the buffer is enlarged instead of trying to flush it.


Attached is a patch using that approach. This is a much smaller patch, 
and easier to understand. I'm not totally happy with the walsender main 
loop, it seems to work as it is, but the logic has become quite 
complicated. Ideas welcome on how to simplify that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e0ebee6..3192ef7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2019,6 +2019,28 @@ SET ENABLE_SEQSCAN TO OFF;
/para
   /listitem
  /varlistentry
+
+ varlistentry id=guc-replication-timeout xreflabel=replication_timeout
+  termvarnamereplication_timeout/varname (typeinteger/type)/term
+  indexterm
+   primaryvarnamereplication_timeout/ configuration parameter/primary
+  /indexterm
+  listitem
+   para
+Specifies the maximum time, in milliseconds, to wait for the reply
+from the standby before terminating replication.  This is useful for
+the primary server to detect the standby crash or network outage.
+A value of zero turns this off.  This parameter can only be set in
+the filenamepostgresql.conf/ file or on the server command line.
+The default value is 60 seconds.
+   /para
+   para
+To make the timeout work properly, xref linkend=guc-wal-receiver-status-interval
+must be enabled on the standby, and its value must be less than the
+value of varnamereplication_timeout/.
+   /para
+  /listitem
+ /varlistentry
  /variablelist
 /sect2
 
@@ -2216,6 +2238,11 @@ SET ENABLE_SEQSCAN TO OFF;
the filenamepostgresql.conf/ file or on the server command line.
The default value is 10 seconds.
   /para
+  para
+   When xref linkend=guc-replication-timeout is enabled on the primary,
+   varnamewal_receiver_status_interval/ must be enabled, and its value
+   must be less than the value of varnamereplication_timeout/.
+  /para
   /listitem
  /varlistentry
 
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 3c7b05b..b6dc8cc 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -56,9 +56,11 @@
  *		pq_putbytes		- send bytes to connection (not flushed until pq_flush)
  *		pq_flush		- flush pending output
  *		pq_getbyte_if_available - get a byte if available without blocking
+ *		pq_flush_if_writable	- flush pending output if writable without blocking
  *
  * message-level I/O (and old-style-COPY-OUT cruft):
  *		pq_putmessage	- send a normal message (suppressed in COPY OUT mode)
+ *		pq_putmessage_noblock - buffer a normal message without blocking (suppressed in COPY OUT mode)
  *		pq_startcopyout - inform libpq that a COPY OUT transfer is beginning
  *		pq_endcopyout	- end a COPY OUT transfer
  *
@@ -92,6 +94,7 @@
 #include miscadmin.h
 #include storage/ipc.h
 #include utils/guc.h
+#include utils/memutils.h
 
 /*
  * Configuration options
@@ -108,12 +111,15 @@ static char sock_path[MAXPGPATH];
  * Buffers for low-level I/O
  */
 
-#define PQ_BUFFER_SIZE 8192
+#define PQ_SEND_BUFFER_SIZE 8192
+#define PQ_RECV_BUFFER_SIZE 8192
 
-static char PqSendBuffer[PQ_BUFFER_SIZE];
+static char *PqSendBuffer;
+static int	PqSendBufferSize;
 static int	PqSendPointer;		/* Next index to store a byte in PqSendBuffer */
+static int	PqSendStart;		/* Next index to send a byte in PqSendBuffer */
 
-static char PqRecvBuffer[PQ_BUFFER_SIZE];
+static char PqRecvBuffer[PQ_RECV_BUFFER_SIZE];
 static int	PqRecvPointer;		/* Next index to read a byte from PqRecvBuffer */
 static int	PqRecvLength;		/* End of data available in PqRecvBuffer */
 
@@ -142,7 +148,9 @@ static int	Setup_AF_UNIX(void);
 void
 pq_init(void)
 {
-	PqSendPointer = PqRecvPointer = PqRecvLength = 0;
+	PqSendBufferSize = PQ_SEND_BUFFER_SIZE;
+	PqSendBuffer = MemoryContextAlloc(TopMemoryContext, PqSendBufferSize);
+	PqSendPointer = PqSendStart = PqRecvPointer = PqRecvLength = 0;
 	PqCommBusy = false;
 	DoingCopyOut = false;
 	on_proc_exit(pq_close, 0);
@@ -762,7 +770,7 @@ pq_recvbuf(void)
 		int			r;
 
 		r = secure_read(MyProcPort, PqRecvBuffer + PqRecvLength,
-		PQ_BUFFER_SIZE - PqRecvLength);
+		PQ_RECV_BUFFER_SIZE - PqRecvLength);
 
 		if (r  0)
 		{
@@ -1138,10 +1146,10 @@ 

Re: [HACKERS] Replication server timeout patch

2011-03-16 Thread Fujii Masao
On Sat, Mar 12, 2011 at 4:34 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Mar 11, 2011 at 8:29 AM, Fujii Masao masao.fu...@gmail.com wrote:
 I think we should consider making this change for 9.1.  This is a real
 wart, and it's going to become even more of a problem with sync rep, I
 think.

 Yeah, that's a welcome! Please feel free to review the patch.

 I discussed this with Heikki on IM.

 I think we should rip all the GUC change stuff out of this patch and
 just decree that if you set a timeout, you get a timeout.  If you set
 this inconsistently with wal_receiver_status_interval, then you'll get
 lots of disconnects.  But that's your problem.  This may seem a little
 unfriendly, but the logic in here is quite complex and still isn't
 going to really provide that much protection against bad
 configurations.  The only realistic alternative I see is to define
 replication_timeout as a multiple of the client's
 wal_receiver_status_interval, but that seems quite annoyingly
 unfriendly.  A single replication_timeout that applies to all slaves
 doesn't cover every configuration someone might want, but it's simple
 and easy to understand and should cover 95% of cases.  If we find that
 it's really necessary to be able to customize it further, then we
 might go the route of adding the much-discussed standby registration
 stuff, where there's a separate config file or system table where you
 can stipulate that when a walsender with application_name=foo
 connects, you want it to get wal_receiver_status_interval=$FOO.  But I
 think that complexity can certainly wait until 9.2 or later.

 I also think that the default for replication_timeout should not be 0.
  Something like 60s seems about right.  That way, if you just use the
 default settings, you'll get pretty sane behavior - a connectivity
 hiccup that lasts more than a minute will bounce the client.  We've
 already gotten reports of people who thought they were replicating
 when they really weren't, and had to fiddle with settings and struggle
 to try to make it robust.  This should make things a lot nicer for
 people out of the box, but it won't if it's disabled out of the box.

 On another note, there doesn't appear to be any need to change the
 return value of WaitLatchOrSocket().

Agreed. I'll change the patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-16 Thread Fujii Masao
On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Agreed. I'll change the patch.

Done. I attached the updated patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


replication_timeout_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-11 Thread Fujii Masao
On Mon, Mar 7, 2011 at 8:47 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Mar 6, 2011 at 11:10 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Mar 6, 2011 at 5:03 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Why does internal_flush_if_writable compute bufptr differently from
 internal_flush?  And shouldn't it be static?

 It seems to me that this ought to be refactored so that you don't
 duplicate so much code.  Maybe static int internal_flush(bool
 nonblocking).

 I don't think that the while (bufptr  bufend) loop needs to contain
 the code to set and clear the nonblocking state.  You could do the
 whole loop with nonblocking mode turned on and then reenable it just
 once at the end.  Besides possibly being clearer, that would be more
 efficient and leave less room for unexpected failures.

 All these comments seem to make sense. Will fix. Thanks!

 Done. I attached the updated patch.

 I rebased the patch against current git master.

I added this replication timeout patch into next CF.

I explain why this feature is required for the future review;

Without this feature, walsender might unexpectedly remain for a while when
the standby crashes or the network outage happens. TCP keepalive can
improve this situation to a certain extent, but it's not perfect. Remaining
walsender can cause some problems.

For example, when hot_standby_feedback is enabled, such a remaining
walsender would prevent oldest xmin from advancing and interfere with
vacuuming on the master. For example, when you use synchronous
replication and walsender in SYNC mode gets stuck, any synchronous
standby candidate cannot switch to SYNC mode until that walsender exits,
and all the transactions would pause.

This feature causes walsender to exit when there is no reply from the
standby before the replication timeout expires. Then we can avoid the
above problems.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-11 Thread Robert Haas
On Fri, Mar 11, 2011 at 8:14 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Mar 7, 2011 at 8:47 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Mar 6, 2011 at 11:10 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Mar 6, 2011 at 5:03 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Why does internal_flush_if_writable compute bufptr differently from
 internal_flush?  And shouldn't it be static?

 It seems to me that this ought to be refactored so that you don't
 duplicate so much code.  Maybe static int internal_flush(bool
 nonblocking).

 I don't think that the while (bufptr  bufend) loop needs to contain
 the code to set and clear the nonblocking state.  You could do the
 whole loop with nonblocking mode turned on and then reenable it just
 once at the end.  Besides possibly being clearer, that would be more
 efficient and leave less room for unexpected failures.

 All these comments seem to make sense. Will fix. Thanks!

 Done. I attached the updated patch.

 I rebased the patch against current git master.

 I added this replication timeout patch into next CF.

 I explain why this feature is required for the future review;

 Without this feature, walsender might unexpectedly remain for a while when
 the standby crashes or the network outage happens. TCP keepalive can
 improve this situation to a certain extent, but it's not perfect. Remaining
 walsender can cause some problems.

 For example, when hot_standby_feedback is enabled, such a remaining
 walsender would prevent oldest xmin from advancing and interfere with
 vacuuming on the master. For example, when you use synchronous
 replication and walsender in SYNC mode gets stuck, any synchronous
 standby candidate cannot switch to SYNC mode until that walsender exits,
 and all the transactions would pause.

 This feature causes walsender to exit when there is no reply from the
 standby before the replication timeout expires. Then we can avoid the
 above problems.

I think we should consider making this change for 9.1.  This is a real
wart, and it's going to become even more of a problem with sync rep, I
think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-11 Thread Fujii Masao
On Fri, Mar 11, 2011 at 10:18 PM, Robert Haas robertmh...@gmail.com wrote:
 I added this replication timeout patch into next CF.

 I explain why this feature is required for the future review;

 Without this feature, walsender might unexpectedly remain for a while when
 the standby crashes or the network outage happens. TCP keepalive can
 improve this situation to a certain extent, but it's not perfect. Remaining
 walsender can cause some problems.

 For example, when hot_standby_feedback is enabled, such a remaining
 walsender would prevent oldest xmin from advancing and interfere with
 vacuuming on the master. For example, when you use synchronous
 replication and walsender in SYNC mode gets stuck, any synchronous
 standby candidate cannot switch to SYNC mode until that walsender exits,
 and all the transactions would pause.

 This feature causes walsender to exit when there is no reply from the
 standby before the replication timeout expires. Then we can avoid the
 above problems.

 I think we should consider making this change for 9.1.  This is a real
 wart, and it's going to become even more of a problem with sync rep, I
 think.

Yeah, that's a welcome! Please feel free to review the patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-11 Thread Bruce Momjian
Fujii Masao wrote:
 On Fri, Mar 11, 2011 at 10:18 PM, Robert Haas robertmh...@gmail.com wrote:
  I added this replication timeout patch into next CF.
 
  I explain why this feature is required for the future review;
 
  Without this feature, walsender might unexpectedly remain for a while when
  the standby crashes or the network outage happens. TCP keepalive can
  improve this situation to?a certain extent, but it's not perfect. Remaining
  walsender can cause some problems.
 
  For example, when hot_standby_feedback is enabled, such a remaining
  walsender would prevent oldest xmin from advancing and interfere with
  vacuuming on the master. For example, when you use synchronous
  replication and walsender in SYNC mode gets stuck, any synchronous
  standby candidate cannot switch to SYNC mode until that walsender exits,
  and all the transactions would pause.
 
  This feature causes walsender to exit when there is no reply from the
  standby before the replication timeout expires. Then we can avoid the
  above problems.
 
  I think we should consider making this change for 9.1. ?This is a real
  wart, and it's going to become even more of a problem with sync rep, I
  think.
 
 Yeah, that's a welcome! Please feel free to review the patch.

It is already in the next commitfest, so if someone wants to add it as
an open 9.1 item, go ahead.  I am unclear of this so I am not adding it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-11 Thread Robert Haas
On Fri, Mar 11, 2011 at 8:29 AM, Fujii Masao masao.fu...@gmail.com wrote:
 I think we should consider making this change for 9.1.  This is a real
 wart, and it's going to become even more of a problem with sync rep, I
 think.

 Yeah, that's a welcome! Please feel free to review the patch.

I discussed this with Heikki on IM.

I think we should rip all the GUC change stuff out of this patch and
just decree that if you set a timeout, you get a timeout.  If you set
this inconsistently with wal_receiver_status_interval, then you'll get
lots of disconnects.  But that's your problem.  This may seem a little
unfriendly, but the logic in here is quite complex and still isn't
going to really provide that much protection against bad
configurations.  The only realistic alternative I see is to define
replication_timeout as a multiple of the client's
wal_receiver_status_interval, but that seems quite annoyingly
unfriendly.  A single replication_timeout that applies to all slaves
doesn't cover every configuration someone might want, but it's simple
and easy to understand and should cover 95% of cases.  If we find that
it's really necessary to be able to customize it further, then we
might go the route of adding the much-discussed standby registration
stuff, where there's a separate config file or system table where you
can stipulate that when a walsender with application_name=foo
connects, you want it to get wal_receiver_status_interval=$FOO.  But I
think that complexity can certainly wait until 9.2 or later.

I also think that the default for replication_timeout should not be 0.
 Something like 60s seems about right.  That way, if you just use the
default settings, you'll get pretty sane behavior - a connectivity
hiccup that lasts more than a minute will bounce the client.  We've
already gotten reports of people who thought they were replicating
when they really weren't, and had to fiddle with settings and struggle
to try to make it robust.  This should make things a lot nicer for
people out of the box, but it won't if it's disabled out of the box.

On another note, there doesn't appear to be any need to change the
return value of WaitLatchOrSocket().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-06 Thread Fujii Masao
On Sun, Mar 6, 2011 at 3:23 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Feb 28, 2011 at 8:08 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Feb 27, 2011 at 11:52 AM, Fujii Masao masao.fu...@gmail.com wrote:
 There are two things that I think are pretty clear.  If the receiver
 has wal_receiver_status_interval=0, then we should ignore
 replication_timeout for that connection.

 The patch still doesn't check that wal_receiver_status_interval
 is set up properly. I'll implement that later.

 Done. I attached the updated patch.

 Why does internal_flush_if_writable compute bufptr differently from
 internal_flush?  And shouldn't it be static?

 It seems to me that this ought to be refactored so that you don't
 duplicate so much code.  Maybe static int internal_flush(bool
 nonblocking).

 I don't think that the while (bufptr  bufend) loop needs to contain
 the code to set and clear the nonblocking state.  You could do the
 whole loop with nonblocking mode turned on and then reenable it just
 once at the end.  Besides possibly being clearer, that would be more
 efficient and leave less room for unexpected failures.

All these comments seem to make sense. Will fix. Thanks!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-05 Thread Robert Haas
On Mon, Feb 28, 2011 at 8:08 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Feb 27, 2011 at 11:52 AM, Fujii Masao masao.fu...@gmail.com wrote:
 There are two things that I think are pretty clear.  If the receiver
 has wal_receiver_status_interval=0, then we should ignore
 replication_timeout for that connection.

 The patch still doesn't check that wal_receiver_status_interval
 is set up properly. I'll implement that later.

 Done. I attached the updated patch.

Why does internal_flush_if_writable compute bufptr differently from
internal_flush?  And shouldn't it be static?

It seems to me that this ought to be refactored so that you don't
duplicate so much code.  Maybe static int internal_flush(bool
nonblocking).

I don't think that the while (bufptr  bufend) loop needs to contain
the code to set and clear the nonblocking state.  You could do the
whole loop with nonblocking mode turned on and then reenable it just
once at the end.  Besides possibly being clearer, that would be more
efficient and leave less room for unexpected failures.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-02-28 Thread Fujii Masao
On Sun, Feb 27, 2011 at 11:52 AM, Fujii Masao masao.fu...@gmail.com wrote:
 There are two things that I think are pretty clear.  If the receiver
 has wal_receiver_status_interval=0, then we should ignore
 replication_timeout for that connection.

 The patch still doesn't check that wal_receiver_status_interval
 is set up properly. I'll implement that later.

Done. I attached the updated patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


replication_timeout_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-02-26 Thread Fujii Masao
On Fri, Feb 18, 2011 at 12:10 PM, Robert Haas robertmh...@gmail.com wrote:
 IMHO, that's so broken as to be useless.

 I would really like to have a solution to this problem, though.
 Relying on TCP keepalives is weak.

Agreed.

I updated the replication timeout patch which I submitted before.
http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com

Since the patch implements also non-blocking send functions,
the timeout can work properly even when the send buffer has
been filled up.

 There are two things that I think are pretty clear.  If the receiver
 has wal_receiver_status_interval=0, then we should ignore
 replication_timeout for that connection.

The patch still doesn't check that wal_receiver_status_interval
is set up properly. I'll implement that later.

Regards,

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 2015,2020  SET ENABLE_SEQSCAN TO OFF;
--- 2015,2042 
 /para
/listitem
   /varlistentry
+ 
+  varlistentry id=guc-replication-timeout xreflabel=replication_timeout
+   termvarnamereplication_timeout/varname (typeinteger/type)/term
+   indexterm
+primaryvarnamereplication_timeout/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ Specifies the maximum time, in milliseconds, to wait for the reply
+ from the standby before terminating replication.  This is useful for
+ the primary server to detect the standby crash or network outage.
+ A value of zero (the default) turns this off.  This parameter can
+ only be set in the filenamepostgresql.conf/ file or on the server
+ command line.
+/para
+para
+ To make the timeout work properly, xref linkend=guc-wal-receiver-status-interval
+ must be enabled on the standby, and its value must be less than the
+ value of varnamereplication_timeout/.
+/para
+   /listitem
+  /varlistentry
   /variablelist
  /sect2
  
***
*** 2125,2130  SET ENABLE_SEQSCAN TO OFF;
--- 2147,2157 
 the filenamepostgresql.conf/ file or on the server command line.
 The default value is 10 seconds.
/para
+   para
+When xref linkend=guc-replication-timeout is enabled on the primary,
+varnamewal_receiver_status_interval/ must be enabled, and its value
+must be less than the value of varnamereplication_timeout/.
+   /para
/listitem
   /varlistentry
  
*** a/src/backend/libpq/pqcomm.c
--- b/src/backend/libpq/pqcomm.c
***
*** 56,61 
--- 56,63 
   *		pq_putbytes		- send bytes to connection (not flushed until pq_flush)
   *		pq_flush		- flush pending output
   *		pq_getbyte_if_available - get a byte if available without blocking
+  *		pq_putbytes_if_writable	- send bytes to connection if writable without blocking
+  *		pq_flush_if_writable	- flush pending output if writable without blocking
   *
   * message-level I/O (and old-style-COPY-OUT cruft):
   *		pq_putmessage	- send a normal message (suppressed in COPY OUT mode)
***
*** 112,117  static char sock_path[MAXPGPATH];
--- 114,120 
  
  static char PqSendBuffer[PQ_BUFFER_SIZE];
  static int	PqSendPointer;		/* Next index to store a byte in PqSendBuffer */
+ static int	PqSendStart;		/* Next index to send a byte in PqSendBuffer */
  
  static char PqRecvBuffer[PQ_BUFFER_SIZE];
  static int	PqRecvPointer;		/* Next index to read a byte from PqRecvBuffer */
***
*** 128,133  static bool DoingCopyOut;
--- 131,137 
  static void pq_close(int code, Datum arg);
  static int	internal_putbytes(const char *s, size_t len);
  static int	internal_flush(void);
+ static int	internal_flush_if_writable(void);
  
  #ifdef HAVE_UNIX_SOCKETS
  static int	Lock_AF_UNIX(unsigned short portNumber, char *unixSocketName);
***
*** 1153,1158  internal_putbytes(const char *s, size_t len)
--- 1157,1212 
  }
  
  /* 
+  *		pq_putbytes_if_writable - send bytes to connection (not flushed
+  *			until pq_flush), if writable
+  *
+  * Returns the number of bytes written without blocking, or EOF if trouble.
+  * 
+  */
+ int
+ pq_putbytes_if_writable(const char *s, size_t len)
+ {
+ 	size_t		amount;
+ 	size_t		nwritten = 0;
+ 
+ 	/* Should not be called by old-style COPY OUT */
+ 	Assert(!DoingCopyOut);
+ 	/* No-op if reentrant call */
+ 	if (PqCommBusy)
+ 		return 0;
+ 	PqCommBusy = true;
+ 
+ 	while (len  0)
+ 	{
+ 		/* If buffer is full, then flush it out */
+ 		if (PqSendPointer = PQ_BUFFER_SIZE)
+ 		{
+ 			int		r;
+ 
+ 			r = internal_flush_if_writable();
+ 			if (r == 0)
+ break;
+ 			if (r == EOF)
+ 			{
+ PqCommBusy = false;
+ return r;
+ 			}
+ 		}
+ 		

Re: [HACKERS] Replication server timeout patch

2011-02-17 Thread Simon Riggs
On Wed, 2011-02-16 at 11:34 +0900, Fujii Masao wrote:
 On Tue, Feb 15, 2011 at 7:13 AM, Daniel Farina dan...@heroku.com wrote:
  On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote:
  On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote:
  Context diff equivalent attached.
 
  Thanks for the patch!
 
  As I said before, the timeout which this patch provides doesn't work well
  when the walsender gets blocked in sending WAL. At first, we would
  need to implement a non-blocking write function as an infrastructure
  of the replication timeout, I think.
  http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com
 
  Interesting point...if that's accepted as required-for-commit, what
  are the perceptions of the odds that, presuming I can write the code
  quickly enough, that there's enough infrastructure/ports already in
  postgres to allow for a non-blocking write on all our supported
  platforms?
 
 I'm not sure if there's already enough infrastructure for a non-blocking
 write. But the patch which I submitted before might help to implement that.
 http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com

So, in summary, the position is that we have a timeout, but that timeout
doesn't work in all cases. But it does work in some, so that seems
enough for me to say let's commit. Not committing gives us nothing at
all, which is as much use as a chocolate teapot.

I will be looking to commit this tomorrow morning, unless I hear some
clear No comments, with reasons.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-02-17 Thread Robert Haas
On Thu, Feb 17, 2011 at 4:21 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2011-02-16 at 11:34 +0900, Fujii Masao wrote:
 On Tue, Feb 15, 2011 at 7:13 AM, Daniel Farina dan...@heroku.com wrote:
  On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com 
  wrote:
  On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote:
  Context diff equivalent attached.
 
  Thanks for the patch!
 
  As I said before, the timeout which this patch provides doesn't work well
  when the walsender gets blocked in sending WAL. At first, we would
  need to implement a non-blocking write function as an infrastructure
  of the replication timeout, I think.
  http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com
 
  Interesting point...if that's accepted as required-for-commit, what
  are the perceptions of the odds that, presuming I can write the code
  quickly enough, that there's enough infrastructure/ports already in
  postgres to allow for a non-blocking write on all our supported
  platforms?

 I'm not sure if there's already enough infrastructure for a non-blocking
 write. But the patch which I submitted before might help to implement that.
 http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com

 So, in summary, the position is that we have a timeout, but that timeout
 doesn't work in all cases. But it does work in some, so that seems
 enough for me to say let's commit. Not committing gives us nothing at
 all, which is as much use as a chocolate teapot.

 I will be looking to commit this tomorrow morning, unless I hear some
 clear No comments, with reasons.

I guess the question is whether it works in 10% of cases or 95% of
cases.  In the first case there's probably no point in pretending we
have a feature if it doesn't really work.  In the second case, it
might make sense.  But I don't have a good feeling for which it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-02-17 Thread Josh Berkus

 So, in summary, the position is that we have a timeout, but that timeout
 doesn't work in all cases. But it does work in some, so that seems
 enough for me to say let's commit. Not committing gives us nothing at
 all, which is as much use as a chocolate teapot.

Can someone summarize the cases where it does and doesn't work?
There's been a longish gap in this thread.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-02-17 Thread Simon Riggs
On Thu, 2011-02-17 at 16:42 -0500, Robert Haas wrote:
 
  So, in summary, the position is that we have a timeout, but that timeout
  doesn't work in all cases. But it does work in some, so that seems
  enough for me to say let's commit. Not committing gives us nothing at
  all, which is as much use as a chocolate teapot.
 
  I will be looking to commit this tomorrow morning, unless I hear some
  clear No comments, with reasons.
 
 I guess the question is whether it works in 10% of cases or 95% of
 cases.  In the first case there's probably no point in pretending we
 have a feature if it doesn't really work.  In the second case, it
 might make sense.  But I don't have a good feeling for which it is.

Well, I guess the people that wanted to wait forever may get their wish.

For sync rep, I intend to put in place a client timeout, which we do
have code for. The server side timeout still makes sense, but it's not a
requirement for sync rep.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-02-17 Thread Fujii Masao
On Fri, Feb 18, 2011 at 7:55 AM, Josh Berkus j...@agliodbs.com wrote:
 So, in summary, the position is that we have a timeout, but that timeout
 doesn't work in all cases. But it does work in some, so that seems
 enough for me to say let's commit. Not committing gives us nothing at
 all, which is as much use as a chocolate teapot.

 Can someone summarize the cases where it does and doesn't work?
 There's been a longish gap in this thread.

The timeout doesn't work when walsender gets blocked during sending the
WAL because the send buffer has been filled up, I'm afraid. IOW, it doesn't
work when the standby becomes unresponsive while WAL is generated on
the master one after another. Since walsender tries to continue sending the
WAL while the standby is unresponsive, the send buffer gets filled up and
the blocking send function (e.g., pq_flush) blocks the walsender.

OTOH, if the standby becomes unresponsive when there is no workload
which causes WAL, the timeout would work.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-02-17 Thread Robert Haas
On Thu, Feb 17, 2011 at 9:10 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Feb 18, 2011 at 7:55 AM, Josh Berkus j...@agliodbs.com wrote:
 So, in summary, the position is that we have a timeout, but that timeout
 doesn't work in all cases. But it does work in some, so that seems
 enough for me to say let's commit. Not committing gives us nothing at
 all, which is as much use as a chocolate teapot.

 Can someone summarize the cases where it does and doesn't work?
 There's been a longish gap in this thread.

 The timeout doesn't work when walsender gets blocked during sending the
 WAL because the send buffer has been filled up, I'm afraid. IOW, it doesn't
 work when the standby becomes unresponsive while WAL is generated on
 the master one after another. Since walsender tries to continue sending the
 WAL while the standby is unresponsive, the send buffer gets filled up and
 the blocking send function (e.g., pq_flush) blocks the walsender.

 OTOH, if the standby becomes unresponsive when there is no workload
 which causes WAL, the timeout would work.

IMHO, that's so broken as to be useless.

I would really like to have a solution to this problem, though.
Relying on TCP keepalives is weak.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-02-15 Thread Robert Haas
On Mon, Feb 14, 2011 at 5:13 PM, Daniel Farina dan...@heroku.com wrote:
 On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote:
 Context diff equivalent attached.

 Thanks for the patch!

 As I said before, the timeout which this patch provides doesn't work well
 when the walsender gets blocked in sending WAL. At first, we would
 need to implement a non-blocking write function as an infrastructure
 of the replication timeout, I think.
 http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com

 Interesting point...if that's accepted as required-for-commit, what
 are the perceptions of the odds that, presuming I can write the code
 quickly enough, that there's enough infrastructure/ports already in
 postgres to allow for a non-blocking write on all our supported
 platforms?

Are you working on this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-02-15 Thread Fujii Masao
On Tue, Feb 15, 2011 at 7:13 AM, Daniel Farina dan...@heroku.com wrote:
 On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote:
 Context diff equivalent attached.

 Thanks for the patch!

 As I said before, the timeout which this patch provides doesn't work well
 when the walsender gets blocked in sending WAL. At first, we would
 need to implement a non-blocking write function as an infrastructure
 of the replication timeout, I think.
 http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com

 Interesting point...if that's accepted as required-for-commit, what
 are the perceptions of the odds that, presuming I can write the code
 quickly enough, that there's enough infrastructure/ports already in
 postgres to allow for a non-blocking write on all our supported
 platforms?

I'm not sure if there's already enough infrastructure for a non-blocking
write. But the patch which I submitted before might help to implement that.
http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-02-14 Thread Fujii Masao
On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote:
 Context diff equivalent attached.

Thanks for the patch!

As I said before, the timeout which this patch provides doesn't work well
when the walsender gets blocked in sending WAL. At first, we would
need to implement a non-blocking write function as an infrastructure
of the replication timeout, I think.
http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-02-14 Thread Daniel Farina
On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote:
 Context diff equivalent attached.

 Thanks for the patch!

 As I said before, the timeout which this patch provides doesn't work well
 when the walsender gets blocked in sending WAL. At first, we would
 need to implement a non-blocking write function as an infrastructure
 of the replication timeout, I think.
 http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com

Interesting point...if that's accepted as required-for-commit, what
are the perceptions of the odds that, presuming I can write the code
quickly enough, that there's enough infrastructure/ports already in
postgres to allow for a non-blocking write on all our supported
platforms?

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-02-14 Thread Simon Riggs
On Mon, 2011-02-14 at 14:13 -0800, Daniel Farina wrote:
 On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote:
  On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote:
  Context diff equivalent attached.
 
  Thanks for the patch!
 
  As I said before, the timeout which this patch provides doesn't work well
  when the walsender gets blocked in sending WAL. At first, we would
  need to implement a non-blocking write function as an infrastructure
  of the replication timeout, I think.
  http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com

I wasn't aware that had been raised before. Thanks for noting it again.

I guess that's why you thought wait forever was a good idea ;-)

 Interesting point...if that's accepted as required-for-commit, what
 are the perceptions of the odds that, presuming I can write the code
 quickly enough, that there's enough infrastructure/ports already in
 postgres to allow for a non-blocking write on all our supported
 platforms?

I'd like to see what you come up with. I would rate that as important,
though not essential for sync replication.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Replication server timeout patch

2011-02-11 Thread Daniel Farina
Hello list,

I split this out of the synchronous replication patch for independent
review. I'm dashing out the door, so I haven't put it on the CF yet or
anything, but I just wanted to get it out there...I'll be around in
Not Too Long to finish any other details.

--
fdr
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 2121,2126  SET ENABLE_SEQSCAN TO OFF;
--- 2121,2140 
/listitem
   /varlistentry
  
+  varlistentry id=guc-replication-timeout-server xreflabel=replication_timeout_server
+   termvarnamereplication_timeout_server/varname (typeinteger/type)/term
+   indexterm
+primaryvarnamereplication_timeout_server/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ If the primary server does not receive a reply from a standby server
+ within varnamereplication_timeout_server/ seconds then the
+ primary will terminate the replication connection.
+/para
+   /listitem
+  /varlistentry
+ 
   /variablelist
  /sect2
 /sect1
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***
*** 73,78  bool		am_walsender = false;		/* Am I a walsender process ? */
--- 73,80 
  /* User-settable parameters for walsender */
  int			max_wal_senders = 0;	/* the maximum number of concurrent walsenders */
  int			WalSndDelay = 200;	/* max sleep time between some actions */
+ int			replication_timeout_server; /* If the receiver takes too long, time
+ 		 * out and die after this duration */
  
  /*
   * These variables are used similarly to openLogFile/Id/Seg/Off,
***
*** 89,94  static uint32 sendOff = 0;
--- 91,99 
   */
  static XLogRecPtr sentPtr = {0, 0};
  
+ /* Remembers the last time the standby has notified the primary of progress */
+ static TimestampTz last_reply_timestamp;
+ 
  /* Flags set by signal handlers for later service in main loop */
  static volatile sig_atomic_t got_SIGHUP = false;
  volatile sig_atomic_t walsender_shutdown_requested = false;
***
*** 250,255  WalSndHandshake(void)
--- 255,265 
  		 errmsg(invalid standby handshake message type %d, firstchar)));
  		}
  	}
+ 
+ 	/*
+  * Initialize our timeout checking mechanism.
+  */
+ 	last_reply_timestamp = GetCurrentTimestamp();
  }
  
  /*
***
*** 616,632  WalSndLoop(void)
  
  			if (!XLogSend(output_message, caughtup))
  break;
! 			if (caughtup  !got_SIGHUP  !walsender_ready_to_stop  !walsender_shutdown_requested)
  			{
! /*
!  * XXX: We don't really need the periodic wakeups anymore,
!  * WaitLatchOrSocket should reliably wake up as soon as
!  * something interesting happens.
!  */
  
  /* Sleep */
! WaitLatchOrSocket(MyWalSnd-latch, MyProcPort-sock,
!   WalSndDelay * 1000L);
  			}
  		}
  		else
--- 626,650 
  
  			if (!XLogSend(output_message, caughtup))
  break;
! 			if (caughtup  !got_SIGHUP  !walsender_ready_to_stop 
! !walsender_shutdown_requested)
  			{
! long timeout;
! 
! if (replication_timeout_server == -1)
! 	timeout = -1L;
! else
! 	timeout = 100L * replication_timeout_server;
  
  /* Sleep */
! if (WaitLatchOrSocket(MyWalSnd-latch, MyProcPort-sock,
! 	  timeout) == 0)
! {
! 	ereport(LOG,
! 			(errmsg(streaming replication timeout after %d s,
! 	replication_timeout_server)));
! 	break;
! }
  			}
  		}
  		else
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***
*** 1484,1489  static struct config_int ConfigureNamesInt[] =
--- 1484,1499 
  	},
  
  	{
+ 		{replication_timeout_server, PGC_SIGHUP, WAL_SETTINGS,
+ 		 gettext_noop(Replication connection will timeout after this duration.),
+ 		 NULL,
+ 		 GUC_UNIT_S
+ 		},
+ 		replication_timeout_server,
+ 		30, -1, INT_MAX, NULL, NULL
+ 	},
+ 
+ 	{
  		{temp_buffers, PGC_USERSET, RESOURCES_MEM,
  			gettext_noop(Sets the maximum number of temporary buffers used by each session.),
  			NULL,
*** a/src/backend/utils/misc/postgresql.conf.sample
--- b/src/backend/utils/misc/postgresql.conf.sample
***
*** 203,208 
--- 203,209 
  	# when reading streaming WAL;
  	# -1 allows indefinite delay
  #wal_receiver_status_interval = 10s	# replies at least this often, 0 disables
+ #replication_timeout_server = 120	# -1 means wait forever
  
  
  #--
*** a/src/include/replication/walsender.h
--- b/src/include/replication/walsender.h
***
*** 70,75  extern volatile sig_atomic_t walsender_ready_to_stop;
--- 70,76 
  /* user-settable parameters */
  extern int	WalSndDelay;
  extern int	max_wal_senders;
+ extern int	replication_timeout_server;
  
  extern int	WalSenderMain(void);
  extern void WalSndSignals(void);

-- 
Sent via pgsql-hackers mailing 

Re: [HACKERS] Replication server timeout patch

2011-02-11 Thread Robert Haas
On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina drfar...@acm.org wrote:
 I split this out of the synchronous replication patch for independent
 review. I'm dashing out the door, so I haven't put it on the CF yet or
 anything, but I just wanted to get it out there...I'll be around in
 Not Too Long to finish any other details.

This looks like a useful and separately committable change.

However, it looks to me like this renders wal_sender_delay aka
WalSndDelay completely unused.  If we don't need that GUC any more, we
should rip it out completely.

The comment in WalSndHandshake should have a tab at the beginning of
every line.  Right now the first line has a tab and the rest have
spaces.

The first hunk in WalSndLoop is a meaningless whitespace change.

I wonder if we ought to just call this replication_timeout, rather
than replication_timeout_server.  Simon's patch (from which this
extracted) also has replication_timeout_client, but the two aren't
symmetrical.  The replication_timeout_client in this patch is the
amount of time after which the master acknowledges the commit even
though the synchronous standby hasn't acked yet.  So it only applies
to the synchronous replication case, whereas this is useful for both
synchronous and asynchronous replication.  I'm inclined to think that
knob is utterly useless anyway; surely waiting more than zero will
reduce the throughput of the system to some minute fraction of its
normal value, while waiting less than infinity throws out the data
guarantee that made you pick synchronous replication in the first
place.  Even if we do decide to keep that knob, I don't think we'll
want the value to be symmetric with this one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Replication server timeout patch

2011-02-11 Thread Daniel Farina
Hello list,

I split this out of the synchronous replication patch for independent
review. I'm dashing out the door, so I haven't put it on the CF yet or
anything, but I just wanted to get it out there...I'll be around in
Not Too Long to finish any other details.

--
fdr
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 2121,2126  SET ENABLE_SEQSCAN TO OFF;
--- 2121,2140 
/listitem
   /varlistentry
  
+  varlistentry id=guc-replication-timeout-server xreflabel=replication_timeout_server
+   termvarnamereplication_timeout_server/varname (typeinteger/type)/term
+   indexterm
+primaryvarnamereplication_timeout_server/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ If the primary server does not receive a reply from a standby server
+ within varnamereplication_timeout_server/ seconds then the
+ primary will terminate the replication connection.
+/para
+   /listitem
+  /varlistentry
+ 
   /variablelist
  /sect2
 /sect1
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***
*** 73,78  bool		am_walsender = false;		/* Am I a walsender process ? */
--- 73,80 
  /* User-settable parameters for walsender */
  int			max_wal_senders = 0;	/* the maximum number of concurrent walsenders */
  int			WalSndDelay = 200;	/* max sleep time between some actions */
+ int			replication_timeout_server; /* If the receiver takes too long, time
+ 		 * out and die after this duration */
  
  /*
   * These variables are used similarly to openLogFile/Id/Seg/Off,
***
*** 89,94  static uint32 sendOff = 0;
--- 91,99 
   */
  static XLogRecPtr sentPtr = {0, 0};
  
+ /* Remembers the last time the standby has notified the primary of progress */
+ static TimestampTz last_reply_timestamp;
+ 
  /* Flags set by signal handlers for later service in main loop */
  static volatile sig_atomic_t got_SIGHUP = false;
  volatile sig_atomic_t walsender_shutdown_requested = false;
***
*** 250,255  WalSndHandshake(void)
--- 255,265 
  		 errmsg(invalid standby handshake message type %d, firstchar)));
  		}
  	}
+ 
+ 	/*
+  * Initialize our timeout checking mechanism.
+  */
+ 	last_reply_timestamp = GetCurrentTimestamp();
  }
  
  /*
***
*** 616,632  WalSndLoop(void)
  
  			if (!XLogSend(output_message, caughtup))
  break;
! 			if (caughtup  !got_SIGHUP  !walsender_ready_to_stop  !walsender_shutdown_requested)
  			{
! /*
!  * XXX: We don't really need the periodic wakeups anymore,
!  * WaitLatchOrSocket should reliably wake up as soon as
!  * something interesting happens.
!  */
  
  /* Sleep */
! WaitLatchOrSocket(MyWalSnd-latch, MyProcPort-sock,
!   WalSndDelay * 1000L);
  			}
  		}
  		else
--- 626,650 
  
  			if (!XLogSend(output_message, caughtup))
  break;
! 			if (caughtup  !got_SIGHUP  !walsender_ready_to_stop 
! !walsender_shutdown_requested)
  			{
! long timeout;
! 
! if (replication_timeout_server == -1)
! 	timeout = -1L;
! else
! 	timeout = 100L * replication_timeout_server;
  
  /* Sleep */
! if (WaitLatchOrSocket(MyWalSnd-latch, MyProcPort-sock,
! 	  timeout) == 0)
! {
! 	ereport(LOG,
! 			(errmsg(streaming replication timeout after %d s,
! 	replication_timeout_server)));
! 	break;
! }
  			}
  		}
  		else
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***
*** 1484,1489  static struct config_int ConfigureNamesInt[] =
--- 1484,1499 
  	},
  
  	{
+ 		{replication_timeout_server, PGC_SIGHUP, WAL_SETTINGS,
+ 		 gettext_noop(Replication connection will timeout after this duration.),
+ 		 NULL,
+ 		 GUC_UNIT_S
+ 		},
+ 		replication_timeout_server,
+ 		30, -1, INT_MAX, NULL, NULL
+ 	},
+ 
+ 	{
  		{temp_buffers, PGC_USERSET, RESOURCES_MEM,
  			gettext_noop(Sets the maximum number of temporary buffers used by each session.),
  			NULL,
*** a/src/backend/utils/misc/postgresql.conf.sample
--- b/src/backend/utils/misc/postgresql.conf.sample
***
*** 203,208 
--- 203,209 
  	# when reading streaming WAL;
  	# -1 allows indefinite delay
  #wal_receiver_status_interval = 10s	# replies at least this often, 0 disables
+ #replication_timeout_server = 120	# -1 means wait forever
  
  
  #--
*** a/src/include/replication/walsender.h
--- b/src/include/replication/walsender.h
***
*** 70,75  extern volatile sig_atomic_t walsender_ready_to_stop;
--- 70,76 
  /* user-settable parameters */
  extern int	WalSndDelay;
  extern int	max_wal_senders;
+ extern int	replication_timeout_server;
  
  extern int	WalSenderMain(void);
  extern void WalSndSignals(void);

-- 
Sent via pgsql-hackers mailing 

Re: [HACKERS] Replication server timeout patch

2011-02-11 Thread Heikki Linnakangas

On 11.02.2011 22:11, Robert Haas wrote:

On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farinadrfar...@acm.org  wrote:

I split this out of the synchronous replication patch for independent
review. I'm dashing out the door, so I haven't put it on the CF yet or
anything, but I just wanted to get it out there...I'll be around in
Not Too Long to finish any other details.


This looks like a useful and separately committable change.


Hmm, so this patch implements a watchdog, where the master disconnects 
the standby if the heartbeat from the standby stops for more than 
'replication_[server]_timeout' seconds. The standby sends the heartbeat 
every wal_receiver_status_interval seconds.


It would be nice if the master and standby could negotiate those 
settings. As the patch stands, it's easy to have a pathological 
configuration where replication_server_timeout  
wal_receiver_status_interval, so that the master repeatedly disconnects 
the standby because it doesn't reply in time. Maybe the standby should 
report how often it's going to send a heartbeat, and master should wait 
for that long + some safety margin. Or maybe the master should tell the 
standby how often it should send the heartbeat?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-02-11 Thread Robert Haas
On Fri, Feb 11, 2011 at 4:30 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 11.02.2011 22:11, Robert Haas wrote:

 On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farinadrfar...@acm.org  wrote:

 I split this out of the synchronous replication patch for independent
 review. I'm dashing out the door, so I haven't put it on the CF yet or
 anything, but I just wanted to get it out there...I'll be around in
 Not Too Long to finish any other details.

 This looks like a useful and separately committable change.

 Hmm, so this patch implements a watchdog, where the master disconnects the
 standby if the heartbeat from the standby stops for more than
 'replication_[server]_timeout' seconds. The standby sends the heartbeat
 every wal_receiver_status_interval seconds.

 It would be nice if the master and standby could negotiate those settings.
 As the patch stands, it's easy to have a pathological configuration where
 replication_server_timeout  wal_receiver_status_interval, so that the
 master repeatedly disconnects the standby because it doesn't reply in time.
 Maybe the standby should report how often it's going to send a heartbeat,
 and master should wait for that long + some safety margin. Or maybe the
 master should tell the standby how often it should send the heartbeat?

I guess the biggest use case for that behavior would be in a case
where you have two standbys, one of which doesn't send a heartbeat and
the other of which does.  Then you really can't rely on a single
timeout.

Maybe we could change the server parameter to indicate what multiple
of wal_receiver_status_interval causes a hangup, and then change the
client to notify the server what value it's using.  But that gets
complicated, because the value could be changed while the standby is
running.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-02-11 Thread Daniel Farina
On Fri, Feb 11, 2011 at 12:11 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina drfar...@acm.org wrote:
 I split this out of the synchronous replication patch for independent
 review. I'm dashing out the door, so I haven't put it on the CF yet or
 anything, but I just wanted to get it out there...I'll be around in
 Not Too Long to finish any other details.

 This looks like a useful and separately committable change.

 However, it looks to me like this renders wal_sender_delay aka
 WalSndDelay completely unused.  If we don't need that GUC any more, we
 should rip it out completely.

Indeed; I have cleaned this up.

 The comment in WalSndHandshake should have a tab at the beginning of
 every line.  Right now the first line has a tab and the rest have
 spaces.

Also correct. Done.

 The first hunk in WalSndLoop is a meaningless whitespace change.

I was trying to get it under 80 columns wide, but yes, it is unnecessary.

I think this closes out the small fry.

I have rebased my splitorific branch to reflect these changes:

https://github.com/fdr/postgres/commits/splitorific

Context diff equivalent attached.

-- 
fdr


0001-Split-and-rename-out-server-timeout-of-clients-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-02-11 Thread Robert Haas
On Fri, Feb 11, 2011 at 4:38 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Feb 11, 2011 at 4:30 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 On 11.02.2011 22:11, Robert Haas wrote:

 On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farinadrfar...@acm.org  wrote:

 I split this out of the synchronous replication patch for independent
 review. I'm dashing out the door, so I haven't put it on the CF yet or
 anything, but I just wanted to get it out there...I'll be around in
 Not Too Long to finish any other details.

 This looks like a useful and separately committable change.

 Hmm, so this patch implements a watchdog, where the master disconnects the
 standby if the heartbeat from the standby stops for more than
 'replication_[server]_timeout' seconds. The standby sends the heartbeat
 every wal_receiver_status_interval seconds.

 It would be nice if the master and standby could negotiate those settings.
 As the patch stands, it's easy to have a pathological configuration where
 replication_server_timeout  wal_receiver_status_interval, so that the
 master repeatedly disconnects the standby because it doesn't reply in time.
 Maybe the standby should report how often it's going to send a heartbeat,
 and master should wait for that long + some safety margin. Or maybe the
 master should tell the standby how often it should send the heartbeat?

 I guess the biggest use case for that behavior would be in a case
 where you have two standbys, one of which doesn't send a heartbeat and
 the other of which does.  Then you really can't rely on a single
 timeout.

 Maybe we could change the server parameter to indicate what multiple
 of wal_receiver_status_interval causes a hangup, and then change the
 client to notify the server what value it's using.  But that gets
 complicated, because the value could be changed while the standby is
 running.

On reflection I'm deeply uncertain this is a good idea.  It's pretty
hopeless to suppose that we can keep the user from choosing parameter
settings which will cause them problems, and there are certainly far
stupider things they could do then set replication_timeout 
wal_receiver_status_interval.  They could, for example, set fsync=off
or work_mem=4GB or checkpoint_segments=3 (never mind that we ship that
last one out of the box).  Any of those settings have the potential to
thoroughly destroy their system in one way or another, and there's not
a darn thing we can do about it.  Setting up some kind of handshake
system based on a multiple of the wal_receiver_status_interval is
going to be complex, and it's not necessarily going to deliver the
behavior someone wants anyway.  If someone has
wal_receiver_status_interval=10 on one system and =30 on another
system, does it therefore follow that the timeouts should also be
different by 3X?  Perhaps, but it's non-obvious.

There are two things that I think are pretty clear.  If the receiver
has wal_receiver_status_interval=0, then we should ignore
replication_timeout for that connection.  And also we need to make
sure that the replication_timeout can't kill off a connection that is
in the middle of streaming a large base backup.  Maybe we should try
to get those two cases right and not worry about the rest.  Dan, can
you check whether the base backup thing is a problem with this as
implemented?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-02-11 Thread Daniel Farina
On Feb 11, 2011 8:20 PM, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Feb 11, 2011 at 4:38 PM, Robert Haas robertmh...@gmail.com
wrote:
  On Fri, Feb 11, 2011 at 4:30 PM, Heikki Linnakangas
  heikki.linnakan...@enterprisedb.com wrote:
  On 11.02.2011 22:11, Robert Haas wrote:
 
  On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farinadrfar...@acm.org
 wrote:
 
  I split this out of the synchronous replication patch for independent
  review. I'm dashing out the door, so I haven't put it on the CF yet
or
  anything, but I just wanted to get it out there...I'll be around in
  Not Too Long to finish any other details.
 
  This looks like a useful and separately committable change.
 
  Hmm, so this patch implements a watchdog, where the master disconnects
the
  standby if the heartbeat from the standby stops for more than
  'replication_[server]_timeout' seconds. The standby sends the heartbeat
  every wal_receiver_status_interval seconds.
 
  It would be nice if the master and standby could negotiate those
settings.
  As the patch stands, it's easy to have a pathological configuration
where
  replication_server_timeout  wal_receiver_status_interval, so that the
  master repeatedly disconnects the standby because it doesn't reply in
time.
  Maybe the standby should report how often it's going to send a
heartbeat,
  and master should wait for that long + some safety margin. Or maybe the
  master should tell the standby how often it should send the heartbeat?
 
  I guess the biggest use case for that behavior would be in a case
  where you have two standbys, one of which doesn't send a heartbeat and
  the other of which does.  Then you really can't rely on a single
  timeout.
 
  Maybe we could change the server parameter to indicate what multiple
  of wal_receiver_status_interval causes a hangup, and then change the
  client to notify the server what value it's using.  But that gets
  complicated, because the value could be changed while the standby is
  running.

 On reflection I'm deeply uncertain this is a good idea.  It's pretty
 hopeless to suppose that we can keep the user from choosing parameter
 settings which will cause them problems, and there are certainly far
 stupider things they could do then set replication_timeout 
 wal_receiver_status_interval.  They could, for example, set fsync=off
 or work_mem=4GB or checkpoint_segments=3 (never mind that we ship that
 last one out of the box).  Any of those settings have the potential to
 thoroughly destroy their system in one way or another, and there's not
 a darn thing we can do about it.  Setting up some kind of handshake
 system based on a multiple of the wal_receiver_status_interval is
 going to be complex, and it's not necessarily going to deliver the
 behavior someone wants anyway.  If someone has
 wal_receiver_status_interval=10 on one system and =30 on another
 system, does it therefore follow that the timeouts should also be
 different by 3X?  Perhaps, but it's non-obvious.

 There are two things that I think are pretty clear.  If the receiver
 has wal_receiver_status_interval=0, then we should ignore
 replication_timeout for that connection.  And also we need to make
 sure that the replication_timeout can't kill off a connection that is
 in the middle of streaming a large base backup.  Maybe we should try
 to get those two cases right and not worry about the rest.  Dan, can
 you check whether the base backup thing is a problem with this as
 implemented?

Yes, I will have something to say come Saturday.

--
fdr