Re: [HACKERS] Replication server timeout patch
On 31.03.2011 05:46, Fujii Masao wrote: On Wed, Mar 30, 2011 at 10:54 PM, Robert Haasrobertmh...@gmail.com wrote: On Wed, Mar 30, 2011 at 4:08 AM, Fujii Masaomasao.fu...@gmail.com wrote: On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 30.03.2011 10:58, Fujii Masao wrote: On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.comwrote: +A value of zero means wait forever. This parameter can only be set in The first sentence sounds misleading. Even if you set the parameter to zero, replication connections can be terminated because of keepalive or socket error. Hmm, should I change it back to A value of zero disables the timeout ? Any better suggestions? I like that. But I appreciate if anyone suggests the better. Maybe sticking the word mechanism in there would be a bit better. A value of zero disables the timeout mechanism? I'm OK with that. Or, what about A value of zero turns this off which is used in statement_timeout for the sake of consistency? Committed Robert's suggestion. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On 29.03.2011 07:55, Fujii Masao wrote: On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: pq_flush_if_writable() calls internal_flush() without using PG_TRY block. This seems unsafe because for example pgwin32_waitforsinglesocket() called by secure_write() can throw ERROR. Perhaps it's time to give up on the assumption that the socket is in blocking mode except within those two functions. Attached patch adds the pq_set_nonblocking() function from your patch, and adds calls to it before all secure_read/write operations to put the socket in the right mode. There's only a few of those operations. Sounds good. + pq_set_nonblocking(false); /* XXX: Is this required? */ No. Since secure_close and close_SSL don't use MyProcPort-sock and MyProcPort-noblock which can be changed in pq_set_nonblocking, I don't think that is required. Ok, I took that out. + pq_putmessage_noblock('d', msgbuf, 1 + sizeof(WalDataMessageHeader) + nbytes); Don't we need to check the return value of pq_putmessage_noblock? That can return EOF when trouble happens (for example the send system call fails). No, pq_putmessage_noblock doesn't call send() because it enlarges the buffer to make sure the message fits, and it doesn't anything else that could fail else. I changed its return type to void, and added an Assert() to check that the pq_putmessage() call it does internally indeed doesn't fail. Should we use COMMERROR instead of ERROR if we fail to put the socket in the right mode? Maybe. I made it COMMERROR. ERRORs are sent to the client, and you could get into infinite recursion if sending the ERROR requires setting the blocking mode again. Committed with those changes. I also reworded the docs a bit. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: + pq_putmessage_noblock('d', msgbuf, 1 + sizeof(WalDataMessageHeader) + nbytes); Don't we need to check the return value of pq_putmessage_noblock? That can return EOF when trouble happens (for example the send system call fails). No, pq_putmessage_noblock doesn't call send() because it enlarges the buffer to make sure the message fits, and it doesn't anything else that could fail else. I changed its return type to void, and added an Assert() to check that the pq_putmessage() call it does internally indeed doesn't fail. Oh, you're right. Committed with those changes. I also reworded the docs a bit. Thanks a lot! +A value of zero means wait forever. This parameter can only be set in The first sentence sounds misleading. Even if you set the parameter to zero, replication connections can be terminated because of keepalive or socket error. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On 30.03.2011 10:58, Fujii Masao wrote: On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: +A value of zero means wait forever. This parameter can only be set in The first sentence sounds misleading. Even if you set the parameter to zero, replication connections can be terminated because of keepalive or socket error. Hmm, should I change it back to A value of zero disables the timeout ? Any better suggestions? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 30.03.2011 10:58, Fujii Masao wrote: On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: + A value of zero means wait forever. This parameter can only be set in The first sentence sounds misleading. Even if you set the parameter to zero, replication connections can be terminated because of keepalive or socket error. Hmm, should I change it back to A value of zero disables the timeout ? Any better suggestions? I like that. But I appreciate if anyone suggests the better. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Wed, Mar 30, 2011 at 4:08 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 30.03.2011 10:58, Fujii Masao wrote: On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: + A value of zero means wait forever. This parameter can only be set in The first sentence sounds misleading. Even if you set the parameter to zero, replication connections can be terminated because of keepalive or socket error. Hmm, should I change it back to A value of zero disables the timeout ? Any better suggestions? I like that. But I appreciate if anyone suggests the better. Maybe sticking the word mechanism in there would be a bit better. A value of zero disables the timeout mechanism? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Wed, Mar 30, 2011 at 10:54 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Mar 30, 2011 at 4:08 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 30.03.2011 10:58, Fujii Masao wrote: On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: + A value of zero means wait forever. This parameter can only be set in The first sentence sounds misleading. Even if you set the parameter to zero, replication connections can be terminated because of keepalive or socket error. Hmm, should I change it back to A value of zero disables the timeout ? Any better suggestions? I like that. But I appreciate if anyone suggests the better. Maybe sticking the word mechanism in there would be a bit better. A value of zero disables the timeout mechanism? I'm OK with that. Or, what about A value of zero turns this off which is used in statement_timeout for the sake of consistency? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
Fujii Masao masao.fu...@gmail.com writes: On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas Should we use COMMERROR instead of ERROR if we fail to put the socket in the right mode? Maybe. COMMERROR exists to keep us from trying to send an error report down a failed socket. I would assume (perhaps wrongly) that walsender/walreceiver don't try to push error reports across the socket anyway, only to the postmaster log. If correct, there is no need for COMMERROR, and using it just muddies the code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Tue, Mar 29, 2011 at 9:24 AM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas Should we use COMMERROR instead of ERROR if we fail to put the socket in the right mode? Maybe. COMMERROR exists to keep us from trying to send an error report down a failed socket. I would assume (perhaps wrongly) that walsender/walreceiver don't try to push error reports across the socket anyway, only to the postmaster log. If correct, there is no need for COMMERROR, and using it just muddies the code. I don't think that's how it works. The error the server sends is copied into some of the messages in the client log, which is really useful for debugging. ERROR: can't connect to the server (server said: you're not authorized) ...or something like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Wed, Mar 30, 2011 at 1:04 AM, Robert Haas robertmh...@gmail.com wrote: COMMERROR exists to keep us from trying to send an error report down a failed socket. I would assume (perhaps wrongly) that walsender/walreceiver don't try to push error reports across the socket anyway, only to the postmaster log. If correct, there is no need for COMMERROR, and using it just muddies the code. I don't think that's how it works. The error the server sends is copied into some of the messages in the client log, which is really useful for debugging. ERROR: can't connect to the server (server said: you're not authorized) ...or something like that. Yes. Walsender sends its error message to walreceiver, and walreceiver writes it down to the server log. For example; FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 00010016 has already been removed The second FATAL message is sent from walsender. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On 24.03.2011 15:24, Fujii Masao wrote: On Wed, Mar 23, 2011 at 7:33 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I don't much like the API for this. Walsender shouldn't need to know about the details of the FE/BE protocol, pq_putbytes_if_available() seems too low level to be useful. I think a better API would be to have a non-blocking version of pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, so that when the message doesn't fit in the output buffer in pq_putmessage(), the buffer is enlarged instead of trying to flush it. Attached is a patch using that approach. This is a much smaller patch, and easier to understand. Agreed. Thanks for improving the patch. pq_flush_if_writable() calls internal_flush() without using PG_TRY block. This seems unsafe because for example pgwin32_waitforsinglesocket() called by secure_write() can throw ERROR. Perhaps it's time to give up on the assumption that the socket is in blocking mode except within those two functions. Attached patch adds the pq_set_nonblocking() function from your patch, and adds calls to it before all secure_read/write operations to put the socket in the right mode. There's only a few of those operations. Should we use COMMERROR instead of ERROR if we fail to put the socket in the right mode? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e0ebee6..3192ef7 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2019,6 +2019,28 @@ SET ENABLE_SEQSCAN TO OFF; /para /listitem /varlistentry + + varlistentry id=guc-replication-timeout xreflabel=replication_timeout + termvarnamereplication_timeout/varname (typeinteger/type)/term + indexterm + primaryvarnamereplication_timeout/ configuration parameter/primary + /indexterm + listitem + para +Specifies the maximum time, in milliseconds, to wait for the reply +from the standby before terminating replication. This is useful for +the primary server to detect the standby crash or network outage. +A value of zero turns this off. This parameter can only be set in +the filenamepostgresql.conf/ file or on the server command line. +The default value is 60 seconds. + /para + para +To make the timeout work properly, xref linkend=guc-wal-receiver-status-interval +must be enabled on the standby, and its value must be less than the +value of varnamereplication_timeout/. + /para + /listitem + /varlistentry /variablelist /sect2 @@ -2216,6 +2238,11 @@ SET ENABLE_SEQSCAN TO OFF; the filenamepostgresql.conf/ file or on the server command line. The default value is 10 seconds. /para + para + When xref linkend=guc-replication-timeout is enabled on the primary, + varnamewal_receiver_status_interval/ must be enabled, and its value + must be less than the value of varnamereplication_timeout/. + /para /listitem /varlistentry diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index 3c7b05b..db313a8 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -55,10 +55,12 @@ * pq_peekbyte - peek at next byte from connection * pq_putbytes - send bytes to connection (not flushed until pq_flush) * pq_flush - flush pending output + * pq_flush_if_writable - flush pending output if writable without blocking * pq_getbyte_if_available - get a byte if available without blocking * * message-level I/O (and old-style-COPY-OUT cruft): * pq_putmessage - send a normal message (suppressed in COPY OUT mode) + * pq_putmessage_noblock - buffer a normal message without blocking (suppressed in COPY OUT mode) * pq_startcopyout - inform libpq that a COPY OUT transfer is beginning * pq_endcopyout - end a COPY OUT transfer * @@ -92,6 +94,7 @@ #include miscadmin.h #include storage/ipc.h #include utils/guc.h +#include utils/memutils.h /* * Configuration options @@ -105,15 +108,21 @@ static char sock_path[MAXPGPATH]; /* - * Buffers for low-level I/O + * Buffers for low-level I/O. + * + * The receive buffer is fixed size. Send buffer is usually 8k, but can be + * enlarged by pq_putmessage_noblock() if the message doesn't fit otherwise. */ -#define PQ_BUFFER_SIZE 8192 +#define PQ_SEND_BUFFER_SIZE 8192 +#define PQ_RECV_BUFFER_SIZE 8192 -static char PqSendBuffer[PQ_BUFFER_SIZE]; +static char *PqSendBuffer; +static int PqSendBufferSize; /* Size send buffer */ static int PqSendPointer; /* Next index to store a byte in PqSendBuffer */ +static int PqSendStart; /* Next index to send a byte in PqSendBuffer */ -static char PqRecvBuffer[PQ_BUFFER_SIZE]; +static char PqRecvBuffer[PQ_RECV_BUFFER_SIZE]; static int PqRecvPointer; /* Next index to read a byte from PqRecvBuffer */ static
Re: [HACKERS] Replication server timeout patch
On Wed, Mar 23, 2011 at 6:33 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 16.03.2011 11:11, Fujii Masao wrote: On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masaomasao.fu...@gmail.com wrote: Agreed. I'll change the patch. Done. I attached the updated patch. I don't much like the API for this. Walsender shouldn't need to know about the details of the FE/BE protocol, pq_putbytes_if_available() seems too low level to be useful. I think a better API would be to have a non-blocking version of pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, so that when the message doesn't fit in the output buffer in pq_putmessage(), the buffer is enlarged instead of trying to flush it. Attached is a patch using that approach. This is a much smaller patch, and easier to understand. I'm not totally happy with the walsender main loop, it seems to work as it is, but the logic has become quite complicated. Ideas welcome on how to simplify that. Heikki, are you planning to commit this, either with or without further revisions? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Wed, Mar 23, 2011 at 7:33 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I don't much like the API for this. Walsender shouldn't need to know about the details of the FE/BE protocol, pq_putbytes_if_available() seems too low level to be useful. I think a better API would be to have a non-blocking version of pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, so that when the message doesn't fit in the output buffer in pq_putmessage(), the buffer is enlarged instead of trying to flush it. Attached is a patch using that approach. This is a much smaller patch, and easier to understand. Agreed. Thanks for improving the patch. pq_flush_if_writable() calls internal_flush() without using PG_TRY block. This seems unsafe because for example pgwin32_waitforsinglesocket() called by secure_write() can throw ERROR. I'm not totally happy with the walsender main loop, it seems to work as it is, but the logic has become quite complicated. Ideas welcome on how to simplify that. As the patch I proposed before did, how about leaving XLogSend() instead of WalSndLoop() to call pq_flush_if_writable() when there is pending data in output buffer? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On 16.03.2011 11:11, Fujii Masao wrote: On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masaomasao.fu...@gmail.com wrote: Agreed. I'll change the patch. Done. I attached the updated patch. I don't much like the API for this. Walsender shouldn't need to know about the details of the FE/BE protocol, pq_putbytes_if_available() seems too low level to be useful. I think a better API would be to have a non-blocking version of pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, so that when the message doesn't fit in the output buffer in pq_putmessage(), the buffer is enlarged instead of trying to flush it. Attached is a patch using that approach. This is a much smaller patch, and easier to understand. I'm not totally happy with the walsender main loop, it seems to work as it is, but the logic has become quite complicated. Ideas welcome on how to simplify that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e0ebee6..3192ef7 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2019,6 +2019,28 @@ SET ENABLE_SEQSCAN TO OFF; /para /listitem /varlistentry + + varlistentry id=guc-replication-timeout xreflabel=replication_timeout + termvarnamereplication_timeout/varname (typeinteger/type)/term + indexterm + primaryvarnamereplication_timeout/ configuration parameter/primary + /indexterm + listitem + para +Specifies the maximum time, in milliseconds, to wait for the reply +from the standby before terminating replication. This is useful for +the primary server to detect the standby crash or network outage. +A value of zero turns this off. This parameter can only be set in +the filenamepostgresql.conf/ file or on the server command line. +The default value is 60 seconds. + /para + para +To make the timeout work properly, xref linkend=guc-wal-receiver-status-interval +must be enabled on the standby, and its value must be less than the +value of varnamereplication_timeout/. + /para + /listitem + /varlistentry /variablelist /sect2 @@ -2216,6 +2238,11 @@ SET ENABLE_SEQSCAN TO OFF; the filenamepostgresql.conf/ file or on the server command line. The default value is 10 seconds. /para + para + When xref linkend=guc-replication-timeout is enabled on the primary, + varnamewal_receiver_status_interval/ must be enabled, and its value + must be less than the value of varnamereplication_timeout/. + /para /listitem /varlistentry diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index 3c7b05b..b6dc8cc 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -56,9 +56,11 @@ * pq_putbytes - send bytes to connection (not flushed until pq_flush) * pq_flush - flush pending output * pq_getbyte_if_available - get a byte if available without blocking + * pq_flush_if_writable - flush pending output if writable without blocking * * message-level I/O (and old-style-COPY-OUT cruft): * pq_putmessage - send a normal message (suppressed in COPY OUT mode) + * pq_putmessage_noblock - buffer a normal message without blocking (suppressed in COPY OUT mode) * pq_startcopyout - inform libpq that a COPY OUT transfer is beginning * pq_endcopyout - end a COPY OUT transfer * @@ -92,6 +94,7 @@ #include miscadmin.h #include storage/ipc.h #include utils/guc.h +#include utils/memutils.h /* * Configuration options @@ -108,12 +111,15 @@ static char sock_path[MAXPGPATH]; * Buffers for low-level I/O */ -#define PQ_BUFFER_SIZE 8192 +#define PQ_SEND_BUFFER_SIZE 8192 +#define PQ_RECV_BUFFER_SIZE 8192 -static char PqSendBuffer[PQ_BUFFER_SIZE]; +static char *PqSendBuffer; +static int PqSendBufferSize; static int PqSendPointer; /* Next index to store a byte in PqSendBuffer */ +static int PqSendStart; /* Next index to send a byte in PqSendBuffer */ -static char PqRecvBuffer[PQ_BUFFER_SIZE]; +static char PqRecvBuffer[PQ_RECV_BUFFER_SIZE]; static int PqRecvPointer; /* Next index to read a byte from PqRecvBuffer */ static int PqRecvLength; /* End of data available in PqRecvBuffer */ @@ -142,7 +148,9 @@ static int Setup_AF_UNIX(void); void pq_init(void) { - PqSendPointer = PqRecvPointer = PqRecvLength = 0; + PqSendBufferSize = PQ_SEND_BUFFER_SIZE; + PqSendBuffer = MemoryContextAlloc(TopMemoryContext, PqSendBufferSize); + PqSendPointer = PqSendStart = PqRecvPointer = PqRecvLength = 0; PqCommBusy = false; DoingCopyOut = false; on_proc_exit(pq_close, 0); @@ -762,7 +770,7 @@ pq_recvbuf(void) int r; r = secure_read(MyProcPort, PqRecvBuffer + PqRecvLength, - PQ_BUFFER_SIZE - PqRecvLength); + PQ_RECV_BUFFER_SIZE - PqRecvLength); if (r 0) { @@ -1138,10 +1146,10 @@
Re: [HACKERS] Replication server timeout patch
On Sat, Mar 12, 2011 at 4:34 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Mar 11, 2011 at 8:29 AM, Fujii Masao masao.fu...@gmail.com wrote: I think we should consider making this change for 9.1. This is a real wart, and it's going to become even more of a problem with sync rep, I think. Yeah, that's a welcome! Please feel free to review the patch. I discussed this with Heikki on IM. I think we should rip all the GUC change stuff out of this patch and just decree that if you set a timeout, you get a timeout. If you set this inconsistently with wal_receiver_status_interval, then you'll get lots of disconnects. But that's your problem. This may seem a little unfriendly, but the logic in here is quite complex and still isn't going to really provide that much protection against bad configurations. The only realistic alternative I see is to define replication_timeout as a multiple of the client's wal_receiver_status_interval, but that seems quite annoyingly unfriendly. A single replication_timeout that applies to all slaves doesn't cover every configuration someone might want, but it's simple and easy to understand and should cover 95% of cases. If we find that it's really necessary to be able to customize it further, then we might go the route of adding the much-discussed standby registration stuff, where there's a separate config file or system table where you can stipulate that when a walsender with application_name=foo connects, you want it to get wal_receiver_status_interval=$FOO. But I think that complexity can certainly wait until 9.2 or later. I also think that the default for replication_timeout should not be 0. Something like 60s seems about right. That way, if you just use the default settings, you'll get pretty sane behavior - a connectivity hiccup that lasts more than a minute will bounce the client. We've already gotten reports of people who thought they were replicating when they really weren't, and had to fiddle with settings and struggle to try to make it robust. This should make things a lot nicer for people out of the box, but it won't if it's disabled out of the box. On another note, there doesn't appear to be any need to change the return value of WaitLatchOrSocket(). Agreed. I'll change the patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masao masao.fu...@gmail.com wrote: Agreed. I'll change the patch. Done. I attached the updated patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center replication_timeout_v6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Mon, Mar 7, 2011 at 8:47 PM, Fujii Masao masao.fu...@gmail.com wrote: On Sun, Mar 6, 2011 at 11:10 PM, Fujii Masao masao.fu...@gmail.com wrote: On Sun, Mar 6, 2011 at 5:03 PM, Fujii Masao masao.fu...@gmail.com wrote: Why does internal_flush_if_writable compute bufptr differently from internal_flush? And shouldn't it be static? It seems to me that this ought to be refactored so that you don't duplicate so much code. Maybe static int internal_flush(bool nonblocking). I don't think that the while (bufptr bufend) loop needs to contain the code to set and clear the nonblocking state. You could do the whole loop with nonblocking mode turned on and then reenable it just once at the end. Besides possibly being clearer, that would be more efficient and leave less room for unexpected failures. All these comments seem to make sense. Will fix. Thanks! Done. I attached the updated patch. I rebased the patch against current git master. I added this replication timeout patch into next CF. I explain why this feature is required for the future review; Without this feature, walsender might unexpectedly remain for a while when the standby crashes or the network outage happens. TCP keepalive can improve this situation to a certain extent, but it's not perfect. Remaining walsender can cause some problems. For example, when hot_standby_feedback is enabled, such a remaining walsender would prevent oldest xmin from advancing and interfere with vacuuming on the master. For example, when you use synchronous replication and walsender in SYNC mode gets stuck, any synchronous standby candidate cannot switch to SYNC mode until that walsender exits, and all the transactions would pause. This feature causes walsender to exit when there is no reply from the standby before the replication timeout expires. Then we can avoid the above problems. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Fri, Mar 11, 2011 at 8:14 AM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Mar 7, 2011 at 8:47 PM, Fujii Masao masao.fu...@gmail.com wrote: On Sun, Mar 6, 2011 at 11:10 PM, Fujii Masao masao.fu...@gmail.com wrote: On Sun, Mar 6, 2011 at 5:03 PM, Fujii Masao masao.fu...@gmail.com wrote: Why does internal_flush_if_writable compute bufptr differently from internal_flush? And shouldn't it be static? It seems to me that this ought to be refactored so that you don't duplicate so much code. Maybe static int internal_flush(bool nonblocking). I don't think that the while (bufptr bufend) loop needs to contain the code to set and clear the nonblocking state. You could do the whole loop with nonblocking mode turned on and then reenable it just once at the end. Besides possibly being clearer, that would be more efficient and leave less room for unexpected failures. All these comments seem to make sense. Will fix. Thanks! Done. I attached the updated patch. I rebased the patch against current git master. I added this replication timeout patch into next CF. I explain why this feature is required for the future review; Without this feature, walsender might unexpectedly remain for a while when the standby crashes or the network outage happens. TCP keepalive can improve this situation to a certain extent, but it's not perfect. Remaining walsender can cause some problems. For example, when hot_standby_feedback is enabled, such a remaining walsender would prevent oldest xmin from advancing and interfere with vacuuming on the master. For example, when you use synchronous replication and walsender in SYNC mode gets stuck, any synchronous standby candidate cannot switch to SYNC mode until that walsender exits, and all the transactions would pause. This feature causes walsender to exit when there is no reply from the standby before the replication timeout expires. Then we can avoid the above problems. I think we should consider making this change for 9.1. This is a real wart, and it's going to become even more of a problem with sync rep, I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Fri, Mar 11, 2011 at 10:18 PM, Robert Haas robertmh...@gmail.com wrote: I added this replication timeout patch into next CF. I explain why this feature is required for the future review; Without this feature, walsender might unexpectedly remain for a while when the standby crashes or the network outage happens. TCP keepalive can improve this situation to a certain extent, but it's not perfect. Remaining walsender can cause some problems. For example, when hot_standby_feedback is enabled, such a remaining walsender would prevent oldest xmin from advancing and interfere with vacuuming on the master. For example, when you use synchronous replication and walsender in SYNC mode gets stuck, any synchronous standby candidate cannot switch to SYNC mode until that walsender exits, and all the transactions would pause. This feature causes walsender to exit when there is no reply from the standby before the replication timeout expires. Then we can avoid the above problems. I think we should consider making this change for 9.1. This is a real wart, and it's going to become even more of a problem with sync rep, I think. Yeah, that's a welcome! Please feel free to review the patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
Fujii Masao wrote: On Fri, Mar 11, 2011 at 10:18 PM, Robert Haas robertmh...@gmail.com wrote: I added this replication timeout patch into next CF. I explain why this feature is required for the future review; Without this feature, walsender might unexpectedly remain for a while when the standby crashes or the network outage happens. TCP keepalive can improve this situation to?a certain extent, but it's not perfect. Remaining walsender can cause some problems. For example, when hot_standby_feedback is enabled, such a remaining walsender would prevent oldest xmin from advancing and interfere with vacuuming on the master. For example, when you use synchronous replication and walsender in SYNC mode gets stuck, any synchronous standby candidate cannot switch to SYNC mode until that walsender exits, and all the transactions would pause. This feature causes walsender to exit when there is no reply from the standby before the replication timeout expires. Then we can avoid the above problems. I think we should consider making this change for 9.1. ?This is a real wart, and it's going to become even more of a problem with sync rep, I think. Yeah, that's a welcome! Please feel free to review the patch. It is already in the next commitfest, so if someone wants to add it as an open 9.1 item, go ahead. I am unclear of this so I am not adding it. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Fri, Mar 11, 2011 at 8:29 AM, Fujii Masao masao.fu...@gmail.com wrote: I think we should consider making this change for 9.1. This is a real wart, and it's going to become even more of a problem with sync rep, I think. Yeah, that's a welcome! Please feel free to review the patch. I discussed this with Heikki on IM. I think we should rip all the GUC change stuff out of this patch and just decree that if you set a timeout, you get a timeout. If you set this inconsistently with wal_receiver_status_interval, then you'll get lots of disconnects. But that's your problem. This may seem a little unfriendly, but the logic in here is quite complex and still isn't going to really provide that much protection against bad configurations. The only realistic alternative I see is to define replication_timeout as a multiple of the client's wal_receiver_status_interval, but that seems quite annoyingly unfriendly. A single replication_timeout that applies to all slaves doesn't cover every configuration someone might want, but it's simple and easy to understand and should cover 95% of cases. If we find that it's really necessary to be able to customize it further, then we might go the route of adding the much-discussed standby registration stuff, where there's a separate config file or system table where you can stipulate that when a walsender with application_name=foo connects, you want it to get wal_receiver_status_interval=$FOO. But I think that complexity can certainly wait until 9.2 or later. I also think that the default for replication_timeout should not be 0. Something like 60s seems about right. That way, if you just use the default settings, you'll get pretty sane behavior - a connectivity hiccup that lasts more than a minute will bounce the client. We've already gotten reports of people who thought they were replicating when they really weren't, and had to fiddle with settings and struggle to try to make it robust. This should make things a lot nicer for people out of the box, but it won't if it's disabled out of the box. On another note, there doesn't appear to be any need to change the return value of WaitLatchOrSocket(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Sun, Mar 6, 2011 at 3:23 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Feb 28, 2011 at 8:08 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sun, Feb 27, 2011 at 11:52 AM, Fujii Masao masao.fu...@gmail.com wrote: There are two things that I think are pretty clear. If the receiver has wal_receiver_status_interval=0, then we should ignore replication_timeout for that connection. The patch still doesn't check that wal_receiver_status_interval is set up properly. I'll implement that later. Done. I attached the updated patch. Why does internal_flush_if_writable compute bufptr differently from internal_flush? And shouldn't it be static? It seems to me that this ought to be refactored so that you don't duplicate so much code. Maybe static int internal_flush(bool nonblocking). I don't think that the while (bufptr bufend) loop needs to contain the code to set and clear the nonblocking state. You could do the whole loop with nonblocking mode turned on and then reenable it just once at the end. Besides possibly being clearer, that would be more efficient and leave less room for unexpected failures. All these comments seem to make sense. Will fix. Thanks! Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Mon, Feb 28, 2011 at 8:08 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sun, Feb 27, 2011 at 11:52 AM, Fujii Masao masao.fu...@gmail.com wrote: There are two things that I think are pretty clear. If the receiver has wal_receiver_status_interval=0, then we should ignore replication_timeout for that connection. The patch still doesn't check that wal_receiver_status_interval is set up properly. I'll implement that later. Done. I attached the updated patch. Why does internal_flush_if_writable compute bufptr differently from internal_flush? And shouldn't it be static? It seems to me that this ought to be refactored so that you don't duplicate so much code. Maybe static int internal_flush(bool nonblocking). I don't think that the while (bufptr bufend) loop needs to contain the code to set and clear the nonblocking state. You could do the whole loop with nonblocking mode turned on and then reenable it just once at the end. Besides possibly being clearer, that would be more efficient and leave less room for unexpected failures. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Sun, Feb 27, 2011 at 11:52 AM, Fujii Masao masao.fu...@gmail.com wrote: There are two things that I think are pretty clear. If the receiver has wal_receiver_status_interval=0, then we should ignore replication_timeout for that connection. The patch still doesn't check that wal_receiver_status_interval is set up properly. I'll implement that later. Done. I attached the updated patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center replication_timeout_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Fri, Feb 18, 2011 at 12:10 PM, Robert Haas robertmh...@gmail.com wrote: IMHO, that's so broken as to be useless. I would really like to have a solution to this problem, though. Relying on TCP keepalives is weak. Agreed. I updated the replication timeout patch which I submitted before. http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com Since the patch implements also non-blocking send functions, the timeout can work properly even when the send buffer has been filled up. There are two things that I think are pretty clear. If the receiver has wal_receiver_status_interval=0, then we should ignore replication_timeout for that connection. The patch still doesn't check that wal_receiver_status_interval is set up properly. I'll implement that later. Regards, Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 2015,2020 SET ENABLE_SEQSCAN TO OFF; --- 2015,2042 /para /listitem /varlistentry + + varlistentry id=guc-replication-timeout xreflabel=replication_timeout + termvarnamereplication_timeout/varname (typeinteger/type)/term + indexterm +primaryvarnamereplication_timeout/ configuration parameter/primary + /indexterm + listitem +para + Specifies the maximum time, in milliseconds, to wait for the reply + from the standby before terminating replication. This is useful for + the primary server to detect the standby crash or network outage. + A value of zero (the default) turns this off. This parameter can + only be set in the filenamepostgresql.conf/ file or on the server + command line. +/para +para + To make the timeout work properly, xref linkend=guc-wal-receiver-status-interval + must be enabled on the standby, and its value must be less than the + value of varnamereplication_timeout/. +/para + /listitem + /varlistentry /variablelist /sect2 *** *** 2125,2130 SET ENABLE_SEQSCAN TO OFF; --- 2147,2157 the filenamepostgresql.conf/ file or on the server command line. The default value is 10 seconds. /para + para +When xref linkend=guc-replication-timeout is enabled on the primary, +varnamewal_receiver_status_interval/ must be enabled, and its value +must be less than the value of varnamereplication_timeout/. + /para /listitem /varlistentry *** a/src/backend/libpq/pqcomm.c --- b/src/backend/libpq/pqcomm.c *** *** 56,61 --- 56,63 * pq_putbytes - send bytes to connection (not flushed until pq_flush) * pq_flush - flush pending output * pq_getbyte_if_available - get a byte if available without blocking + * pq_putbytes_if_writable - send bytes to connection if writable without blocking + * pq_flush_if_writable - flush pending output if writable without blocking * * message-level I/O (and old-style-COPY-OUT cruft): * pq_putmessage - send a normal message (suppressed in COPY OUT mode) *** *** 112,117 static char sock_path[MAXPGPATH]; --- 114,120 static char PqSendBuffer[PQ_BUFFER_SIZE]; static int PqSendPointer; /* Next index to store a byte in PqSendBuffer */ + static int PqSendStart; /* Next index to send a byte in PqSendBuffer */ static char PqRecvBuffer[PQ_BUFFER_SIZE]; static int PqRecvPointer; /* Next index to read a byte from PqRecvBuffer */ *** *** 128,133 static bool DoingCopyOut; --- 131,137 static void pq_close(int code, Datum arg); static int internal_putbytes(const char *s, size_t len); static int internal_flush(void); + static int internal_flush_if_writable(void); #ifdef HAVE_UNIX_SOCKETS static int Lock_AF_UNIX(unsigned short portNumber, char *unixSocketName); *** *** 1153,1158 internal_putbytes(const char *s, size_t len) --- 1157,1212 } /* + * pq_putbytes_if_writable - send bytes to connection (not flushed + * until pq_flush), if writable + * + * Returns the number of bytes written without blocking, or EOF if trouble. + * + */ + int + pq_putbytes_if_writable(const char *s, size_t len) + { + size_t amount; + size_t nwritten = 0; + + /* Should not be called by old-style COPY OUT */ + Assert(!DoingCopyOut); + /* No-op if reentrant call */ + if (PqCommBusy) + return 0; + PqCommBusy = true; + + while (len 0) + { + /* If buffer is full, then flush it out */ + if (PqSendPointer = PQ_BUFFER_SIZE) + { + int r; + + r = internal_flush_if_writable(); + if (r == 0) + break; + if (r == EOF) + { + PqCommBusy = false; + return r; + } + } +
Re: [HACKERS] Replication server timeout patch
On Wed, 2011-02-16 at 11:34 +0900, Fujii Masao wrote: On Tue, Feb 15, 2011 at 7:13 AM, Daniel Farina dan...@heroku.com wrote: On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote: Context diff equivalent attached. Thanks for the patch! As I said before, the timeout which this patch provides doesn't work well when the walsender gets blocked in sending WAL. At first, we would need to implement a non-blocking write function as an infrastructure of the replication timeout, I think. http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com Interesting point...if that's accepted as required-for-commit, what are the perceptions of the odds that, presuming I can write the code quickly enough, that there's enough infrastructure/ports already in postgres to allow for a non-blocking write on all our supported platforms? I'm not sure if there's already enough infrastructure for a non-blocking write. But the patch which I submitted before might help to implement that. http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com So, in summary, the position is that we have a timeout, but that timeout doesn't work in all cases. But it does work in some, so that seems enough for me to say let's commit. Not committing gives us nothing at all, which is as much use as a chocolate teapot. I will be looking to commit this tomorrow morning, unless I hear some clear No comments, with reasons. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Thu, Feb 17, 2011 at 4:21 PM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, 2011-02-16 at 11:34 +0900, Fujii Masao wrote: On Tue, Feb 15, 2011 at 7:13 AM, Daniel Farina dan...@heroku.com wrote: On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote: Context diff equivalent attached. Thanks for the patch! As I said before, the timeout which this patch provides doesn't work well when the walsender gets blocked in sending WAL. At first, we would need to implement a non-blocking write function as an infrastructure of the replication timeout, I think. http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com Interesting point...if that's accepted as required-for-commit, what are the perceptions of the odds that, presuming I can write the code quickly enough, that there's enough infrastructure/ports already in postgres to allow for a non-blocking write on all our supported platforms? I'm not sure if there's already enough infrastructure for a non-blocking write. But the patch which I submitted before might help to implement that. http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com So, in summary, the position is that we have a timeout, but that timeout doesn't work in all cases. But it does work in some, so that seems enough for me to say let's commit. Not committing gives us nothing at all, which is as much use as a chocolate teapot. I will be looking to commit this tomorrow morning, unless I hear some clear No comments, with reasons. I guess the question is whether it works in 10% of cases or 95% of cases. In the first case there's probably no point in pretending we have a feature if it doesn't really work. In the second case, it might make sense. But I don't have a good feeling for which it is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
So, in summary, the position is that we have a timeout, but that timeout doesn't work in all cases. But it does work in some, so that seems enough for me to say let's commit. Not committing gives us nothing at all, which is as much use as a chocolate teapot. Can someone summarize the cases where it does and doesn't work? There's been a longish gap in this thread. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Thu, 2011-02-17 at 16:42 -0500, Robert Haas wrote: So, in summary, the position is that we have a timeout, but that timeout doesn't work in all cases. But it does work in some, so that seems enough for me to say let's commit. Not committing gives us nothing at all, which is as much use as a chocolate teapot. I will be looking to commit this tomorrow morning, unless I hear some clear No comments, with reasons. I guess the question is whether it works in 10% of cases or 95% of cases. In the first case there's probably no point in pretending we have a feature if it doesn't really work. In the second case, it might make sense. But I don't have a good feeling for which it is. Well, I guess the people that wanted to wait forever may get their wish. For sync rep, I intend to put in place a client timeout, which we do have code for. The server side timeout still makes sense, but it's not a requirement for sync rep. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Fri, Feb 18, 2011 at 7:55 AM, Josh Berkus j...@agliodbs.com wrote: So, in summary, the position is that we have a timeout, but that timeout doesn't work in all cases. But it does work in some, so that seems enough for me to say let's commit. Not committing gives us nothing at all, which is as much use as a chocolate teapot. Can someone summarize the cases where it does and doesn't work? There's been a longish gap in this thread. The timeout doesn't work when walsender gets blocked during sending the WAL because the send buffer has been filled up, I'm afraid. IOW, it doesn't work when the standby becomes unresponsive while WAL is generated on the master one after another. Since walsender tries to continue sending the WAL while the standby is unresponsive, the send buffer gets filled up and the blocking send function (e.g., pq_flush) blocks the walsender. OTOH, if the standby becomes unresponsive when there is no workload which causes WAL, the timeout would work. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Thu, Feb 17, 2011 at 9:10 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Feb 18, 2011 at 7:55 AM, Josh Berkus j...@agliodbs.com wrote: So, in summary, the position is that we have a timeout, but that timeout doesn't work in all cases. But it does work in some, so that seems enough for me to say let's commit. Not committing gives us nothing at all, which is as much use as a chocolate teapot. Can someone summarize the cases where it does and doesn't work? There's been a longish gap in this thread. The timeout doesn't work when walsender gets blocked during sending the WAL because the send buffer has been filled up, I'm afraid. IOW, it doesn't work when the standby becomes unresponsive while WAL is generated on the master one after another. Since walsender tries to continue sending the WAL while the standby is unresponsive, the send buffer gets filled up and the blocking send function (e.g., pq_flush) blocks the walsender. OTOH, if the standby becomes unresponsive when there is no workload which causes WAL, the timeout would work. IMHO, that's so broken as to be useless. I would really like to have a solution to this problem, though. Relying on TCP keepalives is weak. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Mon, Feb 14, 2011 at 5:13 PM, Daniel Farina dan...@heroku.com wrote: On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote: Context diff equivalent attached. Thanks for the patch! As I said before, the timeout which this patch provides doesn't work well when the walsender gets blocked in sending WAL. At first, we would need to implement a non-blocking write function as an infrastructure of the replication timeout, I think. http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com Interesting point...if that's accepted as required-for-commit, what are the perceptions of the odds that, presuming I can write the code quickly enough, that there's enough infrastructure/ports already in postgres to allow for a non-blocking write on all our supported platforms? Are you working on this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Tue, Feb 15, 2011 at 7:13 AM, Daniel Farina dan...@heroku.com wrote: On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote: Context diff equivalent attached. Thanks for the patch! As I said before, the timeout which this patch provides doesn't work well when the walsender gets blocked in sending WAL. At first, we would need to implement a non-blocking write function as an infrastructure of the replication timeout, I think. http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com Interesting point...if that's accepted as required-for-commit, what are the perceptions of the odds that, presuming I can write the code quickly enough, that there's enough infrastructure/ports already in postgres to allow for a non-blocking write on all our supported platforms? I'm not sure if there's already enough infrastructure for a non-blocking write. But the patch which I submitted before might help to implement that. http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote: Context diff equivalent attached. Thanks for the patch! As I said before, the timeout which this patch provides doesn't work well when the walsender gets blocked in sending WAL. At first, we would need to implement a non-blocking write function as an infrastructure of the replication timeout, I think. http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote: Context diff equivalent attached. Thanks for the patch! As I said before, the timeout which this patch provides doesn't work well when the walsender gets blocked in sending WAL. At first, we would need to implement a non-blocking write function as an infrastructure of the replication timeout, I think. http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com Interesting point...if that's accepted as required-for-commit, what are the perceptions of the odds that, presuming I can write the code quickly enough, that there's enough infrastructure/ports already in postgres to allow for a non-blocking write on all our supported platforms? -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Mon, 2011-02-14 at 14:13 -0800, Daniel Farina wrote: On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote: Context diff equivalent attached. Thanks for the patch! As I said before, the timeout which this patch provides doesn't work well when the walsender gets blocked in sending WAL. At first, we would need to implement a non-blocking write function as an infrastructure of the replication timeout, I think. http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com I wasn't aware that had been raised before. Thanks for noting it again. I guess that's why you thought wait forever was a good idea ;-) Interesting point...if that's accepted as required-for-commit, what are the perceptions of the odds that, presuming I can write the code quickly enough, that there's enough infrastructure/ports already in postgres to allow for a non-blocking write on all our supported platforms? I'd like to see what you come up with. I would rate that as important, though not essential for sync replication. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Replication server timeout patch
Hello list, I split this out of the synchronous replication patch for independent review. I'm dashing out the door, so I haven't put it on the CF yet or anything, but I just wanted to get it out there...I'll be around in Not Too Long to finish any other details. -- fdr *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 2121,2126 SET ENABLE_SEQSCAN TO OFF; --- 2121,2140 /listitem /varlistentry + varlistentry id=guc-replication-timeout-server xreflabel=replication_timeout_server + termvarnamereplication_timeout_server/varname (typeinteger/type)/term + indexterm +primaryvarnamereplication_timeout_server/ configuration parameter/primary + /indexterm + listitem +para + If the primary server does not receive a reply from a standby server + within varnamereplication_timeout_server/ seconds then the + primary will terminate the replication connection. +/para + /listitem + /varlistentry + /variablelist /sect2 /sect1 *** a/src/backend/replication/walsender.c --- b/src/backend/replication/walsender.c *** *** 73,78 bool am_walsender = false; /* Am I a walsender process ? */ --- 73,80 /* User-settable parameters for walsender */ int max_wal_senders = 0; /* the maximum number of concurrent walsenders */ int WalSndDelay = 200; /* max sleep time between some actions */ + int replication_timeout_server; /* If the receiver takes too long, time + * out and die after this duration */ /* * These variables are used similarly to openLogFile/Id/Seg/Off, *** *** 89,94 static uint32 sendOff = 0; --- 91,99 */ static XLogRecPtr sentPtr = {0, 0}; + /* Remembers the last time the standby has notified the primary of progress */ + static TimestampTz last_reply_timestamp; + /* Flags set by signal handlers for later service in main loop */ static volatile sig_atomic_t got_SIGHUP = false; volatile sig_atomic_t walsender_shutdown_requested = false; *** *** 250,255 WalSndHandshake(void) --- 255,265 errmsg(invalid standby handshake message type %d, firstchar))); } } + + /* + * Initialize our timeout checking mechanism. + */ + last_reply_timestamp = GetCurrentTimestamp(); } /* *** *** 616,632 WalSndLoop(void) if (!XLogSend(output_message, caughtup)) break; ! if (caughtup !got_SIGHUP !walsender_ready_to_stop !walsender_shutdown_requested) { ! /* ! * XXX: We don't really need the periodic wakeups anymore, ! * WaitLatchOrSocket should reliably wake up as soon as ! * something interesting happens. ! */ /* Sleep */ ! WaitLatchOrSocket(MyWalSnd-latch, MyProcPort-sock, ! WalSndDelay * 1000L); } } else --- 626,650 if (!XLogSend(output_message, caughtup)) break; ! if (caughtup !got_SIGHUP !walsender_ready_to_stop ! !walsender_shutdown_requested) { ! long timeout; ! ! if (replication_timeout_server == -1) ! timeout = -1L; ! else ! timeout = 100L * replication_timeout_server; /* Sleep */ ! if (WaitLatchOrSocket(MyWalSnd-latch, MyProcPort-sock, ! timeout) == 0) ! { ! ereport(LOG, ! (errmsg(streaming replication timeout after %d s, ! replication_timeout_server))); ! break; ! } } } else *** a/src/backend/utils/misc/guc.c --- b/src/backend/utils/misc/guc.c *** *** 1484,1489 static struct config_int ConfigureNamesInt[] = --- 1484,1499 }, { + {replication_timeout_server, PGC_SIGHUP, WAL_SETTINGS, + gettext_noop(Replication connection will timeout after this duration.), + NULL, + GUC_UNIT_S + }, + replication_timeout_server, + 30, -1, INT_MAX, NULL, NULL + }, + + { {temp_buffers, PGC_USERSET, RESOURCES_MEM, gettext_noop(Sets the maximum number of temporary buffers used by each session.), NULL, *** a/src/backend/utils/misc/postgresql.conf.sample --- b/src/backend/utils/misc/postgresql.conf.sample *** *** 203,208 --- 203,209 # when reading streaming WAL; # -1 allows indefinite delay #wal_receiver_status_interval = 10s # replies at least this often, 0 disables + #replication_timeout_server = 120 # -1 means wait forever #-- *** a/src/include/replication/walsender.h --- b/src/include/replication/walsender.h *** *** 70,75 extern volatile sig_atomic_t walsender_ready_to_stop; --- 70,76 /* user-settable parameters */ extern int WalSndDelay; extern int max_wal_senders; + extern int replication_timeout_server; extern int WalSenderMain(void); extern void WalSndSignals(void); -- Sent via pgsql-hackers mailing
Re: [HACKERS] Replication server timeout patch
On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina drfar...@acm.org wrote: I split this out of the synchronous replication patch for independent review. I'm dashing out the door, so I haven't put it on the CF yet or anything, but I just wanted to get it out there...I'll be around in Not Too Long to finish any other details. This looks like a useful and separately committable change. However, it looks to me like this renders wal_sender_delay aka WalSndDelay completely unused. If we don't need that GUC any more, we should rip it out completely. The comment in WalSndHandshake should have a tab at the beginning of every line. Right now the first line has a tab and the rest have spaces. The first hunk in WalSndLoop is a meaningless whitespace change. I wonder if we ought to just call this replication_timeout, rather than replication_timeout_server. Simon's patch (from which this extracted) also has replication_timeout_client, but the two aren't symmetrical. The replication_timeout_client in this patch is the amount of time after which the master acknowledges the commit even though the synchronous standby hasn't acked yet. So it only applies to the synchronous replication case, whereas this is useful for both synchronous and asynchronous replication. I'm inclined to think that knob is utterly useless anyway; surely waiting more than zero will reduce the throughput of the system to some minute fraction of its normal value, while waiting less than infinity throws out the data guarantee that made you pick synchronous replication in the first place. Even if we do decide to keep that knob, I don't think we'll want the value to be symmetric with this one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Replication server timeout patch
Hello list, I split this out of the synchronous replication patch for independent review. I'm dashing out the door, so I haven't put it on the CF yet or anything, but I just wanted to get it out there...I'll be around in Not Too Long to finish any other details. -- fdr *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 2121,2126 SET ENABLE_SEQSCAN TO OFF; --- 2121,2140 /listitem /varlistentry + varlistentry id=guc-replication-timeout-server xreflabel=replication_timeout_server + termvarnamereplication_timeout_server/varname (typeinteger/type)/term + indexterm +primaryvarnamereplication_timeout_server/ configuration parameter/primary + /indexterm + listitem +para + If the primary server does not receive a reply from a standby server + within varnamereplication_timeout_server/ seconds then the + primary will terminate the replication connection. +/para + /listitem + /varlistentry + /variablelist /sect2 /sect1 *** a/src/backend/replication/walsender.c --- b/src/backend/replication/walsender.c *** *** 73,78 bool am_walsender = false; /* Am I a walsender process ? */ --- 73,80 /* User-settable parameters for walsender */ int max_wal_senders = 0; /* the maximum number of concurrent walsenders */ int WalSndDelay = 200; /* max sleep time between some actions */ + int replication_timeout_server; /* If the receiver takes too long, time + * out and die after this duration */ /* * These variables are used similarly to openLogFile/Id/Seg/Off, *** *** 89,94 static uint32 sendOff = 0; --- 91,99 */ static XLogRecPtr sentPtr = {0, 0}; + /* Remembers the last time the standby has notified the primary of progress */ + static TimestampTz last_reply_timestamp; + /* Flags set by signal handlers for later service in main loop */ static volatile sig_atomic_t got_SIGHUP = false; volatile sig_atomic_t walsender_shutdown_requested = false; *** *** 250,255 WalSndHandshake(void) --- 255,265 errmsg(invalid standby handshake message type %d, firstchar))); } } + + /* + * Initialize our timeout checking mechanism. + */ + last_reply_timestamp = GetCurrentTimestamp(); } /* *** *** 616,632 WalSndLoop(void) if (!XLogSend(output_message, caughtup)) break; ! if (caughtup !got_SIGHUP !walsender_ready_to_stop !walsender_shutdown_requested) { ! /* ! * XXX: We don't really need the periodic wakeups anymore, ! * WaitLatchOrSocket should reliably wake up as soon as ! * something interesting happens. ! */ /* Sleep */ ! WaitLatchOrSocket(MyWalSnd-latch, MyProcPort-sock, ! WalSndDelay * 1000L); } } else --- 626,650 if (!XLogSend(output_message, caughtup)) break; ! if (caughtup !got_SIGHUP !walsender_ready_to_stop ! !walsender_shutdown_requested) { ! long timeout; ! ! if (replication_timeout_server == -1) ! timeout = -1L; ! else ! timeout = 100L * replication_timeout_server; /* Sleep */ ! if (WaitLatchOrSocket(MyWalSnd-latch, MyProcPort-sock, ! timeout) == 0) ! { ! ereport(LOG, ! (errmsg(streaming replication timeout after %d s, ! replication_timeout_server))); ! break; ! } } } else *** a/src/backend/utils/misc/guc.c --- b/src/backend/utils/misc/guc.c *** *** 1484,1489 static struct config_int ConfigureNamesInt[] = --- 1484,1499 }, { + {replication_timeout_server, PGC_SIGHUP, WAL_SETTINGS, + gettext_noop(Replication connection will timeout after this duration.), + NULL, + GUC_UNIT_S + }, + replication_timeout_server, + 30, -1, INT_MAX, NULL, NULL + }, + + { {temp_buffers, PGC_USERSET, RESOURCES_MEM, gettext_noop(Sets the maximum number of temporary buffers used by each session.), NULL, *** a/src/backend/utils/misc/postgresql.conf.sample --- b/src/backend/utils/misc/postgresql.conf.sample *** *** 203,208 --- 203,209 # when reading streaming WAL; # -1 allows indefinite delay #wal_receiver_status_interval = 10s # replies at least this often, 0 disables + #replication_timeout_server = 120 # -1 means wait forever #-- *** a/src/include/replication/walsender.h --- b/src/include/replication/walsender.h *** *** 70,75 extern volatile sig_atomic_t walsender_ready_to_stop; --- 70,76 /* user-settable parameters */ extern int WalSndDelay; extern int max_wal_senders; + extern int replication_timeout_server; extern int WalSenderMain(void); extern void WalSndSignals(void); -- Sent via pgsql-hackers mailing
Re: [HACKERS] Replication server timeout patch
On 11.02.2011 22:11, Robert Haas wrote: On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farinadrfar...@acm.org wrote: I split this out of the synchronous replication patch for independent review. I'm dashing out the door, so I haven't put it on the CF yet or anything, but I just wanted to get it out there...I'll be around in Not Too Long to finish any other details. This looks like a useful and separately committable change. Hmm, so this patch implements a watchdog, where the master disconnects the standby if the heartbeat from the standby stops for more than 'replication_[server]_timeout' seconds. The standby sends the heartbeat every wal_receiver_status_interval seconds. It would be nice if the master and standby could negotiate those settings. As the patch stands, it's easy to have a pathological configuration where replication_server_timeout wal_receiver_status_interval, so that the master repeatedly disconnects the standby because it doesn't reply in time. Maybe the standby should report how often it's going to send a heartbeat, and master should wait for that long + some safety margin. Or maybe the master should tell the standby how often it should send the heartbeat? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Fri, Feb 11, 2011 at 4:30 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 11.02.2011 22:11, Robert Haas wrote: On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farinadrfar...@acm.org wrote: I split this out of the synchronous replication patch for independent review. I'm dashing out the door, so I haven't put it on the CF yet or anything, but I just wanted to get it out there...I'll be around in Not Too Long to finish any other details. This looks like a useful and separately committable change. Hmm, so this patch implements a watchdog, where the master disconnects the standby if the heartbeat from the standby stops for more than 'replication_[server]_timeout' seconds. The standby sends the heartbeat every wal_receiver_status_interval seconds. It would be nice if the master and standby could negotiate those settings. As the patch stands, it's easy to have a pathological configuration where replication_server_timeout wal_receiver_status_interval, so that the master repeatedly disconnects the standby because it doesn't reply in time. Maybe the standby should report how often it's going to send a heartbeat, and master should wait for that long + some safety margin. Or maybe the master should tell the standby how often it should send the heartbeat? I guess the biggest use case for that behavior would be in a case where you have two standbys, one of which doesn't send a heartbeat and the other of which does. Then you really can't rely on a single timeout. Maybe we could change the server parameter to indicate what multiple of wal_receiver_status_interval causes a hangup, and then change the client to notify the server what value it's using. But that gets complicated, because the value could be changed while the standby is running. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Fri, Feb 11, 2011 at 12:11 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina drfar...@acm.org wrote: I split this out of the synchronous replication patch for independent review. I'm dashing out the door, so I haven't put it on the CF yet or anything, but I just wanted to get it out there...I'll be around in Not Too Long to finish any other details. This looks like a useful and separately committable change. However, it looks to me like this renders wal_sender_delay aka WalSndDelay completely unused. If we don't need that GUC any more, we should rip it out completely. Indeed; I have cleaned this up. The comment in WalSndHandshake should have a tab at the beginning of every line. Right now the first line has a tab and the rest have spaces. Also correct. Done. The first hunk in WalSndLoop is a meaningless whitespace change. I was trying to get it under 80 columns wide, but yes, it is unnecessary. I think this closes out the small fry. I have rebased my splitorific branch to reflect these changes: https://github.com/fdr/postgres/commits/splitorific Context diff equivalent attached. -- fdr 0001-Split-and-rename-out-server-timeout-of-clients-2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Fri, Feb 11, 2011 at 4:38 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Feb 11, 2011 at 4:30 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 11.02.2011 22:11, Robert Haas wrote: On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farinadrfar...@acm.org wrote: I split this out of the synchronous replication patch for independent review. I'm dashing out the door, so I haven't put it on the CF yet or anything, but I just wanted to get it out there...I'll be around in Not Too Long to finish any other details. This looks like a useful and separately committable change. Hmm, so this patch implements a watchdog, where the master disconnects the standby if the heartbeat from the standby stops for more than 'replication_[server]_timeout' seconds. The standby sends the heartbeat every wal_receiver_status_interval seconds. It would be nice if the master and standby could negotiate those settings. As the patch stands, it's easy to have a pathological configuration where replication_server_timeout wal_receiver_status_interval, so that the master repeatedly disconnects the standby because it doesn't reply in time. Maybe the standby should report how often it's going to send a heartbeat, and master should wait for that long + some safety margin. Or maybe the master should tell the standby how often it should send the heartbeat? I guess the biggest use case for that behavior would be in a case where you have two standbys, one of which doesn't send a heartbeat and the other of which does. Then you really can't rely on a single timeout. Maybe we could change the server parameter to indicate what multiple of wal_receiver_status_interval causes a hangup, and then change the client to notify the server what value it's using. But that gets complicated, because the value could be changed while the standby is running. On reflection I'm deeply uncertain this is a good idea. It's pretty hopeless to suppose that we can keep the user from choosing parameter settings which will cause them problems, and there are certainly far stupider things they could do then set replication_timeout wal_receiver_status_interval. They could, for example, set fsync=off or work_mem=4GB or checkpoint_segments=3 (never mind that we ship that last one out of the box). Any of those settings have the potential to thoroughly destroy their system in one way or another, and there's not a darn thing we can do about it. Setting up some kind of handshake system based on a multiple of the wal_receiver_status_interval is going to be complex, and it's not necessarily going to deliver the behavior someone wants anyway. If someone has wal_receiver_status_interval=10 on one system and =30 on another system, does it therefore follow that the timeouts should also be different by 3X? Perhaps, but it's non-obvious. There are two things that I think are pretty clear. If the receiver has wal_receiver_status_interval=0, then we should ignore replication_timeout for that connection. And also we need to make sure that the replication_timeout can't kill off a connection that is in the middle of streaming a large base backup. Maybe we should try to get those two cases right and not worry about the rest. Dan, can you check whether the base backup thing is a problem with this as implemented? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Feb 11, 2011 8:20 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Feb 11, 2011 at 4:38 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Feb 11, 2011 at 4:30 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 11.02.2011 22:11, Robert Haas wrote: On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farinadrfar...@acm.org wrote: I split this out of the synchronous replication patch for independent review. I'm dashing out the door, so I haven't put it on the CF yet or anything, but I just wanted to get it out there...I'll be around in Not Too Long to finish any other details. This looks like a useful and separately committable change. Hmm, so this patch implements a watchdog, where the master disconnects the standby if the heartbeat from the standby stops for more than 'replication_[server]_timeout' seconds. The standby sends the heartbeat every wal_receiver_status_interval seconds. It would be nice if the master and standby could negotiate those settings. As the patch stands, it's easy to have a pathological configuration where replication_server_timeout wal_receiver_status_interval, so that the master repeatedly disconnects the standby because it doesn't reply in time. Maybe the standby should report how often it's going to send a heartbeat, and master should wait for that long + some safety margin. Or maybe the master should tell the standby how often it should send the heartbeat? I guess the biggest use case for that behavior would be in a case where you have two standbys, one of which doesn't send a heartbeat and the other of which does. Then you really can't rely on a single timeout. Maybe we could change the server parameter to indicate what multiple of wal_receiver_status_interval causes a hangup, and then change the client to notify the server what value it's using. But that gets complicated, because the value could be changed while the standby is running. On reflection I'm deeply uncertain this is a good idea. It's pretty hopeless to suppose that we can keep the user from choosing parameter settings which will cause them problems, and there are certainly far stupider things they could do then set replication_timeout wal_receiver_status_interval. They could, for example, set fsync=off or work_mem=4GB or checkpoint_segments=3 (never mind that we ship that last one out of the box). Any of those settings have the potential to thoroughly destroy their system in one way or another, and there's not a darn thing we can do about it. Setting up some kind of handshake system based on a multiple of the wal_receiver_status_interval is going to be complex, and it's not necessarily going to deliver the behavior someone wants anyway. If someone has wal_receiver_status_interval=10 on one system and =30 on another system, does it therefore follow that the timeouts should also be different by 3X? Perhaps, but it's non-obvious. There are two things that I think are pretty clear. If the receiver has wal_receiver_status_interval=0, then we should ignore replication_timeout for that connection. And also we need to make sure that the replication_timeout can't kill off a connection that is in the middle of streaming a large base backup. Maybe we should try to get those two cases right and not worry about the rest. Dan, can you check whether the base backup thing is a problem with this as implemented? Yes, I will have something to say come Saturday. -- fdr