Re: [HACKERS] loss of transactions in streaming replication
On Mon, Oct 24, 2011 at 8:40 AM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Oct 21, 2011 at 12:01 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Oct 20, 2011 at 9:51 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Oct 20, 2011 at 1:05 AM, Robert Haas robertmh...@gmail.com wrote: OK, so this is an artifact of the changes to make libpq communication bidirectional. But I'm still confused about where the error is coming from. In your OP, you wrote In 9.2dev and 9.1, when walreceiver detects an error while sending data to WAL stream, it always emits ERROR even if there are data available in the receive buffer. So that implied to me that this is only going to trigger if you have a shutdown together with an awkwardly-timed error. But your scenario for reproducing this problem doesn't seem to involve an error. Yes, my scenario doesn't cause any real error. My original description was misleading. The following would be closer to the truth: In 9.2dev and 9.1, when walreceiver detects the termination of replication connection while sending data to WAL stream, it always emits ERROR even if there are data available in the receive buffer. Ah, OK. I think I now agree that this is a bug and that we should fix and back-patch. The patch that I posted before is well-formed enough to be adopted? Does this still need to be worked on? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] loss of transactions in streaming replication
On Fri, Oct 21, 2011 at 12:01 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Oct 20, 2011 at 9:51 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Oct 20, 2011 at 1:05 AM, Robert Haas robertmh...@gmail.com wrote: OK, so this is an artifact of the changes to make libpq communication bidirectional. But I'm still confused about where the error is coming from. In your OP, you wrote In 9.2dev and 9.1, when walreceiver detects an error while sending data to WAL stream, it always emits ERROR even if there are data available in the receive buffer. So that implied to me that this is only going to trigger if you have a shutdown together with an awkwardly-timed error. But your scenario for reproducing this problem doesn't seem to involve an error. Yes, my scenario doesn't cause any real error. My original description was misleading. The following would be closer to the truth: In 9.2dev and 9.1, when walreceiver detects the termination of replication connection while sending data to WAL stream, it always emits ERROR even if there are data available in the receive buffer. Ah, OK. I think I now agree that this is a bug and that we should fix and back-patch. The patch that I posted before is well-formed enough to be adopted? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] loss of transactions in streaming replication
On Thu, Oct 20, 2011 at 1:05 AM, Robert Haas robertmh...@gmail.com wrote: OK, so this is an artifact of the changes to make libpq communication bidirectional. But I'm still confused about where the error is coming from. In your OP, you wrote In 9.2dev and 9.1, when walreceiver detects an error while sending data to WAL stream, it always emits ERROR even if there are data available in the receive buffer. So that implied to me that this is only going to trigger if you have a shutdown together with an awkwardly-timed error. But your scenario for reproducing this problem doesn't seem to involve an error. Yes, my scenario doesn't cause any real error. My original description was misleading. The following would be closer to the truth: In 9.2dev and 9.1, when walreceiver detects the termination of replication connection while sending data to WAL stream, it always emits ERROR even if there are data available in the receive buffer. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] loss of transactions in streaming replication
On Thu, Oct 20, 2011 at 9:51 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Oct 20, 2011 at 1:05 AM, Robert Haas robertmh...@gmail.com wrote: OK, so this is an artifact of the changes to make libpq communication bidirectional. But I'm still confused about where the error is coming from. In your OP, you wrote In 9.2dev and 9.1, when walreceiver detects an error while sending data to WAL stream, it always emits ERROR even if there are data available in the receive buffer. So that implied to me that this is only going to trigger if you have a shutdown together with an awkwardly-timed error. But your scenario for reproducing this problem doesn't seem to involve an error. Yes, my scenario doesn't cause any real error. My original description was misleading. The following would be closer to the truth: In 9.2dev and 9.1, when walreceiver detects the termination of replication connection while sending data to WAL stream, it always emits ERROR even if there are data available in the receive buffer. Ah, OK. I think I now agree that this is a bug and that we should fix and back-patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] loss of transactions in streaming replication
On Wed, Oct 19, 2011 at 11:28 AM, Robert Haas robertmh...@gmail.com wrote: Convince me. :-) Yeah, I try. My reading of the situation is that you're talking about a problem that will only occur if, while the master is in the process of shutting down, a network error occurs. No. This happens even if a network error doesn't occur. I can reproduce the issue by doing the following: 1. Set up streaming replication master and standby with archive setting. 2. Run pgbench -i 3. Shuts down the master with fast mode. Then I can see that the latest WAL file in the master's pg_xlog doesn't exist in the standby's one. The WAL record which was lost was the shutdown checkpoint one. When smart or fast shutdown is requested, the master tries to write and send the WAL switch (if archiving is enabled) and shutdown checkpoint record. Because of the problem I described, the WAL switch record arrives at the standby but the shutdown checkpoint does not. I am not sure it's a good idea to convolute the code to handle that case, because (1) there are going to be many similar situations where nothing within our power is sufficient to prevent WAL from failing to make it to the standby and Shutting down the master is not a rare case. So I think it's worth doing something. (2) for this marginal improvement, you're giving up including PQerrorMessage(streamConn) in the error message that ultimately gets omitted, which seems like a substantial regression as far as debuggability is concerned. I think that it's possible to include PQerrorMessage() in the error message. Will change the patch. Even if we do decide that we want the change in behavior, I see no compelling reason to back-patch it. Stable releases are supposed to be stable, not change behavior because we thought of something we like better than what we originally released. The original behavior, in 9.0, is that all outstanding WAL are replicated to the standby when the master shuts down normally. But ISTM the behavior was changed unexpectedly in 9.1. So I think that it should be back-patched to 9.1 to revert the behavior to the original. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] loss of transactions in streaming replication
On Wed, Oct 19, 2011 at 3:31 PM, Fujii Masao masao.fu...@gmail.com wrote: (2) for this marginal improvement, you're giving up including PQerrorMessage(streamConn) in the error message that ultimately gets omitted, which seems like a substantial regression as far as debuggability is concerned. I think that it's possible to include PQerrorMessage() in the error message. Will change the patch. Attached is the updated version of the patch. When walreceiver fails to send data to WAL stream, it emits WARNING with the message including PQerrorMessage(), and also it emits the following DETAIL message: Walreceiver process will be terminated after all available data have been received from WAL stream. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c --- b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c *** *** 49,55 static char *recvBuf = NULL; static bool libpqrcv_connect(char *conninfo, XLogRecPtr startpoint); static bool libpqrcv_receive(int timeout, unsigned char *type, char **buffer, int *len); ! static void libpqrcv_send(const char *buffer, int nbytes); static void libpqrcv_disconnect(void); /* Prototypes for private functions */ --- 49,55 static bool libpqrcv_connect(char *conninfo, XLogRecPtr startpoint); static bool libpqrcv_receive(int timeout, unsigned char *type, char **buffer, int *len); ! static bool libpqrcv_send(const char *buffer, int nbytes); static void libpqrcv_disconnect(void); /* Prototypes for private functions */ *** *** 404,417 libpqrcv_receive(int timeout, unsigned char *type, char **buffer, int *len) /* * Send a message to XLOG stream. * ! * ereports on error. */ ! static void libpqrcv_send(const char *buffer, int nbytes) { if (PQputCopyData(streamConn, buffer, nbytes) = 0 || PQflush(streamConn)) ! ereport(ERROR, (errmsg(could not send data to WAL stream: %s, ! PQerrorMessage(streamConn; } --- 404,429 /* * Send a message to XLOG stream. * ! * Return true if successfully, false otherwise. */ ! static bool libpqrcv_send(const char *buffer, int nbytes) { + /* + * Even if we could not send data to WAL stream, we don't emit ERROR + * just yet. There might be still data available in the receive buffer. We + * emit ERROR after all available data have been received and flushed to + * disk. Otherwise, such outstanding data would be lost. + */ if (PQputCopyData(streamConn, buffer, nbytes) = 0 || PQflush(streamConn)) ! { ! ereport(WARNING, (errmsg(could not send data to WAL stream: %s, ! PQerrorMessage(streamConn)), ! errdetail(Walreceiver process will be terminated after all available data have been received from WAL stream.))); ! return false; ! } ! ! return true; } *** a/src/backend/replication/walreceiver.c --- b/src/backend/replication/walreceiver.c *** *** 96,101 static struct --- 96,104 static StandbyReplyMessage reply_message; static StandbyHSFeedbackMessage feedback_message; + /* Did previous attempt to send data to WAL stream fail? */ + static bool walrcv_send_error = false; + /* * About SIGTERM handling: * *** *** 328,333 WalReceiverMain(void) --- 331,345 else { /* + * If we didn't receive anything new after we had failed to send data + * to WAL stream, we can guarantee that there is no data available in + * the receive buffer. So we at last emit ERROR. + */ + if (walrcv_send_error) + ereport(ERROR, + (errmsg(terminating walreceiver process due to failure to send data to WAL stream))); + + /* * We didn't receive anything new, but send a status update to the * master anyway, to report any progress in applying WAL. */ *** *** 627,632 XLogWalRcvSendReply(void) --- 639,651 wal_receiver_status_interval * 1000)) return; + /* + * If previous attempt to send data to WAL stream has failed, there is + * nothing to do here because this attempt would fail again. + */ + if (walrcv_send_error) + return; + /* Construct a new message */ reply_message.write = LogstreamResult.Write; reply_message.flush = LogstreamResult.Flush; *** *** 641,647 XLogWalRcvSendReply(void) /* Prepend with the message type and send it. */ buf[0] = 'r'; memcpy(buf[1], reply_message, sizeof(StandbyReplyMessage)); ! walrcv_send(buf, sizeof(StandbyReplyMessage) + 1); } /* --- 660,666 /* Prepend with the message type and send it. */ buf[0] = 'r'; memcpy(buf[1], reply_message, sizeof(StandbyReplyMessage)); ! walrcv_send_error = !walrcv_send(buf, sizeof(StandbyReplyMessage) + 1); } /* *** *** 682,687 XLogWalRcvSendHSFeedback(void) --- 701,714 return;
Re: [HACKERS] loss of transactions in streaming replication
On Wed, Oct 19, 2011 at 2:31 AM, Fujii Masao masao.fu...@gmail.com wrote: My reading of the situation is that you're talking about a problem that will only occur if, while the master is in the process of shutting down, a network error occurs. No. This happens even if a network error doesn't occur. I can reproduce the issue by doing the following: 1. Set up streaming replication master and standby with archive setting. 2. Run pgbench -i 3. Shuts down the master with fast mode. Then I can see that the latest WAL file in the master's pg_xlog doesn't exist in the standby's one. The WAL record which was lost was the shutdown checkpoint one. When smart or fast shutdown is requested, the master tries to write and send the WAL switch (if archiving is enabled) and shutdown checkpoint record. Because of the problem I described, the WAL switch record arrives at the standby but the shutdown checkpoint does not. Oh, that's not good. The original behavior, in 9.0, is that all outstanding WAL are replicated to the standby when the master shuts down normally. But ISTM the behavior was changed unexpectedly in 9.1. So I think that it should be back-patched to 9.1 to revert the behavior to the original. Which commit broke this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] loss of transactions in streaming replication
On Wed, Oct 19, 2011 at 9:44 PM, Robert Haas robertmh...@gmail.com wrote: The original behavior, in 9.0, is that all outstanding WAL are replicated to the standby when the master shuts down normally. But ISTM the behavior was changed unexpectedly in 9.1. So I think that it should be back-patched to 9.1 to revert the behavior to the original. Which commit broke this? d3d414696f39e2b57072fab3dd4fa11e465be4ed b186523fd97ce02ffbb7e21d5385a047deeef4f6 The former introduced problematic libpqrcv_send() (which was my mistake...), and the latter is the first user of it. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] loss of transactions in streaming replication
On Wed, Oct 19, 2011 at 10:41 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Oct 19, 2011 at 9:44 PM, Robert Haas robertmh...@gmail.com wrote: The original behavior, in 9.0, is that all outstanding WAL are replicated to the standby when the master shuts down normally. But ISTM the behavior was changed unexpectedly in 9.1. So I think that it should be back-patched to 9.1 to revert the behavior to the original. Which commit broke this? d3d414696f39e2b57072fab3dd4fa11e465be4ed b186523fd97ce02ffbb7e21d5385a047deeef4f6 The former introduced problematic libpqrcv_send() (which was my mistake...), and the latter is the first user of it. OK, so this is an artifact of the changes to make libpq communication bidirectional. But I'm still confused about where the error is coming from. In your OP, you wrote In 9.2dev and 9.1, when walreceiver detects an error while sending data to WAL stream, it always emits ERROR even if there are data available in the receive buffer. So that implied to me that this is only going to trigger if you have a shutdown together with an awkwardly-timed error. But your scenario for reproducing this problem doesn't seem to involve an error. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] loss of transactions in streaming replication
On Fri, Oct 14, 2011 at 7:51 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Oct 13, 2011 at 10:08 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Oct 12, 2011 at 10:29 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Oct 12, 2011 at 5:45 AM, Fujii Masao masao.fu...@gmail.com wrote: In 9.2dev and 9.1, when walreceiver detects an error while sending data to WAL stream, it always emits ERROR even if there are data available in the receive buffer. This might lead to loss of transactions because such remaining data are not received by walreceiver :( Won't it just reconnect? Yes if the master is running normally. OTOH, if the master is not running (i.e., failover case), the standby cannot receive again the data which it failed to receive. I found this issue when I shut down the master. When the master shuts down, it sends the shutdown checkpoint record, but I found that the standby failed to receive it. Patch attached. The patch changes walreceiver so that it doesn't emit ERROR just yet even if it fails to send data to WAL stream. Then, after all available data have been received and flushed to the disk, it emits ERROR. If the patch is OK, it should be backported to v9.1. Convince me. :-) My reading of the situation is that you're talking about a problem that will only occur if, while the master is in the process of shutting down, a network error occurs. I am not sure it's a good idea to convolute the code to handle that case, because (1) there are going to be many similar situations where nothing within our power is sufficient to prevent WAL from failing to make it to the standby and (2) for this marginal improvement, you're giving up including PQerrorMessage(streamConn) in the error message that ultimately gets omitted, which seems like a substantial regression as far as debuggability is concerned. Even if we do decide that we want the change in behavior, I see no compelling reason to back-patch it. Stable releases are supposed to be stable, not change behavior because we thought of something we like better than what we originally released. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] loss of transactions in streaming replication
On Thu, Oct 13, 2011 at 10:08 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Oct 12, 2011 at 10:29 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Oct 12, 2011 at 5:45 AM, Fujii Masao masao.fu...@gmail.com wrote: In 9.2dev and 9.1, when walreceiver detects an error while sending data to WAL stream, it always emits ERROR even if there are data available in the receive buffer. This might lead to loss of transactions because such remaining data are not received by walreceiver :( Won't it just reconnect? Yes if the master is running normally. OTOH, if the master is not running (i.e., failover case), the standby cannot receive again the data which it failed to receive. I found this issue when I shut down the master. When the master shuts down, it sends the shutdown checkpoint record, but I found that the standby failed to receive it. Patch attached. The patch changes walreceiver so that it doesn't emit ERROR just yet even if it fails to send data to WAL stream. Then, after all available data have been received and flushed to the disk, it emits ERROR. If the patch is OK, it should be backported to v9.1. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c --- b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c *** *** 49,55 static char *recvBuf = NULL; static bool libpqrcv_connect(char *conninfo, XLogRecPtr startpoint); static bool libpqrcv_receive(int timeout, unsigned char *type, char **buffer, int *len); ! static void libpqrcv_send(const char *buffer, int nbytes); static void libpqrcv_disconnect(void); /* Prototypes for private functions */ --- 49,55 static bool libpqrcv_connect(char *conninfo, XLogRecPtr startpoint); static bool libpqrcv_receive(int timeout, unsigned char *type, char **buffer, int *len); ! static bool libpqrcv_send(const char *buffer, int nbytes); static void libpqrcv_disconnect(void); /* Prototypes for private functions */ *** *** 404,417 libpqrcv_receive(int timeout, unsigned char *type, char **buffer, int *len) /* * Send a message to XLOG stream. * ! * ereports on error. */ ! static void libpqrcv_send(const char *buffer, int nbytes) { if (PQputCopyData(streamConn, buffer, nbytes) = 0 || PQflush(streamConn)) ! ereport(ERROR, ! (errmsg(could not send data to WAL stream: %s, ! PQerrorMessage(streamConn; } --- 404,428 /* * Send a message to XLOG stream. * ! * Return true if successfully, false otherwise. */ ! static bool libpqrcv_send(const char *buffer, int nbytes) { + /* + * Even if we could not send data to WAL stream, we don't emit ERROR + * just yet. There might be still data available in the receive buffer. We + * emit ERROR after all available data have been received and flushed to + * disk. Otherwise, such outstanding data would be lost. + * + * XXX: Should the result of PQerrorMessage() be returned so that the + * caller can report it? This doesn't seem worthwhile because in most cases, + * before reporting that, we will get another error and emit ERROR while + * trying to process all available data. + */ if (PQputCopyData(streamConn, buffer, nbytes) = 0 || PQflush(streamConn)) ! return false; ! ! return true; } *** a/src/backend/replication/walreceiver.c --- b/src/backend/replication/walreceiver.c *** *** 96,101 static struct --- 96,104 static StandbyReplyMessage reply_message; static StandbyHSFeedbackMessage feedback_message; + /* Did previous attempt to send data to WAL stream fail? */ + static bool walrcv_send_error = false; + /* * About SIGTERM handling: * *** *** 328,333 WalReceiverMain(void) --- 331,345 else { /* + * If we didn't receive anything new after we had failed to send data + * to WAL stream, we can guarantee that there is no data available in + * the receive buffer. So we at last emit ERROR. + */ + if (walrcv_send_error) + ereport(ERROR, + (errmsg(could not send data to WAL stream))); + + /* * We didn't receive anything new, but send a status update to the * master anyway, to report any progress in applying WAL. */ *** *** 627,632 XLogWalRcvSendReply(void) --- 639,651 wal_receiver_status_interval * 1000)) return; + /* + * If previous attempt to send data to WAL stream has failed, there is + * nothing to do here because this attempt would fail again. + */ + if (walrcv_send_error) + return; + /* Construct a new message */ reply_message.write = LogstreamResult.Write; reply_message.flush = LogstreamResult.Flush; *** *** 641,647 XLogWalRcvSendReply(void) /* Prepend with the message type and send it. */ buf[0] = 'r'; memcpy(buf[1],
Re: [HACKERS] loss of transactions in streaming replication
On Wed, Oct 12, 2011 at 5:45 AM, Fujii Masao masao.fu...@gmail.com wrote: In 9.2dev and 9.1, when walreceiver detects an error while sending data to WAL stream, it always emits ERROR even if there are data available in the receive buffer. This might lead to loss of transactions because such remaining data are not received by walreceiver :( Won't it just reconnect? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] loss of transactions in streaming replication
On Wed, Oct 12, 2011 at 10:29 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Oct 12, 2011 at 5:45 AM, Fujii Masao masao.fu...@gmail.com wrote: In 9.2dev and 9.1, when walreceiver detects an error while sending data to WAL stream, it always emits ERROR even if there are data available in the receive buffer. This might lead to loss of transactions because such remaining data are not received by walreceiver :( Won't it just reconnect? Yes if the master is running normally. OTOH, if the master is not running (i.e., failover case), the standby cannot receive again the data which it failed to receive. I found this issue when I shut down the master. When the master shuts down, it sends the shutdown checkpoint record, but I found that the standby failed to receive it. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers