Re: [HACKERS] Bug in walsender when calling out to do_pg_stop_backup (and others?)

Heikki Linnakangas Sat, 03 Dec 2011 08:38:04 -0800

On 19.10.2011 19:41, Greg Jaskiewicz wrote:

On 19 Oct 2011, at 18:28, Florian Pflug wrote:

All the other flags which indicate cancellation reasons are set from signal 
handers, I believe. We could of course mark as ClientConnectionLostPending as 
volatile just to be consistent. Not sure whether that's a good idea, or not. It 
might prevent a mistake should we ever add code to detect lost connections 
asynchronously (i.e., from somewhere else than pq_flush). And the cost is 
probably negligible, because CHECK_FOR_INTERRUPTS tests for InterruptPending 
before calling ProcessInterrupts(), so we only pay the cost of volatile if 
there's actually an interrupt pending. But I still think it's better to add 
qualifies such a volatile only when really necessary. A comment about why it 
*isn't* volatile is probably in order, though, so I'll add that in the next 
version of the patch.

Makes sense.

I had to ask, because it sticks out. And indeed there is a possibility that 
someone will one day will try to use from signal handler, etc.

Let's just mark it as volatile. It's not clear to me that we'll neverset or read the flag while in a signal handler. We probably don't, butwhat if ImmediateInterruptOK is 'true', for example, just when we failto send a response and try to set the flags. In that case, we have to becareful that ClientConnectionLost is set before InterruptPending (whichwe can only guarantee if both are volatile, I believe), otherwise asignal might arrive when we've just set InterruptPending, but not yetClientConnectionLost. ProcessInterrupts() would clear InterruptPending,but not see ClientConnectionLost, so we would lose the interrupt.

That's not serious, and the window for that happening would be extremelysmall, and I don't think it can actually happen as the code stands,because we never try to send anything while ImmediateInterruptOK ==true. But better safe than sorry.

One small change I'd like to make is to treat the loss of connectionmore as a new "top-level" event, rather than as a new reason for querycancellation. A lost connection is more like receiving SIGTERM, really.Please take a look at the attached patch. I've been testing this bydoing "SELECT pg_sleep(1), a from generate_series(1,1000) a;", andkilling the connection with "killall -9 psql".

One remaining issue I have with this is the wording of the errormessage. The closest existing message we have is in old-mode COPY:


ereport(FATAL,
        (errcode(ERRCODE_CONNECTION_FAILURE),
         errmsg("connection lost during COPY to stdout")));

In the patch, I made the new message just "connection lost", but doesanyone else have a better opinion on that? Perhaps it should be"connection lost while sending response to client". Or "connection lostduring execution of statement", but that might not be accurate if we'renot executing a statement at the moment, but just sending a "sync"message or similar.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index b83a2ef..c36cf31 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -1247,9 +1247,13 @@ internal_flush(void)
 
 			/*
 			 * We drop the buffered data anyway so that processing can
-			 * continue, even though we'll probably quit soon.
+			 * continue, even though we'll probably quit soon. We also
+			 * set a flag that'll cause the next CHECK_FOR_INTERRUPTS
+			 * to terminate the connection.
 			 */
 			PqSendStart = PqSendPointer = 0;
+			ClientConnectionLost = 1;
+			InterruptPending = 1;
 			return EOF;
 		}
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 976a832..13ea594 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2823,6 +2823,19 @@ ProcessInterrupts(void)
 					(errcode(ERRCODE_ADMIN_SHUTDOWN),
 			 errmsg("terminating connection due to administrator command")));
 	}
+	if (ClientConnectionLost)
+	{
+		QueryCancelPending = false;		/* lost connection trumps QueryCancel */
+		ImmediateInterruptOK = false;	/* not idle anymore */
+		DisableNotifyInterrupt();
+		DisableCatchupInterrupt();
+		/* As in quickdie, don't risk sending to client during auth */
+		if (ClientAuthInProgress && whereToSendOutput == DestRemote)
+			whereToSendOutput = DestNone;
+		ereport(FATAL,
+				(errcode(ERRCODE_CONNECTION_FAILURE),
+				 errmsg("connection lost")));
+	}
 	if (QueryCancelPending)
 	{
 		QueryCancelPending = false;
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 9ce64e6..9417c7a 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -29,6 +29,7 @@ ProtocolVersion FrontendProtocol;
 volatile bool InterruptPending = false;
 volatile bool QueryCancelPending = false;
 volatile bool ProcDiePending = false;
+volatile bool ClientConnectionLost = false;
 volatile bool ImmediateInterruptOK = false;
 volatile uint32 InterruptHoldoffCount = 0;
 volatile uint32 CritSectionCount = 0;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 4ee08fe..8048fc1 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -55,6 +55,12 @@
  * course, only if the interrupt holdoff counter is zero).	See the
  * related code for details.
  *
+ * A lost connection is handled similarly, although the ClientConnectionLost
+ * flag is not set in a signal handler, but whenever we get a nonrecoverable
+ * error writing to the socket. If there was a signal for a broken
+ * connection, we could make use of it by setting ClientConnectionLost in
+ * the signal handler.
+ *
  * A related, but conceptually distinct, mechanism is the "critical section"
  * mechanism.  A critical section not only holds off cancel/die interrupts,
  * but causes any ereport(ERROR) or ereport(FATAL) to become ereport(PANIC)
@@ -70,6 +76,8 @@ extern PGDLLIMPORT volatile bool InterruptPending;
 extern volatile bool QueryCancelPending;
 extern volatile bool ProcDiePending;
 
+extern volatile bool ClientConnectionLost;
+
 /* these are marked volatile because they are examined by signal handlers: */
 extern volatile bool ImmediateInterruptOK;
 extern PGDLLIMPORT volatile uint32 InterruptHoldoffCount;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in walsender when calling out to do_pg_stop_backup (and others?)

Reply via email to