Re: [HACKERS] Dangling Client Backend Process

Robert Haas Thu, 22 Oct 2015 13:28:33 -0700

On Tue, Oct 20, 2015 at 11:42 PM, Rajeev rastogi
<rajeev.rast...@huawei.com> wrote:
> Agreed. Attached is the patch with changes.


Well, I'm not buying this extra PostmasterIsAlive() call on every pass
through the main loop.  That seems more expensive than we can really
justify. Checking this when we're already calling WaitLatchOrSocket is
basically free, but the other part is not.

Here's a version with that removed and some changes to the comments.
I still don't think this is quite working right, though, because
here's what happened when I killed the postmaster:

rhaas=# select 1;
 ?column?
----------
        1
(1 row)

rhaas=# \watch
Watch every 2s    Thu Oct 22 16:24:10 2015

 ?column?
----------
        1
(1 row)

Watch every 2s    Thu Oct 22 16:24:12 2015

 ?column?
----------
        1
(1 row)

Watch every 2s    Thu Oct 22 16:24:14 2015

 ?column?
----------
        1
(1 row)

Watch every 2s    Thu Oct 22 16:24:16 2015

 ?column?
----------
        1
(1 row)

server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Note that the error message doesn't actually show up on the client (it
did show up in the log).  I guess that may be inevitable if we're
blocked in secure_write(), but if we're in secure_read() maybe it
should work?  I haven't investigated why it doesn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 26d8faa..089435d 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -36,7 +36,7 @@
 #include "tcop/tcopprot.h"
 #include "utils/memutils.h"
 #include "storage/proc.h"
-
+#include "storage/ipc.h"
 
 char	   *ssl_cert_file;
 char	   *ssl_key_file;
@@ -144,9 +144,31 @@ retry:
 		Assert(waitfor);
 
 		w = WaitLatchOrSocket(MyLatch,
-							  WL_LATCH_SET | waitfor,
+							  WL_LATCH_SET | WL_POSTMASTER_DEATH | waitfor,
 							  port->sock, 0);
 
+		/*
+		 * If the postmaster has died, it's not safe to continue running,
+		 * because it is the postmaster's job to kill us if some other backend
+		 * exists uncleanly.  Moreover, we won't run very well in this state;
+		 * helper processes like walwriter and the bgwriter will exit, so
+		 * performance may be poor.  Finally, if we don't exit, pg_ctl will
+		 * be unable to restart the postmaster without manual intervention,
+		 * so no new connections can be accepted.  Exiting clears the deck
+		 * for a postmaster restart.
+		 *
+		 * (Note that we only make this check when we would otherwise sleep
+		 * on our latch.  We might still continue running for a while if the
+		 * postmaster is killed in mid-query, or even through multiple queries
+		 * if we never have to wait for read.  We don't want to burn too many
+		 * cycles checking for this very rare condition, and this shouuld cause
+		 * us to exit quickly in most cases.)
+		 */
+		if (w & WL_POSTMASTER_DEATH)
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					errmsg("terminating connection due to unexpected postmaster exit")));
+
 		/* Handle interrupt. */
 		if (w & WL_LATCH_SET)
 		{
@@ -223,9 +245,15 @@ retry:
 		Assert(waitfor);
 
 		w = WaitLatchOrSocket(MyLatch,
-							  WL_LATCH_SET | waitfor,
+							  WL_LATCH_SET | WL_POSTMASTER_DEATH | waitfor,
 							  port->sock, 0);
 
+		/* See comments in secure_read. */
+		if (w & WL_POSTMASTER_DEATH)
+			ereport(FATAL,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					errmsg("terminating connection due to unexpected postmaster exit")));
+
 		/* Handle interrupt. */
 		if (w & WL_LATCH_SET)
 		{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dangling Client Backend Process

Reply via email to