Re: libpq maligning postgres stability

Jelte Fennema-Nio Wed, 10 Jun 2026 23:46:40 -0700

On Tue Jun 9, 2026 at 11:44 PM CEST, Andres Freund wrote:

Note that the tests don't pass with this applied:


https://github.com/postgresql-cfbot/postgresql/actions/runs/27235961657


Hmmm, I'm pretty sure I ran the tests locally before submitting the
patch. Maybe I only ran the regress suite...

Attached is a new version of the patch that includes fixes for the test
failures. This also required a small change to postgres_fdw (see commit
message for details).

From 7a3c7e57a029e7679521b9224fa891677bfc86ef Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio <[email protected]>
Date: Tue, 26 May 2026 09:05:56 +0200
Subject: [PATCH v2] libpq: Consider a connection with a FATAL error to be
 closed

This starts marking a connection as closed (i.e. CONNECTION_BAD) when
the client receives a FATAL/PANIC error. Previously any FATAL error would get the
the "server closed the connection unexpectedly" string appended like such:

FATAL:  57P01: terminating connection due to administrator command
LOCATION:  ProcessInterrupts, postgres.c:3431
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

This addition to the error is just plain incorrect, the server told the
client that it was closing the connection. So it's not unexpected, nor
did the server terminate abnormally. It also makes the error harder to
parse by a client, because it would lose the ability to use
PQresultErrorField on the final PGresult.

Due to this change postgres_fdw needed to be changed a bit too. It
special cased ERRCODE_CONNECTION_FAILURE as an indication to reconnect,
but now that would be ERRCODE_ADMIN_SHUTDOWN on Linux. This updates the
check to do it with any error code as long as the error actually comes
from the foreign server. That behavioral change actually aligns better
with the concern Tom Lane posted in the thread that introduced this
check[1].

[1]: https://www.postgresql.org/message-id/3483167.1602614058%40sss.pgh.pa.us
---
 contrib/postgres_fdw/Makefile                 |  2 +-
 contrib/postgres_fdw/connection.c             | 30 ++++----
 .../postgres_fdw/expected/postgres_fdw.out    | 45 +-----------
 .../expected/postgres_fdw_fatal.out           | 68 ++++++++++++++++++
 .../expected/postgres_fdw_fatal_1.out         | 71 +++++++++++++++++++
 contrib/postgres_fdw/meson.build              |  1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 40 +----------
 .../postgres_fdw/sql/postgres_fdw_fatal.sql   | 63 ++++++++++++++++
 src/bin/psql/t/001_basic.pl                   | 62 +++++++++++++---
 src/interfaces/libpq/fe-protocol3.c           | 26 +++++++
 10 files changed, 303 insertions(+), 105 deletions(-)
 create mode 100644 contrib/postgres_fdw/expected/postgres_fdw_fatal.out
 create mode 100644 contrib/postgres_fdw/expected/postgres_fdw_fatal_1.out
 create mode 100644 contrib/postgres_fdw/sql/postgres_fdw_fatal.sql

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index b8c78b58804..c417aa69493 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -16,7 +16,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql postgres_fdw--1.0--1.1.sql postgres_fdw--1.1--1.2.sql postgres_fdw--1.2--1.3.sql
 
-REGRESS = postgres_fdw query_cancel
+REGRESS = postgres_fdw query_cancel postgres_fdw_fatal
 ISOLATION = eval_plan_qual
 ISOLATION_OPTS = --load-extension=postgres_fdw
 TAP_TESTS = 1
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index aab21695979..f7e9ba5ce9c 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -310,20 +310,24 @@ GetConnection(UserMapping *user, bool will_prep_stmt, PgFdwConnState **state)
 		/*
 		 * Determine whether to try to reestablish the connection.
 		 *
-		 * After a broken connection is detected in libpq, any error other
-		 * than connection failure (e.g., out-of-memory) can be thrown
-		 * somewhere between return from libpq and the expected ereport() call
-		 * in pgfdw_report_error(). In this case, since PQstatus() indicates
-		 * CONNECTION_BAD, checking only PQstatus() causes the false detection
-		 * of connection failure. To avoid this, we also verify that the
-		 * error's sqlstate is ERRCODE_CONNECTION_FAILURE. Note that also
-		 * checking only the sqlstate can cause another false detection
-		 * because pgfdw_report_error() may report ERRCODE_CONNECTION_FAILURE
-		 * for any libpq-originated error condition.
+		 * We retry only if the remote connection has actually been lost:
+		 * libpq has marked it CONNECTION_BAD and the error we caught is one
+		 * that pgfdw_report_error() raised to report a remote failure.
+		 * Checking the origin is essential: once libpq has flagged
+		 * CONNECTION_BAD, an unrelated error (e.g., out-of-memory, or a local
+		 * statement cancel) can be thrown somewhere between the return from
+		 * libpq and that expected ereport(). Such an error also sees
+		 * CONNECTION_BAD, so relying on PQstatus() alone would wrongly
+		 * swallow it and retry. The error's funcname tells us where it was
+		 * raised: every remote failure, including a FATAL that the server
+		 * sends as it closes the connection, is reported from
+		 * pgfdw_report_internal(), whereas a stray local error is raised
+		 * elsewhere. (Keep this name in sync with that function.)
 		 */
-		if (errdata->sqlerrcode != ERRCODE_CONNECTION_FAILURE ||
-			PQstatus(entry->conn) != CONNECTION_BAD ||
-			entry->xact_depth > 0)
+		if (PQstatus(entry->conn) != CONNECTION_BAD ||
+			entry->xact_depth > 0 ||
+			errdata->funcname == NULL ||
+			strcmp(errdata->funcname, "pgfdw_report_internal") != 0)
 		{
 			MemoryContextSwitchTo(ecxt);
 			PG_RE_THROW();
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index e90289e4ab1..16dd89b9b4e 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -10797,49 +10797,6 @@ PREPARE TRANSACTION 'fdw_tpc';
 ERROR:  cannot PREPARE a transaction that has operated on postgres_fdw foreign tables
 ROLLBACK;
 WARNING:  there is no transaction in progress
--- ===================================================================
--- reestablish new connection
--- ===================================================================
--- Change application_name of remote connection to special one
--- so that we can easily terminate the connection later.
-ALTER SERVER loopback OPTIONS (application_name 'fdw_retry_check');
--- Make sure we have a remote connection.
-SELECT 1 FROM ft1 LIMIT 1;
- ?column? 
-----------
-        1
-(1 row)
-
--- Terminate the remote connection and wait for the termination to complete.
--- (If a cache flush happens, the remote connection might have already been
--- dropped; so code this step in a way that doesn't fail if no connection.)
-DO $$ BEGIN
-PERFORM pg_terminate_backend(pid, 180000) FROM pg_stat_activity
-	WHERE application_name = 'fdw_retry_check';
-END $$;
--- This query should detect the broken connection when starting new remote
--- transaction, reestablish new connection, and then succeed.
-BEGIN;
-SELECT 1 FROM ft1 LIMIT 1;
- ?column? 
-----------
-        1
-(1 row)
-
--- If we detect the broken connection when starting a new remote
--- subtransaction, we should fail instead of establishing a new connection.
--- Terminate the remote connection and wait for the termination to complete.
-DO $$ BEGIN
-PERFORM pg_terminate_backend(pid, 180000) FROM pg_stat_activity
-	WHERE application_name = 'fdw_retry_check';
-END $$;
-SAVEPOINT s;
--- The text of the error might vary across platforms, so only show SQLSTATE.
-\set VERBOSITY sqlstate
-SELECT 1 FROM ft1 LIMIT 1;    -- should fail
-ERROR:  08006
-\set VERBOSITY default
-COMMIT;
 -- =============================================================================
 -- test connection invalidation cases and postgres_fdw_get_connections function
 -- =============================================================================
@@ -12947,7 +12904,7 @@ SELECT 1 FROM postgres_fdw_disconnect_all();
         1
 (1 row)
 
-ALTER SERVER loopback OPTIONS (SET application_name 'fdw_conn_check');
+ALTER SERVER loopback OPTIONS (application_name 'fdw_conn_check');
 SELECT 1 FROM ft1 LIMIT 1;
  ?column? 
 ----------
diff --git a/contrib/postgres_fdw/expected/postgres_fdw_fatal.out b/contrib/postgres_fdw/expected/postgres_fdw_fatal.out
new file mode 100644
index 00000000000..6eaecf508a6
--- /dev/null
+++ b/contrib/postgres_fdw/expected/postgres_fdw_fatal.out
@@ -0,0 +1,68 @@
+-- ===================================================================
+-- reestablish a connection broken while idle
+-- ===================================================================
+-- When a cached connection's remote backend goes away while the connection
+-- sits idle, postgres_fdw should transparently reconnect at the start of a new
+-- remote transaction, but must fail (not reconnect) when starting a remote
+-- subtransaction.  The SQLSTATE of that failure depends on whether the server's
+-- FATAL message reaches us before the dead connection is reset, which varies by
+-- platform, so this test has alternative expected outputs and lives on its own.
+SELECT current_database() AS current_database,
+  current_setting('port') AS current_port \gset
+-- Use a dedicated server with a distinctive application_name so we can find and
+-- terminate exactly this test's remote backend.
+CREATE SERVER fdw_retry_server FOREIGN DATA WRAPPER postgres_fdw
+  OPTIONS (dbname :'current_database', port :'current_port',
+           application_name 'fdw_retry_check');
+CREATE USER MAPPING FOR CURRENT_USER SERVER fdw_retry_server;
+CREATE TABLE fdw_retry_local (c1 int);
+INSERT INTO fdw_retry_local VALUES (1);
+CREATE FOREIGN TABLE fdw_retry_ft (c1 int)
+  SERVER fdw_retry_server OPTIONS (table_name 'fdw_retry_local');
+-- Make sure we have a remote connection.
+SELECT 1 FROM fdw_retry_ft LIMIT 1;
+ ?column? 
+----------
+        1
+(1 row)
+
+-- Terminate the remote connection and wait for the termination to complete.
+-- (If a cache flush happens, the remote connection might have already been
+-- dropped; so code this step in a way that doesn't fail if no connection.)
+DO $$ BEGIN
+PERFORM pg_terminate_backend(pid, 180000) FROM pg_stat_activity
+	WHERE application_name = 'fdw_retry_check';
+END $$;
+-- This query should detect the broken connection when starting a new remote
+-- transaction, reestablish a new connection, and then succeed.
+BEGIN;
+SELECT 1 FROM fdw_retry_ft LIMIT 1;
+ ?column? 
+----------
+        1
+(1 row)
+
+-- If we detect the broken connection when starting a new remote
+-- subtransaction, we should fail instead of establishing a new connection.
+-- Terminate the remote connection and wait for the termination to complete.
+DO $$ BEGIN
+PERFORM pg_terminate_backend(pid, 180000) FROM pg_stat_activity
+	WHERE application_name = 'fdw_retry_check';
+END $$;
+SAVEPOINT s;
+-- This should fail.  On most platforms libpq receives the server's FATAL
+-- message before the dead connection is torn down, so we report it verbatim
+-- ("terminating connection due to administrator command") -- in particular
+-- without the bogus "server closed the connection unexpectedly" text.  The
+-- exact message and SQLSTATE can differ on platforms where the connection is
+-- reset before the message arrives (e.g. Windows), so this test has
+-- alternative expected output files.
+SELECT 1 FROM fdw_retry_ft LIMIT 1;    -- should fail
+ERROR:  terminating connection due to administrator command
+CONTEXT:  remote SQL command: SAVEPOINT s2
+COMMIT;
+-- Clean up
+DROP FOREIGN TABLE fdw_retry_ft;
+DROP TABLE fdw_retry_local;
+DROP USER MAPPING FOR CURRENT_USER SERVER fdw_retry_server;
+DROP SERVER fdw_retry_server;
diff --git a/contrib/postgres_fdw/expected/postgres_fdw_fatal_1.out b/contrib/postgres_fdw/expected/postgres_fdw_fatal_1.out
new file mode 100644
index 00000000000..95c1058f707
--- /dev/null
+++ b/contrib/postgres_fdw/expected/postgres_fdw_fatal_1.out
@@ -0,0 +1,71 @@
+-- ===================================================================
+-- reestablish a connection broken while idle
+-- ===================================================================
+-- When a cached connection's remote backend goes away while the connection
+-- sits idle, postgres_fdw should transparently reconnect at the start of a new
+-- remote transaction, but must fail (not reconnect) when starting a remote
+-- subtransaction.  The SQLSTATE of that failure depends on whether the server's
+-- FATAL message reaches us before the dead connection is reset, which varies by
+-- platform, so this test has alternative expected outputs and lives on its own.
+SELECT current_database() AS current_database,
+  current_setting('port') AS current_port \gset
+-- Use a dedicated server with a distinctive application_name so we can find and
+-- terminate exactly this test's remote backend.
+CREATE SERVER fdw_retry_server FOREIGN DATA WRAPPER postgres_fdw
+  OPTIONS (dbname :'current_database', port :'current_port',
+           application_name 'fdw_retry_check');
+CREATE USER MAPPING FOR CURRENT_USER SERVER fdw_retry_server;
+CREATE TABLE fdw_retry_local (c1 int);
+INSERT INTO fdw_retry_local VALUES (1);
+CREATE FOREIGN TABLE fdw_retry_ft (c1 int)
+  SERVER fdw_retry_server OPTIONS (table_name 'fdw_retry_local');
+-- Make sure we have a remote connection.
+SELECT 1 FROM fdw_retry_ft LIMIT 1;
+ ?column? 
+----------
+        1
+(1 row)
+
+-- Terminate the remote connection and wait for the termination to complete.
+-- (If a cache flush happens, the remote connection might have already been
+-- dropped; so code this step in a way that doesn't fail if no connection.)
+DO $$ BEGIN
+PERFORM pg_terminate_backend(pid, 180000) FROM pg_stat_activity
+	WHERE application_name = 'fdw_retry_check';
+END $$;
+-- This query should detect the broken connection when starting a new remote
+-- transaction, reestablish a new connection, and then succeed.
+BEGIN;
+SELECT 1 FROM fdw_retry_ft LIMIT 1;
+ ?column? 
+----------
+        1
+(1 row)
+
+-- If we detect the broken connection when starting a new remote
+-- subtransaction, we should fail instead of establishing a new connection.
+-- Terminate the remote connection and wait for the termination to complete.
+DO $$ BEGIN
+PERFORM pg_terminate_backend(pid, 180000) FROM pg_stat_activity
+	WHERE application_name = 'fdw_retry_check';
+END $$;
+SAVEPOINT s;
+-- This should fail.  On most platforms libpq receives the server's FATAL
+-- message before the dead connection is torn down, so we report it verbatim
+-- ("terminating connection due to administrator command") -- in particular
+-- without the bogus "server closed the connection unexpectedly" text.  The
+-- exact message and SQLSTATE can differ on platforms where the connection is
+-- reset before the message arrives (e.g. Windows), so this test has
+-- alternative expected output files.
+SELECT 1 FROM fdw_retry_ft LIMIT 1;    -- should fail
+ERROR:  server closed the connection unexpectedly
+	This probably means the server terminated abnormally
+	before or while processing the request.
+invalid socket
+CONTEXT:  remote SQL command: SAVEPOINT s2
+COMMIT;
+-- Clean up
+DROP FOREIGN TABLE fdw_retry_ft;
+DROP TABLE fdw_retry_local;
+DROP USER MAPPING FOR CURRENT_USER SERVER fdw_retry_server;
+DROP SERVER fdw_retry_server;
diff --git a/contrib/postgres_fdw/meson.build b/contrib/postgres_fdw/meson.build
index 3e2ed06b766..a20438e5062 100644
--- a/contrib/postgres_fdw/meson.build
+++ b/contrib/postgres_fdw/meson.build
@@ -39,6 +39,7 @@ tests += {
     'sql': [
       'postgres_fdw',
       'query_cancel',
+      'postgres_fdw_fatal',
     ],
     'regress_args': ['--dlpath', meson.project_build_root() / 'src/test/regress'],
   },
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index dfc58beb0d2..d37e64e075f 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3504,44 +3504,6 @@ SELECT count(*) FROM ft1;
 PREPARE TRANSACTION 'fdw_tpc';
 ROLLBACK;
 
--- ===================================================================
--- reestablish new connection
--- ===================================================================
-
--- Change application_name of remote connection to special one
--- so that we can easily terminate the connection later.
-ALTER SERVER loopback OPTIONS (application_name 'fdw_retry_check');
-
--- Make sure we have a remote connection.
-SELECT 1 FROM ft1 LIMIT 1;
-
--- Terminate the remote connection and wait for the termination to complete.
--- (If a cache flush happens, the remote connection might have already been
--- dropped; so code this step in a way that doesn't fail if no connection.)
-DO $$ BEGIN
-PERFORM pg_terminate_backend(pid, 180000) FROM pg_stat_activity
-	WHERE application_name = 'fdw_retry_check';
-END $$;
-
--- This query should detect the broken connection when starting new remote
--- transaction, reestablish new connection, and then succeed.
-BEGIN;
-SELECT 1 FROM ft1 LIMIT 1;
-
--- If we detect the broken connection when starting a new remote
--- subtransaction, we should fail instead of establishing a new connection.
--- Terminate the remote connection and wait for the termination to complete.
-DO $$ BEGIN
-PERFORM pg_terminate_backend(pid, 180000) FROM pg_stat_activity
-	WHERE application_name = 'fdw_retry_check';
-END $$;
-SAVEPOINT s;
--- The text of the error might vary across platforms, so only show SQLSTATE.
-\set VERBOSITY sqlstate
-SELECT 1 FROM ft1 LIMIT 1;    -- should fail
-\set VERBOSITY default
-COMMIT;
-
 -- =============================================================================
 -- test connection invalidation cases and postgres_fdw_get_connections function
 -- =============================================================================
@@ -4603,7 +4565,7 @@ SET debug_discard_caches TO '0';
 \set VERBOSITY sqlstate
 
 SELECT 1 FROM postgres_fdw_disconnect_all();
-ALTER SERVER loopback OPTIONS (SET application_name 'fdw_conn_check');
+ALTER SERVER loopback OPTIONS (application_name 'fdw_conn_check');
 SELECT 1 FROM ft1 LIMIT 1;
 
 -- Since the remote server is still connected, "closed" should be FALSE,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw_fatal.sql b/contrib/postgres_fdw/sql/postgres_fdw_fatal.sql
new file mode 100644
index 00000000000..2ee2db76a5f
--- /dev/null
+++ b/contrib/postgres_fdw/sql/postgres_fdw_fatal.sql
@@ -0,0 +1,63 @@
+-- ===================================================================
+-- reestablish a connection broken while idle
+-- ===================================================================
+-- When a cached connection's remote backend goes away while the connection
+-- sits idle, postgres_fdw should transparently reconnect at the start of a new
+-- remote transaction, but must fail (not reconnect) when starting a remote
+-- subtransaction.  The SQLSTATE of that failure depends on whether the server's
+-- FATAL message reaches us before the dead connection is reset, which varies by
+-- platform, so this test has alternative expected outputs and lives on its own.
+
+SELECT current_database() AS current_database,
+  current_setting('port') AS current_port \gset
+
+-- Use a dedicated server with a distinctive application_name so we can find and
+-- terminate exactly this test's remote backend.
+CREATE SERVER fdw_retry_server FOREIGN DATA WRAPPER postgres_fdw
+  OPTIONS (dbname :'current_database', port :'current_port',
+           application_name 'fdw_retry_check');
+CREATE USER MAPPING FOR CURRENT_USER SERVER fdw_retry_server;
+CREATE TABLE fdw_retry_local (c1 int);
+INSERT INTO fdw_retry_local VALUES (1);
+CREATE FOREIGN TABLE fdw_retry_ft (c1 int)
+  SERVER fdw_retry_server OPTIONS (table_name 'fdw_retry_local');
+
+-- Make sure we have a remote connection.
+SELECT 1 FROM fdw_retry_ft LIMIT 1;
+
+-- Terminate the remote connection and wait for the termination to complete.
+-- (If a cache flush happens, the remote connection might have already been
+-- dropped; so code this step in a way that doesn't fail if no connection.)
+DO $$ BEGIN
+PERFORM pg_terminate_backend(pid, 180000) FROM pg_stat_activity
+	WHERE application_name = 'fdw_retry_check';
+END $$;
+
+-- This query should detect the broken connection when starting a new remote
+-- transaction, reestablish a new connection, and then succeed.
+BEGIN;
+SELECT 1 FROM fdw_retry_ft LIMIT 1;
+
+-- If we detect the broken connection when starting a new remote
+-- subtransaction, we should fail instead of establishing a new connection.
+-- Terminate the remote connection and wait for the termination to complete.
+DO $$ BEGIN
+PERFORM pg_terminate_backend(pid, 180000) FROM pg_stat_activity
+	WHERE application_name = 'fdw_retry_check';
+END $$;
+SAVEPOINT s;
+-- This should fail.  On most platforms libpq receives the server's FATAL
+-- message before the dead connection is torn down, so we report it verbatim
+-- ("terminating connection due to administrator command") -- in particular
+-- without the bogus "server closed the connection unexpectedly" text.  The
+-- exact message and SQLSTATE can differ on platforms where the connection is
+-- reset before the message arrives (e.g. Windows), so this test has
+-- alternative expected output files.
+SELECT 1 FROM fdw_retry_ft LIMIT 1;    -- should fail
+COMMIT;
+
+-- Clean up
+DROP FOREIGN TABLE fdw_retry_ft;
+DROP TABLE fdw_retry_local;
+DROP USER MAPPING FOR CURRENT_USER SERVER fdw_retry_server;
+DROP SERVER fdw_retry_server;
diff --git a/src/bin/psql/t/001_basic.pl b/src/bin/psql/t/001_basic.pl
index bbd330216ae..fb2cb8ad29a 100644
--- a/src/bin/psql/t/001_basic.pl
+++ b/src/bin/psql/t/001_basic.pl
@@ -133,23 +133,69 @@ NOTIFY foo, 'bar';",
 	qr/^Asynchronous notification "foo" with payload "bar" received from server process with PID \d+\.$/,
 	'notification with payload');
 
-# test behavior and output on server crash
+# Test behavior and output when the server closes the connection with a FATAL
+# error.  Because the server told us it was closing the connection, we should
+# show its message verbatim and must not append the generic "server closed the
+# connection unexpectedly" text, which would be both misleading and would hide
+# the server's own error from clients parsing it.
 my ($ret, $out, $err) = $node->psql('postgres',
 		"SELECT 'before' AS running;\n"
 	  . "SELECT pg_terminate_backend(pg_backend_pid());\n"
 	  . "SELECT 'AFTER' AS not_running;\n");
 
-is($ret, 2, 'server crash: psql exit code');
-like($out, qr/before/, 'server crash: output before crash');
-unlike($out, qr/AFTER/, 'server crash: no output after crash');
+is($ret, 2, 'FATAL termination: psql exit code');
+like($out, qr/before/, 'FATAL termination: output before termination');
+unlike($out, qr/AFTER/, 'FATAL termination: no output after termination');
 like(
 	$err,
 	qr/psql:<stdin>:2: FATAL:  terminating connection due to administrator command
-psql:<stdin>:2: server closed the connection unexpectedly
-	This probably means the server terminated abnormally
-	before or while processing the request.
 psql:<stdin>:2: error: connection to server was lost/,
-	'server crash: error message');
+	'FATAL termination: error message');
+unlike(
+	$err,
+	qr/server closed the connection unexpectedly/,
+	'FATAL termination: no bogus unexpected-closure message');
+
+# In contrast, when the backend genuinely crashes it has no chance to send a
+# message, so here psql *should* still fall back to the generic "server closed
+# the connection unexpectedly" report.  Use a dedicated node and a long-lived
+# psql session, since we need an established connection whose backend we can
+# SIGKILL out from under it.  background_psql() is unsuitable here: psql exits
+# once the connection is lost, so we drive the session with IPC::Run directly
+# and watch stderr with pump_until(), which tolerates psql exiting.
+my $crash_node = PostgreSQL::Test::Cluster->new('crash');
+$crash_node->init;
+$crash_node->start;
+
+my $crash_timeout = IPC::Run::timer($PostgreSQL::Test::Utils::timeout_default);
+my ($crash_stdin, $crash_stdout, $crash_stderr) = ('', '', '');
+my $crash_psql = IPC::Run::start(
+	[
+		'psql', '--no-psqlrc', '--quiet', '--no-align', '--tuples-only',
+		'--file' => '-',
+		'--dbname' => $crash_node->connstr('postgres')
+	],
+	'<' => \$crash_stdin,
+	'>' => \$crash_stdout,
+	'2>' => \$crash_stderr,
+	$crash_timeout);
+
+$crash_stdin .= "SELECT pg_backend_pid();\n";
+pump_until($crash_psql, $crash_timeout, \$crash_stdout, qr/[[:digit:]]+[\r\n]$/m);
+my $crash_pid = $crash_stdout;
+chomp $crash_pid;
+
+# SIGKILL leaves the backend no opportunity to tell the client anything.
+PostgreSQL::Test::Utils::system_or_bail('pg_ctl', 'kill', 'KILL', $crash_pid);
+
+# The next query notices the gone connection.
+$crash_stdin .= "SELECT 1;\n";
+ok( pump_until(
+		$crash_psql, $crash_timeout, \$crash_stderr,
+		qr/server closed the connection unexpectedly|could not send data to server|connection to server was lost/
+	),
+	'genuine crash: unexpected closure reported');
+$crash_psql->finish;
 
 # test \errverbose
 #
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 78ffb1025d0..f73ba94dbee 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -955,6 +955,32 @@ pqGetErrorNotice3(PGconn *conn, bool isError)
 					sizeof(conn->last_sqlstate));
 		else if (id == PG_DIAG_STATEMENT_POSITION)
 			have_position = true;
+		else if (id == PG_DIAG_SEVERITY_NONLOCALIZED &&
+				 (strcmp(workBuf.data, "FATAL") == 0 ||
+				  strcmp(workBuf.data, "PANIC") == 0))
+		{
+			/*
+			 * A FATAL or PANIC from the server means the backend is going to
+			 * tear the connection down right after delivering this message.
+			 * Mark the connection bad immediately so callers that drain
+			 * results (PQexecFinish, PQexecStart's discard loop, etc.) stop
+			 * reading from the socket after receiving this result. Further
+			 * reads from the socket will receive an EOF, which would cause us
+			 * to incorrectly report this as an unexpected connection closure
+			 * by appending "server closed the connection unexpectedly ..." to
+			 * the server's own error message. We read SEVERITY_NONLOCALIZED
+			 * rather than SEVERITY so the check is independent of the
+			 * server's lc_messages setting.
+			 *
+			 * We do this regardless of "isError", i.e. even when the message
+			 * arrives while we are idle and is being delivered to the notice
+			 * processor rather than returned as a result.  A FATAL/PANIC
+			 * always means the connection is going away, so a client that
+			 * drains input (e.g. before sending its next command) can detect
+			 * the closure here instead of only when a later read hits EOF.
+			 */
+			conn->status = CONNECTION_BAD;
+		}
 	}
 
 	/*

base-commit: e18b0cb7344cb4bd28468f6c0aeeb9b9241d30aa
-- 
2.54.0

Re: libpq maligning postgres stability

Reply via email to