(Alexander just reminded me of this off-list)

On 09/03/2023 20:51, Tom Lane wrote:
In [1] I wrote:

PG Bug reporting form <nore...@postgresql.org> writes:
The following script:
[ leaks a file descriptor per error ]

Yeah, at least on platforms where WaitEventSets own kernel file
descriptors.  I don't think it's postgres_fdw's fault though,
but that of ExecAppendAsyncEventWait, which is ignoring the
possibility of failing partway through.  It looks like it'd be
sufficient to add a PG_CATCH or PG_FINALLY block there to make
sure the WaitEventSet is disposed of properly --- fortunately,
it doesn't need to have any longer lifespan than that one
function.

Here's a patch to do that. For back branches.

After further thought that seems like a pretty ad-hoc solution.
We probably can do no better in the back branches, but shouldn't
we start treating WaitEventSets as ResourceOwner-managed resources?
Otherwise, transient WaitEventSets are going to be a permanent
source of headaches.

Agreed. The current signature of CurrentWaitEventSet is:

WaitEventSet *
CreateWaitEventSet(MemoryContext context, int nevents)

Passing MemoryContext makes little sense when the WaitEventSet also holds file descriptors. With anything other than TopMemoryContext, you need to arrange for proper cleanup with PG_TRY-PG_CATCH or by avoiding ereport() calls. And once you've arrange for cleanup, the memory context doesn't matter much anymore.

Let's change it so that it's always allocated in TopMemoryContext, but pass a ResourceOwner instead:

WaitEventSet *
CreateWaitEventSet(ResourceOwner owner, int nevents)

And use owner == NULL to mean session lifetime.

--
Heikki Linnakangas
Neon (https://neon.tech)
From b9ea609855b838369cddb33e4045ac91603dd726 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Wed, 15 Nov 2023 23:44:56 +0100
Subject: [PATCH v1 1/1] Fix resource leak when a FDW's ForeignAsyncRequest
 function fails

If an error is thrown after calling CreateWaitEventSet(), the memory
of a WaitEventSet is free'd as it's allocated in the short-lived
memory context, but the file descriptor (on epoll- or kqueue-based
systems) or handles (on Windows) that it contains are leaked. Use
PG_TRY-FINALLY to ensure it gets freed.

In the passing, fix misleading comment on what the 'nevents' argument
to WaitEventSetWait means.

The added test doesn't check for leaking resources, so it passed even
before this commit. But at least it covers the code path.

Report by Alexander Lakhin, analysis and suggestion for the fix by
Tom Lane.

Discussion: https://www.postgresql.org/message-id/17828-122da8cba2323...@postgresql.org
Discussion: https://www.postgresql.org/message-id/472235.1678387...@sss.pgh.pa.us
---
 .../postgres_fdw/expected/postgres_fdw.out    |  7 ++
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  6 ++
 src/backend/executor/nodeAppend.c             | 66 +++++++++++--------
 3 files changed, 50 insertions(+), 29 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 64bcc66b8d..22cae37a1e 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -10809,6 +10809,13 @@ SELECT * FROM result_tbl ORDER BY a;
 (2 rows)
 
 DELETE FROM result_tbl;
+-- Test error handling, if accessing one of the foreign partitions errors out
+CREATE FOREIGN TABLE async_p_broken PARTITION OF async_pt FOR VALUES FROM (10000) TO (10001)
+  SERVER loopback OPTIONS (table_name 'non_existent_table');
+SELECT * FROM async_pt;
+ERROR:  relation "public.non_existent_table" does not exist
+CONTEXT:  remote SQL command: SELECT a, b, c FROM public.non_existent_table
+DROP FOREIGN TABLE async_p_broken;
 -- Check case where multiple partitions use the same connection
 CREATE TABLE base_tbl3 (a int, b int, c text);
 CREATE FOREIGN TABLE async_p3 PARTITION OF async_pt FOR VALUES FROM (3000) TO (4000)
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2d14eeadb5..075da4ff86 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3607,6 +3607,12 @@ INSERT INTO result_tbl SELECT a, b, 'AAA' || c FROM async_pt WHERE b === 505;
 SELECT * FROM result_tbl ORDER BY a;
 DELETE FROM result_tbl;
 
+-- Test error handling, if accessing one of the foreign partitions errors out
+CREATE FOREIGN TABLE async_p_broken PARTITION OF async_pt FOR VALUES FROM (10000) TO (10001)
+  SERVER loopback OPTIONS (table_name 'non_existent_table');
+SELECT * FROM async_pt;
+DROP FOREIGN TABLE async_p_broken;
+
 -- Check case where multiple partitions use the same connection
 CREATE TABLE base_tbl3 (a int, b int, c text);
 CREATE FOREIGN TABLE async_p3 PARTITION OF async_pt FOR VALUES FROM (3000) TO (4000)
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..99818d3ebc 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -1025,43 +1025,51 @@ ExecAppendAsyncEventWait(AppendState *node)
 	/* We should never be called when there are no valid async subplans. */
 	Assert(node->as_nasyncremain > 0);
 
+	Assert(node->as_eventset == NULL);
 	node->as_eventset = CreateWaitEventSet(CurrentMemoryContext, nevents);
-	AddWaitEventToSet(node->as_eventset, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
-					  NULL, NULL);
-
-	/* Give each waiting subplan a chance to add an event. */
-	i = -1;
-	while ((i = bms_next_member(node->as_asyncplans, i)) >= 0)
+	PG_TRY();
 	{
-		AsyncRequest *areq = node->as_asyncrequests[i];
+		AddWaitEventToSet(node->as_eventset, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
+						  NULL, NULL);
 
-		if (areq->callback_pending)
-			ExecAsyncConfigureWait(areq);
-	}
+		/* Give each waiting subplan a chance to add an event. */
+		i = -1;
+		while ((i = bms_next_member(node->as_asyncplans, i)) >= 0)
+		{
+			AsyncRequest *areq = node->as_asyncrequests[i];
 
-	/*
-	 * No need for further processing if there are no configured events other
-	 * than the postmaster death event.
-	 */
-	if (GetNumRegisteredWaitEvents(node->as_eventset) == 1)
+			if (areq->callback_pending)
+				ExecAsyncConfigureWait(areq);
+		}
+
+		/*
+		 * No need for further processing if there are no configured events
+		 * other than the postmaster death event.
+		 */
+		if (GetNumRegisteredWaitEvents(node->as_eventset) == 1)
+		{
+			FreeWaitEventSet(node->as_eventset);
+			node->as_eventset = NULL;
+			return;
+		}
+
+		/* Return at most EVENT_BUFFER_SIZE events in one call. */
+		if (nevents > EVENT_BUFFER_SIZE)
+			nevents = EVENT_BUFFER_SIZE;
+
+		/*
+		 * If the timeout is -1, wait until at least one event occurs.  If the
+		 * timeout is 0, poll for events, but do not wait at all.
+		 */
+		noccurred = WaitEventSetWait(node->as_eventset, timeout, occurred_event,
+									 nevents, WAIT_EVENT_APPEND_READY);
+	}
+	PG_FINALLY();
 	{
 		FreeWaitEventSet(node->as_eventset);
 		node->as_eventset = NULL;
-		return;
 	}
-
-	/* We wait on at most EVENT_BUFFER_SIZE events. */
-	if (nevents > EVENT_BUFFER_SIZE)
-		nevents = EVENT_BUFFER_SIZE;
-
-	/*
-	 * If the timeout is -1, wait until at least one event occurs.  If the
-	 * timeout is 0, poll for events, but do not wait at all.
-	 */
-	noccurred = WaitEventSetWait(node->as_eventset, timeout, occurred_event,
-								 nevents, WAIT_EVENT_APPEND_READY);
-	FreeWaitEventSet(node->as_eventset);
-	node->as_eventset = NULL;
+	PG_END_TRY();
 	if (noccurred == 0)
 		return;
 
-- 
2.39.2

Reply via email to