Hi, As discussed in [1], in postgres_fdw the cached connections to remote servers can stay until the lifetime of the local session without getting a chance to disconnect (connection leak), if the underlying user mapping or foreign server is dropped in another session. Here are few scenarios how this can happen:
Use case 1: 1) Run a foreign query in session 1 with server 1 and user mapping 1 2) Drop user mapping 1 in another session 2, an invalidation message gets generated which will have to be processed by all the sessions 3) Run the foreign query again in session 1, at the start of txn the cached entry gets invalidated via pgfdw_inval_callback() (as part of invalidation message processing). Whatever may be the type of foreign query (select, update, explain, delete, insert, analyze etc.), upon next call to GetUserMapping() from postgres_fdw.c, the cache lookup fails(with ERROR: user mapping not found for "XXXX") since the user mapping 1 has been dropped in session 2 and the query will also fail before reaching GetConnection() where the connections associated with the invalidated entries would have got disconnected. So, the connection associated with invalidated entry would remain until the local session exits. Use case 2: 1) Run a foreign query in session 1 with server 1 and user mapping 1 2) Try to drop foreign server 1, then we would not be allowed because of dependency. Use CASCADE so that dependent objects i.e. user mapping 1 and foreign tables get dropped along with foreign server 1. 3) Run the foreign query again in session 1, at the start of txn, the cached entry gets invalidated via pgfdw_inval_callback() and the query fails because there is no foreign table. Note that the remote connection remains open in session 1 until the local session exits. To solve the above connection leak problem, it looks like the right place to close all the invalid connections is pgfdw_xact_callback(), once registered, which gets called at the end of every txn in the current session(by then all the sub txns also would have been finished). Note that if there are too many invalidated entries, then the following txn has to close all of them, but that's okay than having leaked connections and it's a one time job for the following one txn. Attaching a patch for the same. Thoughts? [1] - https://www.postgresql.org/message-id/flat/CALj2ACUOQYs%2BssxkxRvZ6Ja5%2Bsfc6a-s_0e-Nv2A591hEyOgiw%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
From 76ae4aad5c78ace5ab348c024e724669057e755f Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com> Date: Mon, 14 Dec 2020 22:39:46 +0530 Subject: [PATCH v1] postgres_fdw - cached connection leaks if the associated user mapping is dropped In postgres_fdw the cached connections to remote servers can stay until the lifetime of the local session, if the underlying user mapping is dropped in another session. To solve the above connection leak problem, it looks like the right place to close all the invalid connections is pgfdw_xact_callback(), once registered, which gets called at the end of every txn in the current session(by then all the sub txns also would have been finished). Note that if there are too many invalidated entries, then the following txn has to bear running this extra disconnect code, but that's okay than having leaked connections. --- contrib/postgres_fdw/connection.c | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-) diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index ab3226287d..e7ec914e48 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -73,6 +73,12 @@ static unsigned int prep_stmt_number = 0; /* tracks whether any work is needed in callback functions */ static bool xact_got_connection = false; +/* + * Tracks whether there exists at least one invalid connection in the + * connection cache. + */ +static bool have_invalid_connections = false; + /* prototypes of private functions */ static void make_new_connection(ConnCacheEntry *entry, UserMapping *user); static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user); @@ -788,8 +794,15 @@ pgfdw_xact_callback(XactEvent event, void *arg) HASH_SEQ_STATUS scan; ConnCacheEntry *entry; - /* Quick exit if no connections were touched in this transaction. */ - if (!xact_got_connection) + /* + * Quick exit if no connections were touched in this transaction and there + * are no invalid connections in the cache. If there are any invalid + * connections exists in the cache, this is the safe place to close all of + * them so that they will not be leaked in case underlying user mappings or + * foreign servers are dropped in other sessions because of which they + * may not have a chance to get disconnected in GetConnection. + */ + if (!xact_got_connection && !have_invalid_connections) return; /* @@ -943,12 +956,14 @@ pgfdw_xact_callback(XactEvent event, void *arg) entry->xact_depth = 0; /* - * If the connection isn't in a good idle state, discard it to - * recover. Next GetConnection will open a new connection. + * If the connection isn't in a good idle state or it is marked as + * invalid, then discard it to recover. Next GetConnection will open a + * new connection. */ if (PQstatus(entry->conn) != CONNECTION_OK || PQtransactionStatus(entry->conn) != PQTRANS_IDLE || - entry->changing_xact_state) + entry->changing_xact_state || + entry->invalidated) { elog(DEBUG3, "discarding connection %p", entry->conn); disconnect_pg_server(entry); @@ -962,6 +977,9 @@ pgfdw_xact_callback(XactEvent event, void *arg) */ xact_got_connection = false; + /* We are done closing all the invalidated connections so reset. */ + have_invalid_connections = false; + /* Also reset cursor numbering for next transaction */ cursor_number = 0; } @@ -1105,7 +1123,10 @@ pgfdw_inval_callback(Datum arg, int cacheid, uint32 hashvalue) entry->server_hashvalue == hashvalue) || (cacheid == USERMAPPINGOID && entry->mapping_hashvalue == hashvalue)) + { entry->invalidated = true; + have_invalid_connections = true; + } } } -- 2.25.1