On 2020/12/12 3:01, Bharath Rupireddy wrote:
On Fri, Dec 11, 2020 at 11:01 PM Fujii Masao
<masao.fu...@oss.nttdata.com> wrote:
On 2020/12/11 19:16, Bharath Rupireddy wrote:
On Thu, Dec 10, 2020 at 7:14 AM Bharath Rupireddy
<bharath.rupireddyforpostg...@gmail.com> wrote:
+                       /* We only look for active and open remote connections. 
*/
+                       if (entry->invalidated || !entry->conn)
+                               continue;

We should return even invalidated entry because it has still cached connection?


I checked this point earlier, for invalidated connections, the tuple
returned from the cache is also invalid and the following error will
be thrown. So, we can not get the server name for that user mapping.
Cache entries too would have been invalidated after the connection is
marked as invalid in pgfdw_inval_callback().

umaptup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(entry->key));
if (!HeapTupleIsValid(umaptup))
        elog(ERROR, "cache lookup failed for user mapping with OID %u",
entry->key);


I further checked on returning invalidated connections in the output
of the function. Actually, the reason I'm seeing a null tuple from sys
cache (and hence the error "cache lookup failed for user mapping with
OID xxxx") for an invalidated connection is that the user mapping
(with OID entry->key that exists in the cache) is getting dropped, so
the sys cache returns null tuple. The use case is as follows:

1) Create a server, role, and user mapping of the role with the server
2) Run a foreign table query, so that the connection related to the
server gets cached
3) Issue DROP OWNED BY for the role created, since the user mapping is
dependent on that role, it gets dropped from the catalogue table and
an invalidation message will be pending to clear the sys cache
associated with that user mapping.
4) Now, if I do select * from postgres_fdw_get_connections() or for
that matter any query, at the beginning the txn
AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
gets called and marks the cached entry as invalidated. Remember the
reason for this invalidation message is that the user mapping with the
OID entry->key is dropped from 3). Later in
postgres_fdw_get_connections(), when we search the sys cache with
entry->key for that invalidated connection, since the user mapping is
dropped from the system, null tuple is returned.

Thanks for the analysis! This means that the cached connection invalidated by 
drop of server or user mapping will not be closed even by the subsequent access 
to the foreign server and will remain until the backend exits. Right?

It will be first marked as invalidated via
AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback(),
and on the next use of that connection invalidated connections are
disconnected and reconnected.

I was thinking that in the case of drop of user mapping or server, 
hash_search(ConnnectionHash) in GetConnection() cannot find the cached connection entry 
invalidated by that drop. Because "user->umid" used as hash key is changed. So 
I was thinking that that invalidated connection will not be closed nor reconnected.



     if (entry->conn != NULL && entry->invalidated && entry->xact_depth == 0)
     {
         elog(DEBUG3, "closing connection %p for option changes to take effect",
              entry->conn);
         disconnect_pg_server(entry);
     }

If so, this seems like a connection-leak bug, at least for me.... Thought?


It's not a leak. The comment before pgfdw_inval_callback() [1]
explains why we can not immediately close/disconnect the connections
in pgfdw_inval_callback() after marking them as invalidated.

*If* invalidated connection cannot be close immediately even in the case of 
drop of server or user mapping, we can defer it to the subsequent call to 
GetConnection(). That is, GetConnection() closes not only the target 
invalidated connection but also the other all invalidated connections. Of 
course, invalidated connections will remain until subsequent GetConnection() is 
called, though.



Here is the scenario how in the midst of a txn we get invalidation
messages(AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
happens):

1) select from foreign table with server1, usermapping1 in session1
2) begin a top txn in session1, run a few foreign queries that open up
sub txns internally. meanwhile alter/drop server1/usermapping1 in
session2, then at each start of sub txn also we get to process the
invalidation messages via
AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback().
So, if we disconnect right after marking invalidated in
pgfdw_inval_callback, that's a problem since we are in a sub txn under
a top txn.

Maybe. But what is the actual problem here?

OTOH, if cached connection should not be close in the middle of transaction, 
postgres_fdw_disconnect() also should be disallowed to be executed during 
transaction?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply via email to