On 2021/01/29 14:12, Bharath Rupireddy wrote:
On Fri, Jan 29, 2021 at 10:28 AM Fujii Masao
<masao.fu...@oss.nttdata.com> wrote:
On 2021/01/29 11:09, Tom Lane wrote:
Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> writes:
On Fri, Jan 29, 2021 at 1:52 AM Tom Lane <t...@sss.pgh.pa.us> wrote:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
telling us is that the patch's behavior is unstable in the face
of unexpected cache flushes.
Thanks a lot! It looks like the syscache invalidation messages are
generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
which pgfdw_inval_callback gets called many times in which the cached
entries are marked as invalid and closed if they are not used in the
txn. The new function postgres_fdw_get_connections outputs the
information of the cached connections such as name if the connection
is still open and their validity. Hence the output of the
postgres_fdw_get_connections became unstable in the buildfarm member.
I will further analyze making tests stable, meanwhile any suggestions
are welcome.
I do not think you should regard this as "we need to hack the test
to make it stable". I think you should regard this as "this is a
bug". A cache flush should not cause user-visible state changes.
In particular, the above analysis implies that you think a cache
flush is equivalent to end-of-transaction, which it absolutely
is not.
Also, now that I've looked at pgfdw_inval_callback, it scares
the heck out of me. Actually disconnecting a connection during
a cache inval callback seems quite unsafe --- what if that happens
while we're using the connection?
If the connection is still used in the transaction, pgfdw_inval_callback()
marks it as invalidated and doesn't close it. So I was not thinking that
this is so unsafe.
The disconnection code in pgfdw_inval_callback() was added in commit
e3ebcca843 to fix connection leak issue, and it's back-patched. If this
change is really unsafe, we need to revert it immediately at least from back
branches because the next minor release is scheduled soon.
I think we can remove disconnect_pg_server in pgfdw_inval_callback and
make entries only invalidated. Anyways, those connections can get
closed at the end of main txn in pgfdw_xact_callback. Thoughts?
But this revives the connection leak issue. So isn't it better to
to do that after we confirm that the current code is really unsafe?
If okay, I can make a patch for this.
BTW, even if we change pgfdw_inval_callback() so that it doesn't close
the connection at all, ISTM that the results of postgres_fdw_get_connections()
would not be stable because entry->invalidated would vary based on
whether CLOBBER_CACHE_ALWAYS is used or not.
Yes, after the above change (removing disconnect_pg_server in
pgfdw_inval_callback), our tests don't get stable because
postgres_fdw_get_connections shows the valid state of the connections.
I think we can change postgres_fdw_get_connections so that it only
shows the active connections server name but not valid state. Because,
the valid state is something dependent on the internal state change
and is not consistent with the user expectation but we are exposing it
to the user. Thoughts?
I don't think that's enough because even the following simple
queries return the different results, depending on whether
CLOBBER_CACHE_ALWAYS is used or not.
SELECT * FROM ft6; -- ft6 is the foreign table
SELECT server_name FROM postgres_fdw_get_connections();
When CLOBBER_CACHE_ALWAYS is used, postgres_fdw_get_connections()
returns no records because the connection is marked as invalidated,
and then closed at xact callback in SELECT query. Otherwise,
postgres_fdw_get_connections() returns at least one connection that
was established in the SELECT query.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION